Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
List<ComparisonResult> -> Map<ModelID, EloScore>
Calculate relative Elo rating changes for LLMs based on win/loss outcomes from blind A/B comparisons.
Problem it solves
Subjective human evaluation of generative text outputs is difficult to quantify consistently.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.