Recent from talks
Nothing was collected or created yet.
Chess rating system
View on WikipediaA chess rating system is a system used in chess to estimate the strength of a player, based on their performance versus other players. They are used by organizations such as FIDE, the US Chess Federation (USCF or US Chess), International Correspondence Chess Federation, and the English Chess Federation. Most of the systems are used to recalculate ratings after a tournament or match but some are used to recalculate ratings after individual games. Popular online chess sites such as Chess.com, Lichess, and Internet Chess Club also implement rating systems. In almost all systems, a higher number indicates a stronger player. In general, players' ratings go up if they perform better than expected and down if they perform worse than expected. The magnitude of the change depends on the rating of their opponents. The Elo rating system is currently the most widely used (though it has many variations and improvements). The Elo-like ratings systems have been adopted in many other contexts, such as other games like Go, in online competitive gaming, and in dating apps.[1]
The first modern rating system was used by the Correspondence Chess League of America in 1939. Soviet player Andrey Khachaturov proposed a similar system in 1946.[2] The first one that made an impact on international chess was the Ingo system in 1948. The USCF adopted the Harkness system in 1950. Shortly after, the British Chess Federation started using a system devised by Richard W. B. Clarke. The USCF switched to the Elo rating system in 1960, which was adopted by FIDE in 1970.[3]
Ingo system
[edit]This was the system of the West German Chess Federation from 1948 until 1992, designed by Anton Hoesslinger and published in 1948. It was replaced by an Elo system, Deutsche Wertungszahl. It influenced some other rating systems.
New players receive a high, fixed starting score. Players' new ratings centre on the average rating of entrants to their competition: then if having achieved better than a net draw set of result, minus the number of percentage points it is over 50% (e.g. a 12–4 or 24–8 wins-to-losses result is, as ever, noted as a 75% tournament outcome) – if having achieved worse than this then the number, again in percent, is added to the average of the tournament entrants' scores; thus in all cases recalibrating all players after each tournament completely. A consequence is at most 50 points gained or shed per tournament (namely by a totally winning or totally losing participant) away from the tournament average. Unlike other modern, nationally used chess systems, lower numbers indicate better performance.[4]
Harkness system
[edit]This system was noted in Chess Review by tournament organizer Kenneth Harkness, who expounded his invention of it in articles of 1956, 14 years later. It was used by the USCF from 1950 to 1960 and also by other organizations.
When players compete in a tournament, the average rating of their competition is calculated. If a player scores 50%, they receive the average competition rating as their performance rating. If they score more than 50%, their new rating is the competition average plus 10 points per percentage point exceeding 50. If they score less, their new rating is the competition average minus 10 points per percentage point shy of 50.[5]
Example
[edit]A player with a rating of 1600 plays in an eleven-round tournament and scores 2½–8½ (22.7%) against competition with an average rating of 1850. This is 27.3% below 50% (50–22.7%), so their new rating is 1850 − (10 × 27.3) = 1577.[6]
English Chess Federation system
[edit]The ECF grading system was used by the English Chess Federation until 2020. It was published in 1958 by Richard W. B. Clarke. Each game has a large potential effect. Points (grades) are never immediately effective for every game won, lost or drawn, in a registered competition (including English congresses, local and county leagues, and registered, approved team events) but are averaged into personal grade (ECF Grade) over a cycle of at least 30 games.
A player's contributing score for such averaging is taken to be their opponent's grade (but the gap is deemed to be 40 points, if greater than such a grade gap). However this is adjusted by adding 50 points for a win, subtracting 50 points for a loss, and making no adjustment for a draw. Negative grades are deemed to be nil, so a personal score of 50 arose quickly in the lower leagues and experienced novices aspire to a 100 grading. The cyclical averaging and cycle-persistent Grades are its hallmarks. The maximum gain in a single cycle is 90 points, which would entail beating much higher-rated opponents at every match. The opposite applies to losses.
To convert between ECF and Elo grades, the formula ELO = (ECF * 7.50) + 700 was sometimes used.[7]
Elo rating system
[edit]The Elo system was invented by Arpad Elo and is the most common rating system. It is used by FIDE, other organizations and some Chess websites such as Internet Chess Club and chess24.com. Elo once stated that the process of rating players was in any case rather approximate; he compared it to "the measurement of the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope and which is swaying in the wind".[8] Any attempt to consolidate all aspects of a player's strength into a single number inevitably misses some of the picture.
FIDE divides all its normal tournaments into categories by a narrower average rating of the players. Each category is 25 rating points wide. Category 1 is for an average rating of 2251 to 2275, category 2 is 2276 to 2300, etc. Women's tournaments currently commence 200 points lower, including its Category 1.[9]
| Rating range | Category |
|---|---|
| 2600+ | No formal title, but sometimes informally known as "super grandmasters"[11] |
| 2500–2599 | Grandmasters (GM) |
| 2400–2499 | International Masters (IM) |
| 2300–2399 | FIDE Masters (FM) |
| 2200–2299 | Candidate Masters (CM) |
| 2000–2199 | Experts |
| 1800–1999 | Class A, category 1 |
| 1600–1799 | Class B, category 2 |
| 1400–1599 | Class C, category 3 |
| 1200–1399 | Class D, category 4 |
| 1000–1199 | Class E, category 5 |
| Below 1000 | Novices |
The USCF uses the USCF system, a modification of the Elo system, in which the K factor varies and it gives bonus points for superior performance in a tournament.[12] USCF ratings are generally 50 to 100 points higher than the FIDE equivalents.[13]
| Category | Rating range |
|---|---|
| Senior master | 2400 and up |
| National master | 2200–2399 |
| Expert | 2000–2199 |
| Class A | 1800–1999 |
| Class B | 1600–1799 |
| Class C | 1400–1599 |
| Class D | 1200–1399 |
| Class E | 1000–1199 |
| Class F | 800–999 |
| Class G | 600–799 |
| Class H | 400–599 |
| Class I | 200–399 |
| Class J | 100–199 |
Example
[edit]Elo gives an example of amending the rating of Lajos Portisch, a 2635-rated player before his tournament, who scores 10½ points of a possible 16 winning points (as this is against 16 players). First, the difference in rating is recorded for each other player he faced. Then the expected score, against each, is determined from a table, which publishes this for every band of rating difference. For instance, one opponent was Vlastimil Hort, who was rated at 2600. The rating difference of 35 gave Portisch an expected score of "0.55". This is an impossible score as not 0, 1⁄2 or 1 but as this is higher than 0.5 even a draw will slightly damage Portisch's rating and slightly improve Hort's rating, so (ignoring their other results in the tournament) moving their ratings slightly closer together.
Portisch's expected score is summed for each of his matches, which gave him a total expected score of 9.66 for the tournament. Then the formula is:
- new rating = old rating + (K × (W−We))
K is 10; W is the actual match/tournament score; We is the expected score.
Portisch's new rating[14] is 2635 + 10×(10.5−9.66) = 2643.4.
Linear approximation
[edit]Elo devised a linear approximation to his full system, negating the need for look-up tables of expected score. With that method, a player's new rating is
where Rnew and Rold are the player's new and old ratings respectively, Di is the opponent's rating minus the player's rating, W is the number of wins, L is the number of losses, C = 200 and K = 32. The term (W-L) / 2 is the score above or below 0. ΣD / 4C is the expected score according to: 4C rating points equals 100%.[15]
The USCF used a modification of this system to calculate ratings after individual games of correspondence chess, with a K = 32 and C = 200.[16]
Glicko rating system
[edit]The Glicko system is a more modern approach, which was invented by Mark Glickman as an improvement of the Elo system. It is used by Chess.com, Free Internet Chess Server and other online chess servers. The Glicko-2 system is a refinement of the original Glicko system and is used by Lichess, Australian Chess Federation and other online websites.
Turkey UKD system
[edit]TSF (Turkey Chess Federation) uses a combination of ELO and UKD system.[17]
USA ICCF system
[edit]The ICCF U.S.A. used its own system in the 1970s. It now uses the Elo system.
Deutsche Wertungszahl
[edit]The Deutsche Wertungszahl system replaced the Ingo system in Germany.
Chessmetrics
[edit]The Chessmetrics system was invented by Jeff Sonas. It is based on computer analysis of a large database of games and is intended to be more accurate than the Elo system.
Universal Rating System
[edit]The Universal Rating System was developed by Mark Glickman, Jeff Sonas, J. Isaac Miller and Maxime Rischard, with the support of the Grand Chess Tour, the Kasparov Chess Foundation, and the Chess Club and Scholastic Center of Saint Louis.[18]
Rating systems using computers as a reference
[edit]Many rating systems give a rating to players at a given time, but cannot compare players from different eras. In 2006, Matej Guid and Ivan Bratko pioneered a new way of rating players, by comparing their moves against the recommended moves of a chess engine. The authors used the program Crafty and argued that even a lower-ranked program (Elo around 2700) could identify good players.[19] In their follow-up study, they used Rybka 3 to estimate chess player ratings.[20]
In 2017, Jean-Marc Alliot compared players using Stockfish 6 with an ELO rating around 3300, well above top human players.[21]
Chronology
[edit]- 1933 – The Correspondence Chess League of America (now ICCF U.S.A.) is the first national organization to use a numerical rating system. It chooses the Short system which clubs on the west coast of the US had used. In 1934 the CCLA switched to the Walt James Percentage System but in 1940 returned to a point system designed by Kenneth Williams.
- 1942 – Chess Review uses the Harkness system, an improvement of the Williams system.
- 1944 – The CCLA changes to an improved version of the Williams system devised by William Wilcock. A slight change to the system was made in 1949.
- 1946 – The USSR Chess Federation uses a non-numerical system to classify players.
- 1948 – The Ingo system is published and used by the West German Chess Federation.
- 1949 – The Harkness system is submitted to the USCF. The British Chess Federation adopts it later and uses it at least as late as 1967.[22]
- 1950 – The USCF starts using the Harkness system and publishes its first rating list in the November issue of Chess Life. Reuben Fine is first with a rating of 2817 and Sammy Reshevsky is second with 2770.[23]
- 1959 – The USCF names Arpad Elo the head of a committee to examine all rating systems and make recommendations.
- 1961 – Elo develops his system and it is used by the USCF.[24] It is published in the June 1961 issue of Chess Life.[25]
- 1970 – FIDE starts using the Elo system. Bobby Fischer is at the top of the list.[26]
- 1978 – Elo's book (The Rating of Chessplayers, Past and Present) on his rating system is published.
- 1993 – Deutsche Wertungszahl replaces the Ingo system in Germany.
- 2001 – the Glicko system by Glickman is published.[27]
- 2005 – Chessmetrics is published by Jeff Sonas.[28]
- 2006 – Matej Guid and Ivan Bratko publish the research paper "Computer Analysis of World Chess Champions", which rates champions by comparing their moves to the moves chosen by the computer program Crafty.[29]
- 2017 – Jean-Marc Alliot publishes the research paper "Who is the Master?", which rates champions by comparing their moves to Stockfish 6.[21]
See also
[edit]Notes
[edit]- ^ "Tinder matchmaking is more like Warcraft than you might think - Kill Screen". 2017-08-19. Archived from the original on 2017-08-19. Retrieved 2024-02-04.
According to Tinder CEO Jonathan Badeen, Tinder uses a variation of ELO scoring to determine how you rank among the site's userbase, and therefore, which profiles to suggest to you and whose queues your profile shows up in.
- ^ (Hooper & Whyld 1992:332)
- ^ (Hooper & Whyld 1992:332)
- ^ (Harkness 1967:205–6).
- ^ (Harkness 1967:185–88)
- ^ (Harkness 1967:187)
- ^ "The calculation of ECF Grades on a monthly basis". English Chess Federation. Retrieved 2022-07-08.
- ^ Chess Life, 1962.
- ^ FIDE Handbook, Section B.0.0, FIDE web site
- ^ Elo, 1978, p. 18
- ^ "How to face a Super Grandmaster?". Saint Louis Chess Club. January 25, 2019.
- ^ (Just & Burg 2003:259–73)
- ^ (Just & Burg 2003:112)
- ^ (Elo 1978:37)
- ^ (Elo 1978:28–29)
- ^ "The United States Chess Federation - CC Ratings Explanation". www.uschess.org.
- ^ "TSF UKD Bilgi Sistemi". ukd.tsf.org.tr.
- ^ "Universal Rating System". 2017-01-03.
- ^ Riis, Søren (2 November 2006). "Review of "Computer Analysis of World Chess Champions"". Chessbase.
- ^ Riis, Søren (11 November 2011). "Review of "Using Chess Engines to Estimate Human Skill"". Chessbase.
- ^ a b Alliot, Jean-Marc (2017). "Who is the Master?". ICGA Journal. 39: 3–43. doi:10.3233/ICG-160012.
- ^ (Harkness 1967:184)
- ^ (Lawrence 2009)
- ^ (Harkness 1967:184)
- ^ (Elo 1978:197)
- ^ (Elo 1978:68, 89)
- ^ "Glickman website". Archived from the original on June 11, 2010.
- ^ "Welcome to the Chessmetrics site". chessmetrics.com. Archived from the original on November 15, 2011.
- ^ Guid, Matej; Bratko, Ivan. "Computer analysis of world chess champions". International Computer Games Association Journal. 29 (2): 3–14.
References
[edit]- Elo, Arpad (1978), The Rating of Chess Players, Past and Present, Arco, ISBN 978-0-668-04721-0
- Harkness, Kenneth (1967), The Official Chess Handbook, McKay
- Hooper, David; Whyld, Kenneth (1992), The Oxford Companion to Chess (2nd ed.), Oxford University Press, ISBN 978-0-19-280049-7
- Just, Tim; Burg, Daniel B. (2003), U.S. Chess Federation's Official Rules of Chess (5th ed.), McKay, ISBN 978-0-8129-3559-2
- Lawrence, Al (February 2009), "Ratings, Rules, and Rockets: USCF's 2nd decade: 1949–1958", Chess Life, 2009 (2): 9
External links
[edit]Chess rating system
View on GrokipediaFundamentals
Purpose and Design Goals
A chess rating system is a numerical method designed to estimate and quantify a player's relative skill level based on their performance in games against other players, providing an objective measure of playing strength.[1] Its primary goals encompass facilitating fair pairings in tournaments by matching players of comparable ability, predicting the expected outcomes of matches through probabilistic scoring, tracking individual progress across games and events, and enabling standardized classification of players—from beginners with low ratings to elite grandmasters with high ones.[3] These objectives ensure competitive balance and motivate improvement by offering a clear, merit-based hierarchy.[4] The historical motivation for chess rating systems arose in the early 20th century, as the growing number of competitive players necessitated replacing subjective assessments—such as informal titles or opinion-based rankings—with consistent, data-driven evaluations to support organized play.[5] Prior to formalized systems, player strength was often gauged through ad hoc judgments by experts or organizers, leading to inconsistencies that hindered fair competition and reliable performance tracking.[3] Key challenges addressed by these systems include accommodating variable competition levels across tournaments and regions, mitigating the effects of infrequent play that can cause ratings to become outdated or unstable, and grappling with the uncertainty inherent in skill measurement due to factors like performance variability and limited game samples.[4] By incorporating statistical principles, rating systems aim to produce robust estimates despite these issues, ensuring ratings reflect true ability as accurately as possible.[6] Metrics for evaluating the success of a chess rating system focus on its predictive accuracy for expected scores in matches, the long-term stability of ratings amid ongoing play, and its adaptability to diverse formats such as over-the-board and online chess.[1] The Elo system exemplifies these goals as a foundational benchmark for modern rating designs.[5]Core Principles and Components
Chess rating systems fundamentally rely on three key components: initial rating assignment for new players, evaluation of performance through game scores, and adjustment of ratings based on those results against opponents of known strength. Initial ratings are typically assigned to unrated players by averaging the ratings of opponents they face in their first few games, often weighted by factors such as the number of games played and the recency of those opponents' ratings, with a cap on the effective number of games considered to ensure reliability. For example, in FIDE's system (as of March 2024), the average opponent rating includes two hypothetical opponents rated 1800 (each counted as a draw), adjusted based on the player's score, and bounded between 1400 and 2200 after at least five games.[6][1] Alternatively, provisional periods allow ratings to stabilize after a minimum number of games against rated opponents, preventing premature assignments based on limited data.[1] Performance evaluation centers on the actual score achieved in each game—1 point for a win, 0.5 for a draw, and 0 for a loss—aggregated over multiple games to reflect overall results.[4] Central to these systems are concepts like the expected score, which estimates the probability of one player defeating another based on their rating difference, and the rating difference itself as a predictor of outcomes. The expected score is derived from a logistic function modeling win probabilities, where a larger rating advantage for player A over B increases the likelihood of A's victory.[3] The K-factor modulates the magnitude of rating adjustments, typically higher for inexperienced players to allow rapid convergence to true strength and lower for veterans to reflect greater stability, ensuring changes are proportional to surprise in results.[4] The general update principle across most systems follows the formula: This adjustment rewards outperforming expectations and penalizes underperformance, promoting convergence toward a player's underlying ability over time.[3] Special cases in score calculation include draws, which contribute 0.5 points to the actual score, and forfeits generally excluded unless due to exceptional circumstances like force majeure with at least one move played, to maintain the integrity of performance metrics. Forfeits without moves played are often not rated, avoiding artificial inflation of scores. Byes are typically not included in rating calculations, as they do not involve played games.[1][6] Rating inflation, where average ratings rise without corresponding skill improvements, and deflation, the opposite trend, arise from factors like inconsistent K-factor applications or expanding player pools, potentially skewing comparisons across eras.[7] Regularization techniques mitigate these by imposing rating floors (e.g., minimums of 1000), periodic recalibrations through normalization of rating distributions, or adjustments based on long-term performance trends to preserve the system's statistical validity.[4] These methods ensure ratings remain a reliable measure of relative strength.[6]Historical Systems
Ingo System
The Ingo system, developed in 1948 by Anton Hoesslinger for the West German Chess Federation, represented one of the earliest attempts to create a numerical rating framework for chess players. Named after Hoesslinger's hometown of Ingolstadt, it was published in the April 1948 edition of Bayerischen Schachnachrichten and marked a shift from qualitative assessments to quantifiable measures of skill in the pre-Elo era.[5][4] The system gained acceptance in West Germany, where it was used to generate ranking lists that aligned closely with estimates from experienced chess experts, providing a simple tool for organizing tournaments and player classifications.[5] At its core, the Ingo system relied on a point-based scoring approach, awarding 1 point for a win, 0.5 points for a draw, and 0 points for a loss. A player's rating was computed as the average points per game over a specified period, adjusted to account for the relative strength of opponents through a performance metric derived from the deviation of the achieved score from an expected neutral outcome.[5] This resulted in a scale where lower numbers denoted stronger players, starting from near zero for top performers and extending upward for novices; for context, typical club-level ratings fell between 100 and 200. Unlike later systems, it featured no formal categories such as letter grades, though equivalents to modern classifications placed elite players below 50 and beginners above 250. The method's simplicity facilitated manual calculations, making it practical for federation use without advanced computational resources.[5] Despite its innovations, the Ingo system had notable limitations, including the absence of probabilistic models to predict expected outcomes between players of known ratings, which reduced its utility for forecasting match results. It was also highly sensitive to the number of games played, as ratings could fluctuate significantly with small sample sizes, and while it incorporated opponent strength, the adjustment was rudimentary and prone to inconsistencies across varying competition levels.[5] The system remained in use in West Germany for over four decades but was eventually supplanted in the early 1990s by more statistically robust methods like the Deutsche Wertungszahl. It served as a foundational influence on subsequent approaches, including the Harkness system adopted by the U.S. Chess Federation in 1950, which built upon its opponent-adjusted scoring principles.[5]Harkness System
The Harkness system was introduced in 1950 by Kenneth Harkness for the United States Chess Federation (USCF), marking an advancement in rating methodologies by incorporating adjustments for opponent strength while building on the simplicity of earlier approaches like the Ingo system.[8][4] It served as the primary rating mechanism for the USCF from 1950 until 1960, when it was supplanted by the Elo system, and was praised for its ability to reward performance against stronger opposition more equitably than flat scoring methods.[4] Under the Harkness system, ratings were updated after each tournament based on the player's performance relative to the average strength of opponents. The new rating was calculated as the average rating of all opponents faced plus 10 points for each percentage point by which the player's score exceeded 50%, or minus 10 points for each percentage point below 50%. For example, a player scoring 60% in a tournament against an average opponent rating of 1800 would receive a new rating of 1800 + (10 × 10) = 1900. This method ensured that performing as expected against the field maintained the rating, while superior or inferior results led to appropriate adjustments. Class designations were tied to specific thresholds, such as Class A (1800–1999) or reaching 2200 for Expert/Master status, facilitating clear progression through player categories.[8][4] The system's key advantage lay in its explicit consideration of opponent strength, which provided a fairer measure of skill progression compared to unadjusted win counts, though it required manual tabulation and was eventually deemed insufficiently probabilistic for broader adoption.[4]English Chess Federation System
The English Chess Federation (ECF) grading system, originally developed under the British Chess Federation (BCF), traces its origins to informal classifications in the 1920s but was formally established in the 1950s and refined in the post-World War II period. Devised by statistician Richard W. B. Clarke, it served as the primary domestic rating mechanism for chess players in England until its replacement in 2020. Grades ranged from 0 for novice players to over 300 for grandmaster-level strength, providing a three-digit scale distinct from international systems.[9][10] The core method relied on computing an average performance rating derived from a player's most recent games, typically the last 30 or more, with weighting applied based on opponent strength to emphasize competitive encounters. For each event or set of games, a performance grade was calculated by adjusting the average opponent grade according to the player's results, using predefined tables for expected outcomes. This approach incorporated variants for rapidplay games, which had separate grading lists to account for faster time controls, and later adaptations for online play under ECF oversight.[11][10] Grade adjustments followed a structured formula: change in grade = (actual score - expected score) × factor, where the expected score was interpolated from grade difference tables (e.g., a 20-grade advantage yielding an expected win probability of about 64%), and the factor was a constant such as 20 to scale the update magnitude. Updates occurred biannually until later monthly shifts, ensuring grades reflected sustained performance rather than isolated results.[11][12] New players received provisional grades after a minimum threshold of games, often 10 to 30, to avoid volatility from limited data; these were marked and updated cautiously until stability was achieved. For county and team competitions, adjustments weighted games by event prestige and team context, such as board order or match importance, to better capture collective contributions without inflating individual grades.[10][13] The system remained active for domestic grading until 2020, when it transitioned to an Elo-based model, but historical ECF grades continue to inform event seeding and are mapped to FIDE ratings for equivalence, with an ECF grade of 200 approximately aligning to a FIDE rating of 2100.[14][9] This structure bore similarities to early US systems in its emphasis on performance relative to opponents.[15]Standard Rating Systems
Elo Rating System
The Elo rating system, developed by Hungarian-American physicist and chess master Arpad Elo in the late 1950s and early 1960s, represents a statistical method for estimating player skill levels in chess based on game outcomes.[5] As chairman of the United States Chess Federation (USCF) Rating Committee starting in 1959, Elo refined earlier systems to create a more robust model, which the USCF officially adopted in 1960 to replace inconsistent prior methods.[2] The system's mathematical foundation draws from the logistic distribution to model performance differences between players, assuming that skill variations follow a logistic probability curve rather than a normal one for better fit to competitive outcomes.[5] Elo validated the model through extensive analysis of thousands of games, including data from U.S. Open Tournaments (1973–1975) involving 1,514 players over 12 rounds and historical crosstables spanning 120 years, confirming its predictive accuracy with chi-square tests showing close alignment to observed results (standard deviation ≈1.65).[5] The Fédération Internationale des Échecs (FIDE) adopted the Elo system in 1970 following trials and presentations at congresses like the 1965 Weisbaden event, marking its transition to an international standard.[2] This adoption led to FIDE's first official rating list in 1971, covering the top players.[2] with ratings scaled such that a 200-point difference predicts approximately a 76% win probability for the higher-rated player.[2] The system's core formula calculates the expected score for player A against opponent B as where and are the current ratings of players A and B, respectively; the 400 scaling factor ensures that a 400-point difference yields an expected score of 0.91 (i.e., 91% win probability).[5] After a game, player A's updated rating is with denoting the actual score (1 for a win, 0.5 for a draw, 0 for a loss) and as the development coefficient that controls rating volatility.[5] This update symmetrically adjusts the opponent's rating by the negative of the change, preserving the zero-sum nature of the system.[1] The -factor varies to balance stability for established players against responsiveness for newcomers and juniors, mitigating floor effects for beginners (who start with provisional ratings around 1200–1500) and ceiling effects for top players (whose gains diminish above certain thresholds).[6] In the original USCF implementation, for most adult players, with variations such as 24 or 16 based on rating level, and higher for juniors to accelerate convergence.[5] FIDE's current regulations (effective 1 March 2024) set for new players until 30 games are completed and for juniors under 18 with ratings below 2300 (until the end of their 18th year); for players under 2400; and for those reaching 2400 or higher, with further reductions if the product of games played and exceeds 700 in a period to prevent excessive swings.[1] To illustrate, consider a game between Player A (rating 2400) and Player B (rating 2200), assuming for both. The rating difference is 200 points, so Player A's expected score is Player B's expected score is .- If A wins (), A's rating change is (rounded to 8), so ; B loses 8 points to 2192.
- If the game draws (), A's change is (rounded to -8), so ; B gains 8 to 2208.
- If B wins (), A's change is (rounded to -24), so ; B gains 24 to 2224.
Glicko Rating System
The Glicko rating system was created by Mark Glickman in 1995 to address limitations in the Elo system by incorporating a measure of rating uncertainty.[16] It extends the Elo framework by introducing a rating deviation (RD), which quantifies the reliability of a player's rating estimate, starting at 350 for unrated players to reflect high initial uncertainty.[16] The system scales the logistic probability using a factor , ensuring compatibility with Elo-like rating values around 1500 for average players.[16] Widely adopted in online chess, it powers rating calculations on platforms like Lichess and the Internet Chess Club (ICC), where it supports dynamic adjustments for irregular play.[17][18] It has also been adopted by the USCF for over-the-board ratings since 2016.[6] In the original Glicko-1 variant, ratings update after a period of games using , where , , is the expected outcome against opponent j, and is the estimated variance from the opponents' outcomes.[16] The RD itself updates to , narrowing the deviation as more data accumulates.[16] This probabilistic approach builds directly on Elo's expected score concept but modulates changes based on uncertainty, allowing larger adjustments for players with high RD (e.g., novices) and smaller ones for established ratings.[16] The enhanced Glicko-2 version introduces volatility (often denoted ) to model fluctuations in performance stability, updating it iteratively based on game frequency and outcome consistency within a rating period—ideally 10-15 games for optimal convergence.[19] The volatility update solves for the new volatility (or ) using an equation that balances prior volatility, new variance from results, and a system parameter (typically 0.5-1.2) to prevent excessive swings: where , , and derive from game deviations, solved via the Illinois method until convergence.[19] The new rating then becomes (with R' scaled accordingly), where , and the new RD' = 173.7178 \times \phi' ).[19] Glicko-2 is preferred in modern implementations like Lichess for its nuanced handling of player consistency.[17] A key advantage of the Glicko system is its treatment of inactivity and sparse data: RD increases over time without games, such as where per period, reflecting growing uncertainty and enabling catch-up adjustments upon return.[16] This is especially useful for online chess, where players may play infrequently; for instance, during a new player's "rating probation" phase, high initial RD (e.g., 350) shrinks rapidly with 5-10 games but regrows if play halts, promoting fair convergence to true strength even with irregular participation.[19] Overall, these features make Glicko more robust than Elo for environments with variable game volumes, reducing overconfidence in ratings derived from limited data.[16]Variant and National Systems
Turkey UKD System
The UKD (Ulusal Kuvvet Derecesi) system, meaning "National Strength Rating," was adopted by the Turkish Chess Federation (TSF) on January 1, 2005, to evaluate players' strengths in national over-the-board tournaments.[20] This performance-based rating provides a four-digit numerical measure of a player's ability, calculated and published by the TSF, and is mandatory for participation in domestic competitions.[20] Unlike the international Elo system, UKD emphasizes localized progression for Turkish players, particularly in youth and school programs, where it facilitates ranking and advancement within the national framework. Players with UKD below 1000 are considered unrated.[20] The calculation method mirrors the Elo system's core principle of adjusting ratings based on game outcomes relative to expected results but incorporates TSF-specific parameters.[20] The formula is UKD_y = UKD_e + ΔD, where UKD_y is the new rating, UKD_e is the existing rating, and ΔD is the score difference adjusted by a variable K-factor: 25 for ratings ≤1599, 20 for 1600–1999, 15 for 2000–2399, and 10 for ≥2400.[20] Expected scores (We) are derived from rating differences, capped at 350 points, and only games from standard time controls (minimum 60 minutes per player) contribute to updates; rapid and blitz events are excluded.[20] Initial ratings for unrated players are set at 800 if no rated opponents are faced, or as the average UKD of the rated opponents minus 300 points, after at least seven games including three against rated opponents.[20] For age-group and youth tournaments, such as the Türkiye Yaş Grupları Şampiyonası, UKD assignments include tailored adjustments to promote development in school and junior chess, which receive significant emphasis in Turkey's national program.[21] Here, an average UKD is computed for each category (e.g., 1247 for under-8 girls), with players achieving 50% scores receiving this baseline; deviations of 0.5 points above or below adjust the rating by ±20 points, ensuring the higher of the adjusted or standard calculation is applied, while zero-point performers start at 1001.[21] This approach supports the TSF's focus on youth participation, integrating ratings with national ID numbers (T.C. Kimlik No.) for seamless tracking and licensing.[20] Players reaching a UKD of 1400 or higher become eligible for FIDE-rated tournaments in Turkey, enabling a transition to international Elo ratings through performance in those events.[22] Today, the UKD system coexists with FIDE Elo, serving as the primary domestic metric while Elo handles global play, with TSF maintaining an online query tool linked to national IDs for real-time access.[23]USA ICCF System
The USA ICCF rating system governs correspondence chess ratings for American players participating in events sanctioned by the International Correspondence Chess Federation (ICCF), managed by the United States Chess Federation (USCF) since the 1970s. This system supports slower-paced play through postal mail, email, or server-based formats, where participants have extended time for analysis, often days or weeks per move. It applies to both national USCF tournaments and international ICCF competitions hosted on the ICCF web server, with USCF providing official ratings for these games.[24][25] The method is a modified Elo system tailored for correspondence chess, featuring a higher K-factor of 32 to enable larger rating adjustments per game, compensating for the fewer games typically played compared to over-the-board (OTB) formats. Expected scores are calculated using a scaling constant C=200 in the logistic formula, rather than the standard 400, which steepens the probability curve and amplifies outcome expectations for rating differences, aligning with the deeper analytical play that reduces errors and increases draw frequency. For established players below 2100, the update formula approximates Rn = Ro + 0.04(ED) ± 16, where Rn is the new rating, Ro the old rating, and ED the rating difference (capped at 350 points); draws omit the ±16 adjustment, and K effectively decreases (to 0.03 or 0.02) above 2100 or 2400 for stability. Provisional ratings form over the first 25 games as an average of results, with no changes for outcomes against opponents differing by over 400 points.[5][26] New players start without a rating, but initial games assign provisional values: a win credits the winner with 1700 and the loser with 1300, while a draw assigns 1500 to both, providing an entry point around 1600 on average. Forfeits and incomplete games, including historical adjournments in postal play, receive limited credit—such as 20% of full points (maximum 10) for the winner with no game count—ensuring only completed results fully impact ratings. While unified lists track overall performance, separate considerations apply to themed events (e.g., specific openings) versus open tournaments, though primary publication focuses on comprehensive USCF correspondence lists.[26] Key differences from OTB systems include greater draw emphasis, as correspondence play yields higher draw rates due to exhaustive analysis, influencing expected score calculations and rating volatility. USCF correspondence ratings typically run about 100 points higher than equivalent FIDE OTB ratings, reflecting format-specific strengths in dual-format players. The system converges with FIDE through ICCF's formal cooperation, where ICCF titles (e.g., Correspondence Chess Grandmaster) are recognized by FIDE, allowing seamless title progression for players active in both correspondence and OTB arenas. Usage centers on USCF-organized postal/email tournaments and ICCF server events, fostering participation from beginners to elites without engine assistance.[5][27][28]Deutsche Wertungszahl
The Deutsche Wertungszahl (DWZ), translating to "German evaluation number," serves as the official domestic rating system of the Deutscher Schachbund (DSB), Germany's national chess federation. Developed in the late 1980s and early 1990s following the reunification of Germany, it was implemented nationwide across all DSB regional associations on January 1, 1993, succeeding the Ingo system in the former West Germany and the NWZ system in the former East Germany. This transition aimed to standardize player evaluation within the unified federation while incorporating refinements to earlier methods for greater accuracy in assessing playing strength.[29] Unlike cumulative systems such as Elo, which track ongoing performance across all games, the DWZ functions as a periodic performance index rather than a persistent rating, emphasizing recent form to provide a snapshot of current ability. The DWZ is calculated using an adapted Elo formula: new rating = old rating + E × (actual score - expected score), where the expected score follows the logistic probability model, and E is a development coefficient starting at 30 and adjusted higher for younger or less experienced players to allow faster rating changes. Computations are based on all rated games entered into the DSB's DeWIS database, with updates integrated shortly after tournaments, typically within 24 hours.[30][31][32] The DWZ categorizes players into performance levels, with ratings of 2200 or higher designating master-strength players eligible for advanced titles and competitions within the DSB framework. The system includes age-adjusted factors in the development coefficient, which is higher for younger players (under 20) to accelerate rating changes, while more experienced players see smaller adjustments for stability. These features help foster fairer domestic pairings and progression. DWZ ratings coexist with FIDE Elo for international play, showing rough equivalence where a DWZ of 2000 aligns approximately with a FIDE rating of 2000, though variances occur due to differing update cadences and game pools.[32][33]Alternative Approaches
Chessmetrics
Chessmetrics is a retrospective chess rating system developed by statistician Jeff Sonas, who began working on it in the summer of 1999 and launched the original website in late 2001, with significant updates in 2005. The system and website have not been updated since 2005, providing ratings up to that year.[34] Unlike real-time systems like Elo, Chessmetrics computes monthly ratings based on historical game outcomes across the pre-FIDE and post-FIDE eras up to 2005, enabling comparisons of player strengths from the 19th century to the early 21st century. It focuses primarily on top-level players, starting from seed ratings and expanding through interconnected game data up to 11 degrees of separation.[35] The methodology employs regression-based calculations to estimate ratings from tournament and match results, using a weighted performance rating that emphasizes recent games over a 48-month window. Games are weighted linearly by recency—100% for the most recent month, decreasing to 2% for the 47th month—while requiring at least five weighted games for a valid rating. To handle uneven or sparse historical data, the system incorporates priors akin to Bayesian estimation by "padding" performance ratings with simulated draws against average opponents and a baseline of 2300, effectively stabilizing estimates in data-poor periods like the 19th century. For instance, Paul Morphy's peak rating is calculated at 2743 in June 1859, placing him at the world number one at age 22.[35][36] This approach provides advantages in cross-era consistency by applying uniform criteria to all available games, avoiding the era-specific adjustments in FIDE ratings. It highlights potential inflation in modern ratings, as historical peaks like Morphy's align more closely with today's elite levels than FIDE's inflated scale suggests, attributing discrepancies to increased game volume and activity in contemporary chess.[37] However, Chessmetrics is not intended for active tournament use or real-time updates, limiting its application to analytical and historical contexts rather than ongoing player classification.[38] It serves as a complement to the Elo system for deeper historical analysis, offering a data-driven lens on long-term player dominance.Universal Rating System
The Universal Rating System (URS) was proposed in 2017 as a unified approach to rating chess players across various formats, including over-the-board (OTB), online, rapid, and blitz games, aiming to provide a single global rating that reflects overall playing strength. Developed by a team of experts including Dr. Mark Glickman, Jeff Sonas, Dr. J. Isaac Miller, and Maxime Rischard, and funded by organizations such as the Grand Chess Tour, Kasparov Chess Foundation, and the Chess Club and Scholastic Center of Saint Louis, the URS seeks to address the fragmentation caused by separate rating lists for different time controls and platforms. URS ratings continue to be computed and published independently as of July 2025, though without formal widespread adoption.[39][40][41] The method employs a weighted, Elo-like performance rating calculated over a six-year window of game history, using exponential decay to prioritize recent results while incorporating data from all formats. Format multipliers adjust for variations in time controls, such as reducing the influence of faster games like blitz due to their higher variability, with specific adjustments modeled via metrics like M60 (minutes allocated for the first 60 moves) to normalize across classical, rapid, and blitz. Initial ratings are established through iterative convergence from multiple historical sources, and while rapid and blitz maintain linked pools for format-specific insights, the system integrates them into a cohesive universal score, for example by treating online games with a lower K-factor (around 16) to account for their volume and speed. This builds briefly on Glickman’s prior Glicko framework for handling rating uncertainty in online contexts.[39][42] The primary goals of the URS are to reduce rating fragmentation across platforms and formats, enabling better cross-format predictions of player performance, such as using classical results to refine rapid and blitz expectations. For instance, it has demonstrated superior predictive accuracy over traditional Elo systems in analyzing thousands of high-level games, outperforming FIDE ratings in outcome forecasts across player levels. Adoption has been limited but notable, with the system used by the Grand Chess Tour for 2017 wildcard selections and offered freely to organizers, alongside ongoing discussions for partial integration with FIDE to enhance global standardization.[43][44]Computer-Referenced Systems
Computer-referenced chess rating systems emerged in the 2000s as powerful chess engines surpassed human performance levels, providing a benchmark for evaluating player strength relative to artificial intelligence. These systems leverage engine evaluations to calibrate human ratings, often extending traditional Elo methodologies to hybrid human-AI contexts. For instance, modern engines like Stockfish 17.1 achieve Elo ratings of approximately 3644 in engine-only tournaments such as the CCRL 40/40 list, far exceeding the top human grandmaster rating of 2839 held by Magnus Carlsen as of November 2025.[45][46] Key methods in these systems include centaur ratings, which assess human-computer teams—known as "centaurs"—in advanced chess formats where players consult engines during play. Pioneered in the late 1990s but popularized through freestyle tournaments in the mid-2000s, centaur play combines human intuition with computational analysis, often yielding strengths superior to either alone. Engine-adjusted expectations further refine ratings by scaling human performance against predicted outcomes in engine-analyzed positions, accounting for the non-linear gap between human and machine play.[47] Events like the Top Chess Engine Championship (TCEC) exemplify this benchmarking, where engines compete under standardized conditions to establish relative strengths, with top performers rated around 3600-3800 Elo as of 2025—roughly 800-1000 points above elite humans.[48] This scaling allows for contextualizing human ratings; for example, a top grandmaster at 2800 Elo might be expected to score under 10% against a 3400-rated engine in classical time controls. Such comparisons highlight the vast performance divide while informing hybrid evaluations. These systems find applications in online platforms, where bot opponents are calibrated to human-equivalent ratings for training and matchmaking, and in predicting outcomes for human-AI matches. However, challenges arise from non-transitive strengths, where centaur teams exhibit unpredictable hierarchies—amateur-human-strong-computer pairings can defeat grandmaster-weak-computer teams, as seen in the 2005 PAL/CSS Freestyle Tournament won by the amateur duo "ZackS" using multiple engines. This non-transitivity complicates direct Elo translations and underscores the unique dynamics of hybrid play.[47]Evolution and Chronology
Key Milestones
The formalization of chess rating systems began in the 1940s with the introduction of the Ingo system in 1948, developed by Anton Hoesslinger and named after his hometown of Ingolstadt, Germany; this marked the first significant numerical approach to evaluating player strength based on tournament performance percentages, influencing early international efforts.[8][4] In 1950, the United States Chess Federation (USCF) adopted the Harkness system, devised by Kenneth Harkness, which refined prior methods by incorporating fixed point adjustments for wins, draws, and losses relative to expected outcomes, and published its inaugural rating list that year.[8] Concurrently, the English Chess Federation (ECF), then the British Chess Federation, began standardizing its grading system in the early 1950s under Richard W. B. Clarke, establishing a national framework that emphasized performance against competition strength and persisted as a distinct variant.[8][4] The 1960 USCF adoption of the Elo rating system, developed by Arpad Elo, represented a probabilistic breakthrough by modeling player strength on a logistic scale to predict game outcomes more accurately than additive point systems.[4] FIDE followed suit in 1970, implementing Elo internationally and establishing a unified global standard that became the benchmark for classical over-the-board chess.[4] In 1995, Mark Glickman introduced the Glicko system to address Elo's limitations in handling rating uncertainty and infrequent play, incorporating a rating deviation metric to better reflect volatility in player assessments, particularly for less active competitors.[16] During the 2000s, Jeff Sonas launched Chessmetrics, a regression-based system for retrospective historical ratings that adjusted for era-specific competition and game quality, enabling cross-temporal comparisons of players like Emanuel Lasker and Garry Kasparov.[34] This period also saw the rise of online chess platforms, which adapted Elo variants for digital play and later contributed to systems like the Universal Rating System (URS), blending results across time controls to provide a holistic player evaluation.[39] In the 2010s, national variants such as Turkey's UKD system—combining Elo with local performance factors—and Germany's DWZ (Deutsche Wertungszahl), a Bayesian Elo adaptation emphasizing recent results introduced in 1991/1992, increasingly integrated with FIDE ratings through dual-list maintenance and conversion formulas to facilitate international participation. Simultaneously, computer-referenced systems proliferated, using engine evaluations to calibrate human performance against AI benchmarks, enhancing analytical tools for training and rating validation amid advancing computational chess strength.[49][50] Post-2020, FIDE introduced hybrid rating regulations for rapid and online events, allowing verified remote participation in rated tournaments via platforms like the FIDE Online Arena, with separate lists for rapid (10-60 minutes per player) and blitz to accommodate the surge in digital chess while maintaining integrity through anti-cheating measures. In 2024, FIDE further updated its rating regulations effective March 1, including a minimum initial rating of 1400 after five games for new players, revised K-factors (40 for newcomers and those under 2300, 20 for under 2400, and 10 for 2400 and above), mandatory pre-registration for high-rated players, and minimum time controls to enhance tournament reliability and combat rating inflation.[1][51]Timeline of Developments
- 1920s: The English Chess Federation (then British Chess Federation) introduces grading bands to classify player strengths in domestic competitions.
- 1948: The German Chess Federation introduces the Ingo system, an early numerical rating method based on performance classifications.[5]
- 1950: The USCF implements the Harkness system, a point-based rating approach that awards fixed points for wins and draws relative to opponents.[4]
- 1958: The British Chess Federation (now English Chess Federation) publishes its grading system, devised by Richard W. B. Clarke, to classify player strengths in domestic competitions.[8]
- 1959: Arpad Elo proposes a probabilistic rating model using logistic distribution to estimate expected scores and update ratings accordingly.[52]
- 1960: The USCF adopts Elo's rating system, replacing the Harkness method for more accurate strength estimation.[4]
- 1970: FIDE officially adopts the Elo rating system for international player evaluations.[53]
- 1990s: Germany introduces the Deutsche Wertungszahl (DWZ) system in 1991/1992, an Elo variant with adjustments for tournament frequency; the International Correspondence Chess Federation (ICCF) adapts its rating process, raising title qualification thresholds to 55%, 75%, and 85% in 1980.[50][54]
- 1995: Mark Glickman publishes the Glicko rating system, incorporating rating deviation to account for uncertainty in player strength estimates.[16]
- 2002: Jeff Sonas launches Chessmetrics, a historical rating tool using regression methods to evaluate past performances across eras.[55]
- 2000s: Turkey's Chess Federation implements the UKD system, blending Elo with national performance metrics; concepts for a Universal Rating System (URS) emerge, aiming for a single rating across time controls.[39]
- 2010s: The Top Chess Engine Championship (TCEC) establishes computer engine ratings using Elo-like calculations starting from its inaugural season; FIDE introduces online rating rules via the FIDE Online Arena platform in 2014.[56][57]
- 2020s: Post-pandemic, FIDE approves hybrid chess competitions—combining online play with in-person verification—for official ratings, expanding accessibility during restrictions. In 2024, FIDE updates rating regulations effective March 1, including revised initial ratings (minimum 1400 after five games), K-factors, and tournament requirements.[1]
