Elo Rating System

(warning: it's mostly text below, but it's not too hard!)

The Elo rating system (its name is derived from its creator, Arpad Elo) was originally designed to rate the relative skill of chess players. Today though, its use has expanded to rate skill in many different competitions - from esports to table tennis. It's one of the most popular skill rating systems in use today.

How Does Elo Work?

The core premise of the Elo system states that the difference in rating between two players predicts the likelihood of a certain match result. For example, consider two chess players, Ellen and Adam. If Ellen is rated 2000 and Adam is rated 1800, then Elo would project Ellen to win 66% of the time, Adam to win 15% of the time and that they draw 19% of the time. If Ellen were rated higher than 2000 and the difference in rating were larger, then she'd have an even higher probability of winning.

Now, let's say Ellen wins. A win counts, at most, for 32 points in the system, but since Ellen was expected to win 66% of the time, the Elo system says we only give Ellen 8 pts & Adam would lose 8 points. If they drew, Adam would actually gain 8 pts and Ellen would lose 8 pts, because, as we said before, Ellen was expected to win so Adam overperformed with a draw. And if Ellen loses, well, Adam wins a big 24 pts and Ellen loses 24 pts.

We won't delve into how those exact numbers are generated here, but simply know that the Elo system is relative (i.e. the difference of rating is what matters) and that it also deals in probabilities... Ellen isn't expected to win every time. Sometimes Ellen may be tired or perhaps she just had an off day. It doesn't matter the particular reason, just that Ellen won't always beat Adam, even though she's higher rated.

For more information on the exact specifics of Elo, check out this video or the wikipedia article.

How Does Natively Implement Elo?

Natively uses Elo to rate the relative difficulty of books just as Elo rates the relative skill of players. In this analogy, the 'books' are the players, 'difficulty' is the skill and the 'comparisons' are the matches.

So, let's look at a comparison between よつばと！ and ふらいんぐういっち:

If よつばと！ has a level of 20 and ふらいんぐういっち has a level of 18, then elo (the level here corresponds to the elo rating) would project よつばと！ to be selected as harder 66% of the time. As you can see, this is exactly analogous to Ellen & Adam's chess match! The books are the players, the difficulty levels are the elo rating and a 'win' is being selected 'harder' in the comparison.

On The Nature Of Difficulty

As stated in the previous section, even if よつばと！ is rated as more difficult than ふらいんぐういっち, we don't expect everyone to think that. Indeed, the difference in level between two books only corresponds to the amount of people we expect to think that book is harder, but it's never 100%. Even books with a level difference of 5 may get one or two people saying they're the same difficulty!

And it's understandable why people will disagree. Some people may think complex grammar is more difficult than lots of vocabulary, or vice versa. Some may have read a version with more furigana. Or perhaps someone remembered the books a bit poorly. Hopefully, people have written reviews of the book and explain what they struggled with so that you have a better sense of the difficulty.

The good news is that this is all ok! Elo is good at handling when Ellen loses to Adam, just as well as if よつばと！ was rated easier ふうらいんぐういっち. And of course, if よつばと！is rated easier enough times, the rating system will update the ratings accordingly!

Elo Rating System

How Does Elo Work?

How Does Natively Implement Elo?

On The Nature Of Difficulty

Amazon Affiliate