This guest blog post is written by /u/tornmandate, who we noticed had a pretty cool algorithm going on. Many thanks to Torn for coming up with this method and writing this blog post.
SteamDB informed me they liked my formula, and it seems that a lot of the users would agree that it produces better results than Wilson's score interval's lower bound (AKA the old formula). So this is a short write-up on how I came up with it, and why it's probably a better estimate of the score.
If you've ever taken a slightly longer look at the Steam store, you've probably noticed their method of sorting games by review score is pretty bad. They just divide the positive reviews by the total reviews to get the rating. A game with a single positive review would be ranked above some other game that has 48 positive reviews and a single negative review. While they do have “steps” at 50 and 500 total reviews, meaning that no game with a rating of at least 80% will be ranked below a game with less than 50 reviews, and no game with a rating of at least 95% will be below a game with less than 500 reviews, it's still a bad system. Because if our 48 to 1 rated game suddenly accrued 11 more negative ratings, Steam would miraculously still place it higher than it was before.
I looked over some alternatives, including Wilson's formula that SteamDB used, but I didn't quite like any of them. Some sites seemed to have pretty good sorting formulas, but what exactly those formulas were not disclosed. So, I sat down to create my own, starting with trying to word the rules by which Steam's games should be sorted.
- Games with more reviews should have an adjusted rating that is closer to their real rating, because the more reviews we have, the more certain we are that the score they give us is correct.
- All ratings should be biased towards the average rating — 50%.
And that's all, really. It just needs to be a little more precise: “For every 10x the reviews we have, we should be 2x more certain that the rating is correct.” Almost good, but I can't quite figure out what “2x more certain” means. But it sounds like it should be equivalent to “2x less uncertain”, and I can work with that.
Clearly at 0 reviews we're 100% uncertain as to what the rating should be. Let's stretch that a bit and say that we're 100% uncertain at even just 1 review, then we can apply the earlier thought. So at 10 reviews we should be 2x less uncertain, that is 50% uncertain. At 100 reviews, 25%. 1000 reviews, 12.5%, and so forth.
So given a game with 100 reviews of which 90% are positive, we're 25% uncertain that this 90% is the correct rating. So we're 75% certain that it is the correct rating. In other words, 75% of the final rating is 90%, and the other 25% is the average rating of 50%, which also nicely fits with our second rule. This gives us a final rating of 75% * 90% + 25% * 50% = 80% This looks good, and these rules can be translated into a formula that gives us the adjusted rating with respect to the number of reviews and the “real rating”.
As latex:
( Total Reviews = Positive Reviews + Negative Reviews )
( Review Score = \frac{Positive Reviews}{Total Reviews} )
( Rating = Review Score - (Review Score - 0.5)*2^{-log_{10}(Total Reviews + 1)} )
Compared to Wilson's formula, it's very short, and it's not nearly as difficult to understand the idea behind how it works. One could easily get the question "Why would this random formula give better results than what a very established mathematician came up with?" I can't really answer that, but it would seem a lot of you agree that it does indeed produce better results when rating Steam's games. I can, however, try to give some insight into this.
For one, Wilson's formula isn't really meant to be used quite like this. It takes a rating and the sample size (the number of reviews), and outputs a confidence interval. And a confidence interval basically says that “We are some% sure that the score is between x and y”. If you increase the % of how sure you are, the distance between x and y also increases, and vice versa. But to get a single rating, it's not quite okay to just take the lower bound of that interval.
Secondly, because of what was mentioned in the last paragraph, it always gives us a lower rating than the original. This is clearly the incorrect behaviour, as something that just came out and gets a single negative review will be marked as having a score of 0%. Meanwhile, an established terrible game can have 10 positive and 500 negative reviews, and it will rank higher. This is also the reason why one of the two rules I listed was that all ratings should be biased towards the average.
Finally, while Wilson's formula probably gives us a more “precise” rating, so to say, it's not necessarily what we want to see. There's a lot of mathematics behind why what it does is correct, while the previously mentioned numbers of 2 and 10 that I picked for my formula were rather arbitrary. Still, I selected them so that the result would also account for the high number of reviews when assigning a good score. It's why you'll probably notice a lot less games with a low review count among the top games than before.
I think that's important because a game that is very popular and very highly rated should be ranked higher than a game that isn't as popular and is also very highly rated. Not because we can be more certain that this rating is indeed correct, but because you, as a random person who has yet to try that game, will more probably like it if a lot of other people have liked it as well — if it's not a niche game. And I think this aspect is definitely important and should be accounted for when trying to represent an entire game with just a single number.
SteamDB site uses this new algorithm on all of its pages now, you can find it on top rated games, and as a new addition, browser extension now also displays SteamDB rating on Steam Store game pages. You can also find reference implementations in PHP and JavaScript below.
EDIT, September 2018: Games with less than 500 votes in total will show an ❓ unverified icon to signal that there's not enough certainty in the rating.
One important note: SteamDB rating is calculated from all purchase type reviews, whereas Steam only uses Steam purchases.
PHP implementation
function GetRating( $positiveVotes, $negativeVotes ) {
$totalVotes = $positiveVotes + $negativeVotes;
$average = $positiveVotes / $totalVotes;
$score = $average - ( $average - 0.5 ) * 2 ** -log10( $totalVotes + 1 );
return $score * 100;
}
JavaScript implementation
function GetRating( positiveVotes, negativeVotes ) {
const totalVotes = positiveVotes + negativeVotes;
const average = positiveVotes / totalVotes;
const score = average - ( average - 0.5 ) * Math.pow( 2, -Math.log10( totalVotes + 1 ) );
return score * 100;
}
This post and formula provided above are dedicated to the public domain. Reference implementations are available under the MIT license. Feel free to use it anywhere for any purpose.