Why aggregated customer reviews for games are fundamentally flawed

I don’t know about you, but I find myself relying on customer reviews to make game purchase decisions more and more. I think the written descriptions are especially helpful because they allow you to work out whether you appreciate the same things in a game as the person writing the review.

However, I’ve been giving some thought to the underlying mathematics of aggregated customer reviews and have created a theory which suggests we need to be wary of trusting these aggregated scores too blindly.

A quick test

I have a question for you. If you came across the following two games on your game storefront of choice, which would you consider to be the best game?

  • Game A: 50,000 units sold, 75% positive reviews
  • Game B: 5,000 units sold, 90% positive reviews

I’m guessing that most of you chose game B. I would have too until recently, it looks like the classic example of a great game that has been poorly marketed, but now I’m not so sure.

Let’s delve into the mathematics.

We need a metric

Before I start, I need to stress that this is just a theory, an interesting point of view to consider, I’m going to make a lot of assumptions here and there will always be exceptions.

It’s useful in this sort of mathematical model of a real life system to design some metrics. The most important one for this theory is a Quality Factor (QF) that we can use to compare games. I define this as the percentage of people who would consider themselves satisfied with the experience a game offers. The rest of this theory looks at how we can ascertain the QF for a given game.

Option 1 – Aggregated review score

Now you might be thinking that we already have a QF available to us by way of the aggregated review score.

Let’s see how the maths works out with our two games from earlier. Consider 100,000 people visit both pages (our first assumption), check out the game and either decide to buy it or not.

  • Game A has 50,000 purchasers, 75% of which are satisfied (based on the review score). This means 37,500 people are left satisfied out of the original 100,000.
  • Game B has 5,000 purchasers, 90% of which are satisfied (based on the review score). This means 4,000 people are left satisfied out of the original 100,000.

Hopefully you will already see what I’m getting out. Out of the original 100,000 people who checked out the two games, 37,500 people were satisfied with game A whereas only 4,000 people were satisfied with Game B.

In terms of the QF, this would mean Game A has a QF of 37.5% and Game B, 4%.

This obviously tells a very different story to the review scores themselves.

Option 2 – Aggregated review score * number of sales

I propose that a better way of calculating the QF is to multiply the aggregated review score with the number of sales.

As you can see from the examples in the last section, that’s how we end up with the “correct” values for QF. IE 50,000 * 75% / 100,000 = 37.5%.

On first inspection this seems insane. Why would the number of sales have any influence on the quality of a game?

Well, it’s not that the number of sales has an influence on the quality. But rather, the number of sales is a result of the quality.

The problem with just using the aggregated review score is that it discounts all the people who never bought the game. I know this sounds obvious but it’s so important.

If all the people who didn’t buy the game were given a copy and then forced to leave a review score, there’s a good chance they’d be negative. As in “I just don’t really enjoy this sort of game”.

Remember, our QF is a measure of the percentage of people who would be satisfied with the experience a game offers. It needs to include all the people who wouldn’t buy the game in the first place.

Assumptions

Before I move on to some of the implications of this theory. I should list out all of the assumptions, many of which you will already have spotted.

  1. We’ve assumed the same number of people visit both store fronts. If a game is poorly marketed then of course, less people are going to end up on the store front. However, I think the effect of this is probably overstated a lot these days. Perhaps lots of people are visiting the page; they’re just not purchasing the game. I believe these non-purchasers needs to be taken into account in the quality measure of a game.
  2. We’ve assumed that everybody who didn’t buy the game, would not have enjoyed it. Clearly there are going to be people who mistakenly disregarded a game they would have enjoyed if they tried it.
  3. Finally, the biggest assumption of all is that our definition of QF is correct in the first place. Our definition is one that rewards games that merely satisfy on a broad scale. It does not take the intensity of satisfaction into account. More on this later.

Measure of expectation

Based on this theory, I would conclude that the aggregated review score is less useful as a measure de facto quality, but is rather a measure of how well the expectations of the purchaser were met.

Think about it. If you don’t think a game looks good, you won’t buy it just so you can leave a negative review. Well you might, but then you’re probably in the minority… and a sadist.

The reason negative review scores get left at all is because a player purchases a game they think they will enjoy but are then left dissatisfied with it.

Likewise, a positive review score is normally left when the player expects they’ll enjoy the game, but then have their expectations wildly exceeded.

Therefore, I propose that the aggregated review score should not be used as a measure of objective quality, but as an objective modifier to apply to your own subjective expectation.

For example, if you think the game looks amazing but it only has a 75% rating, it’s probably not quite as good as you thought but could still be a lot of fun. If you think it looks merely OK, but has a 95% rating, then it’s probably better than you think.

What about niche games?

I don’t believe my theory is a fair measure of niche games. As has already been stated in the assumptions, we are basing our entire QF on the idea that the quality of a game is equal to its broad appeal and its ability to merely satisfy. We don’t account for the intensity of satisfaction that niche games can bring to their players.

Pricing

To finish, I wanted to add a note on pricing and how my theory impacts on this.

My theory says that a game has a constant QF that is equal to the percentage of people who would be satisfied with the game if they purchased it.

If we lower the price of that game, we lower the expectation threshold required for people to purchase the game. In other words, people “take a punt” because the game is cheap or on sale.

However, the QF remains fixed. We’re just increasing the sales due to the lower price. Here’s how the numbers turn out:

Game A, normal price, 50,000 sales. 37,500 people satisfied. Review rating 75%.
Game A, discounted price, 75,000 sales. 37,500 people satisfied. Review rating 50%.

We’ve turned the theory on its head so that we’re no longer measuring the quality factor (that’s a constant), instead we are using the quality factor to predict the aggregated review score.

In conclusion, the theory predicts that low prices and discounts could actually lead to lower review scores. Scary huh?

If you have found this article interesting, consider following me on Twitter or subscribing to the Twice Circled newsletter. This is a monthly newsletter containing blog post summaries and development updates on my latest projects.