Kotaku calls out Quarter to Three for aberrant review scores

Spoken like a true critic.

I do think Metacritics pretense is entirely wrong, precisely because it tries to be objective within such a narrow margin. What's the difference between a game of 65% and a game of 60%? It's really just too silly to consider, and it means that a 1% or 100% review makes more of an impact than a 60% or 40% review, while enabling critics to huddle around that all too comfortable middleground.

A model like the one on Rotten Tomatoes makes a lot more sense. Is a game a positive experience or a negative one? If the Rotten Tomatoes model was in place, then something like 90% of all the critics on Metacritic would love absolutely everything, and that would actually expose the problem in a way that Metacritic can't.

ab·er·rant
/ˈabərənt/

Adjective
Departing from an accepted standard.
Diverging from the normal type.

Hi folks! I'll post the e-mail I just sent Tom:

Thanks, Tom. I'm not sure we disagree on much of this at all, though.
When I say you're well-known for aberrant scoring, that isn't a
criticism. It's a statement of fact. Your scoring deviates from the
norm, and therefore it is aberrant. That's not to say the norm is
"okay" or that it shouldn't be changed. But I think that's a fair way
to describe the way you score games.

And I think we agree that the system doesn't work! When your scores
are averaged with scores from Game Informer and IGN, the resulting
numerical output is flawed because you use a different scale than they
do. That is a problem - one of many with Metacritic. I would LOVE to
see IGN and Game Informer and every other review site out there adapt
the same 5-star system that you use. It'd make Metacritic scores
meaningless to publishers and it'd encourage gamers to read criticism
rather than thinking things like "Oh, IGN gave it a 7, better stay
away!"

But as I think I've laid out quite clearly, scoring discrepancies
definitely aren't the only problem with the current system.

Hope that makes things clearer. Thanks again for weighing in on the discussion!

Classy response Jason. Tom, your defensive attitude was unnecessary. You guys pretty much say the same thing, with slightly different foci. And yes, Jason's use of "aberrant" was accurate and completely appropriate in his article - don't let overloaded perceptions of a word override its actual meaning.

I'm thinking beyond definition and towards connotation, which is just as important. "Aberrant" is more synonymous with "deviant" than simply "different".

It's the difference between calling someone "pig-headed" versus "resolute".

I would simply point out that another viable review metric is "how strongly would I recommend this to others." For example, with the full breadth of a 100 point scale that I'd totally hate using, I could see giving Fez a slightly lower score than Super Meat Boy, which are both games I'd give 5/5 on the q3 scale. That'd be due to technical issues that can be experience-defining, such as the shockingly-prevalent save corruption bug. I mean, my friend bought the game and has found it completely unplayable due to that bug; it'd be astounding to ignore it. Despite that, I like Fez just as well, if not moreso, but I'd be hard-pressed to ignore its technical issues in a review's copy. If my review score system allowed the gradation necessary (again, an IGN or GameSpot) then I'd probably allow it to reflect my ultimate score.

But I agree that the "objective merit" system isn't viable. Marking down Fez or Meat Boy for "limited appeal" or "lack of replayability" is nonsense. I generally review games for my own experiences with them or thoughts about them, and I'm similarly stingy with my 5/5 reviews.

Actually, saying that Tom is known for aberrant scoring is fair. Tom doesn't use the 60-90 scale almost everyone else uses, and that makes him "aberrant".

My problem with the article, and the reason why I think Tom was defensive about that phrase, is not that phrase in itself, but the context in which it appears. Let me put both paragraphs here:

"'The problem is the scale,' said Obsidian’s Urquhart. 'There's an expectation that a good game is between 80 and 90. If a good game is between 80 and 90, and let's say an average game is gonna maybe get 50 scores, if you
wanna hit that 85 and someone gives you a 35, that just took ten 90s down to 85... Just math-wise, how do you deal with that? Some guy who wants to make a name for himself can absolutely screw the numbers.'

One reviewer well-known for aberrant scores is Tom Chick, who runs the blog Quarter To Three. Chick is listed for having the lowest Metacritic score on BioShock Infinite (a 60) and Halo 4 (a 20), among others. He uses a 1-5 scale that Metacritic converts into multiples of 20, so Chick’s 'I liked this game,'–3 out of 5–is converted into a 60, which most Metacritic readers see as a bad score."

Now, the last phrase before the mention to Tom Chick says "Some guy who wants to make a name for himself can absolutely screw the numbers". And then Tom Chick is introduced as someone who "screws the numbers". That phrase, howerver, leaves a lingering implication that he's also someone who "wants to make a name for himself". And that's what sounds offensive to Tom, and that's why he acted defensively.

So, it's not the text or the article, but the placement and the context of that phrase. People who know Tom will not draw upon that implication - that he wants to make a name for himself - but people who don't know Tom, or only read his scores, and not his reviews, will be tricked into that implication. I'd say that's unfair.

No, it's not "the difference". That you choose to overload an actual definition with connotation is your choice/mistake, not his.

Again, this enters the area of objective reviews, of reviews like consumer guides. Exactly the problem Tom is pointing out. Even the way bugs influence your experience varies.

For instance, for Fallout New Vegas, it was buggy as hell but was still one of my favorite games of the year. Fallout 3 was also really really buggy, but many reviewers, Tom included, quite enjoyed it. Some people were soured on XCOM because of bugs, others gave it their GOTY.

This is factually untrue. They provide an average of the reviews by diverse publications. The main problem is that scales between publications don't match.

You could argue that metacritic should do the extra legwork of trying to normalize scores, but why should it? Gamespot/Edge/IGN/Polygon don't say 'we rate on a 1-10 scale but use only 40% of that scale'. The pretense is that they use the full scale, EXCEPT they give 60 for just showing up, as Tom mentioned.

Do you have any idea what QA actually means?

Of course connotation matters, you deviant.
"Aberrant" was not the best word for that article.
(At least, I don't think so, given the context of the article, which was not disagreeing with Mr Chick.)

If developers are silly enough to tie their financial well-being to a review aggregator, I don't really have any sympathy when that decision blows up in their faces. They agreed to that bonus structure knowing the problems with Metacritic. I'm sure that if some "aberrant" critic gave a game a particularly high score, barely pushing the game over its bonus threshold, the developer would insist on its strict right to compensation. Games are a business; business ain't beanbag.

No one cares too much about typos in a blog if meaning can be discerned, Mercanis. AP Style certainly doesn't, and specifically holds blogs to a different standard than writing for publication. I don't think you've ever really fully grasped this.

In cinema it's expected that contracts will be tied to metrics, but those metrics are box office results. Leaving aside whether that's good or bad, why aren't film production contracts tied to critical reviews? Because there is a critical voice about cinema as a medium that is distinctive from what is popular. We accept it is possible to make a critically-acclaimed movie that pushes the medium forward in an important way, yet that this may not lead to ticket sales. But game reviews don't work that way. Why? For the most part there is no genuine critical voice.

Echoing Tom's comments, if there's a problem with metacritic it's that the major game review sites are a farce. They lack integrity. All this fuss about game developer bonuses seems beside the point when the foundations of the reviewing system are so screwed up.

That's a prescriptive view to take on word-use; fact of the matter is that the word carries connotation. Since the author didn't choose the word by accident (one hopes) its usage deserves scrutiny.

ESPECIALLY when you consider the way fanboys address Tom after their game du jour gets an 'aberrant' score.

Why am I not surprised that a "former lawyer" runs Metacritic, and not somebody with a math major? If 90% of IGN's scores are in the 6-9 range, and Tom Chick's is 1-5, it's easy enough to normalize the data before making comparisons.

The guy does say he does "super-secret critic weighings", but it sounds more like anti-lawsuit talk rather than any attempt at rationality.

What is Metacritic for? It's not an objective measure of merit, but it is an aggregate of reviewers many of whom have popular preferences. And the more reviewers Metacritic has, the more that is true, and the less "outliers" like Qt3 matter (to the score, I mean.) So it is a proxy for popularity, hence sales, hence it is a usable tool for maximizing profit, hence publishers will continue to use it, informally or formally (via contracts).

It is NOT a tool for finding artistic worth, playability, or any other "subjective" metric. You know what is? A game review from somebody who cares about those things.

"I don't think you've ever really fully grasped this."

I feel like my intelligence has been vaguely insulted.

I typically keep typos to myself, but ever since Mr. Chick said he welcomed them being pointed out, I've posted when I notice them. That's all there is to it, really.

I admit I used to point out each and every little "error", but I'd like to think I've loosened up in recent memory.

Sarcasm?