7-9 scale - the discussion

IIRC Rotten Tomatoes did this for a little while with game reviews and the end result was almost every game was 80+% “Fresh” since even the harshest game reviews usually have some stupid “worthwhile for those who follow the genre” caveat.

My issue with sites like Gamerankings and Metacritic is that they are not in any way, other than accidental, reflections of game quality.

  1. The sample size is too small to be statistically relevent.

  2. The reviewers at the various sites used to come up with a ranking are not judging on the same scale or, more importantly, examing the same things. Some reviewers are looking at overall game quality, some at perceived consumer appeal and others are looking to impress an editor or the demographic of their readers. For example, some sites care about sound enough to call it out separately, others do not. This leads to wildly different views of the same games and indeed of features within the game.

  3. All games are not judged by consistent standards. There is no question that sequels are compared to an original, that annual franchises are viewed differently than new IP and that the new and fresh is preferable to the familiar. Games are often judged against the current standard of the genre and/or what competitive products are on the market. While these standards are fine when it comes to informing the general consumer, they cloud the picture if one attempts to use these reviews as an objective measure of product quality.

  4. Developers actually use these sites to guide feature design. I’ve worked at several companies who have established a “Gamerankings” or “Metacritic” target for their products. In my opinion this is absurd. Using such a target, designers will inevitably tailor their design towards the bias of game reviewers. This rarely results in good game design, because game reviewers are not ever the target audience of a consumer software entertainment product. They tend to be, no offence to any reviewers our there, considerably more jaded than the typical consumer.

There is no truly objective measure of video game quality that I’ve come across. I know I base my personal measure of success on sales numbers. Those are the numbers that tell me if I still have a job after the game ships.

I have no issue with people using these sites to help make purchasing decisions, but the aggregate numbers should be taken with a large grain of salt if one is attempting to determine product quality.

Let me get this straight – in your opinion, it is utter chance that multiple reviewers rated Half-Life 2, Civ 4, and Unreal Tournament 2004 higher than Catfight, Panty Raider, and MTV: Celebrity Deathmatch. The fact the higher rated games just happen to be better products than the lower rated ones is purely random. In your mind, the most cogent explanation for why the top rated games at Gamerankings and Metacritic land on multiple magazine/website “Top 50 Games of All Time” lists, and would likely match up fairly well with Qt3ers lists of “Top 5 Games for a Desert Island” is coincidence.

I don’t agree, but I’m pretty sure I’d like to play poker against you. Any two cards can potentially win; it’s pretty random.

Since you work in the industry, I can understand your preference for sales numbers as a measure of success; it directly impacts your job security. As a consumer, however, I like that developers shoot for high ratings. They could just as easily target the “Mall of America Tycoon” segment, since average or below average-rated titles like that regularly make the top selling list.

Many of the posters on this forum are the ones who will be rating these games, even though some wish for a day when they can write for the ratingless New York Times Games Supplement and forget about the “7-9” years. I tend to agree with their ratings, jaded or not; there’s a much better chance I’ll like a higher rated game than a lower rated one. I’m glad developers are trying to make games that will please them and get high ratings, because I want more high-quality games.

And that is precisely the reason that people do not trust review scores.

I think GameRankings would be extremely useful if it were to say, drop the 50 highest scores. Take Jade Empire for example. I may be off in my counting by one or two, but it takes 59 review outlets just to start getting to to scores under 90%.

And I’d really like to know the criteria for becoming one of the “bolded” sites/mags. What I mean is, only scores from the ones in bold get used in computing the average. Sorry “Gamer Within”, you just don’t have the clout of “Deeko.com” and “Gamenikki”, or even “Lawrence”.

Let me get this straight – in your opinion, it is utter chance that multiple reviewers rated Half-Life 2, Civ 4, and Unreal Tournament 2004 higher than Catfight, Panty Raider, and MTV: Celebrity Deathmatch. The fact the higher rated games just happen to be better products than the lower rated ones is purely random. In your mind, the most cogent explanation for why the top rated games at Gamerankings and Metacritic land on multiple magazine/website “Top 50 Games of All Time” lists, and would likely match up fairly well with Qt3ers lists of “Top 5 Games for a Desert Island” is coincidence.

I don’t agree, but I’m pretty sure I’d like to play poker against you. Any two cards can potentially win; it’s pretty random.

Sure, the ranking systems work fine if you look at the top few and bottom few games. It’s fairly simple to agree that Catwoman is a bad, bad game and that Civilization is a good one. Where the system breaks down is at any point between universal acclaim and universal horror.

There is no definition of quality used by these systems. If I actually read the reviews, I can, via the syntax of the reviewer, gain an understanding of how the reviewer felt about the product. If I look at an aggregate score all I’m getting is a normalized (somewhat) integer.

Let’s look at 3 recent racing game scores, for example. For the Xbox 360, on Metacritic, PGR 3 is ranked above NFS: Most Wanted which is ranked above Ridge Racer. All three are fairly close in score, but by the ranking theory of game quality PGR 3 is clearly better, right? Unless I don’t like loaded track racing, I suppose. Of course, Most Wanted is better than Ridge Racer…oh, unless I just want a cool arcade drift game without cops.

My point is that assigning a numeric score based on reviews that take very differernt criteria of quality in to account is folly. It requires that you ignore the variables that make the individual reviews different in the first place.

By all means use the top and bottom half of these lists to make binary good/bad game decisions, but don’t try to guess which game is better out of several with similar scores. Read the actual reviews, don’t rely on gamerankings or metacritic to determine quality.

Btw, I’m an analytical chemist by training so I must admit I have a bit of a tough time with this sort of aggregate approach to disparate pieces of data.

Pure numbers game. From the site:

Q. What does it take to get a site included in the composite score of Game Rankings?

A. This is the most commonly asked question. The things we look for when adding a new site are: At Least 300 archived reviews if they review multiple systems or 100 reviews if they concentrate on only one system or genre.

The site does at least 15 reviews a month.

The site is visually appealing and looks professional.

The site reviews a variety of titles.

The site has it’s own domain name and is not hosted on GeoCities or another free server.

The reviews need to be well written.

The site conducts itself in a professional manner.

I’ve never understood this. In the same sentence that some people decry useless scores in reviews they race off to Gamerankings for nothing but an aggregate score. I thought it was the text that was important?

Gamerankings is taking the wrong approach, then. What they should do is convert each site’s score into a letter grade and then average those together. Then an 8/10 and 80% might be a “B” and a 3/5 might also be a “B”.

RE: Normalized game ratings. IIRC gamerankings lets you view all scores for a given publication (can’t find it right now, though!) Couldn’t those pages just be scraped to produce normalized averages?

Do you mean that their advertising people are calling your editorial people? Or that your advertising people are telling editorial that they’re getting threatening calls?

I only ask because I’ve yet to actually hear of a real world situation where pressure is exerted by advertising $$$ on the newsroom (admittedly, I’m not in game journalism.). I’d started to think that the common reader complaint – that the reviews were driven by advertising – is a whiny fanboy myth.

Never once got pressure from an ad person to alter a review in my 15 years of magazine editorial work.

I did get a couple of calls from game companies telling me they were going to pull their ads. (Including one from the CEO of a mid-sized company – I felt special.) My basic response is “Oh, well, you need to be sure to tell the advertising people that, it’s not really my department.”

If a publisher thought a review had unfair elements, I was more than happy to discuss those. But the ad threats were just a waste of time. Even if you discard the ethical issues, the editors are too far removed from the money chain to care.

And the threats are counterproductive anyway. They didn’t directly affect coverage, but you have to think that when you’re limited in space and you have two cool games to cover but only room to cover one of them, the game from the company whose enraged CEO tried to use monetary pressure to influence your editorial is going to take at least a subconscious -5 on the 1d20 roll to see which one gets coverage…

The ad-threats go to the ad people, the polite requests for another reviewer to voice their opinion go the editor.

Both work. Not that I’ve done that sort of job for many years.

:twisted:

I find your last point about game preference close to a straw man argument; of course personal taste for a genre, or specific types within a genre, will override any score. I’ll never buy a sports game, regardless of ranking, because I don’t like sports. The remainder of my post assumes that gamers have interest in racing games in general.

Statistics can’t say whether PGR 3 is “clearly better”, but it can provide the probability that chance produced higher scores for PGR 3, relative to the other two games. If chance can’t account for the difference, something else, perhaps an actual difference in quality, may explain the scores. You need a standard deviation (or standard error, to be more precise) to perform this test; you can’t just provide three average scores.

Metacritic ratings for the 3 games are 88 (PGR 3), 84 (Most Wanted), and 78 (Ridge Racer). Assume the statistical tests find that chance might reasonably account for the 4 point difference between PGR 3 and Most Wanted, but that it’s highly improbably chance accounts for the 10 point PGR 3/Ridge Racer difference. I would expect to find that, among real-world players of all 3 games, there’d be debate over whether PGR 3 or Most Wanted is superior, but most gamers would find Ridge Racer the least compelling of the three.

That’s why I argue that ratings are useful tools; because I believe they coincide with mass opinion a significant majority of the time. I disagree with you that ratings will only be useful at the extreme upper and lower ends of a population. I think many people would see similar meaningful distinctions between the four segments of games ranked 90-95%, 75-80%, 60-65%, and 45-50%, not just between the highest and lowest. At the same time, I think we are in agreement that there would be more debate between people as to the relative quality of the 92% versus the 93% games, or the relative suckitude of the 46% versus the 47%.

So you work for the gaming press? And does your print mag or web site (or whatever) capitulate to these demands?

Its stuff like this that makes the whole game review system worthless. I wonder why on gamestop the review might rail about how crappy a game is, yet the score is still like a 6.5.

What the gaming press needs is some outside media watchdog group to keep them on the strait an narrow or the gaming press needs to form something like a corperate union where if any member is thretened like that, all members will take said publisher to take a hike.

It’s not really necessary, though, because those sorts of threats are usually empty ones. Publishers advertise purely out of self-interest. They don’t buy ads as a reward for good reviews; they buy ads because they want to reach a certain demographic. They may hurt the publication by pulling those ads, but they hurt their own marketing efforts, too. So they’ll bark a lot about negative reviews, but it’s pretty uncommon for them to actually bite.

So you work for the gaming press? And does your print mag or web site (or whatever) capitulate to these demands?

[/quote]

He works for GameSpy, as do I, and NO, we do not capitulate to these demands. They can rant all they like and the sales people can moan all they like, but there HAS to be a separation of church and state for there to be any credibility. Luckily my bosses understand that. If they didn’t, I’d quit. If a publisher points out factual errors, we can fix it (and are happy to). But if they disagree with our opinion on their biggest title of the year, too bad. We have no agenda or vendetta out for any publisher, any more than we are in bed with any publisher because they use our technology in their games. If it sucks in implementation, we say so (see Mr. Chick’s review of BF2: Special Forces).

I don’t expect everyone to believe me, but that’s the way we run our operation.

What is the “MacReview” syndrome?

On a side note, one of my students stopped by to talk to me this afternoon before he ran off to one of his clubs. Good kid, likes to talk about games, really pushing Call of Duty 2 on me.

Conversation turned to magazines and he mentioned that he subscribed to CGW. And that he couldn’t understand why their reviewer “hated Age of Empires 3”. I explained that a 3/5 is hardly hate, and that the review explained in quite a bit of detail where it came up short. He’s still convinced it’s a five-star game.

So Tom, there’s a junior in Maryland who’s a little mad at you.

Troy

Here you go, sorted by average game score. IGN clocks in at 70.6%, which would, in theory, blow flyinj’s argument out of the water. Stranger things have happened :)