Steel cage deathmatch fight: Tom Chick vs. Bruce Shelley

People don’t read or care about those little blurbs that explain what a score means. As Ben says, people are accustomed to having seen both 5 star and 0-100 scores in many other contexts, and evaluate scores accordingly.

To me, 0-100 (and 0.0 to 10.0) map directly into grade school scoring. Anything below 60 is failing (quite bad). Anything below 80 is essentially poor. And unless you’re hard up for a given genre, you really shouldn’t much consider anything below an 85.

Most magazines/web sites reviewing on this scale have 80%+ of their games in the range of 60-93.

On a 5-star scale, anything below a 4 is considered something to avoid. The only ‘check this out’ scores are 4, 5, and 4.5 (if they have it).

It would be nice if Gamerankings tweaked their translations of 5-star scores into 100 point scores. It would be nice if they had a sliding scale per site, that adjusted for each sites tendency to score games a bit higher or lower than their peers. But this adds complexity, and really doesn’t add much value to their final composite scores - For any game with 12+ reviews (i.e. all major games), the scoring subtleties of individual sites and systems get averaged out.

That said, as a developer, I was certainly frustrated by the 5-star systems, as you are much more likely to get a 3 or 4 star rating, which translates about 10 points lower on the 0-100 scale than if the game were scored directly on the latter system (at least the way most sites seem to distribute their scores).

Finally, on a mostly unrelated note, I was watching that ‘Skating with Celebrities’ show (yeah, go ahead and mock me). They generated 6 scores per skating pair - 3 judges each giving an ‘artistic’ and ‘technical’ score. Despite huge variations in the performances (Tod Bridges could barely skate. Bruce Jenner was about as artistic as, well, I would be on skates), every pair got a score between 7.8 and 8.3 from every judge, down to the last pair, who was actually quite good technically and artistically, and really broke through to something like an 8.5 average. Basically, they were using 5-7 points of a 100 point range.

Frankly, I think the best solution would be to use letter grades - there’s little ambiguity there (for American readers).

From F to A+. A is great, B is good, C is mediocre, etc…

I liked GameDaily’s system, as explained in the OMM archives, even better.

I like the “thumbs up, thumbs down” system, if you absolutely have to have ratings. Mostly because it’s impossible to read anything more into a binary score than what is explicitly intended (“Does the reviewer recommend the game, or not?”), but also because it prevents the reader (and the writer, for that matter) from using the rating as a crutch.

How about this: based on Gun Tests, the only non-biased firearms magazine (they don’t have advertisements), the ratings are a five star system, but they often don’t even refer to the star number. Instead, the stars really stand for:

1 - Don’t buy (piece of crap)
2 - Conditional Buy (marginally functional, but has at least one significant flaw)
3 - Buy it (you won’t be screwed if you buy it, but it ain’t anything to get excited over)
4 - Our pick (since they usually are reviewing three items from the same genre, this denotes the one that they want)
5 - Best buy (not only a great example of the type, but also a hell of a deal, money-wise)

The problem of course is that if you give EA’s new game a Don’t Buy, they’ll come back at you with a Don’t Advertise. Is there a non-advertisement game review magazine?

H.

But then the score gets averaged into Gamerankings.com com as 100% or 0% :)

I agree with the “3 stars is not the same as 60%” crowd. The range of values in actual use in each scale don’t correspond in straight linear fashion. In any case, the boiling down of a review to a single numerical score only gives you a vague sense of whether you in particular may like the game due to varying tastes.

The fact that the system throws a monkey wrench into the Gamerankings stats is like the shiny red cherry on top of a tasty hot fudge sundae.

:)

I want a three-grade system, like Daily Rader had: bad, average, good. You don’t want to play bad games, you can play average games if you don’t have anything better to do, and you really want to play good games. What’s not to like?

In my opinion, overall game quality can be usefully assessed on a curve relative to other games. Most people – jaded game reviewers with postmodern English degrees notwithstanding – have internal “Top 10” lists for games. The trick is to aggregate meaningfully, rather than the crude approach used by GameRankings.

My preference for converting ratings to population percentages is because it is useful to me as a reader of reviews; the statistical benefits are secondary. If I only had access to two reviewers, and they both gave a binary “buy/don’t buy” or “thumbs up/down” ranking, it would be helpful to know what percentage of games have been rated favorably by each reviewer. I only buy 10-15 games a year, all PC. Somebody who gives a “buy” recommendation to 190 of the 250 yearly titles isn’t going to be as helpful to me as someone who only gives that designation to 65 titles. At least your example had three rankings (excellent, decent, and bad). Tom made reference to merely a “thumbs up” in his piece, which doesn’t give me enough information as a consumer. For similar reasons, I don’t watch Ebert & Roper’s TV show (two rankings), but do read Ebert’s print reviews (9 rankings). I’m only going to pay money for 10-20 movies in a year; hearing that 75% of releases are “recommended” isn’t a strict enough criteria.

I do care about game reviews, because both the numbers and the text help me make good financial decisions. For the last 13 years, numerical rating systems have given me consistently useful information, although there’s been a few outliers (cough…Black & White…cough). It’s disconcerting when reviewers insult the people who use their services, or when they believe that the average person has the time and financial resources to play half the titles released in a given year because they are “decent.”

Rather than trying to decide on the most effective rating system, how can we get readers/devs/marketdroids/etc. to pay more attention to the text contained in the review? How can the point be driven home that the score is/should be ancillary to what the reviewer has to say? I’d love to see one of the bolder magazines or websites step up and remove scores from their reviews, but that’s asking too much at this point. Committing Gamerankings suicide is contrary to their bottom line, and that can be understandable. Still, it’s not going to do us any good to debate rating systems all day and array game scores into statistical models as we work to more accurately quantify a medium that, IMO, shouldn’t be quantified.

Gamerankings is turning into the gaming version of that database that allows musicians to have their songs scanned and compared to other popular songs throughout history so they can know if they’ve got a hit. I don’t think this is where we want to go.

It’s really not that hard to convert between the 5-star system and the 10 point system–just add five. *** == 8/10, which is about what Gamerankings has for the game.

Over at our site, we recently rejiggered our rating system because gamerankings was giving us stupid percentages. The bottom line is that some people like to read the whole text, some only read the first paragraph, and some just go for the numbers. I can’t really think of a good way to “force” someone to read something they ultimately don’t want to read.

And as regards AoEIII, that game drives me nuts. Insert attack-move comment here. It’s unfortunately uninstalled until I see what the upcoming patch does to it, but ultimately it’s a frustrating game. Not because of bugs or interface. But because whenever I play it I keep thinking, “this game should be so much better than it is”. I have all these nice neat rows of soldiers that march along only to dissolve into blobby blobs in combat. Micro-ing is just a real pain. At least resource gathering is much better. Man, I get depressed just thinking about it. I love the ESO interface though… has a few problems but to me it’s the current standard-by-which-all-subsequent-games-are-judged.

And the replay system sucks. sucks sucks sucks. It stinks. It is, without any doubt, the worst replay system I have ever come across. The manual hammers the importance of the Home City, but in replays you can’t see anyone’s deck. And god, even making the replay… ugh. You have to check some box before the game starts. As opposed to Dawn of War or Warcraft III’s replay system. Just click a button at the end of the game. Easy peasy. And anyone can do it, not just the guy hosting the game.

DeepT, this is actually unfair when directed at Ensemble/MGS as a whole. For the record, they’ve been in touch with me and have asked for more specifics about what I didn’t like. Microsoft’s PR folks are very receptive to listening to criticism and soliciting input, which I presume is fed back to the developer. And the fact that Ensemble’s Rob Fermier participates in a thread like this speaks volumes.

At any rate, you can’t be bothered to read text review or accept that a 3-star game doesn’t suck, so you’re probably not even reading this. So I’ll just give your posts in this thread 1 1/2 stars. Which, for the record, is a thumbs-down.

Phil, instead of being frustrated with the 5-star scale, why aren’t you frustrated with the 7-9 scale? That’s the problem, dammit. The review sites that shove their ratings at the top end create expecations for those that don’t.

Don’t give in, Phil! Don’t drink the Gamerankings.com Kool-Aid!

Too complicated by 1/3! It was hit/miss. With the occasional hit+ and miss-, which I think were two ratings too many. :) Actually, four ratings too many, since I agree with Kalle that we should just stop tacking numbers onto our opinions.

If game reviews want to be taken seriously, and if they want to encourage good writers and good readers, and if they want to avoid all those juvenile arguments about whether Game X is a 7 or an 8 and whether it should have more points for its Fun Factor and I can’t believe the graphics for Game Y got a 9 while the graphics for Game Z got a 6, then they should buck up and commit to the text. And just the text.

We’re not compiling a goddamn database, which is what Gamerankings would have us do. We’re offering – hopefully – commentary, opinions, analysis. Fuck the numbers.

-Tom

I guess we are, though, because apparently that’s the only way that Sidd_Budd will know which games to buy.

The whole concept of making game buying decisions based on whatever happens to land in the top X% of an aggregate rating scale seems totally crazy to me. I just don’t even know what to say to that.

No, we just credit our readers as being intelligent, thinking beings that are capable of using reviews to make informed choices about which games they want to play, even if they can’t afford to (or have time to) play every game that’s worth playing. What we don’t do is assume that our readers are mindless robots who need hard numerical data so that they can play the top X number of games (based on carefully compiled aggregate ratings) by rote, and who might flail about screaming “DOES NOT COMPUTE! DOES NOT COMPUTE!” if confronted by a situation where (number of worthwhile games available) > (number of games they have time and/or money to play), or if ever discover that they may have, by accident or design, played a game that falls outside their targeted statistical range.

But maybe we give them (or at least some of them) too much credit.

I’m for the thumbs up/thumbs down with bullets on the good and bad.
Of course, I’d also be for no score, really, all you need is in the review.

I think this is the beginning of a thoroughly beautiful ranking system. You start just there and then begin to elaborate it step by step. Give a hit+B. Next time a miss-Y4. Gamerankings would crumble under it’s own weight, skidding into howling oblivion. Of death.

Everyone would be getting what they wanted. Readers would have their beloved overall score. Writers would be sure they read their reviews, because they’d probably want to know what the fricking thing meant. And PR people would be far too confused to annoy you with whiny griping. (“Hey, the game was far better than a miss++Xg4!.. I guess…”)

Best thing last: you could even apply a pineapple to the score.

Less than 60% is a failing grade in the US? Huh. In Canada (or at least, Alberta where I grew up), it’s 50% on average, though when it comes to finals, a 48% will let you pass.

No wonder albertans are a bunch of retards!

No offense intended to albertans who aren’t retards.

Clearly, there is only one solution to this problem: combine the absurdity of modern game reviews’ scoring systems with the inanity of online “Who are you?” quizzes.

To whit: “If Age of Empires 3 was a Marvel superhero, which one would it be?”

For just a little bit, I’m gonna play the devil’s advocate and take sides with Ensemble, again just a little bit. It’s human nature to want to clarify things as much as possible. This is easier to do with numbers than with stars. It’s a very logical step to take to convert 3 out of 5 stars into 60%. After all, you got 60% of the stars! Also, since gradeschool we have been programmed to know that 60% is pretty much a failing grade. Now we see why developers aren’t exactly dancing in the streets when they get a 3 out of 5 score. Sidds comments that 60% should mean they scored higher than 60% of all the games that were scored. That would only be logical if the extreme lower numbers were ever used, and we know they very seldom are. Really terrible games are given 50’s and such which is pretty close to the “60%” that is in question. Again, see why they aren’t excited about the score?

I remember when I first went to work at Computer Games Magazine. I was in Scott Udell’s office (some of you know who that is) and we were talking about Tom. Tom did freelance work for us even back in those days. Scott made a statement that still resonates with me today. “Woe be to the game Tom Chick doesn’t like.” Tom has a rep of being hard on games. Great I say! While I make a good living and my gaming budget is larger than it should be, a lot of folks can’t just plop down $50 on every game that comes to town. It’s a significant investment for them and they need someone who will “tell it like it is.” Tom is just such a person and I give him much props for that. I appreciate him taking reviews written for people who are actually considering buying said game.

BUT, again I look at Ensemble’s position. I think it’s perfectly logical to look at 3-5 stars as 60%. It’s just that, it’s logical. Tom said there’s no way he would give it a 60%, but that is just what he did. Then why not give it 3.5 stars or 4 stars?

To me, this whole argument reminds me of a story an elementary school teacher once told me (and MAN was she hot! But…that’s another story for another time…) She told about cavemen (No, I’m not calling anyone a caveman so don’t even go there) and how they viewed the sun. They didn’t understand it and were even afraid of it, but they had to deal with it and they knew that. Now they offered sacrifices to it (although we don’t offer sacrifices to the ratings system, but maybe we should! Couldn’t hurt…) as that was the only way they knew how. What way can we “deal” with the system? I think the only way is to just come to terms with human nature and realize 3-5 is not gonna get developers dancing around like Mick Jagger no matter how much we scream “But it’s a GOOD score!” 3-5 is always going to be a failing grade, period.

I’m just trying to give my point-of-view as a regular reader of magazine reviews; the numerical score adds value to the review over and above the text. I use both in my purchasing decisions, and have said so repeatedly in my posts in this thread and similar ones in the past. Using both words and numbers makes me a “mindless robot”? In my opinion, it is unrealistic and condescending to expect a publication should only cater to gamers that treat the hobby with the appropriate respect and gravitas that they are willing to sift through 75 or 80 text-only reviews to decide on a purchase.

I’m supportive of having a magazine or website that just had text reviews with no rankings. People who want to support that structure could do so, and I could continue to support and subscribe to the publications that give both. For the record, I wouldn’t subscribe to anything that just gave a rating with no explanatory text, either. I’d rather not support reviewers who assign a number, but treat their readers with smug disdain, derision, and character-based attacks for using it along with the text to make a purchase.