Games are slow because developers

…use slow compiler settings. So says a nameless author on Extreme Tech.

One guy looked at the Battlefield 2 executables and discovered that they were compiled using Microsoft Visual C++ 7.1 (= Visual Studio 2003) using the default /GB switch. This switch enables the ancient “blended” code generation which optimizes for the Pentium Pro, P2, and P3 CPUs.

But VC++ 7.1 provides optimizations for current P4 and Athlon systems that should give ~10% better performance – not unimportant for framerate-hungry shooter fans. There’s even a new switch to automatically use the SSE/SSE2 units for floating-point operations. Yet the developers enabled none of these features. Why?

In discussions with game developers over the past few years, I’ve learned that they tend to be pretty wary of automatic optimizations generated by simple use of compiler switches. Sometimes a large software build will break when certain automatic optimizations are turned on. Some of this is likely institutional memory, as compilers have improved over the years. Some of it is likely laziness coupled with tight schedules, as alluded to above. If you’re a game developer on constant 80-hour a week crunch mode, experimenting with compiler switches is probably the last thing on your mind.

Still, it’s an interesting thought. And the issue may not simply reside in the game code itself. Many game developers use third-party libraries and game engines, including physics engines, AI engines, audio processing libraries and more. If that’s the case, then the optimizing the core game code may not have as large an impact as it might seem.

Personally, I could see not enabling SSE for general FP arithmetic because that unit is likely kept busy by some graphics library. I’m amazed that nobody thought to enable P4 optimizations, though… or did they?

Imagine that, a bitwise protected binary being compiled for the most generic of platforms? Who would have thought?

Of course this is one reason why JITing is kind of cool. Proc specific instructions at an asm layer.

Enabling optimizations that are tailored to a particular type of CPU is generally fine and shouldn’t break code. (ie /G7) But OTOH the types of optimization that you get from going from blended to /G7 don’t really do all that much for general purpose code. Especially when you consider the dissimilarity between the two architectures.
OTOH enabling generation of SSE/SSE2 code could improve code quite a bit, but it would would not work if those instructions were not present which basically makes it a commercial non-event to compile for SSE2, and prolly a bit iffy to compile for SSE, considering that they were only added very late in the athlon range.
But this still does not mean that we have under optimized code. The most effective optimizations come from algorithmic changes, which a compiler cannot do. (erm, in a general manner)

I heard games have a thing known as “minimum system requirements”, and I’m pretty sure Battlefield 2 runs only on P4 and Athlon processors… so they’re optimizing for a platform on which the binary is not intended to run.

Of course this is one reason why JITing is kind of cool. Proc specific instructions at an asm layer.

Sure, once the .NET JIT actually does some optimizations to speak of.

The article quoted MSDN to the effect that a 10-15% improvement for floating-point heavy code could be expected. That’s not huge but it’s not nonexistent either.

It could be that /G7 might have failed to produce a speedup. Or it might have penalized P-M/AMD-whatever too much.

Besides the obvious processor requirements, /arch:SSE can be slower as well. It can also have a bad effect on numerical code.

That’s not the way SSE works.

That would be me. I’m not sure why my producer didn’t add my byline.

At any rate, this thread is all good stuff, and what I was looking for. I tried to be somewhat broadminded, because I suspected it wasn’t as simple as throwing a compiler switch, or Intel’s evangelists would have been all over this.

That’s certainly possible. It would be interesting to have the involved programmer’s input on this issue…

That’s not the way SSE works.

I was thinking of library functions that might need to preserve SSE state between calls since loading/unloading all SSE registers strikes me as an expensive operation. Granted, I have no idea if there actually are such library functions. :?:

A library definitely can’t rely on the state of the SSE registers being preserved between calls.
This sort of thing is defined by the calling conventions. IIRC, all x86 calling conventions for SSE are caller saves.

And it’s not hugely expensive - it’s just another bit added to function call overhead.

Well, Loyd Case has just posted some of the feedback he got on his article.

One developer said his team was only using AMDs and they flat-out didn’t care about optimal P4 performance! The other replies were more reasonable: danger of breaking builds by changing switches, no time or inclination to test various switch settings for just one platform (PC) of diminishing importance, or even no opportunity if the final build is not in the hands of the developers (I was pretty surprised by that one).

Interestingly, Microsoft is aware of this situation and has reacted by completely removing the /G optimization switches from the current VC++ version. The compiler now always produces “blended” code but favors the current generation of CPUs rather than ancient ones, as in the previous versions.

I’m curious to see what experimenting with compiler switches might do in a timedemo on various platforms. I have to admit that I’m one of the guilty parties who tends to ignore compiler switches. Hell, I still use the default compiler settings that came with our engine license, although that’s more an issue of not caring at this point due to the fact that we’re far from completion. I’ve got a couple timedemos handy at home, I’ll see what I come up with – I’m an optimization freak, and I love playing with stuff like this.

None of it helps if you’re spending double digits in stricmp.

Any developer who uses strings internally for anything other than actual outputted text is both insane and stupid.

I remember at one point in MDK2 we were using strings to look up animations in code. One day a programmer took a few hours to convert animation names to animation IDs, and we got a 15% performance boost across the board.

Hooray competence!

It’s hard to imagine any professional dev would do this. The closest I’ve ever come to using strings for non-output was a color-coding system I made for embedding color tags in strings for multi-colored output text. And I’m not even sure that counts.

Or using .NET where comparisons between interned identifiers are simple 32-bit pointer comparisons!

Yeah, nobody’s using .NET for commercial games yet but I had to plug the platform…

Well, we are. :roll:

Seriously? Are you the guys who made that .NET-based flight sim add-on or whatever, much to the disgust of scharmers?

Right, but that’s not what I’m talking about. I mean actual string compares. Internal identifiers are a separate issue.

code like

if(stricmp(anim,“MYANIMATION_SKULLFUCK”)) // or any self made string class that ends up falling back on str*cmp

instead of

if(animid == ANIMID_SKULLFUCK)


No, actually we’re using Reality Engine as our middleware and it relies heavily on .NET. The scripting language used for writing most of the gameplay code is C#, while the core engine processes and tick-intensive operations are done in C++. It’s very similar to how Unreal Engine uses UnrealScript, and it’s actually a very nice setup.

It’s hard to imagine any professional dev would do this. The closest I’ve ever come to using strings for non-output was a color-coding system I made for embedding color tags in strings for multi-colored output text. And I’m not even sure that counts.[/quote]

You would not believe how many PC codebases where I’ve seen this problem. And we won’t even discuss a certain famous first-person shooter that derived its network game class from a CDocument…