Games and Concurrency

Our very own Kyle Wilson has published an interesting essay on the effect that the move to multicore architectures has on games.

It’s commonly known by now that processors have bumped against a physical speed limit. Game consoles are adopting multicore architectures in order to achieve greater theoretical speed, but this makes them much harder to program. Kyle compares the Xbox 360 and PS3 architectures, specifically with regards to porting between them, and makes a few interesting predictions:

[ul][li]No more 60 Hz game because of the latency introduced by the extensive multithreading needed to use all those cores
[/li][li]No more PC games because the increasing number of cores in consoles won’t be mirrored in consumer PCs, thus preventing ports
[/li][li]Middleware will become mandatory since writing good multithreaded code is extremely difficult[/ul]
[/li]Please go read the article, I’ve just summarized it here…

(By the way, Kyle, please add the RSS link to the left sidebar on your home page so that people have a chance to find it!)

It’s good to see more articles about this topic, but his conclusions are wack.

Most easily to point out, the “no more 60 Hz games” shows a general lack of understanding of parallelism. Latency Vs. Throughput is one of the first things that anyone learns about pipelined or parallel computing architectures. A high latency does not imply a low throughput – you can have a deeper pipeline instead. In other words, you can have a game that runs at 60Hz even if any individual frame has a latency higher than 1/60 sec. (Not that high latency is good. But there’s nothing stopping you).

I’m not going to go into the rest of it, except to also say that middleware isn’t a magic bullet because you have to, you know, write the code that actually makes your game different from any other game. Probably even highly-parallelized middleware won’t help you with this. I would say the jury is out on whether middleware becomes more or less important due to this.

Okay, I’ve never written a real-time game but from my understanding as a developer I’d say that the pipelining you suggest is not likely an effective solution to the 60 Hz issue. (And I would assume that Kyle is aware of this technique and has considered it, although he doesn’t mention it… like you say it’s not exactly revolutionary!)

If you’re pipelining frame calculations you’re introducing an extra delay between any change in the game state due to current user input and the next frame whose calculation can reflect such a change. That means you’ll get smooth 60 Hz graphics but you’ll get noticeably laggy controls. I don’t think that’s a solution anyone would be happy with. Any attempt to forward-communicate input to the currently pipelined frame calculations would again run foul of the latency issue, so apparently the only choice is to lower the overall frame rate.

That doesn’t make sense, it’s not as if the code they are writing has any knowledge of what kind of processor it runs on. I don’t think they are writing all xbox360 games in assembly language. The number of cores should have nothing whatsoever to do with portability. I know games aren’t developed at the same level of abstraction as most business software is today, maybe they are still writing things in straight C code, but even at that level you are abstracted from the hardware by libraries, complier, etc. Is there something special about console development I’m missing here?

Also, PCs have multi core processors. There’s no reason to think that somehow console hardware will be way ahead of PC hardware. So even if the conclusion wasn’t utterly wacky the premise is wrong.

First, I have to disagree that writing good multithreaded code is extremely difficult. There are just a few base concepts to keep in mind. It’s far from extremely difficult.

Anyway, middleware doesn’t take care of things like creating thread safe code. No API can do it for you (an API can provide thread safe mechanisms, but you can still use thread safe libraries to write non-thread safe code). Middleware (really in this case just a library, since most games aren’t exactly giant enterprise systems with distributed transaction managers, ORM solutions, etc.) in this case exists mainly to wrap up low level functions into a high level programming interface. It also exists to limit redundancy, increase maintainability, create cohesion between projects by using the same toolset, etc. Middleware is necessary, but not because writing thread safe code is hard, but because it is necessary to the efficiency of the organization with or without dual code processors.

I don’t buy the death of 60Hz gaming, either. You can localize the processes to cores (or subsets of cores) and avoid much of the difficulty in large scale synchronization. It just requires going from a model where threads are logically localized to ones where they’re spatially localized. If you have a 3-core, 6 thread architecture, with a different subsystem in each thread, the most efficient way to scale it to a 12 core, 24 thread architecture is to spatially subdivide your process into 4 regions. Most processes in a game should be spatially local (with a few exceptions for primarily AI concerns). Do your processing on a local basis with information flow across your spatial boundaries. This requires that you may have to do more processing at any given range (consider a visibility test that passes a spatial bound from subdivision 1->subdivision 2; rather than try to get all the information for visibility from 2 to 1, you project the information to the boundary of 1 by default, and then read that projection from the 1st subdivision. It’s wasted ops if subdivision 1 has occlusion right next to the boundary, but the whole point of concurrent processors is that rather than optimizing a single thread to process fast in serial, you’re using parallel methods to offload the calculation speed in the first place).

These problems have been encountered repeatedly in scientific computing where we long ago lost the ability to do many worthwhile problems in a non-massively parallel method. There are solutions for decomposing the calculations to achieve good scalability, but they are hard to implement. Game makers will have to progress (as a whole) from hackers and programmers to computer scientists most likely. Minimizing data set transmission across processor boundaries and coding so as to minimize communication overhead, potential race conditions, and blocking locks isn’t really tought as an innate factor in most CS programs by default that I know of. If computation is really moving to a multi-processor world, it will have to start being done.

On the user input vs. frame rate thing, I really doubt adding a 1/10th a second delay (for example) from user input to outputed change will be that bad.

Not as much computer scientists as professional software engineers. Some will have to be computer scientists, but those will be people in very specific roles working out essential bits of the architecture. The bulk of the developers on a game won’t have to worry about the hard parts of optimizing to a multi-processor architecture. If that is even addressed by the game developer at all it will be in some library where the problem is hidden from them.

Actually, it’s probably less that the individual developer will have to transform, but that the game development organizations will need to transform. Even in the software application space most software development organizations are not really professional engineering shops. Half of the software shops out there probably need a good smack upside the head with a copy of “Professional Software Engineering” (or at least the a few chapters from “The Mythical Man Month”), I’m sure it’s even worse in games.

IMHO:… (and I reserve the right to be wrong)

I think the problem really is the fact that programmers are not used to writting in a multi-core (multi-thread if you will) manner. All the algorithms and tricks we are used to take advantage of a single cpu, single thread.

For example:
Code a Binary search tree. Now code it again for N threads to work on it simultaniously.

Take that to a much higher level, write a skinned and skeletal animation system, now write it so that one animation is handled by N cpus.

The classic mistake is to say, I need to play 20 animations, ill divide the tasks up amoung N CPUs. That is that CPU #1 will play animation 1 and 8, while CPU #2 does 2 and 9, etc… That isn’t a paralell algorithm. Ideally all N CPUs participate in all animations.

Everything we have been taugh in our CS classes do not address this line of thought. However once we get to the point where writting any algorthim to work on N CPUs, we will have the problem licked and applications will run beautifully on multi-cpu systems.

Ah, I did misunderstand the point a little up there. To create a good mutli-threaded architecture that utilizes threading well to solve a problem can indeed be very difficult depending on the nature of the problem. Phrased as it was (“multithreaded code”)I was thinking primarily about the issues of writing thread safe code. It certainly can be a challenge to architect things properly to take advantage of parallelism when the problem is not inherently parallel. We already have examples of parallel processing in games, just look at SLI.

Either way, PC gaming is not doomed because consoles now have multi-core processors. Actually, if this will cause middleware to be “mandatory” (which it should be anyway) that middleware will abstract the game developer from the hardware making cross platform more, not less, possible and likely.

That’s a human perceptible level of delay. It will be a nuisance, especially to people used to snappier response. I would hate to shell out big buckos for some super duper next+3 gen console only to have it feel slow and clunky.

Okay, that’s easy, a thread safe binary tree is probably a bad example. I think I know what you mean though, some problems just don’t parallelize easily.

Of course you don’t have to break down every algorithm to run parallel. You can do something else while the search runs. The AI processes in one thread while the physics calculations run in another. The screen is rendered for everything but the character models while another thread is working out the skeletal animations and yet another is doing the water surface reflections. It’s more an architecture problem than a straight up CS or math problem.

I always figured that games have been taking advantage of multi-threading already. It’s not as if it is something new.

Not for some games, but it will be noticeable, and for twitch games (especially fighters), quite bad.

Wow. I’m embarrassed by the attention. To deal with some of the points raised:

The problem with a deeper pipeline is the greater latency between input and output. The primary benefit of a 60 Hz game is smoother response, not smoother animation and physics. That benefit is lost with a deep pipe. I think the cost/benefit of 60 Hz is going to tilt further toward cost with the next couple of console iterations.

Very true. But the parts of a game that take lots of time–physical simulation, rendering, scene hierarchy update, animation–tend to be the parts that aren’t very different from game to game. Optimized middleware for those systems allows gameplay programmers to get on with writing their game without having to know about job queues, futures, active objects, etc.

This prediction was for five or ten years from now. Yes, PCs will have multi-core processors. As I said in the article, I just bought a multi-core PC and I expect the number of cores in PCs to increase. But I think PCs are going to top out at sixteen or thirty-two cores while game consoles continue to add specialized ancillary processors like the PS3s SPEs. At that point, consoles will be able to run more complex simulations than PCs, and I think that hardcore gaming will gradually shift to the more powerful gaming platform.

MMORPG developers use spatial locality in load-balancing, but I think it’s less of an option in single-player games. In a single-player game, you’re usually only interested in simulating a small area around the player and the rest of the world is static, inhabited only by inert physical objects and disabled AIs, if at all.

Scientific computation, unfortunately, tends to be more about maximizing throughput than minimizing latency. I think that game developers can learn a lot from existing parallel programming theory, but I also think that we’re going to have to invent some of our own design patterns and practices to solve the problems that are unique to our industry. The next few years are going to be an interesting time.

Thanks to all for the feedback!

Done, thanks.

I think the article is pretty accurate as far as identifying the general trends in multi-core development. CPUs have shown a trend toward trading higher latency for higher bandwidth for years now. Multi-core architectures are just more of the same. Game developers are going to have to deal with these latency issues and I think there will be a real temptation to drop back from a 60th of a second of latency to a 30th of a second of latency. And once that is done, the temptation to throttle back the throughput (frame rate) will be even greater. That isn’t to say that some systems won’t continue to demand more than 1 update per frame…

On the subject of middleware, again I think he’s correct as far as the general trend. The more discrete and complex a system is, the more open developers are to licensing it externally. One of the common (though not often well justified) arguments against licensing external libraries is the cost you pay in translation and data marshalling between “your” optimized data structures and “their” optimized data structures. The move to multi-core development will force software developers to better define the interactions between all systems and will definitely raise the bar in terms of complexity. As a result, data marshalling costs are going to be everywhere - between internal systems as well as external systems. The fact that the systems you write yourself can no longer be tightly coupled will help drive more people to use middleware.

But I think PCs are going to top out at sixteen or thirty-two cores while game consoles continue to add specialized ancillary processors like the PS3s SPEs. At that point, consoles will be able to run more complex simulations than PCs, and I think that hardcore gaming will gradually shift to the more powerful gaming platform.

I don’t agree, there will always be a market for high end PCs. This sounds a little like a “why would a home pc user ever need more than 640k” type of statement. Specialized processors for specific tasks will no doubt exist on PCs and consoles, they already do (a graphics card, a DSP on a sound card, etc.). I even recall an article a while back about someone developing a physics card. Continued specialization of consoles is not a reason for PCs to die especially since consoles and PCs share similar technology. What is available for a console will also be available for a PC. If someone comes out with some specialized ancillary processor that revolutionizes gaming you bet they are going to slap it on a card you can stick in your PC at the same time the biz dev guys are trying to ink deals with Sony, MS and Nintendo (and if they develop it in house someone will copy the technology within a year and slap it on a PC card).

Perhaps there is an argument for the nature of what a PC is evolving from what’s in the average box now to something different, but not for the death of PC gaming. It wasn’t so long ago that no PC had a sound card, now it is often an integrated chipset. Perhaps 10 years from now you’ll choose your cpu, gpu, sound card plus optional physics processor, neural network chip, etc.

That’s true, and that’s why, of my three crazy future predictions, that’s the one that I have the least confidence in. I could also see the PC market fracturing, with commodity consumer PCs getting cheaper and cheaper while high-end gaming PCs become more like specialized luxury consoles built on an open platform.

It’s really very hard to make predictions at this point. CPUs are suddenly back where GPUs were ten years ago. The next decade is going to be a period of radical growth and radical experimentation. A lot of people are going to blunder into dead ends in the process of finding the right solutions for the next couple of hardware generations. Will we even be using C++ to write games ten years from now? I could see it going either way.

I gather there’s another vector for latency that will be creeping into the X360/PS2 generation games, coming from your television. Word is that the hi-def sets with fixed resolutions, like LCD and plasma, use scalar processing during signal conversion that frequently runs a frame or two behind. For passive television watching, that’s no problem, just delay the audio by the same amount, and you’ll never notice the difference. In games, though, the simulation is continually running ahead, irrespective of when you happen to see it. In an interactive system, those delayed frames add directly to your reactive input latency.

In the next gen, we’ll all be a little worse at gaming. (It’s not just my age, I swear!)

I don’t have time to get too deep into this discussion but I want to respond to this point because several people said it.

Look, a 30Hz frame rate already doubles your latency over a 60Hz frame rate. A well-designed pipeline will give you less than double the latency – certainly no more than double the latency. So what would you rather have, 30Hz with latency X or 60Hz with latency X?

I don’t like pipelined architectures like this. I would rather just do it all straight up and serial. But if we’re being forced to do it, that’s just how it is.

1/10 sec as Jason suggested is a ridiculously high amount of latency. Any amount of added latency is too much, but you kind of have to just do the best you can given all your constraints.