Computers can play Go now


Yesterday, Google's DeepMind announced that they reached the most significant milestone in pure AI since the solving of checkers in 2007: a Go AI called AlphaGo that's competitive with human professionals. (Strictly speaking, computers could play Go before yesterday, but they were ranked around midlevel amateurs at their best.) The whole paper is available here.

For those of you who aren't as much nerds about AI as I am, here's a quick primer on why this was thought to be a very hard problem (so hard that the people involved in the prior state of the art thought it was at least a decade away):

In the most theoretical conception, game-playing computers for perfect-information zero-sum games (most abstract board games: those with no hidden state with all players working toward the same objectives, to be not entirely accurate but more readable than perfect accuracy allows for) are as simple as exploring every possible move and every possible countermove from the current position to the end of the game. Assuming perfect play on both sides, every result will be either a win, a loss, or a draw—that is, abstract strategy games are perfectly deterministic. (Checkers isn't completely solved, but we do know now, as a result of the work from 2007, that perfect play on both sides from the start always yields a draw.)

This is, however, a massively impractical way to actually play a game, because the number of positions to explore rapidly turns intractable. Speed-focused modern chess engines search on the order of millions of nodes (in the game tree) per second, but searching a chess position exhaustively to a depth of 7 requires a search of about 60 billion nodes. Traditional games like chess and checkers yielded to some optimizations on this process:

  1. It's easy to look at a chess or checkers board and tell how well the game is going for a certain player: in both games, it comes down primarily to the balance of pieces. (The last I read, advantages in position are worth a pawn or two in a game of chess if they're all going your way; the queen is worth nine pawns.) So, you don't need to explore all the way to the bottom of the game tree to get an idea of which directions are promising. You just explore as deep as you can and evaluate the position.
  2. If you have a good evaluation function (that is, one that generally only evaluates a position as better when it's closer to winning), you can do some easy pruning when you come across a game state that's known to be worse than the worst possibility you've explored: in that case, you just don't explore any further in that direction. It works even better if you can assess which moves are likely to be good and explore those first: if you try the best move first, every subsequent move is going to turn out to be worse, and you'll save a bunch of time.

So chess computers today, working with those tools, are better than the best human players: the effective branching factor (the number of moves to explore at each state in the tree), using the pruning techniques above, goes from about 35 to between 1 and 10, depending on the exact techniques used. The reason Go didn't fall to traditional techniques is because it's just so much more computationally difficult. Chess's branching factor (the number of possible moves at each state) is about 35, and games last about 80 moves; Go's branching factor is about 250 on average, and runs about 150 moves. It also features a few difficulties that chess does not:

  1. It's a very non-local game, both in space and in time: a move made at the very start of the game halfway across the board could have implications fifty turns later on the strength of the positions played at the start. This is a horizon problem: in chess, most positions become quiescent—not likely to affect the end effect of that position on the overall evaluation—after the captures stop. Modern chess engines will play all the way through capture sequences for this reason; there's no similar metric to use for go engines.
  2. It's very difficult to evaluate a board position on purely objective grounds, or rather, we haven't figured out how to phrase, mathematically, what about a good go position is good. Neither present control of territory nor present number of captures bears very strongly on the eventual result.

Because of the size of the problem space for Go, traditional techniques don't work. The branching factor remains too high. Modern Go programs use one (or sometimes both) of two approaches: either they use hand-coded expert knowledge to sort and select promising moves for tree expansion (which frequently misses oddball moves that are nevertheless good), or they randomly play out a bunch of games from the current position to the end, and sample the result to pick the best move on aggregate (which frequently misses obviously good moves). The best of the pre-AlphaGo bunch used both, combining expert knowledge to pick the best moves to sample with the oddball-finding power of random simulation.

AlphaGo does a little bit of that, but at its heart, it learned to play a lot like humans do: DeepMind's researchers fed it a diet of 30 million sample positions and the eventual results, and built a neural network to identify what a good board looks like and what it doesn't. (Literally—the input into the program is a 19x19 image, with pixels set to values representing empty, black stone, or white stone.) They built a second neural network to identify which states are the best ones to simulate through in a random simulation, and Bob's your uncle: a professional-level Go program. Their paper suspects it's about as good as a mid-level human professional—the exhibition tournament they included in the paper saw AlphaGo beat the European human champ five games to zero, four by resignation, but the Euro champ isn't a top-tier player worldwide. February will see an exhibition tournament between the South Korean champion, a world-class player; we'll see how it does against him. AlphaGo also won 499 out of 500 games against the prior state of the art Go computers.

That said, humans still have one thing to be proud of: efficiency. The version of AlphaGo that beat the European champ ran on about 1200 CPUs and about 200 GPUs, whereas the human brain, which is nearly as good, draws about 20 watts.

Finally, shameless plug: the reason I have all of this in my head right now is that I've been working on AI for the Old Norse tafl games, which I've written on at some length.


This makes me think back to the Deep Blue days of the mid-90s, when despite hitting a landmark in AI progress there was actually a fair amount of pessimism about AI.

“Sure we can make a computer play chess,” went the skeptics, “But that’s easy. Things like image recognition, voice recognition, self-driving cars - these are much harder problems. We can’t even begin to make a computer play a decent game of Go.”

And they were right; those are harder problems. But not, as we now see, impossible ones.


What about CS GO without cheating?


Amazing stuff.


Wait, that Demis Hassabis? Indeed, that Demis Hassabis!

Also, great AI news on the heels of the sad news of Marvin Minsky’s death (a pioneer of neural networks) earlier this week.


This is a lot more significant than Deep Blue because this is built on some general purpose learning algorithms that already apply to other games and problems.


Two steps away from an AI chip that play (any) games with you?


Sort of. There’s a fair bit of directed learning involved in getting it to this point, but the basic concept is sound, and they’ve taught AlphaGo or code very much like AlphaGo how to play a bunch of classic arcade games, too.

Ultimately, yes. The most interesting thing about this development is that it learned to play a lot like a human would—it studied past games and figured out from that study what was good play and what wasn’t, then played a lot (against itself and past versions of itself), which is roughly analogous to playing a lot and working on go problems. The obstacle to a general game-playing AI (I could buy a copy of Arkham Horror, finally, and not have to rope the wife into an eight-hour marathon!) is that training neural networks is currently pretty slow. As I mentioned in the first post, AlphaGo had to study about thirty million positions and play itself many, many times to get to the level of competence it’s at now; presumably, that will improve as DeepMind hones its learning techniques and discovers new and faster ways.

Anyway, GoGameGuru has all the game replays, which are interesting if you’re familiar with Go (it doesn’t take much—if you’ve seen a human game or five and a game-against-computer or five, you’ve probably already noticed the differences in play, if only subconsciously) AlphaGo has a style that looks very human. Apparently, Go professionals expect the Korean champ to beat AlphaGo, and another source has the game taking place in March. If I were the champ, I’d be pushing for, like, next week. I don’t know how long the million-dollar prize is gonna be good for.

Here’s a commentary on one of AlphaGo’s games against the European champ.


Really enjoyed this explanation, thanks.


An update: the Lee Sedol-AlphaGo games will be played starting on Wednesday at 1pm Korea time, or 11pm the night before Eastern time (so tomorrow). The schedule is for March 9th, 10th, 12th, 13th, and 15th, all with the same start time.

The matches will be played with two hours of main time, plus three sixty-second byo-yomi periods, which is about twice as much time as AlphaGo had when it beat Fan Hui. I’m not altogether sure this was wise on Lee Sedol’s part—AlphaGo did better against Fan Hui with increasing time. (Fan Hui won some games quickplay, but lost all of the longer time-control games.) Then again, Lee may feel he’s a better player with thinking time than on a blitz clock. As a final potential spoiling factor, Lee Sedol just lost a major match today, and that may throw him a little going into Wednesday.

Anyway, I’ll be watching the first game, or at least part of it. (EST is not friendly to Korea time.) I fully expect that GoGameGuru or someone else will have a commentary as soon as the first game is over, and I’ll probably have some remarks after the first game or two.


If it’s broadly the same program and set of training data as last time, I’m going to predict a 5-0 sweep by Lee Sedol. However, it’s possible they’ve done some serious work on AlphaGo and its training data,which I believe was entirely from amateur games, or they held something back from the previous version. In that case, maybe it could be 4-1 or 3-2 to Sedol with some very narrow margins of victory. But I think people expecting AlphaGo to win again vastly underestimate the difference between the European champion and a top Korean professional. I’ve played a lot of Starcraft, I’m under no such illusions :P.

And if I’m wrong, I for one welcome our new AI overlords.


I’m expecting a 3-2 or 4-1 win for AlphaGo, myself, unless Lee Sedol finds a repeatably exploitable weakness. We’ll see, though, and the first game should be telling. I’ll be liveblogging the first two or three hours of the broadcast, if you can’t catch the broadcast or would prefer a kinda-real-time text recap tomorrow.


Looks like AlphaGo has gotten better and won the first game handily. This is really a remarkable demonstration of the wider applicability of neural networks and a validation of them as a model for human thinking.


I agree with “wider applicability”, but I think the types of networks they are using are a very very loose model of the brain. So I think they show that the architecture doesn’t have to be terribly similar to come up with powerful results.


Mike is right: neural networks are much more a model for generic machine learning than a model for the human brain, but nevertheless it’s an awesome pattern that seems to be opening up whole new fields in AI design.

I wrote up my thoughts from the first game. The liveblog probably won’t be back until the last match of the series. I don’t have that many 2:45am nights in me.


I know it’s just one game but what an achievement.


In an hour and twenty minutes, we’ll see if it can be repeated. Lee played an unorthodox early game, perhaps in the hopes of throwing AlphaGo off, so I’m very curious to see his opening tonight, even though I probably shouldn’t stay up again. (Kevin, I envy you and your more suitable timezone!)


AlphaGo won again last night to go up 2-0. I’ll post a link to the game when I get into work.


It’s amazing to me how broadly applicable these networks seem to be. Also, I think there is evidence that these aren’t the only architectures that can succeed. The changes that allowed the networks to do well in object recognition were fairly fine-scale, engineering level things (over what came before) that basically helped avoid the over-fitting problem. Increased computing resources and data seems to be the main enabling factor. So you’d have to expect that there is a lot more to come.


Here’s the game replay, or you can go to DeepMind’s Youtube channel and watch the stream. The replay has decent commentary in the chat window, albeit from strong amateur players rather than pros. You can also find some remarks from An Younggil, GoGameGuru’s resident pro, in the comment thread. I’m watching the game now, myself. >.>