Learning to play Go is only the start
IN 2016 Lee Sedol, one of the world’s best players of Go, lost a match in Seoul to a computer program called AlphaGo by four games to one. It was a big event, both in the history of Go and in the history of artificial intelligence (AI). Go occupies roughly the same place in the culture of China, Korea and Japan as chess does in the West. After its victory over Mr Lee, AlphaGo beat dozens of renowned human players in a series of anonymous games played online, before re-emerging in May to face Ke Jie, the game’s best player, in Wuzhen, China. Mr Ke fared no better than Mr Lee, losing to the computer 3-0.
For AI researchers, Go is equally exalted. Chess fell to the machines in 1997, when Garry Kasparov lost a match to Deep Blue, an IBM computer. But until Mr Lee’s defeat, Go’s complexity had made it resistant to the march of machinery. AlphaGo’s victory was an eye-catching demonstration of the power of a type of AI called machine learning, which aims to get computers to teach complicated tasks to themselves.
AlphaGo learned to play Go by studying thousands of games between expert human opponents, extracting rules and strategies from those games and then refining them in millions more matches which the program played against itself. That was enough to make it stronger than any human player. But researchers at DeepMind, the firm that built AlphaGo, were confident that they could improve it. In a paper just published in Nature they have unveiled the latest version, dubbed AlphaGo Zero. It is much better at the game, learns to play much more quickly and requires far less computing hardware to do well. Most important, though, unlike the original version, AlphaGo Zero has managed to teach itself the game without recourse to human experts at all.
The eyes have it
Like all the best games, Go is easy to learn but hard to master. Two players, Black and White, take turns placing stones on the intersections of a board consisting of 19 vertical lines and 19 horizontal ones. The aim is to control more territory than your opponent. Stones that are surrounded by an opponent’s are removed from the board. Players carry on until neither wishes to continue. Each then adds the number of his stones on the board to the number of empty grid intersections he has surrounded. The larger total is the winner.
The difficulty comes from the sheer number of possible moves. A 19×19 board offers 361 different places on which Black can put the initial stone. White then has 360 options in response, and so on. The total number of legal board arrangements is in the order of 10170, a number so large it defies any physical analogy (there are reckoned to be about 1080 atoms in the observable universe, for instance).
Human experts focus instead on understanding the game at a higher level. Go’s simple rules give rise to plenty of emergent structure. Players talk of features such as “eyes” and “ladders”, and of concepts such as “threat” and “life-and-death”. But although human players understand such concepts, explaining them in the hyper-literal way needed to program a computer is much harder. Instead, the original AlphaGo studied thousands of examples of human games, a process called supervised learning. Since human play reflects human understanding of such concepts, a computer exposed to enough of it can come to understand those concepts as well. Once AlphaGo had arrived at a decent grasp of tactics and strategy with the help of its human teachers, it kicked away its crutches and began playing millions of unsupervised training games against itself, improving its play with every game.
Supervised learning is useful for much more than Go. It is the basic idea behind many of the recent advances in AI, helping computers learn to do things such as identify faces in pictures, recognise human speech reliably, filter spam from e-mail efficiently and more. But as Demis Hassabis, Deepmind’s boss, observes, supervised learning has limits. It relies on the availability of training data to feed to the computer to show the machine what it is meant to be doing. Such data must be filtered by human experts. The training data for face recognition, for instance, consist of thousands of pictures, some with faces and some without, each labelled as such by a person. That makes such data sets expensive, assuming they are available at all. And, as the paper points out, there can be more subtle problems. Relying on human experts for guidance risks imposing human limits on a computer’s ability.
AlphaGo Zero is designed to avoid all these problems by skipping the training-wheels phase entirely. The program starts only with the rules of the game and a “reward function”, which awards it a point for a win and docks a point for a loss. It is then encouraged to experiment, repeatedly playing games against other versions of itself, subject only to the constraint that it must try to maximise its reward by winning as much as possible.
The program started by placing stones randomly, with no real idea of what it was doing. But it improved rapidly. After a single day it was playing at the level of an advanced professional. After two days it had surpassed the performance of the version that beat Mr Lee in 2016.
DeepMind’s researchers were able to watch their creation rediscover the Go knowledge that human beings have accumulated over thousands of years. Sometimes, it seemed eerily human-like. After about three hours of training the program was preoccupied with the idea of greedily capturing stones, a phase that most human beginners also go through. At others it seemed decidedly alien. For example, ladders are patterns of stones that extend in a diagonal slash across the board as one player attempts to capture a group of his opponent’s stones. They are frequent features of Go games. Because a ladder consists of a simple, repeating pattern, human novices quickly learn to extrapolate them and work out if building a particular ladder will succeed or fail. But AlphaGo Zero—which is not capable of extrapolation, and instead experiments with new moves semi-randomly—took longer than expected to come to grips with the concept.
Climbing the ladder
Nevertheless, learning for itself rather than relying on hints from people seemed, on balance, to be a big advantage. For example, joseki are specialised sequences of well-known moves that take place near the edges of the board. (Their scripted nature makes them a little like chess openings.) AlphaGo Zero discovered the standard joseki taught to human players. But it also discovered, and eventually preferred, several others that were entirely of its own invention. The machine, says David Silver, who led the AlphaGo project, seemed to play with a distinctly non-human style.
The result is a program that is not just superhuman, but crushingly so. Skill at Go (and chess, and many other games) can be quantified with something called an Elo rating, which gives the probability, based on past performance, that one player will beat another. A player has a 50:50 chance of beating an opponent with the same Elo rating, but only a 25% chance of beating one with a rating 200 points higher. Mr Ke has a rating of 3,661. Mr Lee’s is 3,526. After 40 days of training AlphaGo Zero had an Elo rating of more than 5,000—putting it as far ahead of Mr Ke as Mr Ke is of a keen amateur, and suggesting that it is, in practice, impossible for Mr Ke, or any other human being, ever to defeat it. When it played against the version of AlphaGo that first beat Mr Lee, it won by 100 games to zero.
There is, of course, more to life than Go. Algorithms such as the ones that power the various iterations of AlphaGo might, its creators hope, be applied to other tasks that are conceptually similar. (DeepMind has already used those that underlie the original AlphaGo to help Google slash the power consumption of its data centres.) But an algorithm that can learn without guidance from people means that machines can be let loose on problems that people do not understand how to solve. Anything that boils down to an intelligent search through an enormous number of possibilities, said Mr Hassabis, could benefit from AlphaGo’s approach. He cited classic thorny problems such as working out how proteins fold into their final, functional shapes, predicting which molecules might have promise as medicines, or accurately simulating chemical reactions.
Advances in AI often trigger worries about human obsolescence. DeepMind hopes such machines will end up as assistants to biological brains, rather than replacements for them, in the way that other technologies from search engines to paper have done. Watching a machine invent new ways to tackle a problem can, after all, help push people down new and productive paths. One of the benefits of AlphaGo, says Mr Silver, is that, in a game full of history and tradition, it has encouraged human players to question the old wisdom, and to experiment. After losing to AlphaGo, Mr Ke studied the computer’s moves, looking for ideas. He then went on a 22-game winning streak against human opponents, an impressive feat even for someone of his skill. Supervised learning, after all, can work in both directions.
Source : economist