The world has quietly crowned a new chess champion. While it has now been over two decades since a human has been honored with that title, the latest victor represents a breakthrough in another significant way: It’s an algorithm that can be generalized to other learning tasks.
It gets crazier. AlphaZero, the new reigning champion, acquired all its chess know-how in a mere four hours. AlphaZero is almost as different from its fellow AI chess competitors as Deep Blue was from Gary Kasparov, back when the latter first faced off against a supercomputer in 1996. And what’s more, AlphaZero stands to upend not merely the world of chess, but the whole realm of strategic decision-making. If that doesn’t give you pause, it probably should.
From its origins in India, the game of chess has stood the test of time as a measure of strategic intelligence. Games of imperfect information, like the variation of poker known as Texas Hold-‘Em, arguably have more in common with our day-to-day strategic decisions. But chess remains an important measure of how we think about intelligence. Chess requires being able to gauge an opponent’s tactics, memorize hundreds of board positions, and think ahead several moves. At least that was the common approach to the game until recently, and also the way conventional chess AIs like Deep Blue were programmed.
The previous reigning champion, Stockfish 8, was no exception. It used a search engine to explore different move combinations that had been programmed into it by its creators. Such chess engines make widespread use of opening books and endgame tables, effectively supplying the search algorithm with all the commonly accepted chess wisdom from which to draw its moves. AlphaZero, the new champion, soundly defeated Stockfish 8 in a 100-game series without losing a single match to its adversary. To do so, it took a completely different tack.
The creators of AlphaZero, the London-based AI project known as DeepMind, have pioneered an approach to AI known as deep reinforcement learning. Instead of looking at games like Chess and Go as search problems, they treated them as reinforcement learning problems. Reinforcement learning may sound vaguely familiar if you took an Intro to Psychology class in college; it’s precisely the way humans learn. We actually don’t play chess like a search engine, exhaustively exploring different move combinations in our head to find the best one. Rather, through repeated playing we gain a set of associations about different board positions and whether they are advantageous. Through repeated exposure, good board positions get reinforced in our minds, and poor ones get pruned — though unlike pure reinforcement learning, we may augment this with information taken from books or word of mouth. Then we draw upon these associations during gameplay.
The mathematical basis of how we apply reinforcement learning as humans has been painstakingly worked out over the last 30 years. That brings us to AlphaZero. By simply playing against itself for a mere 4 hours, the equivalent of over 22 million training games, AlphaZero learned the relevant associations with the various chess moves and their outcomes. In doing so, it was learning much the way a human does, but because the computer can compress 100,000 hours of human chess play into a few minutes, it builds up a set of associations far more quickly than we ever could, and over a far wider range of move combinations.
Building upon research done in psychology and animal cognition, DeepMind created a reinforcement learning algorithm first to conquer a handful of early Atari video games. Realizing the importance of such a multipurpose learning algorithm, Google quickly snapped up the company in a potentially lucrative acquisition. Within a few years, Google demonstrated this by using deep reinforcement learning to optimize the heating and cooling of its data centers, reducing its energy footprint by 15 percent.
Deepmind made further waves by applying reinforcement learning to the board game Go, thought beyond the scope of AI because of its almost infinite variety of move combinations. Now the company has shown that the same approach can dominate in chess. Since reinforcement learning is the method we humans use to gain many kinds of skills, what can deep reinforcement not learn?
Deep reinforcement learning is nothing less than a watershed for AI, and by extension humanity. With the advent of such über-algorithms capable of learning new skills within a matter of hours, and with no human intervention or assistance, we may be looking at the first instance of superintelligence on the planet. How we apply deep reinforcement learning in the years to come is one of the most important questions facing humanity, and the basis of a discussion that needs to be taken up in circles far wider than Silicon Valley boardrooms.
Aaron Krumins is the forthcoming author of a book on reinforcement learning.