![]() ![]() While this is pretty mind-blowing, remember humankind learned chess in a similar way. It seems utterly incredible, but AlphaZero after four hours of self-play had learned enough about chess to exceed Stockfish’s rating, while examining only about 0.1 percent of the number of positions Stockfish examined. Imagine trying to learn principles like central control or the minority attack, simply from who checkmated whom at the end of the game! During this learning period, AlphaZero’s progress was measured by playing second-a-move tournaments with Stockfish, and the previous versions of itself. When it started this learning process, AlphaZero could only play random moves and all it knew was that checkmate is the goal of the game. ![]() After each game it would tweak some of its weights to try to encode (i.e., remember) what worked well and what didn’t. ![]() This means that AlphaZero was left to play millions of games against itself. However this would have resulted in AlphaZero only learning how we play chess, with all its flaws, so the Google team chose instead to use a more ambitious approach called reinforcement learning. The natural next step would seem to be to give it master games to learn from, a technique called supervised learning. This meant it can now play random, but at least legal, moves. The first step was to give AlphaZero the rules of chess. It has been engineered to learn how to play two-player, alternate-move games, but knows absolutely nothing about any particular game at all, much as we are born with a vast capacity to learn language, but with no knowledge of any particular language. If AlphaZero gets mated after moving all its pawns in front of its king, it will adjust its weights to reduce the possibility of making this error again.ĪlphaZero starts out as a blank slate, a big neural network with random weights. It takes input from all preceding neurons in the network and learns what weights to give them. Imagine there’s a neuron that during training has taken on the role of assessing king safety. Weights are important because training the network (also called learning) is a matter of giving the weights values so that the network plays chess well. Do the math and realize that this means hundreds of millions of weights. One thing to notice is that what a neuron outputs potentially depends on every other neuron in the network before it, which allows the network to capture subtleties, like in chess where White’s castled king is safe, but after h3 the assessment changes as Black can open the g-file with g7-g5-g4.īased on the data published for AlphaGo Zero (AlphaZero’s Go-playing predecessor) AlphaZero’s neural network probably has up to 80 layers, and hundreds of thousands of neurons. So these neurons must be smart little devils, right? A neuron is actually a very simple processing unit (it can be in software or hardware) that accepts a number of inputs, multiplies each one by a particular weight, sums the answers and then applies a so-called activation function that gives an output, typically in the range of 0 to 1. Hey, AlphaZero sounds like a chess player already: “White’s a bit better here, and Bg5 or h4 look like good moves!” ![]() An evaluation of each legal move in the position.An evaluation of the chess position it was given.It gets processed by the first layer of neurons, each of which then sends its output to each neuron in the next layer and so on, until the rightmost layer of neurons do their thing and produce the final output. The input, i.e., the current position on the chessboard, comes in on the left. AlphaZero’s learning happens using a neural network, which can be visualized like this:Ī neural network is our attempt at making a computer system more like the human brain and less like, well, a computer. Let’s jump right in to the middle of this. I’ll have to gloss over some details, but hopefully there’s enough to give you a better understanding of how AlphaZero works. In this part I’ll cover how it learns, by itself, to play chess. In the first part of this article I described how AlphaZero calculates variations. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |