DeepMind AI MuZero can learn AND master Chess, Go, Atari

Today the machine learning algorithm MuZero was detailed in a feature research paper in Nature. MuZero expands on the abilities of systems like AlphaGo, AlphaGo Zero, and AlphaZero. Each new algorithm allowed a smart machine to become better at mastering games, starting with Go, then Chess and Shogi, now Atari!

What is MuZero?

MuZero is a machine learning algorithm. An algorithm is a set of rules that a computer follows when it needs to learn new information and/or act on new information. When MuZero is employed on a machine learning-capable computer, it's able to learn games and master games like Go, and Chess.

The process of creating an algorithm like MuZero is important because it leads to machine learning and artificial intelligence able to handle real world problems more advanced than any computer of the past has been able to crack.

AlphaGo, AlphaGo Zero, AlphaZero

The algorithm AlphaGo was made public in 2016, appearing to be the first program to master the game GO*. *AlphaGo mastered Go using neural networks and tree search. It required the implantation of human data and domain knowledge before it could begin to attempt to master Go.

In the year 2017, the system AlphaGo Zero advanced beyond the first iteration, now able to learn to play Go without the addition of human data and/or domain knowledge. Fast forward to the year 2018 and AlphaZero advances beyond the first two releases, mastering Go, Chess, and Shogi. The latest algorithm from the same group that released the others is called MuZero.

MuZero can learn, too

Each of the first three releases required a set of rules pre-implanted for each of the games it'd then move on to master. Here at the tail end of 2020, the system called MuZero makes a major leap. The algorithm now needs no pre-implanted sets of rules – it can learn said rules on the fly.

MuZero can both learn the rules of the games it'll aim to play AND master said games. As its creators put it, "MuZero learns the rules of the game, allowing it to also master environments with unknown dynamics."

As noted by the research published this week, "When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm that was supplied with the rules of the game."

Atari as proof

Researchers showed that MuZero was able to learn the rules of Go, Chess, and Shogi, then master those games (again, without pre-implanted rules). In addition, MuZero was tested against a set of 57 different Atari video games. MuZero was able to learn the rules of a set of Atari games and master said games, too!

As the research published this week said, when tested on 57 different Atari games, "When evaluated on 57 different Atari games—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled—the MuZero algorithm achieved state-of-the-art performance."

More information

For more information on MuZero, take a peek at the research paper Mastering Atari, Go, chess and shogi by planning with a learned model. This paper was published in Nature with code DOI:10.1038/s41586-020-03051-4 on December 23, 2020. This paper was authored by Schrittwieser, J., Antonoglou, I., Hubert, T. et al..