Image Credit: DeepMind

I recently started a new newsletter focus on AI education and already has over 50,000 subscribers. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:


Gaming have been at the center of some of the biggest deep learning in the recent years. The sputnik moment of deep learning and gaming came when DeepMind’s reinforcement learning agent AlphaGo beating go world champion Lee Sedol. AlphaGo was later perfected with AlphaZero which was able to master games like chess, go or shogi. Reinforcement learning agents have also achieved super human performance in multi-player games like AtariCapture the FlagStarCraft IIDota 2, and Hide-and-Seek. However, in each case, the reinforcement learning agents have been train in a single game at a time. The idea of building agents that can master multiple games at the same time without major human intervention have remained an elusive goal in the deep learning space. Recently, DeepMind published “Open-Ended Learning Leads to Generally Capable Agents”, a research paper that details methods and processes to train reinforcement learning agents capable of mastering multiple simultaneous games without human intervention. This paper represents a major step towards building more generally capable agents that an interact in real world environments.

In essence, the DeepMind recipe to build generally capable agents is based in three intuitive building blocks:

  1. A rich universe of training tasks.
  2. A flexible architecture and training methods.
  3. A rigorous process of measuring progress.

A Rich Universe of Training Tasks

To generally master the skills to learn different games, DeepMind created an environment called XLand which is, essentially, a galaxy of games. In the XLand galaxy, games are placed based on the proximity of some characteristics such as cooperation or competition dynamics. Each game can be played using different levels of complexity that are dynamically changed to improve the learning behavior of the agent.

Image Credit: DeepMind

A Flexible Architecture and Training Method

DeepMind’s agent architecture is based on a goal-attentive agent(GOAT) neural network that uses attention over its current state. This mechanism helps the agent to focus on specific subgoals…

Continue reading: