DeepMind AlphaStar

AlphaStar is an advanced reinforcement learning agent designed to tackle the challenging game of Starcraft II. It uses a policy that is learned through a neural network with various types of architecture, including a Transformer for processing observations of player and enemy units, a core LSTM for handling temporal sequences, and a Residual Network for extracting minimap features. To manage the combinatorial action space, AlphaStar uses an autoregressive policy and a recurrent pointer network.

Training Process

AlphaStar is trained using a combination of supervised learning from human replays and reinforcement learning that maximizes the win rate against opponents. The RL algorithm used is based on a policy-gradient algorithm similar to actor-critic. Updates are performed off-policy and asynchronously. AlphaStar also uses a combination of TD$\left(\lambda\right)$ and V-trace to manage the off-policy updates effectively.

To address game-theoretic challenges, AlphaStar is also trained with league training to try to approximate a fictitious self-play (FSP) setting, which avoids cycles by computing a best response against a uniform mixture of all previous policies. The league of potential opponents includes a diverse range of agents, including policies from current and previous agents.

Architecture

The architecture of AlphaStar is designed to process various inputs and manage the combinatorial action space. Observations of player and enemy units are processed with a Transformer, which is a type of neural network architecture commonly used in natural language processing. The transformer is used to process the spatial information in the game to create an efficient representation of the game state.

AlphaStar also uses a type of neural network called a pointer network to manage the combinatorial action space. The pointer network maps actions to positions in the input sequence, effectively ordering the actions. The pointer network is combined with an autoregressive policy to create an efficient and effective way of addressing the combinatorial action space.

The architecture of AlphaStar also includes a Residual Network for extracting minimap features and a core LSTM for handling temporal sequences. These networks work together to create an efficient and effective way of processing the inputs to AlphaStar and generating high-quality policy outputs.

AlphaStar is an impressive example of the power of reinforcement learning in video games. Its ability to navigate the complex and dynamic environment of Starcraft II is a testament to the effectiveness of its neural network architecture and its use of innovative reinforcement learning techniques. As deep reinforcement learning continues to evolve, it is likely that AlphaStar will continue to serve as a valuable benchmark for the development of new and more advanced RL agents.