TY - JOUR
AU1 - Anthony, Thomas
AU2 - Eccles, Tom
AU3 - Tacchetti, Andrea
AU4 - Kramár, János
AU5 - Gemp, Ian
AU6 - Hudson, Thomas C.
AU7 - Porcel, Nicolas
AU8 - Lanctot, Marc
AU9 - Pérolat, Julien
AU1 - Everett, Richard
AU1 - Werpachowski, Roman
AU1 - Singh, Satinder
AU1 - Graepel, Thore
AU1 - Bachrach, Yoram
AB - Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
TI - Learning to Play No-Press Diplomacy with Best Response Policy Iteration
JF - Computing Research Repository
DO - 10.48550/arxiv.2006.04635
DA - 2020-06-08
UR - https://www.deepdyve.com/lp/arxiv-cornell-university/learning-to-play-no-press-diplomacy-with-best-response-policy-tkpL1G0FOj
VL - 2023
IS - 2006
DP - DeepDyve
ER -