Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

Reinforcement Learning of Informed Initial Policies for Decentralized Planning Reinforcement Learning of Informed Initial Policies for Decentralized Planning LANDON KRAEMER and BIKRAMJIT BANERJEE, University of Southern Mississippi Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based--limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well. Categories and Subject Descriptors: 500 [Computing Methodologies]: Cooperation and Coordination General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Decentralized partially observable Markov decision processes, multiagent reinforcement learning ACM Reference Format: Landon Kraemer and Bikramjit Banerjee. 2014. Reinforcement learning of http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Autonomous and Adaptive Systems (TAAS) Association for Computing Machinery

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

Loading next page...
 
/lp/association-for-computing-machinery/reinforcement-learning-of-informed-initial-policies-for-decentralized-GZPUtx2ayt
Publisher
Association for Computing Machinery
Copyright
Copyright © 2014 by ACM Inc.
ISSN
1556-4665
DOI
10.1145/2668130
Publisher site
See Article on Publisher Site

Abstract

Reinforcement Learning of Informed Initial Policies for Decentralized Planning LANDON KRAEMER and BIKRAMJIT BANERJEE, University of Southern Mississippi Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based--limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well. Categories and Subject Descriptors: 500 [Computing Methodologies]: Cooperation and Coordination General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Decentralized partially observable Markov decision processes, multiagent reinforcement learning ACM Reference Format: Landon Kraemer and Bikramjit Banerjee. 2014. Reinforcement learning of

Journal

ACM Transactions on Autonomous and Adaptive Systems (TAAS)Association for Computing Machinery

Published: Dec 8, 2014

There are no references for this article.