TY - JOUR
AU1 - Wu, Bohan
AU2 - Gupta, Jayesh K.
AU3 - Kochenderfer, Mykel
AB - Learning interpretable and transferable subpolicies and performing task decomposition from a single, complex task is difficult. Such decomposition can lead to immense sample efficiency gains in lifelong learning. Some traditional hierarchical reinforcement learning techniques enforce this decomposition in a top-down manner, while meta-learning techniques require a task distribution at hand to learn such decompositions. This article presents a framework for using diverse suboptimal world models to decompose complex task solutions into simpler modular subpolicies. Given these world models, this framework performs decomposition of a single source task in a bottom up manner, concurrently learning the required modular subpolicies as well as a controller to coordinate them. We perform a series of experiments on high dimensional continuous action control tasks to demonstrate the effectiveness of this approach at both complex single-task learning and lifelong learning. Finally, we perform ablation studies to understand the importance and robustness of different elements in the framework and limitations to this approach.
TI - Model primitives for hierarchical lifelong reinforcement learning
JF - Autonomous Agents and Multi-Agent Systems
DO - 10.1007/s10458-020-09451-0
DA - 2020-02-25
UR - https://www.deepdyve.com/lp/springer-journals/model-primitives-for-hierarchical-lifelong-reinforcement-learning-8UFlhwt2Y7
VL - 34
IS - 1
DP - DeepDyve
ER -