In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.
ACM Transactions on Autonomous and Adaptive Systems (TAAS) – Association for Computing Machinery
Published: Apr 19, 2021
Keywords: Multi-agent reinforcement learning