Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Time-aware deep reinforcement learning with multi-temporal abstraction

Time-aware deep reinforcement learning with multi-temporal abstraction Deep reinforcement learning (DRL) is advantageous, but it rarely performs well when tested on real-world decision-making tasks, particularly those involving irregular time series with sparse actions. Although irregular time series with sparse actions can be handled using temporal abstractions for the agent to grasp high-level states, they aggravate temporal irregularities by increasing the range of time intervals essential to represent a state and estimate expected returns. In this work, we propose a general Time-aware DRL framework with Multi-Temporal Abstraction (T-MTA) that incorporates the awareness of time intervals from two aspects: temporal discounting and temporal abstraction. For the former, we propose a Time-aware DRL method, whereas for the latter we propose a Multi-Temporal Abstraction mechanism. T-MTA was tested in three standard RL testbeds and two real-life tasks (control of nuclear reactors and prevention of septic shock), which represent four common contexts of learning environments, online and offline, as well as fully and partially observable. As T-MTA is a general framework, it can be combined with any model-free DRL method. In this work, we examined two in particular: the Deep Q-Network approach and its variants, and Truly Proximal Policy Optimization. Our results show that T-MTA significantly outperforms competing baseline frameworks, including a standalone Time-aware DRL framework, MTAs, and the original DRL methods without considering either type of temporal aspect, especially when partially observable environments are involved and the range of time intervals is large. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Intelligence Springer Journals

Time-aware deep reinforcement learning with multi-temporal abstraction

Applied Intelligence , Volume 53 (17) – Sep 1, 2023

Loading next page...
 
/lp/springer-journals/time-aware-deep-reinforcement-learning-with-multi-temporal-abstraction-B6HfEGZq0m
Publisher
Springer Journals
Copyright
Copyright © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ISSN
0924-669X
eISSN
1573-7497
DOI
10.1007/s10489-022-04392-5
Publisher site
See Article on Publisher Site

Abstract

Deep reinforcement learning (DRL) is advantageous, but it rarely performs well when tested on real-world decision-making tasks, particularly those involving irregular time series with sparse actions. Although irregular time series with sparse actions can be handled using temporal abstractions for the agent to grasp high-level states, they aggravate temporal irregularities by increasing the range of time intervals essential to represent a state and estimate expected returns. In this work, we propose a general Time-aware DRL framework with Multi-Temporal Abstraction (T-MTA) that incorporates the awareness of time intervals from two aspects: temporal discounting and temporal abstraction. For the former, we propose a Time-aware DRL method, whereas for the latter we propose a Multi-Temporal Abstraction mechanism. T-MTA was tested in three standard RL testbeds and two real-life tasks (control of nuclear reactors and prevention of septic shock), which represent four common contexts of learning environments, online and offline, as well as fully and partially observable. As T-MTA is a general framework, it can be combined with any model-free DRL method. In this work, we examined two in particular: the Deep Q-Network approach and its variants, and Truly Proximal Policy Optimization. Our results show that T-MTA significantly outperforms competing baseline frameworks, including a standalone Time-aware DRL framework, MTAs, and the original DRL methods without considering either type of temporal aspect, especially when partially observable environments are involved and the range of time intervals is large.

Journal

Applied IntelligenceSpringer Journals

Published: Sep 1, 2023

Keywords: Time-aware; Temporal abstraction; Deep reinforcement learning; Irregular time series; RL for real-world applications; Nuclear reactor control; Sepsis treatment

References