Robotics and Computer-Integrated Manufacturing 20 (2004) 553–562
Learning policies for single machine job dispatching
Yi-Chi Wang
a,b
, John M. Usher
a,b,
Ã
a
Department of Information Management, Kun Shan University of Technology, Tainan Hsien 710, Taiwan
b
Department of Industrial Engineering, Mississippi State University, P.O. Box 9542, 260 McCain Bldg, Miss. Sate, MS 39762, USA
Accepted 24 May 2004
Abstract
Reinforcement learning (RL) has received some attention in recent years from agent-based researchers because it deals with the
problem of how an autonomous agent can learn to select proper actions for achieving its goals through interacting with its
environment. Each time after an agent performs an action, the environment’s response, as indicated by its new state, is used by the
agent to reward or penalize its action. The agent’s goal is to maximize the total amount of reward it receives over the long run.
Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems
has not been fully explored. In this study, a single machine agent employs the Q-learning algorithm to develop a decision-making
policy on selecting the appropriate dispatching rule from among three given dispatching rules. The system objective is to minimize
mean tardiness. This paper presents a factorialexperiment design for studying the settings used to apply Q-learning to the single
machine dispatching rule selection problem. The factors considered in this study include two related to the agent’s policy table
design and three for developing its reward function. This study not only investigates the main effects of this Q-learning application
but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based
production scheduling.
r 2004 Elsevier Ltd. All rights reserved.
Keywords: Reinforcement learning; Q-learning algorithm; Dispatching rule selection
1. Introduction
In recent years, a new paradigm called agent
technology has been widely recognized as a promising
paradigm for developing software applications able to
support complex tasks. An agent can be viewed as a
computational module that is able to act autonomously
to achieve its goal [1,2]. In fact, agents can be used to
represent physicalshop-floor components such as parts,
machines, tools, and even human beings. Each agent is
in charge of information collection, data storage, and
decision-making for the corresponding shop floor
component. A popular scheme to achieve cooperation
among autonomous agents is through the negotiation-
based contract-net protocol [3]. The contract-net proto-
colprovides the advantage of real-time information
exchange, making it suitable for shop floor scheduling
and control. The idea of the agent-based approaches has
also offered a promising solution for controlling future
manufacturing systems requiring flexibility, reliability,
adaptability, and reconfigurability [4].
In the application of multi-agent systems, one
significant issue for improving an autonomous agent’s
capability deals with the question of how to enhance an
agent’s intelligence. Learning is one mechanism that can
provide the ability for an agent to increase its
intelligence while in operation. Reinforcement learning
(RL), developed in the early 1990s, has generated a lot
of interest from the research community. As opposed to
the popular approach of supervised learning whereby an
agent learns from examples provided by a knowledge-
able external supervisor [2], reinforcement learning
requires that the agent learn by directly interacting with
ARTICLE IN PRESS
www.elsevier.com/locate/rcim
0736-5845/$ - see front matter r 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.rcim.2004.07.003
Ã
Corresponding author. Tel.: 662-325-7624; fax: 662-325-7618.
E-mail address: usher@engr.msstate.edu (J.M. Usher).