Jiang et al. / J Zhejiang Univ Sci A 2009 10(7):937-951
937
Monitoring correlative financial data streams
by local pattern similarity
*
Tao JIANG
†1
, Yu-cai FENG
†1
, Bin ZHANG
2
, Zhong-sheng CAO
1
, Ge FU
1
, Jie SHI
1
(
1
College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China)
(
2
Department of Computer Science, Hengyang Normal University, Hengyang 421008, China)
†
E-mail: jiangtao_albert@yahoo.cn; fyc@dameng.com
Received June 12, 2008; Revision accepted Nov. 25, 2008; Crosschecked May 8, 2009
Abstract: Developing tools for monitoring the correlations among thousands of financial data streams in an online fashion can
be interesting and useful work. We aimed to find highly correlative financial data streams in local patterns. A novel distance metric
function slope duration distance (SDD) is proposed, which is compatible with the characteristics of actual financial data streams.
Moreover, a model monitoring correlations among local patterns (MCALP) is presented, which dramatically decreases the
computational cost using an algorithm quickly online segmenting and pruning (QONSP) with O(1) time cost at each time tick t,
and our proposed new grid structure. Experimental results showed that MCALP provides an improvement of several orders of
magnitude in performance relative to traditional naive linear scan techniques and maintains high precision. Furthermore, the model
is incremental, parallelizable, and has a quick response time.
Key words: Data mining, Model, Data streams, Correlation, Local pattern, Pattern similarity
doi:10.1631/jzus.A0820445 Document code: A CLC number: TP391
INTRODUCTION
In many domains, including financial markets
and sensor networks, applications consist of data
streams. It is very important to monitor correlative
data streams for special applications but it is not easy
to process such data in an environment of high speed
data streams. This is because data stream time series
have their own special characteristics compared with
traditional archived data. First, in stream time series,
data is frequently updated. Thus, previous methods
applied to traditional archived data may not work in
this scenario. Second, owing to the frequent updates,
it is very difficult to store all the data in memory or on
disk, therefore, efficient and one-pass algorithms are
very important for achieving a real time response.
In this study, we deal with an important scenario
in stream applications where incoming data are from a
set of continuous financial stream time series. At each
timestamp t, a new data item is appended to the cor-
responding financial stream time series. We hope to
quickly find all the similar stream pairs up to the
current time, that have local pattern distances which
do not exceed a user-specified threshold
ε
, i.e., locally
correlated stream pairs. Fig.1 illustrates the problem.
It can be seen from Fig.1 that A and B are locally
correlated from t(187) to t(200).
Previous studies have approached the problem as
sub-sequence similarity matching with distance
function L
p
-norms (Agrawal et al., 1993) or dynamic
time warping (DTW) (Berndt and Clifford, 1996).
However, L
p
-norms requires two sub-sequences to
keep the same length, and DTW has a lower effi-
ciency by a direct implementation. In addition, many
methods in these studies focused on detecting a single
static pattern over multiple stream time series data, or
checking which pattern (from multiple static patterns)
was close to a single stream time series up to the
Journal of Zhejiang University SCIENCE A
ISSN 1673-565X (Print); ISSN 1862-1775 (Online)
www.zju.edu.cn/jzus; www.springerlink.com
E-mail: jzus@zju.edu.cn
*
Project (Nos. 2006AA01Z430 and 2007AA01Z309) supported by
the National Hi-Tech Research and Development Program (863) of
China