Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Analyzing and forecasting financial series with singular spectral analysis

Analyzing and forecasting financial series with singular spectral analysis 1IntroductionOne of the important problems of making management decisions based on monitoring is the low predictability of the resulting series. The reason for this is the high instability of some classes of open nonlinear systems categorized as dynamic chaos [8,10,11,21]. Examples of such unstable systems are gas-dynamic and hydrodynamic turbulent flows, high-temperature plasma, etc. [9,13,22]. In finance, chaotic behavior is particularly prominent in inertia-less environments such as electronic capital markets [1,15,17,26].Investigating a multidimensional process is meaningful only if the parameters of the observation vector are correlated. Otherwise, the solution is limited to sequentially examining one-dimensional processes. Inter-dependencies make a regularization of chaotic observations possible (in other words, forecasting in general). However, the conventional assessment of inter-dependencies encounters serious problems due to the chaotic nature of the series [14,15]. Making correlation estimates over a large time interval is not feasible, and immediate estimates on a limited observation window are not stable. In addition, correlations larger than 0.9 can lead to degeneracy of the observation matrix. Thus, an alternative approach based on either of two branches of singular value decomposition (SVD) appears promising immunocomputing (IC) [4,23,24] and singular spectrum analysis (SSA) [2,6,7]. IC has found its use in the tasks of information processing for molecular protein complexes of the immune system. First of all, it concerns the issues of recognition and classification of foreign cells intended for stimulating the body’s defense mechanisms [4,24]. As a result, the applied aspects of SVD analysis obtained the name of IC.This article investigates the possibility of applying IC and SSA to forecasting multidimensional chaotic environments. We consider highly dynamic chaotic processes which are not suitable for multivariate statistical analysis [2,19]. We use dimension reduction techniques based on the representation of data matrices in first singular basis in space.2Related workThe main directions of IC development are related to its practical applications, in particular, in classification and clustering problems. For example, recognition of multidimensional images uses vector projections on space generated by several singular components. This approach gives rise to a specific pseudometric (proximity measure) called the Lr distance [2,12].The problems of situational analysis are solved similarly: by mapping the observed situation, determined by state vector X0{X}_{0}, to a regular situation by the closest value of the pseudometric in Lr X1,…,Xk{X}_{1},\ldots ,{X}_{k}[3,20].In problems of approximation of random fields, the value of the sought hypersurface f(x0)f\left({x}_{0})is estimated with linear interpolation over kknearest points x1,…,xk{x}_{1},\ldots ,{x}_{k}: f=c1f(x1)+…+ckf(xk)f={c}_{1}f\left({x}_{1})+\ldots +{c}_{k}f\left({x}_{k}), where cj=1+dj∑i≠jkdi−1−1{c}_{j}={\left(1+{d}_{j}{\sum }_{i\ne j}^{k}{d}_{i}^{-1}\right)}^{-1}. A variant of the Lr pseudometric is used as dj,j=1,…,k{d}_{j},\hspace{0.33em}j=1,\ldots ,kfrom x0{x}_{0}to xj,j=1,…,k{x}_{j},\hspace{0.33em}j=1,\ldots ,k.If a segment of an mm-dimensional series is represented as a transport matrix of size <n×m>\lt n\times m\gt , then it can be approximated by a sum of elementary matrices of unit rank, thus making it possible to analyze separately series terms of the simplified structure. This approach significantly reduces the dimension of the original problem.An interesting direction in analyzing one-dimensional random series is a group of methods based on embedding a time series in a multidimensional space followed by a singular decomposition of the resulting Hankel matrix. This approach is also based on SVD and is known in the literature as “Caterpillar” [16,18]. The Caterpillar method identifies time series components and solves the problems of forecasting, parameter estimation and detecting various types of decomposition. The applications of this approach that use the projection into the space of principal singular components are based on the application of the Euclidean metric in the space of projections, which makes it much easier to analyze and interpret the results.3Methods3.1Basics of IC and SSA approachesWe have a traditional problem for data analysis of representing the structure of mm-dimensional data with kkgeneralized features, where k<p<mk\lt p\lt m. It is usually solved using the principal component method. However, its effectiveness significantly depends on the correlation properties of the observation matrix. The correlations between variables in a multidimensional chaotic process change rapidly and within very wide limits. Very strong correlation can lead to degeneracy or poor conditionality of the evaluation task. Therefore, we consider the singular decomposition technique (or IC) as an alternative approach to the problem of data dimension compression.The problem consists in approximating a matrix of multidimensional observations XXof size <n×m>\lt n\times m\gt , n>mn\gt mand rank p≤mp\le mby another matrix Y of a smaller rank k<pk\lt p. The corresponding approximation is carried out by minimizing the quadratic distance between the matrices (1)(X−Y)T(X−Y)=min,{\left(X-Y)}^{T}\left(X-Y)=\min ,restricting rank(Y)=k<min(n,p){\rm{rank}}\left(Y)=k\lt \min \left(n,\hspace{0.33em}p).The solution of this problem was found in [5]. A real observation matrix XXof dimension ⟨n×m⟩\langle n\times m\rangle can be represented with an SVD (LR-decomposition): (2)X=L∗S∗RT,X=L\ast S\ast {R}^{T},where S=diag(s1,s2,…,sn)S={\rm{diag}}\left({s}_{1},{s}_{2},\ldots ,{s}_{n})is a diagonal matrix whose elements s1≥s2≥…≥sn≥0{s}_{1}\ge {s}_{2}\hspace{0.33em}\ge \ldots \ge {s}_{n}\ge 0are called singular values of the matrix XX.LLis a matrix of size ⟨n×m⟩\langle n\times m\rangle , the columns L1,…,Ln{L}_{1},\ldots ,{L}_{n}of which are orthogonal vectors of unit length, i.e., LTL=LLT=E{L}^{T}L=L{L}^{T}=E, where EEis the unit matrix. These columns are the left singular vectors of XX.RRis a matrix of size ⟨m×m⟩\langle m\times m\rangle , the columns R1,…,Rn{R}_{1},\ldots ,{R}_{n}of which are also orthogonal vectors of unit length RTR=RRT=E{R}^{T}R=R{R}^{T}=Ewhich are called the right singular vectors of XX. These vectors are orthogonal in the Euclidean sense; from a probabilistic point of view, they are correlated.If the rank of the observation matrix is rank(X)=p<m{\rm{rank}}\left(X)=p\lt m, then among the singular numbers only ppwill be nonzero. In this case, the decomposition (2) can be rewritten as a sum of elementary matrices of unit rank: (3)X=∑i=1psiLiRiT=s1L1R1T+⋯+spLpRpT.X=\mathop{\sum }\limits_{i=1}^{p}{s}_{i}{L}_{i}{R}_{i}^{T}={s}_{1}{L}_{1}{R}_{1}^{T}+\cdots +{s}_{p}{L}_{p}{R}_{p}^{T}.According to the Eckart-Young theorem [2,5], the solution of optimization problem (1) is the sum of the first kkterms in (3), i.e., X≅Y=∑i=1psiLiRiT=s1L1R1T+⋯+skLkRkT.X\cong Y={\sum }_{i=1}^{p}{s}_{i}{L}_{i}{R}_{i}^{T}={s}_{1}{L}_{1}{R}_{1}^{T}+\cdots +{s}_{k}{L}_{k}{R}_{k}^{T}.With k=1k=1(one-dimensional case), the best approximation is given by the first (maximum) singular value and the corresponding singular vectors A≅s1L1R1′A\cong {s}_{1}{L}_{1}{R}_{1}^{^{\prime} }. The matrix of observations XXin this case turns into the sum of a small number of matrix segments of the same dimension, but of a very simple structure: each of them is a matrix of unit rank.An important feature of SVD is its stability to small perturbations of the observation matrix. In other words, this representation of each matrix is a well-conditioned procedure. Such properties are not characteristic of the traditional spectral decomposition used in problems of multidimensional statistical analysis. As it has been already noted, this is important for processing multidimensional chaotic processes with a strongly pronounced dependence between the individual parameters of the observation vector.Singular matrix decomposition is stable to small matrix perturbations, i.e., it is a well-conditioned procedure. Such properties are not characteristic of spectral decomposition used in problems of multidimensional statistical analysis. Several main developments have arisen from this approach. In problems of recognition, classification and clustering, vectors are projected on the space generated by several singular components (3), which generates a specific pseudometric [12, 24].The tasks of situational analysis are solved in a similar fashion: the observed situation x0{x}_{0}is associated with the closest by the pseudometric of the regular situations x1,…,xk{x}_{1},\ldots ,{x}_{k}[24].In random field interpolation, f(x0)f\left({x}_{0})is estimated via linear interpolation on k nearest points x1,…,xk{x}_{1},\ldots ,{x}_{k}: f=c1f(x1)+…+ckf(xk),f={c}_{1}f\left({x}_{1})+\ldots +{c}_{k}f\left({x}_{k}),where cj=11+dj∑i≠jk1di.{c}_{j}=\frac{1}{1+{d}_{j}{\sum }_{i\ne j}^{k}\frac{1}{{d}_{i}}}.This approach stands out by employing the metric used in the projection space as a measure of proximity dj{d}_{j}from x0{x}_{0}to xj{x}_{j}.If a segment of an mm-dimensional series is represented as a matrix ⟨n×m⟩\langle n\times m\rangle , then it can be approximated by the sum of elementary matrices of unit rank, thus making it possible to analyze separately series-terms of the simplified structure. This significantly reduces the dimensionality of the original problem.Based on these approaches as well as Caterpillar method, new algorithms for identifying local structure of multidimensional chaotic time series can be built.3.2Singular analysis algorithms for individual selected componentsLet the selected component be a one-dimensional time series y=(y1,…,yN)y=({y}_{1},\ldots ,{y}_{N}). We will match it with a Hankel matrix of size K×LK\times L: Y=y1y2…yLy2y3…yL+1⋮⋮⋱⋮yKyK+1…yL+K+1,L+K+1=N,Y=\left[\begin{array}{cccc}{y}_{1}& {y}_{2}& \ldots & {y}_{L}\\ {y}_{2}& {y}_{3}& \ldots & {y}_{L+1}\\ \vdots & \vdots & \ddots & \vdots \\ {y}_{K}& {y}_{K+1}& \ldots & {y}_{L+K+1}\end{array}\right],\hspace{1.0em}L+K+1=N,where LLis the width of the sliding window. Let us construct a decomposition (3) for it. Then, to each real harmonic of the series y1,…,yN{y}_{1},\ldots ,{y}_{N}corresponds m∗,m∗<p{m}^{\ast },\hspace{0.0em}{m}^{\ast }\lt pdifferent singular numbers, and it is determined by the sum of m∗{m}^{\ast }terms in Eq. (3) that correspond to these numbers. To fully restore such a component, it is necessary to average over the diagonals of the matrix XXof the same name.At this stage, the choice of the width of the sliding window LLis the most problematic issue. In the process of optimizing the algorithm, one can also vary the window shift parameter dd.One of the problems is that it is impossible to build a hierarchy of the corresponding components based on the values of singular numbers si{s}_{i}, i=1,…,pi=1,\ldots ,p: the periodic component that is important for analysis is not necessarily associated with one of the largest singular values. This problem is especially acute for the rapidly changing chaotic processes characteristic of the dynamics of quotations of financial instruments. In the process of tracking the local fluctuations of the components, it is necessary to consider the sequential differences of the singular values determined on each window. The appearance of zero in such a sequence indicates the presence of a quasi-periodic component, which can be visualized either by highlighting the corresponding singular components or by considering the change in a pair of components with equal (or sufficiently close) singular numbers on their phase plane [2,6].3.3Analyzing correlation properties of financial seriesAs an example of initial data, consider a segment of observations of centered quotation values of five currency instruments with the highest degree of correlation over an observation interval of 500 days (Figure 1).Figure 1An example of the dynamics of quotations of five currency instruments with the highest degree of correlation in the observation interval T=500T=500days.The instruments (EURUSD, NZDUSD, AUDUSD, AUDJPY and NZDJPY) were selected according to the degree of correlation from 16 overall most common currency instruments. The reason for the interdependence of currency pairs is largely related to the nature of international trade and global financial flows. The currencies of countries with large trade deficits tend to have a negative correlation with countries that run surpluses. Similarly, the currencies of wealthy commodity exporters will often have a negative correlation with countries that rely heavily on imports. The color representation of their correlation matrix is shown in Figure 2. The color scheme for their correlation matrix is shown in Figure 3. The correlation estimates based on the data obtained over the entire observation interval range oscillate within 0.91–0.98, which indicates poor conditionality of the observation matrix.Figure 2Correlation matrix estimates for 16 common currency instruments.Figure 3Correlation matrix estimates for the five currency instruments with the highest correlation.The obtained conclusion indicates that in the problems of integral representation of the dynamics of market segments, transition from the traditional approach based on component data analysis to IC-based (i.e., SSA algorithm) representations would be practical.3.4Calculating singular components for multidimensional series of the currency exchange market stateLet a matrix of observations X⟨n×m⟩X\langle n\times m\rangle , n>mn\gt mcorrespond to the example with five correlated financial instruments. We are going to construct an SVD (2) for this matrix. Limiting ourselves to three projections k=3k=3, we present the corresponding terms as X(i)=LisiRiT,i=1,…,k.X\left(i)={L}_{i}{s}_{i}{R}_{i}^{T},\hspace{1.0em}i=1,\ldots ,k.Each projection is a matrix ⟨n×m⟩\langle n\times m\rangle of unit rank, so only the first column x1(i){x}_{1}\left(i), i=1,…,ki=1,\ldots ,kand the following set of coefficients need to be extracted from each projection X(i)X\left(i): (4)Cj(i)=∑j=1mxj(i)∑j=1mxj(1),j=1,…,m,{C}_{j}\left(i)=\frac{{\sum }_{j=1}^{m}{x}_{j}\left(i)}{{\sum }_{j=1}^{m}{x}_{j}\left(1)},\hspace{1.0em}j=1,\ldots ,m,which will be needed to return to the original variables.Figure 4 plots changes in the values of the first three singular components at the observation interval T=200T=200 minute counts. This observation interval is extracted for an example from the general dataset shown in Figure 1.Figure 4Plots of changes in the values of the first three singular components on observation interval T=200T=200 minute counts.It is important to point out that for the example at k=3k=3, the variance criterion D(k)=∑j=1ksj∑j=1msjD\left(k)=\frac{{\sum }_{j=1}^{k}{s}_{j}}{{\sum }_{j=1}^{m}{s}_{j}}is equal to 0.98, so that the transition to decomposition (3) occurs almost without loss of information.Figure 5 shows a plot of logarithms of individual singular components. We can infer that various singular components are unequally significant, and that it is possible to limit the analysis results to 2–3 components, which solves the problem of visualizing multidimensional data.Figure 5Logarithms of individual singular values in descending order.3.5Forecasting using SSAFix a sliding window XX, LLcounts wide.At each step, construct an SVD for matrix XX. The ratio of the sum of several first singular values to the sum of all singular values is interpreted as the fraction of information explained by these first singular components.Each selected component Xi{X}_{i}is a matrix of unit rank and the same dimensions as XX, therefore, only their first columns (or rows) and coefficients (4) for recovering their estimates are necessary for further analysis.Apply one of the SSA procedures [2,6] to the selected one-dimensional series to provide filtering, interpolation and forecasting.Using the estimates, recover matrices Xi{X}_{i}and calculate their weighted sum, which is the estimate for the initial matrix XX.Correct the estimate of XXwith respect to the bias and scale parameters.4Computational experimentsAs an example of using the fusion of IC and SSA, we analyze the above-presented data of five highly correlated currency instruments. Using the Caterpillar method, we select a segment of series of LLminute counts and construct a forecast for each component using SSA. Figure 6 shows plots of the resulting forecast with the use of extrapolation of singular components, followed by recalculation into the initial dimension of the observed chaotic process.Figure 6Predicting a chaotic process using the fusion of IC and SSA (sliding window width L=2minL=2\hspace{0.33em}{\rm{\min }}, prediction horizon 30 s, step 4 s).The figure implies a fundamental feasibility of restoring a correct approximation of a chaotic process and the possibility of forecasting it using pre-aggregated data based on SSA.This approach is promising: the aggregated form contains information about the mood of a given market segment, and the forecast takes into account its general trends. Another expected advantage of this technique is its increased resistance to strong correlation between individual financial instruments of the selected segment.Scanning the entire series rrcounts ahead (count step is 4 s, width of sliding window L=2minL=2\hspace{0.33em}{\rm{\min }}, window shift d=1mind=1\hspace{0.33em}{\rm{\min }}; Figure 7), we find out that the SD of the forecast increases monotonically with the increase of rr, and bias decreases at first, but begins increasing with r=8r=8and so on. We can conclude that this approach limits the forecast horizon rrat around 7 steps or 30 s. At the same time, standard procedures of local polynomial forecasting [16,18] possess the same qualities only when forecasting 1 or 2 steps (not more than 10 s) ahead. However, the final conclusions about the feasibility of employing SSA in the tasks of proactive management require additional research. In particular, it would be interesting to compare the proposed technique with component data analysis, at least in conditions of low dimensions that would allow such a comparison.Figure 7Standard deviation and bias of predictions depending on prediction horizon rr(averaging by series at Figure 1 and by 5 components, L=2minL=2\hspace{0.33em}{\rm{\min }}, count step 4 s).5ConclusionRepresentation of multivariate data matrices in the first singular basis and dimension reduction methods make it possible to undergo transition from a multidimensional time series to an integral curve in a low-dimensional space. This curve can be interpreted as a phase trajectory in a generalized state space. The proposed transformation conforms to the set of constraint characteristic of most other similar approaches [19,25]. In particular, singular representation can be implemented for highly correlated and highly variable series, for which the observation matrix may be poorly conditioned with all the ensuing consequences of incorrectly formulated identification and forecasting problems. We propose a technique whose feasibility highly depends on initial properties of data. Best predictions may be supposed when the dimension of the data set is compared to length of adequate sliding window and when data include some latent periodics.At the same time, it should be taken into account that data integration can create uncertainty in reverse transformation, which can lead to ambiguous and paradoxical results of restoring one-dimensional components, especially in forecasting problems.The proposed technique acquires a significant role in the tasks of proactive management of financial instruments at capital markets. The data model that is closest to reality is based on the concept of stochastic chaos, which means that series are fundamentally non-stationary and non-ergodic. This forces us to operate with these data inside a limited sliding window. Under these conditions, increased stability of singular components to variations in the statistical structure of the source data may increase the effectiveness of the entire asset management strategy.In particular, SSA can be useful in rapid detection of significant discrepancies in quotations. Furthermore, it appears suitable for precedent analysis that deals with similarity metrics for multidimensional observation segments. These issues, along with selecting a data compression technique for observation segments with different dynamic characteristics, are the subject of our further research. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Dependence Modeling de Gruyter

Analyzing and forecasting financial series with singular spectral analysis

Loading next page...
 
/lp/de-gruyter/analyzing-and-forecasting-financial-series-with-singular-spectral-LAYb3FHyYA
Publisher
de Gruyter
Copyright
© 2022 Andrey Makshanov et al., published by De Gruyter
ISSN
2300-2298
eISSN
2300-2298
DOI
10.1515/demo-2022-0112
Publisher site
See Article on Publisher Site

Abstract

1IntroductionOne of the important problems of making management decisions based on monitoring is the low predictability of the resulting series. The reason for this is the high instability of some classes of open nonlinear systems categorized as dynamic chaos [8,10,11,21]. Examples of such unstable systems are gas-dynamic and hydrodynamic turbulent flows, high-temperature plasma, etc. [9,13,22]. In finance, chaotic behavior is particularly prominent in inertia-less environments such as electronic capital markets [1,15,17,26].Investigating a multidimensional process is meaningful only if the parameters of the observation vector are correlated. Otherwise, the solution is limited to sequentially examining one-dimensional processes. Inter-dependencies make a regularization of chaotic observations possible (in other words, forecasting in general). However, the conventional assessment of inter-dependencies encounters serious problems due to the chaotic nature of the series [14,15]. Making correlation estimates over a large time interval is not feasible, and immediate estimates on a limited observation window are not stable. In addition, correlations larger than 0.9 can lead to degeneracy of the observation matrix. Thus, an alternative approach based on either of two branches of singular value decomposition (SVD) appears promising immunocomputing (IC) [4,23,24] and singular spectrum analysis (SSA) [2,6,7]. IC has found its use in the tasks of information processing for molecular protein complexes of the immune system. First of all, it concerns the issues of recognition and classification of foreign cells intended for stimulating the body’s defense mechanisms [4,24]. As a result, the applied aspects of SVD analysis obtained the name of IC.This article investigates the possibility of applying IC and SSA to forecasting multidimensional chaotic environments. We consider highly dynamic chaotic processes which are not suitable for multivariate statistical analysis [2,19]. We use dimension reduction techniques based on the representation of data matrices in first singular basis in space.2Related workThe main directions of IC development are related to its practical applications, in particular, in classification and clustering problems. For example, recognition of multidimensional images uses vector projections on space generated by several singular components. This approach gives rise to a specific pseudometric (proximity measure) called the Lr distance [2,12].The problems of situational analysis are solved similarly: by mapping the observed situation, determined by state vector X0{X}_{0}, to a regular situation by the closest value of the pseudometric in Lr X1,…,Xk{X}_{1},\ldots ,{X}_{k}[3,20].In problems of approximation of random fields, the value of the sought hypersurface f(x0)f\left({x}_{0})is estimated with linear interpolation over kknearest points x1,…,xk{x}_{1},\ldots ,{x}_{k}: f=c1f(x1)+…+ckf(xk)f={c}_{1}f\left({x}_{1})+\ldots +{c}_{k}f\left({x}_{k}), where cj=1+dj∑i≠jkdi−1−1{c}_{j}={\left(1+{d}_{j}{\sum }_{i\ne j}^{k}{d}_{i}^{-1}\right)}^{-1}. A variant of the Lr pseudometric is used as dj,j=1,…,k{d}_{j},\hspace{0.33em}j=1,\ldots ,kfrom x0{x}_{0}to xj,j=1,…,k{x}_{j},\hspace{0.33em}j=1,\ldots ,k.If a segment of an mm-dimensional series is represented as a transport matrix of size <n×m>\lt n\times m\gt , then it can be approximated by a sum of elementary matrices of unit rank, thus making it possible to analyze separately series terms of the simplified structure. This approach significantly reduces the dimension of the original problem.An interesting direction in analyzing one-dimensional random series is a group of methods based on embedding a time series in a multidimensional space followed by a singular decomposition of the resulting Hankel matrix. This approach is also based on SVD and is known in the literature as “Caterpillar” [16,18]. The Caterpillar method identifies time series components and solves the problems of forecasting, parameter estimation and detecting various types of decomposition. The applications of this approach that use the projection into the space of principal singular components are based on the application of the Euclidean metric in the space of projections, which makes it much easier to analyze and interpret the results.3Methods3.1Basics of IC and SSA approachesWe have a traditional problem for data analysis of representing the structure of mm-dimensional data with kkgeneralized features, where k<p<mk\lt p\lt m. It is usually solved using the principal component method. However, its effectiveness significantly depends on the correlation properties of the observation matrix. The correlations between variables in a multidimensional chaotic process change rapidly and within very wide limits. Very strong correlation can lead to degeneracy or poor conditionality of the evaluation task. Therefore, we consider the singular decomposition technique (or IC) as an alternative approach to the problem of data dimension compression.The problem consists in approximating a matrix of multidimensional observations XXof size <n×m>\lt n\times m\gt , n>mn\gt mand rank p≤mp\le mby another matrix Y of a smaller rank k<pk\lt p. The corresponding approximation is carried out by minimizing the quadratic distance between the matrices (1)(X−Y)T(X−Y)=min,{\left(X-Y)}^{T}\left(X-Y)=\min ,restricting rank(Y)=k<min(n,p){\rm{rank}}\left(Y)=k\lt \min \left(n,\hspace{0.33em}p).The solution of this problem was found in [5]. A real observation matrix XXof dimension ⟨n×m⟩\langle n\times m\rangle can be represented with an SVD (LR-decomposition): (2)X=L∗S∗RT,X=L\ast S\ast {R}^{T},where S=diag(s1,s2,…,sn)S={\rm{diag}}\left({s}_{1},{s}_{2},\ldots ,{s}_{n})is a diagonal matrix whose elements s1≥s2≥…≥sn≥0{s}_{1}\ge {s}_{2}\hspace{0.33em}\ge \ldots \ge {s}_{n}\ge 0are called singular values of the matrix XX.LLis a matrix of size ⟨n×m⟩\langle n\times m\rangle , the columns L1,…,Ln{L}_{1},\ldots ,{L}_{n}of which are orthogonal vectors of unit length, i.e., LTL=LLT=E{L}^{T}L=L{L}^{T}=E, where EEis the unit matrix. These columns are the left singular vectors of XX.RRis a matrix of size ⟨m×m⟩\langle m\times m\rangle , the columns R1,…,Rn{R}_{1},\ldots ,{R}_{n}of which are also orthogonal vectors of unit length RTR=RRT=E{R}^{T}R=R{R}^{T}=Ewhich are called the right singular vectors of XX. These vectors are orthogonal in the Euclidean sense; from a probabilistic point of view, they are correlated.If the rank of the observation matrix is rank(X)=p<m{\rm{rank}}\left(X)=p\lt m, then among the singular numbers only ppwill be nonzero. In this case, the decomposition (2) can be rewritten as a sum of elementary matrices of unit rank: (3)X=∑i=1psiLiRiT=s1L1R1T+⋯+spLpRpT.X=\mathop{\sum }\limits_{i=1}^{p}{s}_{i}{L}_{i}{R}_{i}^{T}={s}_{1}{L}_{1}{R}_{1}^{T}+\cdots +{s}_{p}{L}_{p}{R}_{p}^{T}.According to the Eckart-Young theorem [2,5], the solution of optimization problem (1) is the sum of the first kkterms in (3), i.e., X≅Y=∑i=1psiLiRiT=s1L1R1T+⋯+skLkRkT.X\cong Y={\sum }_{i=1}^{p}{s}_{i}{L}_{i}{R}_{i}^{T}={s}_{1}{L}_{1}{R}_{1}^{T}+\cdots +{s}_{k}{L}_{k}{R}_{k}^{T}.With k=1k=1(one-dimensional case), the best approximation is given by the first (maximum) singular value and the corresponding singular vectors A≅s1L1R1′A\cong {s}_{1}{L}_{1}{R}_{1}^{^{\prime} }. The matrix of observations XXin this case turns into the sum of a small number of matrix segments of the same dimension, but of a very simple structure: each of them is a matrix of unit rank.An important feature of SVD is its stability to small perturbations of the observation matrix. In other words, this representation of each matrix is a well-conditioned procedure. Such properties are not characteristic of the traditional spectral decomposition used in problems of multidimensional statistical analysis. As it has been already noted, this is important for processing multidimensional chaotic processes with a strongly pronounced dependence between the individual parameters of the observation vector.Singular matrix decomposition is stable to small matrix perturbations, i.e., it is a well-conditioned procedure. Such properties are not characteristic of spectral decomposition used in problems of multidimensional statistical analysis. Several main developments have arisen from this approach. In problems of recognition, classification and clustering, vectors are projected on the space generated by several singular components (3), which generates a specific pseudometric [12, 24].The tasks of situational analysis are solved in a similar fashion: the observed situation x0{x}_{0}is associated with the closest by the pseudometric of the regular situations x1,…,xk{x}_{1},\ldots ,{x}_{k}[24].In random field interpolation, f(x0)f\left({x}_{0})is estimated via linear interpolation on k nearest points x1,…,xk{x}_{1},\ldots ,{x}_{k}: f=c1f(x1)+…+ckf(xk),f={c}_{1}f\left({x}_{1})+\ldots +{c}_{k}f\left({x}_{k}),where cj=11+dj∑i≠jk1di.{c}_{j}=\frac{1}{1+{d}_{j}{\sum }_{i\ne j}^{k}\frac{1}{{d}_{i}}}.This approach stands out by employing the metric used in the projection space as a measure of proximity dj{d}_{j}from x0{x}_{0}to xj{x}_{j}.If a segment of an mm-dimensional series is represented as a matrix ⟨n×m⟩\langle n\times m\rangle , then it can be approximated by the sum of elementary matrices of unit rank, thus making it possible to analyze separately series-terms of the simplified structure. This significantly reduces the dimensionality of the original problem.Based on these approaches as well as Caterpillar method, new algorithms for identifying local structure of multidimensional chaotic time series can be built.3.2Singular analysis algorithms for individual selected componentsLet the selected component be a one-dimensional time series y=(y1,…,yN)y=({y}_{1},\ldots ,{y}_{N}). We will match it with a Hankel matrix of size K×LK\times L: Y=y1y2…yLy2y3…yL+1⋮⋮⋱⋮yKyK+1…yL+K+1,L+K+1=N,Y=\left[\begin{array}{cccc}{y}_{1}& {y}_{2}& \ldots & {y}_{L}\\ {y}_{2}& {y}_{3}& \ldots & {y}_{L+1}\\ \vdots & \vdots & \ddots & \vdots \\ {y}_{K}& {y}_{K+1}& \ldots & {y}_{L+K+1}\end{array}\right],\hspace{1.0em}L+K+1=N,where LLis the width of the sliding window. Let us construct a decomposition (3) for it. Then, to each real harmonic of the series y1,…,yN{y}_{1},\ldots ,{y}_{N}corresponds m∗,m∗<p{m}^{\ast },\hspace{0.0em}{m}^{\ast }\lt pdifferent singular numbers, and it is determined by the sum of m∗{m}^{\ast }terms in Eq. (3) that correspond to these numbers. To fully restore such a component, it is necessary to average over the diagonals of the matrix XXof the same name.At this stage, the choice of the width of the sliding window LLis the most problematic issue. In the process of optimizing the algorithm, one can also vary the window shift parameter dd.One of the problems is that it is impossible to build a hierarchy of the corresponding components based on the values of singular numbers si{s}_{i}, i=1,…,pi=1,\ldots ,p: the periodic component that is important for analysis is not necessarily associated with one of the largest singular values. This problem is especially acute for the rapidly changing chaotic processes characteristic of the dynamics of quotations of financial instruments. In the process of tracking the local fluctuations of the components, it is necessary to consider the sequential differences of the singular values determined on each window. The appearance of zero in such a sequence indicates the presence of a quasi-periodic component, which can be visualized either by highlighting the corresponding singular components or by considering the change in a pair of components with equal (or sufficiently close) singular numbers on their phase plane [2,6].3.3Analyzing correlation properties of financial seriesAs an example of initial data, consider a segment of observations of centered quotation values of five currency instruments with the highest degree of correlation over an observation interval of 500 days (Figure 1).Figure 1An example of the dynamics of quotations of five currency instruments with the highest degree of correlation in the observation interval T=500T=500days.The instruments (EURUSD, NZDUSD, AUDUSD, AUDJPY and NZDJPY) were selected according to the degree of correlation from 16 overall most common currency instruments. The reason for the interdependence of currency pairs is largely related to the nature of international trade and global financial flows. The currencies of countries with large trade deficits tend to have a negative correlation with countries that run surpluses. Similarly, the currencies of wealthy commodity exporters will often have a negative correlation with countries that rely heavily on imports. The color representation of their correlation matrix is shown in Figure 2. The color scheme for their correlation matrix is shown in Figure 3. The correlation estimates based on the data obtained over the entire observation interval range oscillate within 0.91–0.98, which indicates poor conditionality of the observation matrix.Figure 2Correlation matrix estimates for 16 common currency instruments.Figure 3Correlation matrix estimates for the five currency instruments with the highest correlation.The obtained conclusion indicates that in the problems of integral representation of the dynamics of market segments, transition from the traditional approach based on component data analysis to IC-based (i.e., SSA algorithm) representations would be practical.3.4Calculating singular components for multidimensional series of the currency exchange market stateLet a matrix of observations X⟨n×m⟩X\langle n\times m\rangle , n>mn\gt mcorrespond to the example with five correlated financial instruments. We are going to construct an SVD (2) for this matrix. Limiting ourselves to three projections k=3k=3, we present the corresponding terms as X(i)=LisiRiT,i=1,…,k.X\left(i)={L}_{i}{s}_{i}{R}_{i}^{T},\hspace{1.0em}i=1,\ldots ,k.Each projection is a matrix ⟨n×m⟩\langle n\times m\rangle of unit rank, so only the first column x1(i){x}_{1}\left(i), i=1,…,ki=1,\ldots ,kand the following set of coefficients need to be extracted from each projection X(i)X\left(i): (4)Cj(i)=∑j=1mxj(i)∑j=1mxj(1),j=1,…,m,{C}_{j}\left(i)=\frac{{\sum }_{j=1}^{m}{x}_{j}\left(i)}{{\sum }_{j=1}^{m}{x}_{j}\left(1)},\hspace{1.0em}j=1,\ldots ,m,which will be needed to return to the original variables.Figure 4 plots changes in the values of the first three singular components at the observation interval T=200T=200 minute counts. This observation interval is extracted for an example from the general dataset shown in Figure 1.Figure 4Plots of changes in the values of the first three singular components on observation interval T=200T=200 minute counts.It is important to point out that for the example at k=3k=3, the variance criterion D(k)=∑j=1ksj∑j=1msjD\left(k)=\frac{{\sum }_{j=1}^{k}{s}_{j}}{{\sum }_{j=1}^{m}{s}_{j}}is equal to 0.98, so that the transition to decomposition (3) occurs almost without loss of information.Figure 5 shows a plot of logarithms of individual singular components. We can infer that various singular components are unequally significant, and that it is possible to limit the analysis results to 2–3 components, which solves the problem of visualizing multidimensional data.Figure 5Logarithms of individual singular values in descending order.3.5Forecasting using SSAFix a sliding window XX, LLcounts wide.At each step, construct an SVD for matrix XX. The ratio of the sum of several first singular values to the sum of all singular values is interpreted as the fraction of information explained by these first singular components.Each selected component Xi{X}_{i}is a matrix of unit rank and the same dimensions as XX, therefore, only their first columns (or rows) and coefficients (4) for recovering their estimates are necessary for further analysis.Apply one of the SSA procedures [2,6] to the selected one-dimensional series to provide filtering, interpolation and forecasting.Using the estimates, recover matrices Xi{X}_{i}and calculate their weighted sum, which is the estimate for the initial matrix XX.Correct the estimate of XXwith respect to the bias and scale parameters.4Computational experimentsAs an example of using the fusion of IC and SSA, we analyze the above-presented data of five highly correlated currency instruments. Using the Caterpillar method, we select a segment of series of LLminute counts and construct a forecast for each component using SSA. Figure 6 shows plots of the resulting forecast with the use of extrapolation of singular components, followed by recalculation into the initial dimension of the observed chaotic process.Figure 6Predicting a chaotic process using the fusion of IC and SSA (sliding window width L=2minL=2\hspace{0.33em}{\rm{\min }}, prediction horizon 30 s, step 4 s).The figure implies a fundamental feasibility of restoring a correct approximation of a chaotic process and the possibility of forecasting it using pre-aggregated data based on SSA.This approach is promising: the aggregated form contains information about the mood of a given market segment, and the forecast takes into account its general trends. Another expected advantage of this technique is its increased resistance to strong correlation between individual financial instruments of the selected segment.Scanning the entire series rrcounts ahead (count step is 4 s, width of sliding window L=2minL=2\hspace{0.33em}{\rm{\min }}, window shift d=1mind=1\hspace{0.33em}{\rm{\min }}; Figure 7), we find out that the SD of the forecast increases monotonically with the increase of rr, and bias decreases at first, but begins increasing with r=8r=8and so on. We can conclude that this approach limits the forecast horizon rrat around 7 steps or 30 s. At the same time, standard procedures of local polynomial forecasting [16,18] possess the same qualities only when forecasting 1 or 2 steps (not more than 10 s) ahead. However, the final conclusions about the feasibility of employing SSA in the tasks of proactive management require additional research. In particular, it would be interesting to compare the proposed technique with component data analysis, at least in conditions of low dimensions that would allow such a comparison.Figure 7Standard deviation and bias of predictions depending on prediction horizon rr(averaging by series at Figure 1 and by 5 components, L=2minL=2\hspace{0.33em}{\rm{\min }}, count step 4 s).5ConclusionRepresentation of multivariate data matrices in the first singular basis and dimension reduction methods make it possible to undergo transition from a multidimensional time series to an integral curve in a low-dimensional space. This curve can be interpreted as a phase trajectory in a generalized state space. The proposed transformation conforms to the set of constraint characteristic of most other similar approaches [19,25]. In particular, singular representation can be implemented for highly correlated and highly variable series, for which the observation matrix may be poorly conditioned with all the ensuing consequences of incorrectly formulated identification and forecasting problems. We propose a technique whose feasibility highly depends on initial properties of data. Best predictions may be supposed when the dimension of the data set is compared to length of adequate sliding window and when data include some latent periodics.At the same time, it should be taken into account that data integration can create uncertainty in reverse transformation, which can lead to ambiguous and paradoxical results of restoring one-dimensional components, especially in forecasting problems.The proposed technique acquires a significant role in the tasks of proactive management of financial instruments at capital markets. The data model that is closest to reality is based on the concept of stochastic chaos, which means that series are fundamentally non-stationary and non-ergodic. This forces us to operate with these data inside a limited sliding window. Under these conditions, increased stability of singular components to variations in the statistical structure of the source data may increase the effectiveness of the entire asset management strategy.In particular, SSA can be useful in rapid detection of significant discrepancies in quotations. Furthermore, it appears suitable for precedent analysis that deals with similarity metrics for multidimensional observation segments. These issues, along with selecting a data compression technique for observation segments with different dynamic characteristics, are the subject of our further research.

Journal

Dependence Modelingde Gruyter

Published: Jan 1, 2022

Keywords: multidimensional chaotic processes; forecasting; singular spectrum analysis; immunocomputing; Forex; 37M20; 37M10; 90C90

There are no references for this article.