TY - JOUR AU - Alessandro, Giovannelli, AB - SUMMARY The autocovariance matrix of a stationary random process plays a central role in prediction theory and time series analysis. When the dimension of the matrix is of the same order of magnitude as the number of observations, the sample autocovariance matrix gives an inconsistent estimator. In the nonparametric framework, recent proposals have concentrated on banding and tapering the sample autocovariance matrix. We introduce an alternative approach via a modified Durbin–Levinson algorithm that receives as input the banded and tapered sample partial autocorrelations and returns a consistent and positive-definite estimator of the autocovariance matrix. We establish the convergence rate of our estimator and characterize the properties of the optimal linear predictor obtained from it. The computational complexity of the latter is of the order of the square of the banding parameter, which renders our method scalable for high-dimensional time series. 1. Introduction Estimation of the autocovariance matrix of a random process from a sample realization plays a central role in optimal linear prediction and signal extraction, and has applications in statistical signal processing, communications and information theory, speech recognition, neuroscience, and econometrics. Along with forecasting and power spectral estimation, this learning problem is also relevant to interpolation, noise separation, and the detection of sinusoidal signals by covariance analysis (Quinn & Hannan, 2001; Li, 2013). When the process is stationary, the |$n$|-dimensional autocovariance matrix is a Toeplitz matrix that depends solely on the autocovariances from lag 0 to lag |$n-1$|⁠. This paper is concerned with the high-dimensional setting which arises when the dimension |$n$| is of the same order of magnitude as the sample size. In this situation the sample autocovariance matrix is inconsistent, as was shown by Wu & Pourahmadi (2009). Wu & Pourahmadi (2009) and Bickel & Gel (2011) proposed banding the sample autocovariance matrix estimator, which, given a banding parameter |$\ell$|⁠, leaves the first |$\ell$| subdiagonals of the sample autocovariance matrix intact and sets the remaining ones to zero. McMurry & Politis (2010) proposed a banded and tapered estimator, which gradually shrinks the autocovariances to zero starting from lag |$\ell$|⁠. They developed the theory of optimal linear prediction using the resulting autocovariance sequence. The asymptotic properties of the two estimators were further investigated by Cai et al. (2013). Unfortunately, banding and tapering do not yield positive-definite estimators. We propose an autocovariance matrix estimator based on banding and tapering the sample partial autocorrelation sequence using a trapezoidal kernel with banding parameter |$\ell$|⁠, and then deriving the implied autocovariance sequence by inverting the Durbin–Levinson algorithm (Levinson, 1946; Durbin, 1960). The mapping yields a regularized sample autocovariance matrix that is positive definite. Our estimator exploits the fact that the partial autocorrelations provide a unique and unrestricted parameterization of the positive-definite autocovariance function of a stationary stochastic process, as shown by Barndorff–Nielsen & Schou (1973); see also Monahan (1984) and Bingham (2012). Outside the time series literature, the advantages of modelling a covariance matrix via partial autocorrelations have been discussed by Daniels & Pourahmadi (2009). Another advantage of our approach is that it produces a regularized estimator of the inverse autocovariance matrix with a band structure, enabling estimation of the oracle linear predictor in |$O(\ell^2)$| operations. Hence, optimal linear prediction is feasible also for large |$n$|⁠, with a complexity that increases only with the square of the banding parameter. 2. Methods 2.1. Banded and tapered estimators Let |$\{Y_t, \, t=1, \ldots \}$| be a real-valued discrete-time stationary random process with mean zero, autocovariance function |$\gamma(\,j) = E(Y_t Y_{t-j})$||$(\,j = 0, \pm 1, \ldots)$|⁠, and spectral density |$f(\omega) =(2\pi)^{-1} \sum_{j=-\infty}^\infty \gamma(\,j)\exp(-{\rm i} \omega j)$|⁠, |$\omega \in [-\pi,\pi],$| where |${\rm i}^2=-1$|⁠. The |$n\times n$| Toeplitz autocovariance matrix \begin{equation*} \Gamma_n = \left[\begin{array}{cccc} \gamma(0)&\;\gamma(1)&\cdots&\;\gamma(n-1)\\ \gamma(1)&\;\gamma(0)&\cdots&\;\gamma(n-2)\\ \vdots&\vdots&\ddots&\vdots\\ \gamma(n-1)\;&\;\gamma(n-2)& \cdots & \;\gamma(0)\\ \end{array}\right] \end{equation*} plays an important role in optimal linear prediction and interpolation from a finite sample. In particular, the minimum mean square error linear predictor of |$Y_{n+h}$| at time |$n$| based on |$Y_1, \ldots, Y_n$|⁠, for |$h \geq 1$|⁠, is |$\hat{Y}_{n+h|n} = \sum_{j=1}^{n} \phi_{nj}^{(h)} Y_{n-j+1}, $| where the coefficients |$\phi_n^{(h)} = [\phi_{n1}^{(h)}, \ldots, \phi_{nn}^{(h)}]^{ \mathrm{\scriptscriptstyle T} }$| are the solution to the Yule–Walker system |$\phi_n^{(h)} = \Gamma_n^{-1} \gamma_n^{(h)}$| with |$\gamma_n^{(h)} = [\gamma(h), \ldots, \gamma(h+n-1)]^{ \mathrm{\scriptscriptstyle T} }$|⁠. We consider the problem of estimating |$\Gamma_{n}$| from a time series realization |$\{y_t, \, t = 1, \ldots, n\}$|⁠. Denoting the sample autocovariance at lag |$k$| by \begin{equation} \hat{\gamma}(k) = \frac{1}{n} \sum_{t=k+1}^n y_{t}y_{t-k} \quad (k = 0, 1, \ldots, n-1), \end{equation} (1) the sample autocovariance matrix |$\hat{\Gamma}_n = \{\hat{\gamma}(|i-j|),\, i,\,j =1, \ldots, n\}$| is an inconsistent, positive-definite estimator. In particular, Wu & Pourahmadi (2009) have shown that the operator norm, i.e., the largest eigenvalue of the estimation error matrix |$\rho(\hat{\Gamma}_{n}- \Gamma_n)$|⁠, does not converge to zero as |$n$| tends to infinity. Wu & Pourahmadi (2009) and Bickel & Gel (2011) proposed the banded autocovariance matrix estimator \begin{equation} \hat{\Gamma}_{n,\ell} = \{\hat{\gamma}(|i-j|) I(|i-j|\leq \ell), \: i, j =1, \ldots, n\}, \end{equation} (2) where |$\ell$| is the banding parameter and |$I(\cdot)$| is the indicator function. Wu & Pourahmadi (2009) proved the consistency of (2) for the class of nonlinear short-range dependent processes considered by Wu (2005), and obtained an explicit upper bound for the operator norm of |$\hat{\Gamma}_{n,\ell}- \Gamma_n$|⁠. Bickel & Gel (2011) proved the consistency of (2) under the Frobenius norm, and proposed a crossvalidation method for selecting the optimal banding parameter. McMurry & Politis (2010) proposed the banded and tapered autocovariance matrix estimator \begin{equation*} \hat{\Gamma}_{n, {\rm MP} } = \{\hat{\gamma}(|i-j|) w(|i-j|), \: i, j =1, \ldots, n\} \end{equation*} with |$w(|i-j|) = \kappa\{(i-j)/\ell\}$|⁠, where |$\ell$| is the banding parameter and \begin{equation} \kappa(u) = \begin{cases} 1, & \;|u|\leq 1, \\ 2-|u|, &\; 1 < |u| \leq 2, \\ 0, & \;|u|> 2\text{.} \end{cases} \end{equation} (3) McMurry & Politis (2015) further developed the theory of optimal linear prediction using the banded and tapered sample autocovariance sequence. Both |$\hat{\Gamma}_{n,\ell}$| and |$\hat{\Gamma}_{n, {\rm MP}}$| preserve the Toeplitz structure, but they may not be positive definite. In general, the estimator |$\{\hat{\gamma}(|i-j|) w(|i-j|)\}$| is positive definite if and only if |$\sum_i \sum_j w(|i- j|)\times \exp(-{\rm i} \omega |i-j|) >0$| for |$\omega \in [-\pi, \pi]$|⁠, which is not satisfied by either the uniform or the trapezoidal kernel. McMurry & Politis (2015) presented and compared several alternative positivity corrections. 2.2. Regularized Durbin–Levinson algorithm The Durbin–Levinson algorithm processes the autocovariances |$\{\gamma(0), \ldots, \gamma(k)\}$| recursively for |$k=1, \ldots, n-1$|⁠, and computes the coefficients of the optimal one-step-ahead predictor at time |$t> k$|⁠, |$\hat{Y}_{t\mid t-1} = \sum_{j=1}^k \phi_{kj} Y_{t-j}$|⁠, based on |$k$| past values, where |$\phi_{kk}$| is the partial autocorrelation between |$Y_t$| and |$Y_{t-k}$|⁠, as well as the prediction error variance |$\varsigma_k = \text{var}(Y_{t}\mid Y_{t-1}, \ldots, Y_{t-k})$|⁠. It provides the solution to a Yule–Walker system of equations in |$O(k^2)$| operations, by making efficient use of the Toeplitz structure of |$\Gamma_{k+1}$|⁠. The following regularized Durbin–Levinson algorithm augments the usual recursions by equations that regularize the partial autocorrelation at lag |$k$| by applying the weight |$w_k= \kappa(k/\ell)$|⁠, where |$\kappa(|u|)$| is the trapezoidal kernel given in (3) and |$\ell$| is the banding parameter. It then computes the regularized autocovariance |$\gamma_{r}(k)$|⁠, corresponding to a process with partial autocorrelation function |$|w_k\phi_{kk}|<1$|⁠. Finally, it obtains the coefficients |$\pi_{kj}$||$(\,j = 1, \ldots, k)$| of the regularized predictor. The algorithm is initialized as follows: set |$\gamma_r(0) = \gamma(0)$| and \begin{equation} \begin{array}{lllcll} \varsigma_0 &=& \gamma(0), & v_0 &=& \gamma_r(0),\\ \phi_{11} &=& \gamma(1)/\gamma(0), \qquad\quad & \pi_{11} &=& w_1 \phi_{11}, \\ & & & \gamma_r(1) &=& v_0 \pi_{11}, \\ \varsigma_1 &=& (1-\phi_{11}^2)\varsigma_{0},\qquad & v_1 &=& (1-\pi_{11}^2)v_{0}\text{.}\\ \end{array} \end{equation} (4) Then, for |$k=2,\ldots, n-1$|⁠, the following recursions are run: \begin{equation} \begin{array}{rcllrcl} \phi_{kk}&=& \displaystyle\frac{\gamma(k)-\sum_{j=1}^{k-1} \phi_{k-1,\,j} \gamma(k-j)}{ \varsigma_{k-1}}, & & \pi_{kk}&=& w_k\phi_{kk}, \\ & & & &\gamma_r(k) &=& \sum_{j=1}^{k-1} \pi_{k-1,\,j} \gamma_r(k-j) + v_{k-1} \pi_{kk},\\ \phi_{kj}&=& \phi_{k-1,\,j} - \phi_{kk}\phi_{k-1, k-j}, & & \pi_{kj} &=& \pi_{k-1,\,j} - \pi_{kk}\pi_{k-1, k-j} \quad (\,j = 1, \ldots, k-1),\\ \varsigma_k &=& (1-\phi_{kk}^2)\varsigma_{k-1}, & & v_k &=& (1-\pi_{kk}^2)v_{k-1}\text{.} \\ \end{array} \end{equation} (5) The definitions and recursions on the left sides of (4) and (5) constitute the traditional Durbin–Levinson algorithm, mapping the autocovariances |$\gamma(k)$| into the partial autocorrelations |$\phi_{kk}$|⁠. The recursions on the right compute the regularized partial autocorrelations, |$\pi_{kk}= w_k\phi_{kk}$|⁠, and perform the reverse mapping leading to the regularized autocovariances, |$\gamma_r(k)$| (⁠|$k=0, \ldots, n-1$|⁠). If |$w_k = 1$| for all |$k$|⁠, we recover the usual Durbin–Levinson recursion yielding the raw partial autocorrelations, and |$\gamma_r(k) = \gamma(k)$| for all |$k$|⁠. If the uniform kernel were used, so that |$w_k=1$| for |$k\leq m$| and |$w_k=0$| for |$k>m$|⁠, then (5) would provide the coefficients of the Yule–Walker predictor based on |$m$| lagged values, i.e., the coefficients of the autoregressive predictor of order |$m$|⁠. In this case |$\gamma_r(k) = \gamma(k)$| for |$k =0,1, \ldots, m$|⁠, and |$\gamma_r(k) = \sum_{j=1}^{m} \phi_{mj} \gamma_r(k-j)$| for |$|k|> m$|⁠. The sequence |$\{\pi_{kk}, \,k = 1, \ldots, n-1\}$| is a proper partial autocorrelation function, as |$|\pi_{kk}|<1$| by construction. It is associated with an auxiliary process, |$Y_t^*$|⁠, which is autoregressive of order |$L = \lfloor 2\ell\rfloor$| for the trapezoidal kernel with banding parameter |$\ell$|⁠, since |$\pi_{kk}=0$| for |$k>L$|⁠. An alternative equivalent expression for the regularized partial autocorrelations is $$ \pi_{kk} = \frac{\gamma_r(k)-\sum_{j=1}^{k-1} \pi_{k-1,\,j} \gamma_r(k-j)}{\gamma_r(0)-\sum_{j=1}^{k-1} \pi_{k-1,\,j} \gamma_r(\,j)} = \frac{\text{cov}(Y_t^*, Y_{t-k}^*\mid Y_{t-1}^*, \ldots, Y_{t-k+1}^*)}{\text{var}(Y_t^* \mid Y_{t-1}^*, \ldots, Y_{t-k+1}^*)}\text{.} $$ The algorithm leaves the first |$\ell$| autocovariances intact, i.e., |$\gamma_r(\,j) = \gamma(\,j)$| for |$j\leq \ell$|⁠, because for the trapezoidal kernel |$w_j = 1$| for |$j\leq \ell$|⁠. The following two lemmata deal with the properties of the regularized Durbin–Levinson algorithm that will be essential for characterizing the properties of the estimator of the autocovariance matrix derived from it. Lemma 1. Let |$\gamma_r(k)$| and |$\pi_{n-1, k}$||$(k = 1, \ldots, n-1)$| be defined as in (4) and (5). Let |$0 \leq w_k \leq 1$||$(k = 1, \ldots, n-1)$| be the trapezoidal weighting functions with banding parameter |$\ell$|⁠. Then the following results hold. (i) The regularized autocovariance function |$\{\gamma_r(k), \,k = 0, \pm 1, \ldots\}$| is a positive-definite sequence. (ii) The roots of the polynomial |$\Pi_k(z) = 1-\sum_{j=1}^k \pi_{k j} z^{ j}$| lie strictly outside the unit circle; that is, |$\Pi_k(1/z)$| is a Schur, or minimum-phase, polynomial. (iii) We have that \begin{equation} \sum_{j=1}^{n-1} |\pi_{n-1,\,j}-\phi_{n-1,\,j}| \leq 2\sum_{j = \ell +1}^{n-1}|\phi_{jj}|\prod_{i=1}^{j-1}(1+|\phi_{ii}|)\text{.} \end{equation} (6) The proof is given in the Supplementary Material, along with proofs of the other theoretical results. Lemma 2 characterizes the nature of the tapered approximation of |$Y_t$|⁠. Lemma 2. Let the autocovariance function of |$Y_t$| be summable, with |$\sum_{j=-\infty}^{\infty} |j|^r |\gamma(\,j)|<\infty$| for |$r\in \mathbb{N} $| and |$\sum_j \gamma(\,j)\exp({-{\rm i} \omega j})>0$| for |$\omega \in [-\pi,\pi]$|⁠, and write the infinite autoregressive representation of |$Y_t$| as |$Y_t = \sum_{j=1}^\infty \phi_j Y_{t-j} + \epsilon_t$|⁠, with |$E(\epsilon_t) = 0, E(\epsilon^2) = \sigma^2$| and |$E( \epsilon_t \epsilon_{t-j}) = 0$| for |$j\neq 0$|⁠. Then \begin{align} \sum_{j=1}^{n-1} |\pi_{n-1,\,j}-\phi_j| & \leq C \sum_{j = \ell +1}^{\infty}|\phi_{jj}|, \nonumber\\ \log v_{n-1} -\log \sigma^2 & \leq \sum_{j=\ell+1}^{\infty} \frac{\phi_{jj}^2}{1-\phi_{jj}^2}\text{.} \end{align} (7) If, moreover, |$n, \ell \rightarrow \infty$| such that |$\ell=o(n)$|⁠, then |$\sum_{j=1}^\infty j^r |\gamma(\,j)| < \infty$| implies \begin{equation} \lim_{n\rightarrow \infty} \sum_{j=1}^{n-1} j^r|\gamma_r (\,j)| < \infty\text{.} \end{equation} (8) Remark 1. The assumption |$\sum_{j=-\infty}^{\infty} |j|^r |\gamma(\,j)|<\infty$| implies that the spectral density has |$r$| continuous derivatives. If |$Y_t$| is a stationary autoregressive moving average process, its autocovariance function decreases to zero at a geometric rate and the assumption holds for any positive integer |$r$|⁠. 2.3. Regularized Durbin–Levinson autocovariance matrix estimator Given the time series |$\{y_t, \, t = 1,\ldots, n\}$|⁠, our estimator of the autocovariance matrix is obtained by running the regularized Durbin–Levinson recursions (5) on the raw sample autocovariances |$\{\hat{\gamma}(k),\, k = 0, 1, \ldots, n-1\}$|⁠, where |$\hat{\gamma}(k)$| is given in (1). The regularized Durbin–Levinson algorithm shrinks the sample partial autocorrelations |$\hat{\phi}_{kk}$| towards zero by setting |$\tilde{\pi}_{kk}=w_k \hat{\phi}_{kk}$|⁠, and yields the regularized sample autocovariance sequence |$\{\tilde{\gamma}_r(k), \, k = 0, 1, \ldots, n-1\}$|⁠, along with the estimated autoregressive coefficients |$\tilde{\pi}_{kj}$||$(\,j= 1, \ldots, k-1)$|⁠. The Toeplitz matrix constructed from the regularized sample autocovariances, \begin{equation} \tilde{\Gamma}_n = \{\tilde{\gamma}_r(|i-j|), \: i, j = 1,\ldots, n \}, \end{equation} (9) will be referred to as the regularized Durbin–Levinson estimator of the autocovariance matrix. The estimator (9) is positive definite, as implied by Lemma 1. Furthermore, the output of the regularized Durbin–Levinson algorithm enables estimation of the inverse autocovariance matrix via the factorization (Pourahmadi, 2001, Ch. 7; Golub & Van Loan, 2012) $$\tilde{\Gamma}_n^{-1} = C_n^{ \mathrm{\scriptscriptstyle T} } D_n C_n,$$ where |$D_n = \text{diag}(\tilde{v}_0^{-1}, \tilde{v}_1^{-1}, \ldots, \tilde{v}_{n-1}^{-1} )$| and \begin{equation*} C_n = \left[\begin{array}{ccccc} 1 & 0 & 0& \cdots & 0 \\ -\tilde{\pi}_{11} & 1 & 0 & \cdots & 0\\ -\tilde{\pi}_{22} & -\tilde{\pi}_{21} & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ -\tilde{\pi}_{n-1,n-1}\; &\; -\tilde{\pi}_{n-1, n-2} \;&\; -\tilde{\pi}_{n-1, n-3} & \cdots & 1\\ \end{array}\right]\!\text{.} \end{equation*} Adoption of the trapezoidal kernel with banding parameter |$\ell$| gives |$\tilde{\pi}_{kj} = 0$| for |$k,j> L$|⁠, and hence |$\tilde{\Gamma}_n^{-1}$| is a band matrix with nonzero elements on the diagonal and the |$2L$| main subdiagonals. Thus, regularization of the partial autocorrelation function provides a way of regularizing the inverse autocovariance matrix. The sampling properties of the estimator (9) will be investigated under a suitable set of assumptions concerning the nature of the underlying random process and the design of the estimator. Assumption 1. Let |$\{Y_t\}$| be the stationary linear process |$Y_t = \sum_{j=0}^\infty \psi_j \epsilon_{t-j}$| with |$\psi_0=1$| and |$\epsilon_t$| satisfying: (i) |$E(\epsilon_t \mid \mathcal{F}_{t-1}) = 0$|⁠; (ii) |$E(\epsilon_t^2\mid \mathcal{F}_{t-1}) = \sigma^2 < \infty$| and |$E(\epsilon_t^4) < \infty$|⁠; (iii) |$f(\omega) = (2\pi)^{-1}\sum_{j=-\infty}^{\infty}\gamma(\,j) \exp(-{\rm i} \omega j) \neq 0$| for |$\omega \in [-\pi,\pi]$|⁠; (iv) |$\sum_{j=-\infty}^\infty (1+|j|)|\gamma(\,j)| < \infty$|⁠. Remark 2. Assumption 1(ii) can be relaxed to allow for conditional heteroscedasticity as in Goncalves & Kilian (2007), where the condition was replaced with the assumptions that |$E(\epsilon_t^2)=\sigma^2$| and that the joint cumulants of |$\epsilon_t$| up to order 8 are absolutely summable. The assumption that |$Y_t$| is a linear process can be relaxed as in Wu & Pourahmadi (2009) and McMurry & Politis (2010), where it was assumed that |$Y_t$| is a nonlinear process of the form |$Y_t = g(\epsilon_t, \epsilon_{t-1}, \ldots)$|⁠, with |$g(\cdot)$| being a measurable function of the independent and identically distributed random variables |$\epsilon_s$| (⁠|$s= t, t-1, \ldots$|⁠), having finite fourth moment and finite physical dependence (Wu, 2005). Remark 3. Assumptions 1(iii) and (iv) imply that |$\sum_j (1+|j|)|\psi_j|<\infty$| and that the infinite autoregressive representation |$Y_t = \sum_{j=1}^\infty \phi_j Y_{t-j}+ \epsilon_t$| has |$\sum_j (1+|j|)|\phi_j|<\infty$|⁠; see Theorem 3.8.4 in Brillinger (1981, p. 78). Moreover, the spectral density is continuous and bounded and there exist two positive real numbers |${M}_{\rm L} = \inf_\omega f(\omega)$| and |${M}_{\rm U}=\max_{\omega} f(\omega)$| such that the eigenvalues |$\lambda_1, \ldots, \lambda_n$| of |$\Gamma_n$| satisfy (Brockwell & Davis, 1991, Proposition 4.5.3) $$0<{M}_{\rm L}\leq \lambda_1 \leq \cdots \leq \lambda_n < {M}_{\rm U} < \infty\text{.}$$ Assumption 2. The following conditions hold for the banding parameter |$\ell$| and a rate |$p_n$| such that |$\ell 1$|⁠, the optimal rate for |$\ell$| is determined according to Assumption 2(v) by setting |$\beta = 1/(2d)$|⁠, so that |$r_n = O(n^{-(d-1)/(2d)})$|⁠, and |$\alpha <1-d^{-1}$|⁠. Thus, the optimized rate is |$p_n^{1/2}r_n = n^{-(d-1)/(2d)+\alpha/2}$|⁠. If |$|\gamma(\,j)| = O(|\xi|^{j})$| where |$|\xi|<1$|⁠, then |$r_n < \ell n^{-1/2} + C|\xi|^{\ell}$| for a positive constant |$C$|⁠; the optimal choice for |$\ell$|⁠, |$\:\ell = O\{\log(n)\}$|⁠, implies |$r_n = O(n^{-1/2}\log n)$|⁠. Hence |$p_n^{1/2}r_n = O(n^{(\alpha-1)/2}\log n)$| with |$\alpha < 1/2$|⁠. Remark 5. Wu & Pourahmadi (2009) and McMurry & Politis (2010) obtained the faster convergence rate |$r_n$| for the banded and tapered estimator. The rate |$p_n$| is not a tuning parameter and does not need to be chosen in practice; it guarantees that |$\sum_{p_n+1}^\infty |\tilde{\gamma}_r(\,j)| = o_{\rm p}(1)$|⁠. As was seen in the previous remark, it does not affect the optimal choice of |$\ell$|⁠, which coincides with that of McMurry & Politis (2010, Corollary 1). McMurry & Politis (2015) also needed to introduce a rate playing a similar role to |$p_n$| to establish the convergence of the best linear predictor to the oracle predictor. Bickel & Gel (2011) defined their |$\ell$|-banded estimator in terms of a submatrix of |$\hat{\Gamma}_n$| of dimension |$p_nq) = N^{-1} \sum_i I(\rho_1^{(i)} < \rho_2^{(i)}\mid m_n^ {(i)} > q_n^{(i)})$|⁠. The relative performances of the two estimators vis-à-vis the raw sample autocovariance matrix are measured by the average operator norm ratios |$A(\rho_1/\rho_0) = N^{-1} \sum_i (\rho_1^{(i)}/\rho_0^{(i)})$| and |$A(\rho_2/\rho_0) = N^{-1} \sum_i (\rho_2^{(i)}/\rho_0^{(i)})$|⁠. Finally, for comparative assessment of the out-of-sample forecasting performance of the regularized Durbin–Levinson predictor, we report the average ratios |$A(S_1/S_2) = N^{-1} \sum_i \{S^{(i)}_{1}(H)/S^{(i)}_{2}(H) \}$| and |$A(S_1/S_0) = N^{-1} \sum_i \{S^{(i)}_{1}(H)/S^{(i)}_{0}(H) \}$| for |$H=10$|⁠. Table 1. Performance of the regularized Durbin–Levinson and McMurry & Politis estimators of randomly generated autoregressive moving average covariance matrices and comparison of the forecasting performance of the corresponding predictors; all values are multiplied by |$100$| Scenario 1: |$a = b = 1{\cdot}2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 50.0 48.5 49.3 51.5 53.0 53.1 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 62.4 62.5 64.4 65.3 68.8 69.1 |$A(\rho_1/\rho_0)$| 80.1 84.1 86.7 89.5 90.2 91.3 |$A(\rho_1/\rho_0)$| 77.1 80.9 83.8 86.0 86.8 87.4 |$A(S_1/S_2)$| 95.7 91.0 83.9 78.2 76.3 73.5 |$A(S_1/S_0)$| 115.6 110.8 107.8 106.3 105.6 104.4 Scenario 2: |$a = 2.2$|⁠, |$b = 1.2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 51.3 52.3 55.2 56.0 57.2 59.0 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 64.2 66.0 68.9 69.7 69.9 70.0 |$A(\rho_1/\rho_0)$| 74.4 79.9 81.7 82.8 84.4 85.6 |$A(\rho_1/\rho_0)$| 70.2 73.3 74.9 75.3 76.4 77.2 |$A(S_1/S_2)$| 94.4 90.1 78.9 72.0 68.1 66.2 |$A(S_1/S_0)$| 122.2 120.0 113.9 111.2 110.1 110.2 Scenario 1: |$a = b = 1{\cdot}2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 50.0 48.5 49.3 51.5 53.0 53.1 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 62.4 62.5 64.4 65.3 68.8 69.1 |$A(\rho_1/\rho_0)$| 80.1 84.1 86.7 89.5 90.2 91.3 |$A(\rho_1/\rho_0)$| 77.1 80.9 83.8 86.0 86.8 87.4 |$A(S_1/S_2)$| 95.7 91.0 83.9 78.2 76.3 73.5 |$A(S_1/S_0)$| 115.6 110.8 107.8 106.3 105.6 104.4 Scenario 2: |$a = 2.2$|⁠, |$b = 1.2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 51.3 52.3 55.2 56.0 57.2 59.0 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 64.2 66.0 68.9 69.7 69.9 70.0 |$A(\rho_1/\rho_0)$| 74.4 79.9 81.7 82.8 84.4 85.6 |$A(\rho_1/\rho_0)$| 70.2 73.3 74.9 75.3 76.4 77.2 |$A(S_1/S_2)$| 94.4 90.1 78.9 72.0 68.1 66.2 |$A(S_1/S_0)$| 122.2 120.0 113.9 111.2 110.1 110.2 View Large Table 1. Performance of the regularized Durbin–Levinson and McMurry & Politis estimators of randomly generated autoregressive moving average covariance matrices and comparison of the forecasting performance of the corresponding predictors; all values are multiplied by |$100$| Scenario 1: |$a = b = 1{\cdot}2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 50.0 48.5 49.3 51.5 53.0 53.1 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 62.4 62.5 64.4 65.3 68.8 69.1 |$A(\rho_1/\rho_0)$| 80.1 84.1 86.7 89.5 90.2 91.3 |$A(\rho_1/\rho_0)$| 77.1 80.9 83.8 86.0 86.8 87.4 |$A(S_1/S_2)$| 95.7 91.0 83.9 78.2 76.3 73.5 |$A(S_1/S_0)$| 115.6 110.8 107.8 106.3 105.6 104.4 Scenario 2: |$a = 2.2$|⁠, |$b = 1.2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 51.3 52.3 55.2 56.0 57.2 59.0 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 64.2 66.0 68.9 69.7 69.9 70.0 |$A(\rho_1/\rho_0)$| 74.4 79.9 81.7 82.8 84.4 85.6 |$A(\rho_1/\rho_0)$| 70.2 73.3 74.9 75.3 76.4 77.2 |$A(S_1/S_2)$| 94.4 90.1 78.9 72.0 68.1 66.2 |$A(S_1/S_0)$| 122.2 120.0 113.9 111.2 110.1 110.2 Scenario 1: |$a = b = 1{\cdot}2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 50.0 48.5 49.3 51.5 53.0 53.1 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 62.4 62.5 64.4 65.3 68.8 69.1 |$A(\rho_1/\rho_0)$| 80.1 84.1 86.7 89.5 90.2 91.3 |$A(\rho_1/\rho_0)$| 77.1 80.9 83.8 86.0 86.8 87.4 |$A(S_1/S_2)$| 95.7 91.0 83.9 78.2 76.3 73.5 |$A(S_1/S_0)$| 115.6 110.8 107.8 106.3 105.6 104.4 Scenario 2: |$a = 2.2$|⁠, |$b = 1.2$| |$n= 50$| |$n= 100$| |$n= 250$| |$n= 500$| |$n= 750$| |$n=1000$| |$\bar{I}(\rho_1<\rho_2)$| 51.3 52.3 55.2 56.0 57.2 59.0 |$\bar{I}(\rho_1<\rho_2\mid m>q)$| 64.2 66.0 68.9 69.7 69.9 70.0 |$A(\rho_1/\rho_0)$| 74.4 79.9 81.7 82.8 84.4 85.6 |$A(\rho_1/\rho_0)$| 70.2 73.3 74.9 75.3 76.4 77.2 |$A(S_1/S_2)$| 94.4 90.1 78.9 72.0 68.1 66.2 |$A(S_1/S_0)$| 122.2 120.0 113.9 111.2 110.1 110.2 View Large Table 1 shows that the comparative performance of the regularized Durbin–Levinson estimator improves with the sample size and tends to be superior to that characterizing |$\hat{\Gamma}_{n, \textrm{MP}}$| when the autoregressive order is greater than the moving average order. Comparison of the two cases further reveals that the performance also depends on the size of the partial and inverse partial autocorrelations, given the orders |$(m_n^ {(i)}, q_n^{(i)})$|⁠, with the regularized Durbin–Levinson estimator showing greater improvements in performance than |$\hat{\Gamma}_{n, \textrm{MP}}$| in the second scenario. Further simulation results, presented in the Supplementary Material, confirm that the strategy of banding and tapering the partial autocorrelation function is more effective than banding and tapering the autocovariance function when the serial dependence structure of the true generating process is characterized by high dynamic range. The average ratios |$A(S_1/S_2)$| show that the regularized Durbin–Levinson predictor outperforms that of McMurry and Politis, achieving a reduction in the root mean square forecast error that increases with |$n$| in size; already for |$n>250$| the percentage reduction is around 25% and 33%, respectively, in the two scenarios. At the same time, the index |$A(S_1/S_0)$| shows that the accuracy loss with respect to the oracle predictor is not substantial, with an average of 5% in the first scenario and 10% in the second for |$n>250$|⁠. 5.2. Forecasting El Niño |$3.4$| sea surface temperature Our illustrative example deals with the ability to forecast sea surface temperatures in the Pacific region called Niño |$3.4$|⁠. This time series has been analysed by Bickel & Gel (2011) and provides an interesting case study, since its cyclical behaviour is a manifestation of the El Niño phenomenon, which determines an increase in sea surface temperatures in the eastern and central Pacific regions. The monthly time series for the period from January 1950 to December 2016, made available by the United States National Oceanic and Atmospheric Administration, is plotted in Fig. 1(a). Predictions of sea surface temperatures are an important input for global circulation models and are relevant to the prediction of El Niño-Southern Oscillation events, which are the dominant forcing factors for inter-annual climate variability. Fig. 1. View largeDownload slide El Niño |$3.4$| Sea Surface Temperature Series, 1950.1–2016.12: (a) original time series; (b) plots of the index of predictability |$P(h)$| versus |$h$| for the regularized Durbin–Levinson predictor (solid), the McMurry and Politis predictor (dashed), and the autoregressive moving average predictor (dotted); (c) raw (dots) and regularized (solid line) sample partial autocorrelations; (d) raw (dots) and regularized (solid line) sample spectra. Fig. 1. View largeDownload slide El Niño |$3.4$| Sea Surface Temperature Series, 1950.1–2016.12: (a) original time series; (b) plots of the index of predictability |$P(h)$| versus |$h$| for the regularized Durbin–Levinson predictor (solid), the McMurry and Politis predictor (dashed), and the autoregressive moving average predictor (dotted); (c) raw (dots) and regularized (solid line) sample partial autocorrelations; (d) raw (dots) and regularized (solid line) sample spectra. Evaluation of the forecasting ability of the regularized Durbin–Levinson estimator, compared with the McMurry and Politis estimator and traditional autoregressive moving average predictors, is carried out by a rolling out-of-sample forecasting exercise. Starting from January 1970, we compute the |$h$|-step-ahead predictors |$(h = 1, \ldots, 48)$| for values of the banding parameter |$\ell = 1, \ldots, 48,$| using a training sample of size |$n_0 = 288$|⁠, corresponding to 24 consecutive years; we proceed by adding one future observation and removing the initial one until the end of the sample is reached, so that each forecast is based on a fixed number of observations, and then we re-estimate the coefficients of the linear predictors for each rolling window. The experiment yields |$H = 564-h+1$| prediction errors for each forecasting method. The predictive performance depends on the forecast horizon |$h$| and the banding parameter |$\ell$|⁠. As a benchmark, we consider the |$h$|-step-ahead predictions that arise from fitting autoregressive moving average models of orders |$m\leq 13$| and |$q \leq 2$|⁠. Denoting by |$\tilde{y}_{t\mid t-h}$| a generic |$h$|-step-ahead predictor of |$y_{t}$| constructed using the training data |$ \{ y_{t-h}, y_{t-h-1}, \ldots, y_{t-h-n_0+1}\}$|⁠, i.e., the time series of |$n_0$| consecutive past observations available at time |$t-h$|⁠, for each forecast lead |$h$| and banding parameter |$\ell$| we compute the mean square forecast error |$S^2(h,\ell) = H^{-1} \sum_{t=n_0+h}^n (y_t - \tilde{y}_{t|t-h})^2$| and the measure of predictability $$P(h,\ell) = \max\left\{0, \: 1- \frac{S^2(h,\ell)}{\hat{\gamma}(0)}\right\} \quad (h,\ell= 1, \ldots, 48)\text{.}$$ For instance, in the case of the regularized Durbin–Levinson one-step-ahead predictor with banding parameter |$\ell$|⁠, the coefficients |$\tilde{\pi}_{n_0 j}$||$(\,j = 1, \ldots, n_0)$| are the output of the regularized Durbin–Levinson algorithm applied to the sample partial autocorrelations of the training data. Figure 2 is a contour plot of |$P(h, \ell)$| for the regularized Durbin–Levinson predictor, and Fig. 1(b) compares its maximum across |$\ell$|⁠, |$P(h) = \max_{\ell=1,\ldots,48} P(h,\ell)$| (⁠|$h= 1, \ldots, 48$|⁠), with those characterizing the McMurry–Politis and autoregressive moving average predictors. Fig. 2. View largeDownload slide El Niño |$3.4$| sea surface temperature series: contour plot of predictability of the regularized Durbin–Levinson predictor at forecast horizon |$h$| using banding parameter |$l$|⁠, |$P(h,\ell) = \max\{0, 1- S^2(h,\ell)/\hat{\gamma}(0)\}$|⁠, for |$h=1,\ldots, 48$| and |$\ell =1,\ldots, 48$|⁠. Fig. 2. View largeDownload slide El Niño |$3.4$| sea surface temperature series: contour plot of predictability of the regularized Durbin–Levinson predictor at forecast horizon |$h$| using banding parameter |$l$|⁠, |$P(h,\ell) = \max\{0, 1- S^2(h,\ell)/\hat{\gamma}(0)\}$|⁠, for |$h=1,\ldots, 48$| and |$\ell =1,\ldots, 48$|⁠. The value of |$\ell$| that is optimal for different horizons, in the sense that it minimizes the mean square forecast error, grows rapidly from 10 to values in the range of |$(28,33)$|⁠. The empirical rule based on (12) selects a value around 10 with some variability across the rolling samples; as can be seen from Fig. 2, choosing a small |$\ell$| is detrimental to predictability at forecast lead times greater than 6. For the McMurry and Politis predictor, the optimal banding parameter ranges from 42 to 48. For the autoregressive moving average models, the |$P(h)$| measure is defined as the maximum predictability across the autoregressive and moving average orders; the selected specifications in the majority of cases are autoregressive of order 12, possibly with a first-order moving average component. The main evidence is that the predictability of sea surface temperatures is high at inter-annual horizons up to lead time 6, after which it decreases quite rapidly. However, there is some predictability over longer horizons. Secondly, the regularized Durbin–Levinson predictor outperforms both competitors. The superior performance with respect to the McMurry and Politis predictor is attributable to the fact that sea surface temperatures are characterized by the presence of strong cyclical features, which are captured more effectively by regularizing the partial autocorrelations rather than the sample autocovariances. Fig. 1(c) displays the raw and regularized sample partial autocorrelations for |$\ell = 28$|⁠; the estimated log-spectral density is plotted along the log-periodogram of the series in Fig. 1(d). The two sharp peaks that occur at the annual and semi-annual frequencies, |$\pi/6$| and |$\pi/3$|⁠, are due to the seasonal cycle. The peak close to the origin is a reflection of the alternation of El Niño and La Niña episodes with a periodicity of four years. Our regularized Durbin–Levinson predictor seems to capture the main stylized aspects of the series quite well. Acknowledgement The authors are grateful to the referees, associate editor and editor for their valuable comments. Tommaso Proietti acknowledges support from the Center for Research in Econometric Analysis of Time Series, funded by the Danish National Research Foundation. Supplementary material Supplementary material available at Biometrika online includes detailed proofs and extended simulation results. References Barndorff-Nielsen O. & Schou G. ( 1973 ). On the parametrization of autoregressive models by partial autocorrelations . J. Mult. Anal. 3 , 408 – 19 . Google Scholar Crossref Search ADS Berk K. N. ( 1974 ). Consistent autoregressive spectral estimates . Ann. Statist . 2 , 489 – 502 . Google Scholar Crossref Search ADS Bhansali R. ( 1978 ). Linear prediction by autoregressive model fitting in the time domain. Ann. Statist. 6 , 224 – 31 . Google Scholar Crossref Search ADS Bickel P. J. & Gel Y. R. ( 2011 ). Banded regularization of autocovariance matrices in application to parameter estimation and forecasting of time series . J. R. Statist. Soc . B 73 , 711 – 28 . Google Scholar Crossref Search ADS Bingham N. ( 2012 ). Szegö’s theorem and its probabilistic descendants . Prob. Surv. 9 , 287 – 324 . Google Scholar Crossref Search ADS Brillinger D. R. ( 1981 ). Time Series: Data Analysis and Theory . San Francisco: Holden-Day . Brockwell P. J. & Davis R. A. ( 1991 ). Time Series: Theory and Methods . New York : Springer . Cai T. T. , Ren Z. & Zhou H. H. ( 2013 ). Optimal rates of convergence for estimating Toeplitz covariance matrices . Prob. Theory Rel. Fields 156 , 101 – 43 . Google Scholar Crossref Search ADS Daniels M. J. & Pourahmadi M. ( 2009 ). Modeling covariance matrices via partial autocorrelations . J. Mult. Anal. 100 , 2352 – 63 . Google Scholar Crossref Search ADS Durbin J. ( 1960 ). The fitting of time-series models . Rev. Inst. Int. Statist. 28 , 233 – 44 . Google Scholar Crossref Search ADS Golub G. H. & Van Loan C. F. ( 2012 ). Matrix Computations . Baltimore : The John Hopkins University Press , 3 rd ed. Goncalves S. & Kilian L. ( 2007 ). Asymptotic and bootstrap inference for ar(⁠|$\infty$|⁠) processes with conditional heteroskedasticity . Econ. Rev. 26 , 609 – 41 . Google Scholar Crossref Search ADS Gupta S. D. , Mazumdar R. R. & Glynn P. ( 2013 ). On the convergence of the spectrum of finite order approximations of stationary time series . J. Mult. Anal. 121 , 1 – 21 . Google Scholar Crossref Search ADS Levinson N. ( 1946 ). The Wiener (root mean square) error criterion in filter design and prediction . Stud. Appl. Math. 25 , 261 – 78 . Lewis R. & Reinsel G. C. ( 1985 ). Prediction of multivariate time series by autoregressive model fitting . J. Mult. Anal. 16 , 393 – 411 . Google Scholar Crossref Search ADS Li T.-H. ( 2013 ). Time Series with Mixed Spectra . Boca Raton, Florida : CRC Press . Lütkepohl H. & Saikkonen P. ( 1997 ). Impulse response analysis in infinite order cointegrated vector autoregressive processes . J. Economet. 81 , 127 – 57 . Google Scholar Crossref Search ADS McMurry T. L. & Politis D. N. ( 2010 ). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap . J. Time Ser. Anal. 31 , 471 – 82 . Google Scholar Crossref Search ADS McMurry T. L. & Politis D. N. ( 2015 ). High-dimensional autocovariance matrices and optimal linear prediction . Electron. J. Statist. 9 , 753 – 88 . Google Scholar Crossref Search ADS Monahan J. F. ( 1984 ). A note on enforcing stationarity in autoregressive-moving average models . Biometrika 71 , 403 – 4 . Google Scholar Crossref Search ADS Ng C. T. & Joe H. ( 2010 ). Generating random ar(⁠|$p$|⁠) and ma(⁠|$q$|⁠) Toeplitz correlation matrices . J. Mult. Anal. 101 , 1532 – 45 . Google Scholar Crossref Search ADS Politis D. N. ( 2003 ). Adaptive bandwidth choice . J. Nonparam. Statist. 15 , 517 – 33 . Google Scholar Crossref Search ADS Pourahmadi M. ( 2001 ). Foundations of Time Series Analysis and Prediction Theory . New York : Wiley . Quinn B. G. & Hannan E. J. ( 2001 ). The Estimation and Tracking of Frequency . Cambridge : Cambridge University Press . Saikkonen P. & Lütkepohl H. ( 1996 ). Infinite-order cointegrated vector autoregressive processes . Economet. Theory 12 , 814 – 44 . Google Scholar Crossref Search ADS Wu W. B. ( 2005 ). Nonlinear system theory: Another look at dependence . Proc. Nat. Acad. Sci. 102 , 14150 – 4 . Google Scholar Crossref Search ADS Wu W. B. & Pourahmadi M. ( 2009 ). Banding sample autocovariance matrices of stationary processes . Statist. Sinica 19 , 1755 – 68 . © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - A Durbin–Levinson regularized estimator of high-dimensional autocovariance matrices JF - Biometrika DO - 10.1093/biomet/asy042 DA - 2018-12-01 UR - https://www.deepdyve.com/lp/oxford-university-press/a-durbin-levinson-regularized-estimator-of-high-dimensional-AVYPcozCRj SP - 783 VL - 105 IS - 4 DP - DeepDyve ER -