# A robust goodness-of-fit test for generalized autoregressive conditional heteroscedastic models

A robust goodness-of-fit test for generalized autoregressive conditional heteroscedastic models Summary The estimation of time series models with heavy-tailed innovations has been widely discussed, but corresponding goodness-of-fit tests have attracted less attention, primarily because the autocorrelation function commonly used in constructing goodness-of-fit tests necessarily imposes certain moment conditions on the innovations. As a bounded random variable has finite moments of all orders, we address the problem by first transforming the residuals with a bounded function. More specifically, we consider the sample autocorrelation function of the transformed absolute residuals of a fitted generalized autoregressive conditional heteroscedastic model. With the corresponding residual empirical distribution function naturally employed as the transformation, a robust goodness-of-fit test is then constructed. The asymptotic distributions of the test statistic under the null hypothesis and local alternatives are derived, and Monte Carlo experiments are conducted to examine finite-sample properties. The proposed test is shown to be more powerful than existing tests when the innovations are heavy-tailed. 1. Introduction The heavy-tail phenomenon has attracted considerable attention in time series analysis, and great efforts have been made in model fitting and parameter estimation; see, for example, Davis & Resnick (1986) and Ling (2005). The generalized autoregressive conditional heteroscedastic model (Engle, 1982; Bollerslev, 1986) is well known for its success in capturing time-dependent conditional variances or scales, which are often observed in financial data; see Zivot (2009) and Guo et al. (2017). Although a stationary generalized autoregressive conditional heteroscedastic process with Gaussian innovations can be heavy-tailed (He & Teräsvirta, 1999; Basrak et al., 2002), numerous empirical studies have shown that the residuals $$\{\hat{\varepsilon}_t\}$$ of fitted generalized autoregressive conditional heteroscedastic models of financial returns appear to have high or even nonexistent kurtosis; see, for instance, Mikosch & Stărică (2000), Mittnik & Paolella (2003) and § 6 of this paper. Various robust estimators that allow $$E(\varepsilon_t^4)=\infty$$ yet still achieve $$\surd{n}$$-consistency have been introduced. For example, the least absolute deviations estimator in Peng & Yao (2003) and the Pearsonian quasi maximum likelihood estimator in Zhu & Li (2015) require only a finite fractional moment of $$\varepsilon_t$$, i.e., $$E(|\varepsilon_t|^{2\gamma})<\infty$$ for some $$\gamma>0$$, and the Laplacian quasi maximum likelihood estimator in Berkes & Horváth (2004) requires $$E(\varepsilon_t^2)<\infty$$. In contrast to the many studies of robust parameter estimation, research on corresponding goodness-of-fit tests, despite its importance, is still quite limited, primarily because the autocorrelation function commonly used in constructing the test imposes certain moment conditions on the innovations. As a bounded random variable has finite moments of all orders, we can remove such conditions through a bounded transformation. Although the distribution function of the innovations is a natural transformation, it is unknown in practice, so an alternative is to employ the empirical distribution function of the residuals. For conditional heteroscedastic models, diagnostic tools constructed from the sample autocorrelation functions of squared residuals (Li & Mak, 1994) and absolute residuals (Li & Li, 2005) are particularly popular. However, the former require $$E(\varepsilon_t^4)<\infty$$ (Li, 2004) and the latter $$E(\varepsilon_t^2)<\infty$$ (Li & Li, 2005). Even worse, the convergence rates of these residual sample autocorrelation functions can become extremely slow under generalized autoregressive conditional heteroscedastic alternatives if $$E(\varepsilon_t^4)=\infty$$ (Davis & Mikosch, 1998; Basrak et al., 2002), possibly undermining the power of the corresponding test. To address these problems, in this paper we construct a robust goodness-of-fit test based on the sample autocorrelation function of the transformed absolute residuals, where the transformation is the residual empirical distribution function. This test is shown to be asymptotically equivalent to the test where the transformation is the true distribution function of $$|\varepsilon_t|$$. We also derive the asymptotic power of the test based on transformed absolute residuals with any known function, which includes as special cases those existing methods based on squared and absolute residual autocorrelations (Li & Mak, 1994; Li & Li, 2005). Doing so makes it possible to theoretically compare the commonly used goodness-of-fit tests in the literature. Our asymptotic analysis is crucially reliant on Lemmas A1 and A2 in the Appendix, which provide useful results for weighted residual empirical processes of generalized autoregressive conditional heteroscedastic models and hence are of independent interest. 2. Goodness-of-fit test based on transformed absolute residuals 2.1. Goodness-of-fit test based on residual empirical processes Our null hypothesis is that the observed time series $$\{\,y_1, \ldots, y_n\}$$ is generated by the following model:   $$\label{garch} H_0:\quad y_t=\varepsilon_t h_t^{1/2}, \quad h_t=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i}^2+\sum_{j=1}^q\beta_{0j} h_{t-j},$$ (1) where $$\{\varepsilon_t\}$$ is a sequence of innovations. Denote by $${\theta}=(\omega, \alpha_1, \ldots, \alpha_p, \beta_1, \ldots, \beta_q)^{\mathrm{\scriptscriptstyle T} } \in\Theta$$ the parameter vector of model (1), where the parameter space $${\Theta} \subset \mathbb{R}_+^{p+q+1}$$, with $$\mathbb{R}_+=(0, \infty)$$, is a compact set and the true parameter vector $${\theta}_0=(\omega_0, \alpha_{01}, \ldots, \alpha_{0p}, \beta_{01}, \ldots, \beta_{0q})^{\mathrm{\scriptscriptstyle T} }$$ is an interior point of $$\Theta$$. We call model (1) the garch$$(p,q)$$ model. Assumption 1. Model (1) satisfies the following conditions: (i) the innovations $$\{\varepsilon_t\}$$ are independent and identically distributed with $$\varepsilon^2_t$$ following a nondegenerate distribution and $$E(|\varepsilon_t|^{2\gamma})<\infty$$ for some $$\gamma>0$$; (ii) $$\{\,y_t\}$$ is a strictly stationary and ergodic process; (iii) $$\sum_{j=1}^{q}\beta_j<1$$ for all $$\theta\in\Theta$$; and (iv) the polynomials $$\sum_{j=1}^{p}\alpha_{0j} z^j$$ and $$1-\sum_{j=1}^{q}\beta_{0j} z^j$$ have no common root. A necessary and sufficient condition for Assumption 1(ii) to hold is given in Bougerol & Picard (1992), and Assumption 1(iv) is for the identifiability of model (1) (Berkes et al., 2003; Francq & Zakoïan, 2004). We further restrict the innovations $$\{\varepsilon_t\}$$ of model (1) so that the estimator converges to $$\theta_0$$ as $$n\rightarrow\infty$$; see Francq & Zakoïan (2010, pp. 231–5). For example, we assume $$E(\varepsilon_t)=0$$ and $$\mathrm{var}(\varepsilon_t)=1$$ for the Gaussian quasi maximum likelihood estimator (Hall & Yao, 2003), $$\mathrm{median}(|\varepsilon_t|)=1$$ for the least absolute deviations estimator (Peng & Yao, 2003; Chen & Zhu, 2015), and $$E(\varepsilon_t)=0$$ and $$E(|\varepsilon_t|)=1$$ for the Laplacian quasi maximum likelihood estimator (Berkes & Horváth, 2004). Define the functions   $$\label{meq1} \varepsilon_t({\theta})=y_t\big/h_t^{1/2}({\theta}), \quad h_t({\theta})=\omega+\sum_{i=1}^p\alpha_iy_{t-i}^2+\sum_{j=1}^q\beta_j h_{t-j}({\theta})\text{.}$$ (2) Then $$h_t(\theta_0)=h_t$$ and $$\varepsilon_t(\theta_0)=\varepsilon_t$$. Because the recursive equation in (2) depends on past observations that are infinitely far away, in practice initial values are needed for $$\{\,y_0^2,\ldots, y_{1-p}^2, h_0, \ldots, h_{1-q}\}$$. For simplicity, we set them to zero and denote the corresponding functions by $$\tilde{\varepsilon}_t({\theta})$$ and $$\tilde{h}_t({\theta})$$; fixing these initial values does not affect our asymptotic results. Let $$\hat{{\theta}}_n=(\hat{\omega}, \hat{\alpha}_1, \ldots, \hat{\alpha}_p, \hat{\beta}_1, \ldots, \hat{\beta}_q)^{\mathrm{\scriptscriptstyle T} }$$ be an estimator for model (1). The residuals of the fitted model are $$\hat{\varepsilon}_t = \tilde{\varepsilon}_t(\hat{{\theta}}_n)=y_t/\hat{h}_t^{1/2}$$, where $$\hat{h}_t= \tilde{h}_t(\hat{{\theta}}_n)$$. In the literature, the sample autocorrelation function of absolute or squared residuals is commonly used to check the adequacy of fitted conditional heteroscedastic models, whereas that of the residuals usually has very low power (Li & Li, 2008). Hence, we focus on the absolute residuals $$|\hat{\varepsilon}_t|$$. We first transform them with the residual empirical distribution function,   $$\label{Gnhat} \hat{G}_n(x)=\frac{1}{n}\sum_{t=1}^{n}I(|\hat{\varepsilon}_t| \leqslant x) \quad (0\leqslant x < \infty),$$ (3) and obtain $$\hat{G}_n(|\hat{\varepsilon_t}|)$$. Let $$G(\cdot)$$ be the distribution function of $$|\varepsilon_t|$$, so $$E\{G(|\varepsilon_t|)\}=0{\cdot}5$$. The sample autocorrelation function of $$\{\hat{G}_n(|\hat{\varepsilon_t}|)\}$$ at lag $$k$$ can be defined as $$\hat{\rho}_k = \hat{\gamma}_k / \hat{\gamma}_0$$, where the sample autocovariance function is   $$\label{gammakhat} \hat{\gamma}_k = \frac{1}{n} \sum_{t=k+1}^{n} \bigl\{\hat{G}_n(|\hat{\varepsilon}_t|)-0{\cdot}5\bigr \}\bigl\{\hat{G}_n(|\hat{\varepsilon}_{t-k}|) -0{\cdot}5\bigr\} \quad (k\geqslant 0)\text{.}$$ (4) Note that $$\hat{\gamma}_k$$ would take the same value if the squared residuals $$\hat{\varepsilon}_t^2$$ were used in (3) and (4). Andreou & Werker (2015) considered the $$f$$-rank autocorrelation coefficients (Hallin & Puri, 1994) of the residuals and squared residuals of autoregressive models with generalized autoregressive conditional heteroscedastic errors, which are fitted by the Gaussian quasi maximum likelihood method. The $$f$$-rank autocorrelation coefficients in Andreou & Werker (2015) have a symmetric form only when the reference distribution is Gaussian. The proposed $$\hat{\rho}_k$$ has a symmetric and simple form, which can be interpreted as the Spearman rank correlation coefficient (Wald & Wolfowitz, 1943; Bartels, 1982; Dufour & Roy, 1985; Hallin et al., 1985). Andreou & Werker (2015) used the local asymptotic normality approach (Le Cam & Yang, 1990; van der Vaart, 1998; Andreou & Werker, 2012) to derive the limiting distributions of residual-based statistics. To apply the method of Andreou & Werker (2015), we would have to assume that the residuals are based on the true values of $$\{\,y_0, y_{-1}, \ldots\}$$, which are unobservable in practice. This problem is circumvented by our asymptotic approach. For a predetermined positive integer $$M$$, we first derive the asymptotic null distribution of $$\hat{\rho} =(\hat{\rho}_1, \ldots, \hat{\rho}_M)^{\mathrm{\scriptscriptstyle T} }$$. Let $$\mathcal{F}_t$$ be the $$\sigma$$-field generated by $$\{\varepsilon_t, \varepsilon_{t-1}, \ldots\}$$, and let $$g(\cdot)$$ be the density function of $$|\varepsilon_t|$$. Assumption 2. Under $$H_0$$, the estimator $$\hat{{\theta}}_n$$ admits the representation   \begin{equation*} n^{1/2} (\hat{{\theta}}_n-{\theta}_0 )=n^{-1/2}\sum_{t=1}^{n}\xi_t + o_{\rm p}(1), \end{equation*} where $$\{\xi_{t},\mathcal{F}_t\}$$ is a strictly stationary and ergodic martingale difference sequence with $$\Gamma=\mathrm{var}(\xi_t)<\infty$$. Assumption 3. The density $$g$$ satisfies the following conditions: (i) $$\lim_{x\rightarrow0}\,xg(x)=0$$; (ii) $$\lim_{x\rightarrow\infty}\,xg(x)=0$$; and (iii) $$g$$ is continuous on $$(0, \infty)$$. Let $$\kappa = E\{|\varepsilon_t| g(|\varepsilon_t|)\}$$ and   $\Sigma= {I}_M + 144\{ 0{\cdot}25\kappa^2{D}{\Gamma}{D}^{\mathrm{\scriptscriptstyle T} }+ 0{\cdot}5\kappa ({D}{Q}^{\mathrm{\scriptscriptstyle T} } +{Q}{D}^{\mathrm{\scriptscriptstyle T} })\},$ where $$I_M$$ is the $$M\times M$$ identity matrix, $$D=(d_{1}, \ldots, d_{M})^{\mathrm{\scriptscriptstyle T} }$$ and $$Q=(q_{1}, \ldots, q_{M})^{\mathrm{\scriptscriptstyle T} }$$, with   $d_{k}=E\!\left\{\frac{0{\cdot}5-G(|\varepsilon_{t-k}|)}{h_t} \frac{\partial h_t(\theta_0)}{\partial\theta}\right\}\!,\quad q_k=E\bigl[\{G(|\varepsilon_t|)-0{\cdot}5\} \{G(|\varepsilon_{t-k}|)-0{\cdot}5\}\xi_t\bigr]\text{.}$ Theorem 1. Suppose that $$H_0$$ and Assumptions 1–3 hold. If $${\Sigma}$$ is positive definite, then $$n^{1/2} \hat{{\rho}} \rightarrow N({0}, {\Sigma})$$ in distribution as $$n \rightarrow \infty$$.  Because $$g(x)=f(x)+f(-x)$$ for $$0\leqslant x<\infty$$, where $$f(\cdot)$$ is the density function of $$\varepsilon_t$$, we can estimate $$\kappa$$ by $$\hat{\kappa} = n^{-1}\sum_{t=1}^{n} |\hat{\varepsilon}_{t}|\{\hat{f}_n(|\hat{\varepsilon}_{t}|) + \hat{f}_n(-|\hat{\varepsilon}_{t}|)\}$$, where $$\hat{f}_n(\cdot)$$ is the kernel density estimator of $$f(\cdot)$$. Let $$\xi_t=\xi_t({\theta}_0)$$, i.e., the function $$\xi_t({\theta})$$ evaluated at $${\theta}_0$$. Let $${\tilde{\xi}}_t(\theta)$$ be obtained by replacing $$\{\,y_0^2,\ldots, y_{1-p}^2, h_0, \ldots, h_{1-q}\}$$ with their initial values in $$\xi_t({\theta})$$, and write $${\hat{\xi}}_t = {\tilde{\xi}}_t(\hat{\theta}_n)$$. We can estimate $${\Gamma}$$, $${D}$$ and $${Q}$$ by $${\hat{\Gamma}} = n^{-1}\sum_{t=1}^{n} {\hat{\xi}}_t {\hat{\xi}}_t^{\mathrm{\scriptscriptstyle T} }$$, $${\hat{D}}=(\hat{d}_1, \ldots, \hat{d}_M)^{\mathrm{\scriptscriptstyle T} }$$ and $${\hat{Q}}=({\hat{q}}_1, \ldots,{\hat{q}}_M)^{\mathrm{\scriptscriptstyle T} },$$ respectively, where $$\hat{d}_k = n^{-1} \sum_{t=k+1}^{n} \hat{h}_t^{-1} \{0{\cdot}5-\hat{G}_n(|\hat{\varepsilon}_{t-k}|)\} {\partial \tilde{h}_t(\hat{\theta}_n)}/{\partial\theta}$$ and $$\hat{q}_k=n^{-1} \sum_{t=k+1}^{n} \{\hat{G}_n(|\hat{\varepsilon}_{t}|)-0{\cdot}5 \} \{\hat{G}_n(|\hat{\varepsilon}_{t-k}|)-0{\cdot}5 \}{\hat{\xi}}_t\text{.}$$ Under the conditions of Theorem 1, it can be shown that $$\hat{\kappa}=\kappa+o_{\rm p}(1)$$, $${\hat{\Gamma}}={\Gamma}+o_{\rm p}(1)$$, $${\hat{D}}={D}+o_{\rm p}(1)$$ and $${\hat{Q}}={Q}+o_{\rm p}(1)$$. Thus, a consistent estimator $$\hat{{\Sigma}}$$ of $${\Sigma}$$ can be obtained, leading us to construct the test statistic   $Q(M) = n\hat{{\rho}}^{\mathrm{\scriptscriptstyle T} } \hat{\Sigma}^{-1} \hat{{\rho}},$ which under $$H_0$$ is asymptotically distributed as $$\chi^2_M$$, the chi-squared distribution with $$M$$ degrees of freedom. One could also employ $$n^{1/2}\hat{\rho}_k/\hat{\Sigma}_{kk}^{1/2}$$ to examine the significance of the residual autocorrelation at lag $$k$$ individually, where $$\hat{\Sigma}_{kk}$$ is the $$k$$th diagonal element of $$\hat{\Sigma}$$. 2.2. Goodness-of-fit test based on predetermined transformations We can also consider the transformation with any predetermined function $$\Psi(\cdot)$$. The sample autocorrelation function of $$\{\Psi(|\hat{\varepsilon}_t|)\}$$ at lag $$k$$ can be defined as $$\hat{\rho}_k^\Psi = \hat{\gamma}_k^\Psi/\hat{\gamma}_0^\Psi$$, where   $\hat{\gamma}_k^\Psi = \frac{1}{n} \sum_{t=k+1}^{n} \{\Psi(|\hat{\varepsilon}_t|)-\hat{\mu}_\Psi\}\{ \Psi(|\hat{\varepsilon}_{t-k}|)-\hat{\mu}_\Psi \} \quad (k\geqslant0),$ with $$\hat{\mu}_\Psi=n^{-1}\sum_{t=1}^{n}\Psi(|\hat{\varepsilon}_t|)$$, is the sample autocovariance function. Let $$\hat{\rho}_\Psi = (\hat{\rho}_1^\Psi, \ldots, \hat{\rho}_M^\Psi)^{\mathrm{\scriptscriptstyle T} }$$. Denote the first and second derivatives of $$\Psi$$ by $$\psi$$ and $$\dot{\psi}$$. Let $$\mu_\Psi=E\{\Psi(|\varepsilon_t|)\}$$, $$\sigma_\Psi^2=\mathrm{var}\{\Psi(|\varepsilon_t|)\}$$, $$\kappa_\Psi = E\{|\varepsilon_t| \psi(|\varepsilon_t|)\}$$ and   ${\Sigma}_\Psi= {I}_M + \sigma_\Psi^{-4}\bigl\{ 0{\cdot}25\kappa_\Psi^2D_\Psi{\Gamma}D_\Psi^{\mathrm{\scriptscriptstyle T} }+ 0{\cdot}5\kappa_\Psi (D_\Psi Q_\Psi^{\mathrm{\scriptscriptstyle T} } +Q_\Psi D_\Psi^{\mathrm{\scriptscriptstyle T} })\bigr\},$ where $$D_\Psi=(d_{1}^\Psi, \ldots, d_{M}^\Psi)^{\mathrm{\scriptscriptstyle T} }$$ and $$Q_\Psi=( q_{1}^\Psi, \ldots, q_{M}^\Psi)^{\mathrm{\scriptscriptstyle T} }$$, with   $d_{k}^\Psi=E\!\left\{\frac{\mu_\Psi-\Psi(|\varepsilon_{t-k}|)}{h_t} \frac{\partial h_t(\theta_0)}{\partial\theta}\right\}\!,\quad q_{k}^\Psi=E\big[\{\Psi(|\varepsilon_t|)-\mu_\Psi\}\{\Psi(|\varepsilon_{t-k}|)-\mu_\Psi \}\xi_t\big]\text{.}$ Assumption 4. There exists $$m>0$$ such that the function $$\Psi^*(x)=|\dot{\psi}(x)|x^2+|\psi(x)|x$$ satisfies $$\Psi^*(x)\leqslant Cx^m$$ for $$x>1$$ and $$\Psi^*(x)\leqslant C$$ for $$0\leqslant x\leqslant 1$$, where $$C>0$$ is a constant, and $$E(|\varepsilon_t|^m)<\infty$$ and $$E\{\Psi^2(|\varepsilon_t|)\}<\infty$$. Theorem 2. Suppose that $$H_0$$ and Assumptions 1, 2 and 4 hold. If $${\Sigma}_\Psi$$ is positive definite, then $$n^{1/2} \hat{\rho}_\Psi \rightarrow N({0}, {\Sigma}_\Psi)$$ in distribution as $$n \rightarrow \infty$$.  In a similar way, we can obtain a consistent estimator $$\hat{\Sigma}_\Psi$$ of the asymptotic covariance matrix $${\Sigma}_\Psi$$ using sample averages. Thus a goodness-of-fit test, $$Q_{\Psi}(M)= n\hat{{\rho}}_\Psi^{\mathrm{\scriptscriptstyle T} } \hat{\Sigma}_\Psi^{-1} \hat{{\rho}}_\Psi$$, can be constructed. The first interesting example is $$\Psi(x)=x^c$$ for some $$c>0$$, and Assumption 4 is implied by $$E(|\varepsilon_t|^{2c})<\infty$$. This includes existing tests based on absolute and squared residuals, which correspond to cases with $$c=1$$ and 2, respectively; see Li & Li (2005) and Li (2004). From the proof of Theorem 1, when $$\Psi$$ is bounded, Theorem 2 still holds if, instead of Assumption 4, the derivative $$\psi$$ satisfies the conditions on the density $$g$$ in Assumption 3. For Theorem 4 in § 3, the conditions can be similarly substituted. Motivated by the transformation $$\hat{G}_n$$ in the previous subsection, we can also consider $$\Psi=G$$, although $$G$$ is unknown in practice. Let $$G_n$$ denote the empirical distribution function of $$\{|\varepsilon_t|\}$$, defined as $$G_n(x)=n^{-1}\sum_{t=1}^{n}I(|\varepsilon_t| \leqslant x)$$ for $$0\leqslant x<\infty$$. From the proofs of Theorems 1 and 2, it can readily be verified that $$n^{1/2}\hat{\rho}_k$$, $$n^{1/2}\hat{\rho}^{G_n}_k$$ and $$n^{1/2}\hat{\rho}^{G}_k$$ are asymptotically equivalent. Proposition 1. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. Then $$n^{1/2}(\hat{{\gamma}}_k- \hat{\gamma}_k^G)= o_{\rm p}(1)$$ and $$n^{1/2}(\hat{{\gamma}}_k- \hat{\gamma}_k^{G_n})= o_{\rm p}(1)$$ for any positive integer $$k$$. Moreover, $$\hat{{\gamma}}_0$$, $$\hat{{\gamma}}_0^G$$ and $$\hat{{\gamma}}_0^{G_n}$$ all converge in probability to $$1/12$$ as $$n\rightarrow\infty$$.  To apply joint tests $$Q(M)$$ and $$Q_\Psi(M)$$, we can consider several specific values of order $$M$$ or select $$M$$ as   $$\label{bic} \tilde{M}=\mathop{\rm{arg\,max}}\limits_{d_{\mathrm{min}}\leqslant M\leqslant d_{\mathrm{max}}}\{Q(M)-M\log n\},\quad \tilde{M}_\Psi=\mathop{\rm{arg\,max}}\limits_{d_{\mathrm{min}}\leqslant M\leqslant d_{\mathrm{max}}}\{Q_\Psi(M)-M\log n\},$$ (5) where the integer $$M$$ is searched over a fixed range $$[d_{\mathrm{min}}, d_{\mathrm{max}}]$$ for $$d_{\mathrm{min}}\geqslant 1$$ and some large enough $$d_{\mathrm{max}}$$. As shown in § 5, the performance of the automatic tests is insensitive to the choice of $$d_\mathrm{max}$$. Corollary 1. (i) Under the conditions of Theorem 1, $$Q(\tilde{M})\rightarrow\chi_{d_{\mathrm{min}}}^2$$ in distribution as $$n\rightarrow\infty$$.  (ii) Under the conditions of Theorem 2, $$Q_\Psi(\tilde{M}_\Psi)\rightarrow\chi_{d_{\mathrm{min}}}^2$$ in distribution as $$n\rightarrow\infty$$.  In § 3 we demonstrate that under the local alternatives, $$n^{1/2}\hat{\rho}$$ is asymptotically normal with a possible shift in the mean, $$\varUpsilon=(\varUpsilon_1, \ldots, \varUpsilon_M)^{\mathrm{\scriptscriptstyle T} }$$; see Theorem 3. As a result, $$\lim_{n\rightarrow\infty}{\mathrm{pr}}(\tilde{M}=d_{\mathrm{min}})=1$$, which may be undesirable for particular local alternatives with $$\varUpsilon_1=\cdots=\varUpsilon_{d_{\mathrm{min}}}=0$$ and $$\varUpsilon_K\neq0$$ for some $$d_{\mathrm{min}} < K\leqslant d_{\mathrm{max}}$$, since in such cases $$Q(\tilde{M})$$ would have no power. The test $$Q_\Psi(\tilde{M}_\Psi)$$ would suffer from the same problem, which can be avoided by using a smaller penalty, such as the Akaike information criterion-type penalty $$2M$$, to ensure that the probability of choosing a value of $$M$$ larger than $$d_{\mathrm{min}}$$ is nonzero. However, as shown in § 5, doing so may lead to seriously inflated Type I error rates. In practice, the aforementioned problem can be remedied by choosing a proper $$d_\mathrm{min}$$. Suppose that the sample autocorrelation function $$\hat{\rho}_k$$ falls clearly outside the 95% confidence interval at certain lags. To guarantee that the joint test, $$Q(\tilde{M})$$, takes into account at least one of the lags, we need only choose $$d_{\mathrm{min}}$$ to be the smallest such lag by simply examining the plot of the residual autocorrelations, $$\hat{\rho}_k$$; if such a smallest lag does not exist, then we may set $$d_{\mathrm{min}}=1$$. 3. Asymptotic power under local alternatives To study the power of the proposed test, we consider the following local alternatives. For each $$n$$, the observed time series $$\{\,y_{1,n}, \ldots, y_{n,n}\}$$ is generated by   $$\label{garch_alt} H_{1n}:\quad y_{t,n}=\varepsilon_t h_{t,n}^{1/2}, \quad h_{t,n}=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i,n}^2+\sum_{j=1}^q\beta_{0j} h_{t-j,n}+n^{-1/2}s_{t,n},$$ (6) where the subscript $$n$$ is used to emphasize the dependence of $$y_{t,n}, h_{t,n}$$ and $$s_{t,n}$$ on $$n$$. For simplicity, we consider $$s_{t,n}=s(y_{t-1,n}^2, \ldots, y_{t-p^*,n}^2, h_{t-1,n}, \ldots,h_{t-q^*,n})$$ for some positive integers $$p^*> p$$ and $$q^*> q$$, where the function $$s$$ satisfies the following condition. Assumption 5. The function $$s$$ and all elements of its gradient $$\nabla s$$ are nonnegative everywhere. Assumption 6. There exists a positive integer $$n_0$$ such that for each $$n\geqslant n_0$$, $$\{\,y_{t,n}\}$$ and $$\{h_{t,n}\}$$ are strictly stationary and ergodic processes, and $$E(s_{t,n_0}^{\delta_0})<\infty$$ for some constant $$\delta_0>0$$ independent of $$n$$. The nonnegativity of $$s$$ guarantees that $$h_{t,n}\geqslant 0$$; see Nelson & Cao (1992) for a discussion of the relaxation of the nonnegativity constraints on the parameters of generalized autoregressive conditional heteroscedastic models. The condition $$\nabla s\geqslant 0$$ is used to simplify our technical proofs, and we can similarly derive asymptotic results for other cases of $$\nabla s$$. The finite fractional moment of $$s_{t,n_0}$$ in Assumption 6 ensures the same to hold for $$y_{t,n}$$ and $$h_{t,n}$$, which will be needed in the proofs. Similar to (2), we define the functions   \begin{equation*} \varepsilon_{t,n}({\theta})=y_{t,n}\big/h_{t,n}^{1/2}({\theta}) , \quad h_{t,n}({\theta})=\omega+\sum_{i=1}^p\alpha_iy_{t-i,n}^2+\sum_{j=1}^q\beta_j h_{t-j,n}({\theta})\text{.} \end{equation*} For simplicity, with the initial values set to be independent of $$n$$, we denote the resulting functions by $$\tilde{\varepsilon}_{t,n}({\theta})$$ and $$\tilde{h}_{t,n}({\theta})$$, respectively. Under $$H_{1n}$$, the residuals are calculated as $$\hat{\varepsilon}_{t} = \tilde{\varepsilon}_{t,n}(\hat{{\theta}}_n)=y_{t,n}/\hat{h}_{t,n}^{1/2}$$, where $$\hat{h}_{t,n}= \tilde{h}_{t,n}(\hat{{\theta}}_n)$$. While $$h_t=h_t(\theta_0)$$ in (2), we can show that the departure $$n^{-1/2}s_{t,n}$$ in (6) results in   \begin{equation*}\label{eq_diff} h_{t,n}-h_{t,n}(\theta_0)=n^{-1/2}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k,n}\geqslant 0, \end{equation*} where $$e_1=(1,0,\ldots,0)^{\mathrm{\scriptscriptstyle T} }$$ and   \begin{equation*} B_{0} = \begin{pmatrix} \beta_{01}\; & \;\cdots\; & \;\beta_{0q-1}\; & \;\beta_{0q}\\ & \;I_{q-1}\; &&\; 0\\ \end{pmatrix} \end{equation*} is a $$q\times q$$ matrix. Define the nonnegative $$\mathcal{F}_{t-1}$$-measurable random variables   $r_{t,n}=\frac{n^{1/2}\{h_{t,n}-h_{t,n}(\theta_0)\}}{h_{t,n}(\theta_0)}=\frac{1}{h_{t,n}(\theta_0)}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k,n}\text{.}$ Let $$s_t=s(y_{t-1}^2, \ldots, y_{t-p^*}^2, h_{t-1}, \ldots,h_{t-q^*})$$ and   $$\label{eq_rt} r_t=\frac{1}{h_t}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k}\text{.}$$ (7) Assumption 7. There exist processes $$\{r_{t,n}^{(l)}: t=1\ldots,n\}$$ and $$\{r_{t,n}^{(u)}: t=1\ldots,n\}$$ for each $$n$$ that satisfy the following conditions: (i) all the $$r_{t,n}^{(l)}$$ and $$r_{t,n}^{(u)}$$ are $$\mathcal{F}_{t-1}$$-measurable; (ii) the processes $$\{r_{t,n_0}^{(l)}\}$$ and $$\{r_{t,n_0}^{(u)}\}$$ are strictly stationary and ergodic with $$r_{t,n_0}^{(l)}\leqslant r_{t,n}\leqslant r_{t,n_0}^{(u)}$$ for all $$n\geqslant n_0$$; and (iii) for each fixed $$t$$, $$\:r_{t,n}^{(l)}$$ increases monotonically with $$n$$, while $$r_{t,n}^{(u)}$$ decreases monotonically with $$n$$, i.e., $$r_{t,n}^{(l)}\leqslant r_{t,n+1}^{(l)}\leqslant r_{t,n+1}^{(u)} \leqslant r_{t,n}^{(u)}$$ for all $$n$$, and $$\lim_{n\rightarrow\infty} r_{t,n}^{(l)}=\lim_{n\rightarrow\infty} r_{t,n}^{(u)}=r_t$$ with probability $$1$$. Proposition 2. Consider the case of $$s_{t,n}=a_0+\sum_{i=1}^{p^*}a_iy_{t-i,n}^2+\sum_{j=1}^{q^*}a_{p^*+j}h_{t-j,n}$$, where $$a_0,a_1, \ldots, a_{p^*+q^*}$$ are nonnegative constants. Under Assumptions 1 and 6, if $$q>0$$, then the conditions in Assumption 7 hold and $$E\{(r_{t,n_0}^{(u)})^{m}\}<\infty$$ for any $$m>0$$.  For other forms of $$s_{t,n}$$, Assumption 7 can also be readily verified, although additional moment restrictions on $$y_{t,n}$$ may be required. Assumption 2’. Under $$H_{1n}$$, the estimator $$\hat{{\theta}}_n$$ admits the representation   \begin{equation*} n^{1/2}(\hat{{\theta}}_n-{\theta}_0)=n^{-1/2}\sum_{t=1}^{n}\xi_{t,n} + \varDelta+ o_{\rm p}(1), \end{equation*} where $$\{\xi_{t,n},\mathcal{F}_t: t=1\ldots,n\}$$ is a strictly stationary and ergodic martingale difference sequence for each sufficiently large $$n$$, $$\:\lim_{n\rightarrow\infty}\mathrm{var}(\xi_{t,n}) =\Gamma$$, and $$\varDelta\in\mathbb{R}^{p+q+1}$$ is a constant vector. It is possible to derive the explicit form of the shift $$\varDelta$$ under additional regularity conditions for the estimator $$\hat{\theta}_n$$ and those of the underlying model in (6). Specifically, assuming that model (6) is locally asymptotically normal (van der Vaart, 1998), by Le Cam’s third lemma we have $$\varDelta=\lim_{n\rightarrow\infty}\mathrm{cov}\{n^{-1/2}\sum_{t=1}^n\xi_t, \Delta^{(n)}(\theta_0)\}$$, where $$\Delta^{(n)}(\theta_0) = -0{\cdot}5n^{-1/2}\sum_{t=1}^{n} \{1+\varepsilon_t{f^\prime(\varepsilon_t)}/{f(\varepsilon_t)}\}h_t^{-1}{\partial h_t(\theta_0)}/{\partial \theta}$$ is the central sequence of the garch($$p, q$$) model (Drost & Klaassen, 1997). Let $$V=(v_1, \ldots, v_M)^{\mathrm{\scriptscriptstyle T} }$$ with $$v_k=E[\{0{\cdot}5-G(|\varepsilon_{t-k}|)\}r_t]$$, and let $$V_\Psi=(v_1^\Psi, \ldots, v_M^\Psi)^{\mathrm{\scriptscriptstyle T} }$$ with $$v_k^\Psi=E[\{\mu_\Psi-\Psi(|\varepsilon_{t-k}|)\}r_t]$$. Theorem 3. Suppose that $$H_{1n}$$ and Assumptions 1, 2’, 3 and 5–7 hold with $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$. If $${\Sigma}$$ is positive definite, then $$n^{1/2} \hat{\rho} \rightarrow N(\varUpsilon, \Sigma)$$ in distribution as $$n \rightarrow \infty$$, where $$\varUpsilon=6\kappa(D\varDelta-V)$$, with $$\kappa$$, $$D$$ and $$\Sigma$$ defined as in Theorem 1. Theorem 4. Suppose that $$H_{1n}$$ and Assumptions 1, 2’ and 4–7 hold with $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$. If $${\Sigma}_\Psi$$ is positive definite, then $$n^{1/2} \hat{\rho}_\Psi \rightarrow N(\varUpsilon_\Psi, {\Sigma}_\Psi)$$ in distribution as $$n \rightarrow \infty$$, where $$\varUpsilon_\Psi=0{\cdot}5\kappa_\Psi (D_\Psi\varDelta-V_\Psi)/\sigma_\Psi^{2}$$, with $$\kappa_\Psi$$, $$D_\Psi$$, $$\sigma_\Psi$$ and $$\Sigma_\Psi$$ defined as in Theorem 2. We can show that under $$H_{1n}$$, the consistency of the estimators $$\hat{\Sigma}$$ and $$\hat{\Sigma}_\Psi$$ in the previous section still holds, and hence $$Q(M)$$ and $$Q_{\Psi}(M)$$ converge to the noncentral $$\chi^2_M$$ distribution with noncentrality parameter $$c_\Psi=\varUpsilon_\Psi^{\mathrm{\scriptscriptstyle T} } \Sigma^{-1}_\Psi\varUpsilon_\Psi$$ as $$n\rightarrow\infty$$, where $$\Psi=G$$ for $$Q(M)$$. In other words, the local power is determined by the value of $$c_\Psi$$. 4. Two applications In this section we apply the asymptotic results from § § 2 and 3 to generalized autoregressive conditional heteroscedastic models fitted by the Laplacian quasi maximum likelihood method (Berkes & Horváth, 2004) and the least absolute deviations method (Peng & Yao, 2003). We first derive the asymptotic distributions of these two estimators under $$H_{1n}$$. Let us write   $J=E\!\left\{\frac{1}{h_t^2} \frac{\partial h_t(\theta_0)}{\partial\theta} \frac{\partial h_t(\theta_0)}{\partial\theta^{\mathrm{\scriptscriptstyle T} }}\right\}\!,\quad \lambda=E\!\left \{\frac{r_t}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta}\right \}\!,$ where $$r_t$$ is defined as in (7). For model (1), the Laplacian quasi maximum likelihood estimator (Berkes & Horváth, 2004) is defined as $$\hat{\theta}_n^{\mathrm{LQML}}={\rm{arg\,min}}_{\theta\in\Theta}n^{-1}\sum_{t=1}^{n} \{\log\tilde{h}_{t}^{1/2}(\theta)+{|y_{t}|}/{\tilde{h}_{t}^{1/2}(\theta)} \}$$, where the identifiability conditions are $$E(\varepsilon_t)=0$$ and $$E(|\varepsilon_t|)=1$$. Under $$H_0$$ and Assumption 1, if $$E(\varepsilon_t^2)<\infty$$, then we can show that   $n^{1/2}(\hat{\theta}^{\mathrm{LQML}}_n-{\theta}_0 )=\frac{2J^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{|\varepsilon_t|-1}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta} +o_{\rm p}(1),$ which converges in distribution to $$N[0, \,4 \{E(\varepsilon_t^2)-1\}J^{-1}]$$ as $$n\rightarrow\infty$$. Theorem 5. Suppose that $$H_{1n}$$ and Assumptions 1 and 5–7 hold. If $$E(r_{t,n_0}^{(u)})<\infty$$, then $$\hat{\theta}_n^{\mathrm{LQML}}\rightarrow \theta_0$$ almost surely as $$n\rightarrow\infty$$. Moreover, if $$E(\varepsilon_t^2)<\infty$$ and $$E\{(r_{t,n_0}^{(u)})^{2+\delta_1}\}<\infty$$ for some $$\delta_1>0$$, then  $n^{1/2} (\hat{\theta}_n^{\mathrm{LQML}}-{\theta}_0)=\frac{2J^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{|\varepsilon_t|-1}{h_{t,n}(\theta_0)}\frac{\partial h_{t,n}(\theta_0)}{\partial\theta} +J^{-1}\lambda+o_{\rm p}(1),$ which converges in distribution to $$N[J^{-1}\lambda, \, 4 \{E(\varepsilon_t^2)-1\}J^{-1}]$$ as $$n\rightarrow\infty$$.  For model (1), the least absolute deviations estimator in Peng & Yao (2003) is defined as $$\hat{\theta}_n^{\mathrm{LAD}}={\rm{arg\,min}}_{\theta\in\Theta}n^{-1}\sum_{t=1}^{n} |\log y_{t}^2-\log \tilde{h}_{t}({\theta}) |$$, where the identifiability condition is $$\mathrm{median}(|\varepsilon_t|)=1$$. Under $$H_0$$ and Assumption 1, if $$g(1)>0$$, then it can be shown that   $n^{1/2} (\hat{\theta}^{\mathrm{LAD}}_n-{\theta}_0 )=\frac{\{g(1)J\}^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{{\mathrm{sgn}}(|\varepsilon_t|-1)}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta} +o_{\rm p}(1),$ which converges in distribution to $$N[0, \,\{g(1)\}^{-2}J^{-1}]$$ as $$n\rightarrow\infty$$, where $${\mathrm{sgn}}(x)=I(x>0)- I(x<0)$$ is the sign function; see Chen & Zhu (2015). Theorem 6. If $$H_{1n}$$ and Assumptions 1 and 5–7 hold, then $$\hat{\theta}_n^{\mathrm{LAD}}\rightarrow \theta_0$$ almost surely as $$n\rightarrow\infty$$. Moreover, if $$g(1)>0$$ and $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$, then  $n^{1/2} (\hat{\theta}_n^{\mathrm{LAD}}-{\theta}_0 )=\frac{\{g(1)J\}^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{{\mathrm{sgn}}(|\varepsilon_t|-1)}{h_{t,n}(\theta_0)}\frac{\partial h_{t,n}(\theta_0)}{\partial\theta} +J^{-1}\lambda+o_{\rm p}(1),$ which converges in distribution to $$N[J^{-1}\lambda, \,\{g(1)\}^{-2}J^{-1}]$$ as $$n\rightarrow\infty$$.  Given Theorems 5 and 6, the estimators $$\hat{\theta}_n^{\mathrm{LQML}}$$ and $$\hat{\theta}_n^{\mathrm{LAD}}$$ both satisfy Assumptions 2 and 2’ with $$\varDelta=J^{-1}\lambda$$, and we can then obtain the asymptotic distributions of $$n^{1/2}\hat{\rho}$$ and $$n^{1/2}\hat{\rho}_\Psi$$ under both $$H_0$$ and $$H_{1n}$$. Moreover, Theorems 1–4 ensure that the proposed statistic, $$n^{1/2}\hat{\rho}$$, has the same asymptotic distributions as $$n^{1/2}\hat{\rho}_\Psi$$ with $$\Psi=G$$ under both $$H_0$$ and $$H_{1n}$$. Therefore, we focus on $$n^{1/2}\hat{\rho}_\Psi$$ in the following discussion. By Theorems 1–4, under both $$H_0$$ and $$H_{1n}$$, the asymptotic covariance matrix of $$n^{1/2}\hat{\rho}_\Psi$$ is   $\Sigma_\Psi= {I}_M + \sigma_\Psi^{-4}\big(\kappa_\Psi^2\{E(\varepsilon_t^2)-1\}+2\kappa_\Psi E\big[\{\mu_\Psi-\Psi(|\varepsilon_t|)\}(|\varepsilon_t|-1)\big]\big) D_\Psi J^{-1}D_\Psi^{\mathrm{\scriptscriptstyle T} }$ for the Laplacian quasi maximum likelihood estimator $$\hat{\theta}_n^{\mathrm{LQML}}$$, and is   $\Sigma_\Psi= {I}_M + \sigma_\Psi^{-4}\left\{\frac{\kappa_\Psi^2}{4g^2(1)} +\frac{\kappa_\Psi}{g(1)} E\big[\{\mu_\Psi-\Psi(|\varepsilon_t|)\}\mathrm{sgn}(|\varepsilon_t|-1)\big] \right\} D_\Psi J^{-1}D_\Psi^{\mathrm{\scriptscriptstyle T} }$ for the least absolute deviations estimator $$\hat{\theta}_n^{\mathrm{LAD}}$$. When $$\Psi=G$$, $$\:\sigma_\Psi^2=1/12$$. Moreover, under $$H_{1n}$$, the asymptotic distributions of $$n^{1/2} \hat{\rho}_\Psi$$ for both estimators are shifted by   $\varUpsilon_\Psi=0{\cdot}5\kappa_\Psi (D_\Psi J^{-1}\lambda-V_\Psi)/\sigma_\Psi^{2}\text{.}$ We now consider when $$\varUpsilon_\Psi$$ is nonzero. Let $$b_1={\rm{arg\,min}}_{b\in\mathbb{R}^{p+q+1}}E\{(r_t- X_t^{\mathrm{\scriptscriptstyle T} } b)^2\}$$ and $$b_2={\rm{arg\,min}}_{b\in\mathbb{R}^{p+q+1}}E[\{\Psi(|\varepsilon_{t-k}|)- X_t^{\mathrm{\scriptscriptstyle T} } b\}^2]$$, where $$X_t=h_t^{-1}{\partial h_t(\theta_0)}/{\partial\theta}$$. Define the partial covariance (Fan & Yao, 2003)   $$\label{eq_pcov} \mathrm{pcov}\{r_t,\Psi(|\varepsilon_{t-k}|)\mid X_t\}=E[(r_t-X_t^{\mathrm{\scriptscriptstyle T} } b_1)\{\Psi(|\varepsilon_{t-k}|)-X_t^{\mathrm{\scriptscriptstyle T} } b_2\}]\text{.}$$ (8) Because $$b_1=J^{-1}\lambda$$, the $$k$$th element of the term $$D_\Psi J^{-1}\lambda-V_\Psi$$, i.e., $$d_k^{\Psi {\mathrm{\scriptscriptstyle T} }} J^{-1}\lambda-v_k^\Psi$$, can be written as $$-\mathrm{pcov}\{r_t,\Psi(|\varepsilon_{t-k}|)\mid X_t\}$$. Moreover, as $$\kappa_\Psi>0$$, the $$k$$th element of $$\varUpsilon_\Psi$$ is zero if and only if the partial covariance in (8) is zero. Consider the example in Proposition 2, where we have $$s_t=s_{1,t}+s_{2,t}$$ with $$s_{1,t}=a_0+\sum_{i=1}^{p}a_iy_{t-i}^2+\sum_{j=1}^{q}a_{p^*+j}h_{t-j}$$ and $$s_{2,t}=\sum_{i=p+1}^{p^*}a_iy_{t-i}^2+\sum_{j=q+1}^{q^*}a_{p^*+j}h_{t-j}$$. Then $$r_t=X_t^{\mathrm{\scriptscriptstyle T} } a+h_{t}^{-1}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{2,t-k}$$, where $$a=(a_0,a_1,\ldots,a_p,a_{p^*+1},\ldots,a_{p^*+q})^{\mathrm{\scriptscriptstyle T} }$$. As a result, when $$s_{2,t}= 0$$, i.e., when the model is correctly specified, the partial covariance in (8) is zero for all $$k>0$$, and the test $$Q_\Psi(M)$$ has no power. If the model is misspecified, i.e., when $$s_{2,t}\neq0$$, then by a method similar to the proof of identifiability for generalized autoregressive conditional heteroscedastic models (Francq & Zakoïan, 2004) we can show that $$r_t-X_t^{\mathrm{\scriptscriptstyle T} } b_1\neq 0$$ with probability 1, provided that Assumption 1 holds. Thus, (8) becomes nonzero at some $$k$$ values, resulting in nontrivial power for the test. In general, the local power of $$Q_\Psi(M)$$ is determined by the noncentrality parameter $$c_\Psi=\varUpsilon_\Psi^{\mathrm{\scriptscriptstyle T} } \Sigma^{-1}_\Psi\varUpsilon_\Psi$$, which depends on the departure $$s_{t,n}$$, the underlying model, the estimator $$\hat{\theta}_n$$ and the function $$\Psi$$. It is difficult to make a direct comparison of the values of $$c_\Psi$$ across different functions $$\Psi$$. We next calculate $$c_\Psi$$ for specific scenarios. Table 1 presents the values of $$c_\Psi$$ under local alternatives of the garch$$(1,1)$$ model with $$(\omega_0, \alpha_0, \beta_0)=(1, 0{\cdot}3, 0{\cdot}2)$$ and three types of departure, namely $$s_{t,n}=G(|y_{t-2,n}|)$$, $$|y_{t-2,n}|$$ and $$y_{t-2,n}^2$$, for $$\{\varepsilon_t\}$$ following the zero-mean normal distribution and Student’s $$t_7$$, $$t_{5}$$, $$t_3$$, $$t_{2{\cdot}5}$$ and $$t_1$$ distributions, standardized such that $$\mathrm{median}(|\varepsilon_t|)=1$$. We assume that the model is estimated by the least absolute deviations method, and we approximate the quantities in $$\varUpsilon_\Psi$$ and $$\Sigma_\Psi$$ by sample averages based on a generated sequence $$\{\,y_1,\ldots,y_n\}$$ with $$n=100\,000$$. We set $$M=6$$ and compare the three transformations $$\Psi(x)=G(x)$$, $$x$$ and $$x^2$$. Some values are left blank in Table 1 because of violations of the moment conditions on $$\varepsilon_t$$. It can be seen that $$\Psi=G$$ dominates all of the transformations when $$E(\varepsilon_t^4)=\infty$$, and even for moderate-tailed or Gaussian innovations when the departure is $$s_{t,n}=G(|y_{t-2,n}|)$$ or $$|y_{t-2,n}|$$. The desirable performance of $$\Psi=G$$ is also observed in other situations; see the Supplementary Material. Moreover, consistent with these results, our first simulation experiment in § 5 demonstrates that the proposed test, $$Q(M)$$, performs favourably compared with existing tests. Table 1. Noncentrality parameter $$c_{\Psi}$$$$(\times\, 10^2)$$ under different local alternatives of the garch$$(1,1)$$ model with $$(\omega_0, \alpha_0, \beta_0)=(1, 0{\cdot}3, 0{\cdot}2)$$, for $$\Psi(x)=G(x), x$$ and $$x^2$$     $$s_{t, n}=G(|y_{t-2, n}|)$$  $$s_{t, n}=|y_{t-2, n}|$$  $$s_{t, n}=y_{t-2, n}^2$$     $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$t_1$$  3E-05        2E-03        99$${\cdot}$$52        $$t_{2{\cdot}5}$$  0$${\cdot}$$05  3E-03     1$${\cdot}$$17  0$${\cdot}$$13     31$${\cdot}$$38  8$${\cdot}$$45     $$t_3$$  0$${\cdot}$$07  0$${\cdot}$$01     1$${\cdot}$$32  0$${\cdot}$$27     26$${\cdot}$$10  12$${\cdot}$$15     $$t_5$$  0$${\cdot}$$10  0$${\cdot}$$03  3E-03  1$${\cdot}$$42  0$${\cdot}$$72  0$${\cdot}$$11  17$${\cdot}$$07  16$${\cdot}$$86  3$${\cdot}$$98  $$t_7$$  0$${\cdot}$$11  0$${\cdot}$$05  0$${\cdot}$$01  1$${\cdot}$$42  0$${\cdot}$$93  0$${\cdot}$$26  14$${\cdot}$$35  16$${\cdot}$$96  7$${\cdot}$$91  Normal  0$${\cdot}$$15  0$${\cdot}$$10  0$${\cdot}$$04  1$${\cdot}$$36  1$${\cdot}$$25  0$${\cdot}$$74  9$${\cdot}$$32  13$${\cdot}$$62  12$${\cdot}$$80     $$s_{t, n}=G(|y_{t-2, n}|)$$  $$s_{t, n}=|y_{t-2, n}|$$  $$s_{t, n}=y_{t-2, n}^2$$     $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$t_1$$  3E-05        2E-03        99$${\cdot}$$52        $$t_{2{\cdot}5}$$  0$${\cdot}$$05  3E-03     1$${\cdot}$$17  0$${\cdot}$$13     31$${\cdot}$$38  8$${\cdot}$$45     $$t_3$$  0$${\cdot}$$07  0$${\cdot}$$01     1$${\cdot}$$32  0$${\cdot}$$27     26$${\cdot}$$10  12$${\cdot}$$15     $$t_5$$  0$${\cdot}$$10  0$${\cdot}$$03  3E-03  1$${\cdot}$$42  0$${\cdot}$$72  0$${\cdot}$$11  17$${\cdot}$$07  16$${\cdot}$$86  3$${\cdot}$$98  $$t_7$$  0$${\cdot}$$11  0$${\cdot}$$05  0$${\cdot}$$01  1$${\cdot}$$42  0$${\cdot}$$93  0$${\cdot}$$26  14$${\cdot}$$35  16$${\cdot}$$96  7$${\cdot}$$91  Normal  0$${\cdot}$$15  0$${\cdot}$$10  0$${\cdot}$$04  1$${\cdot}$$36  1$${\cdot}$$25  0$${\cdot}$$74  9$${\cdot}$$32  13$${\cdot}$$62  12$${\cdot}$$80  Small values are written in standard form, e.g., 3E-05 means $$3\times 10^{-5}$$. 5. Simulation experiments This section presents the results of three simulation experiments carried out to assess the empirical power of the proposed test $$Q(M)$$, evaluate the performance of the automatic method of selecting $$M$$, and verify the asymptotic equivalence in Proposition 1. The least absolute deviations estimator (Peng & Yao, 2003) is employed throughout. In the first experiment, we compare the power of the proposed test, $$Q(M)$$, with that of three existing goodness-of-fit tests: the sign-based test of Chen & Zhu (2015), $$Q_{{\mathrm{sgn}}}(M)$$; the test based on absolute residuals in Li & Li (2005), $$Q_{\mathrm{abs}}(M)$$; and the test based on squared residuals in Li (2004), $$Q_{\mathrm{sqr}}(M)$$. For comparison, $$M$$ is fixed at 6. We generate 1000 replications from   $$\label{garch11_alt} y_{t,n}=\varepsilon_t h_{t,n}^{1/2}, \hspace{5mm} h_{t,n}=0{\cdot}01 +0{\cdot}03y_{t-1,n}^2 +0{\cdot}2 h_{t-1,n}+n^{-1/2}s_{t,n},$$ (9) where $$\{\varepsilon_t\}$$ are independent and identically distributed, following the normal distribution with mean zero or Student’s $$t_7$$, $$t_{5}$$, $$t_3$$, $$t_{2{\cdot}5}$$ or $$t_1$$ distribution, and are standardized such that median$$(|\varepsilon_t|)=1$$. We consider departures $$s_{t,n}=2y_{t-2,n}^2$$, $$2|y_{t-2,n}|$$ and $$2G(|y_{t-2,n}|)$$, and the sample size is $$n=1000$$. The density function of $$\varepsilon_t$$ is estimated by the kernel density method with the Gaussian kernel and its rule-of-thumb bandwidth, $$h=0{\cdot}9n^{-1/5}\min(\hat{\sigma},\hat{R}/1{\cdot}34)$$, where $$\hat{\sigma}$$ and $$\hat{R}$$ are the sample standard deviation and interquartile of the residuals $$\{\hat{\varepsilon}_t\}$$, respectively; see Silverman (1986). Figure 1 displays the power of the four tests. When the tails of $$\varepsilon_t$$ become heavier, the power of $$Q_{\mathrm{abs}}(M)$$ and of $$Q_{\mathrm{sqr}}(M)$$ drops dramatically. Although both $$Q(M)$$ and $$Q_{{\mathrm{sgn}}}(M)$$ maintain their power, $$Q(M)$$ is clearly more powerful, suggesting that the degree of information loss from its transformation of absolute residuals is relatively small. Finally, although $$Q_{\mathrm{abs}}(M)$$ performs well when $$s_{t,n}=2y_{t-2,n}^2$$ and $$\varepsilon_t$$ is lighter-tailed, the proposed $$Q(M)$$ is almost always the most powerful test for the other two types of departure, even when $$\varepsilon_t$$ is moderate-tailed. Fig. 1. View largeDownload slide Power (%) of four goodness-of-fit tests, $$Q(6)$$ (circles), $$Q_{\mathrm{sgn}}(6)$$ (triangles), $$Q_{\mathrm{abs}}(6)$$ (squares) and $$Q_{\mathrm{sqr}}(6)$$ (pluses), for six different innovation distributions and three different departures: (a) $$s_{t,n}=2 y_{t-2,n}^2$$; (b) $$s_{t,n}=2|y_{t-2,n}|$$; (c) $$s_{t,n}=2 G(|y_{t-2,n}|)$$. Fig. 1. View largeDownload slide Power (%) of four goodness-of-fit tests, $$Q(6)$$ (circles), $$Q_{\mathrm{sgn}}(6)$$ (triangles), $$Q_{\mathrm{abs}}(6)$$ (squares) and $$Q_{\mathrm{sqr}}(6)$$ (pluses), for six different innovation distributions and three different departures: (a) $$s_{t,n}=2 y_{t-2,n}^2$$; (b) $$s_{t,n}=2|y_{t-2,n}|$$; (c) $$s_{t,n}=2 G(|y_{t-2,n}|)$$. The second experiment evaluates the performance of the proposed order selection method. We compare three different methods: the Bayesian information criterion-type method in (5), where the penalty term is $$M\log n$$; the Akaike information criterion-type method, for which the penalty term in (5) is replaced by $$2M$$; and the mixed method, for which the penalty term in (5) is replaced by $$2M$$ if and only if $$n^{1/2}\max(|\hat{\rho}_1|, \ldots, |\hat{\rho}_{d_{\mathrm{max}}}|)> (\log n)^{1/2}$$. We set $$d_{\mathrm{min}}=1$$ and $$d_{\mathrm{max}}=5, 25$$ or $$50$$. The data are generated from (9) with $$s_{t,n}=cy_{t-2,n}^2$$, where $$c=0$$ corresponds to the size and $$c=1,\ldots, 5$$ to the power. The innovations $$\{\varepsilon_t\}$$ are Student $$t_3$$-distributed; the findings under the other innovation distributions from the previous experiment are similar. All other settings are preserved from the first experiment. Figure 2 shows that the rejection rates are insensitive to the value of $$d_{\mathrm{max}}$$, but vary for different selection methods. The size of the Bayesian information criterion-based automatic test is close to the nominal rate. Although the power of that test is slightly smaller than the power of the Akaike information criterion-based test, the latter is severely oversized. The behaviour of the mixed method falls between that of the other two methods. In addition, when comparing the performance of the Bayesian information criterion-based automatic test $$Q(\tilde{M})$$ for $$c=2$$ in Fig. 2 with Fig. 1(a) for $$Q(6)$$ and Student $$t_3$$-distributed innovations, we can see that the automatic test has power comparable to that with a fixed $$M$$. Based on these findings, we recommend using the Bayesian information criterion-based method for automatic selection of $$M$$. Fig. 2. View largeDownload slide Rejection rates (%) of the automatic test, $$Q(\tilde{M})$$, for (a) $$d_{\mathrm{max}}=5$$, (b) $$d_{\mathrm{max}}=25$$, and (c) $$d_{\mathrm{max}}=50$$, with three selection rules: Bayesian information criterion-type rule (squares), Akaike information criterion-type rule (triangles), and mixed method (circles); horizontal dashed lines indicate the 5% nominal level. Fig. 2. View largeDownload slide Rejection rates (%) of the automatic test, $$Q(\tilde{M})$$, for (a) $$d_{\mathrm{max}}=5$$, (b) $$d_{\mathrm{max}}=25$$, and (c) $$d_{\mathrm{max}}=50$$, with three selection rules: Bayesian information criterion-type rule (squares), Akaike information criterion-type rule (triangles), and mixed method (circles); horizontal dashed lines indicate the 5% nominal level. The third experiment is conducted to verify the asymptotic equivalence of the test $$Q_\Psi(M)$$ based on the transformations $$\hat{G}_n$$, $$G_n$$ and $$G$$. We generate 1000 replications from   \begin{equation*} y_t=\varepsilon_t h_t^{1/2}, \quad h_t=0{\cdot}01+0{\cdot}2y_{t-1}^2 +0{\cdot}2 h_{t-1}, \end{equation*} where $$\{\varepsilon_t\}$$ follow the normal distribution with mean zero and median$$(|\varepsilon_t|)=1$$, and the sample sizes are $$n=200,$$ 2000 and 20 000. Figure 3 displays the histograms of $$n^{1/2}(\hat{\gamma}_k- \hat{\gamma}_k^G)$$, $$n^{1/2}(\hat{\gamma}_k^{G_n}- \hat{\gamma}_k^G)$$ and $$n^{1/2}\hat{\gamma}_k^G$$ with $$k=1$$. It can be seen that as $$n$$ increases, the distributions of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ and $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ both shrink towards zero, while that of $$n^{1/2}\hat{\gamma}_1^G$$ maintains the same shape, thereby confirming the asymptotic results in Proposition 1. Fig. 3. View largeDownload slide Histograms of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ (top row), $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ (middle row) and $$n^{1/2}\hat{\gamma}_1^G$$ (bottom row) under $$H_0$$ for sample sizes $$n=200$$ (left column), 2000 (middle column) and 20 000 (right column). Fig. 3. View largeDownload slide Histograms of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ (top row), $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ (middle row) and $$n^{1/2}\hat{\gamma}_1^G$$ (bottom row) under $$H_0$$ for sample sizes $$n=200$$ (left column), 2000 (middle column) and 20 000 (right column). The results of three further simulation studies are reported in the Supplementary Material, wherein we verify the asymptotic distributions of $$Q(M)$$ under $$H_0$$ and $$H_{1n}$$ and apply the proposed order selection method to all test statistics in the first experiment. In particular, we show that the null distribution of $$Q(M)$$ is well approximated by the $$\chi_M^2$$ distribution even in small samples and that when $$n$$ is large, $$Q(M)$$ converges to a noncentral $$\chi_M^2$$ distribution under $$H_{1n}$$, although the convergence rate seems slower for heavier-tailed innovation distributions. Finally, the proposed order selection method performs well when applied to other test statistics. 6. An empirical example In this section we analyse the daily log returns, in percentage form, of the exchange rate of the Chinese yuan to the United States dollar from 23 January 2009 to 9 October 2015. The sample size is $$n=1520$$. Figure 4 shows clear volatility clustering. The sample autocorrelation function lies inside or near the bounds of $$\pm1{\cdot}96/n^{1/2}$$ at the first 30 lags, so a pure generalized autoregressive conditional heteroscedastic model is suggested. Fig. 4. View largeDownload slide Daily log returns (%) of yuan-to-dollar exchange rates from 23 January 2009 to 9 October 2015. Fig. 4. View largeDownload slide Daily log returns (%) of yuan-to-dollar exchange rates from 23 January 2009 to 9 October 2015. We fit four models using the least absolute deviations method: the garch$$(1,1)$$ model and the autoregressive conditional heteroscedastic models of orders $$p=6$$, $$7$$ and $$8$$, defined as $$y_t=\varepsilon_t h_t^{1/2}$$, $$h_t=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i}^2$$ and denoted by arch$$(p)$$. The estimated coefficients and associated standard errors are listed in Table 2. Before conducting goodness-of-fit tests, we first plot the sample autocorrelation functions of the absolute residuals transformed by $$\Psi(x)=\hat{G}_n(x)$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$, along with their corresponding 95% confidence bands. Figure 5 shows that the residual autocorrelation function $$\hat{\rho}_k$$ falls noticeably outside the confidence band at lag $$k=6$$ for all fitted arch$$(p)$$ models, yet falls inside the band at all lags for the fitted garch$$(1,1)$$ model. In contrast, $$\hat{\rho}_k^{{\mathrm{sgn}}}$$, $$\hat{\rho}_k^{\mathrm{abs}}$$ and $$\hat{\rho}_k^{\mathrm{sqr}}$$ all either lie inside the confidence bands or stand out only slightly. The last two sample autocorrelation functions in particular are very small at almost all lags. Fig. 5. View largeDownload slide Sample autocorrelation functions of absolute residuals transformed by $$\Psi=\hat{G}_n$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$ (from top to bottom) for four fitted models, with corresponding 95% confidence bands. Fig. 5. View largeDownload slide Sample autocorrelation functions of absolute residuals transformed by $$\Psi=\hat{G}_n$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$ (from top to bottom) for four fitted models, with corresponding 95% confidence bands. Table 2. Estimation results $$(\times\, 10^2)$$, with standard errors, for all fitted models in the exchange rate example     arch$$(6)$$  arch$$(7)$$  arch$$(8)$$  garch     Estimate  SE  Estimate  SE  Estimate  SE  Estimate  SE  $$\omega$$   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03  2E-03  6E-04  $$\alpha_1$$  19$${\cdot}$$13  3$${\cdot}$$11  18$${\cdot}$$68  3$${\cdot}$$09  17$${\cdot}$$25   2$${\cdot}$$99  11$${\cdot}$$50   1$${\cdot}$$70  $$\alpha_2$$   9$${\cdot}$$30  2$${\cdot}$$24   9$${\cdot}$$19  2$${\cdot}$$22   8$${\cdot}$$65   2$${\cdot}$$20        $$\alpha_3$$   5$${\cdot}$$78  1$${\cdot}$$86   4$${\cdot}$$94  1$${\cdot}$$73   5$${\cdot}$$38   1$${\cdot}$$81        $$\alpha_4$$   3$${\cdot}$$59  1$${\cdot}$$50   2$${\cdot}$$60  1$${\cdot}$$37   2$${\cdot}$$56   1$${\cdot}$$37        $$\alpha_5$$   0$${\cdot}$$04  0$${\cdot}$$68  4E-06  0$${\cdot}$$76  7E-05   0$${\cdot}$$79        $$\alpha_6$$   5$${\cdot}$$02  1$${\cdot}$$39   5$${\cdot}$$03  1$${\cdot}$$44   4$${\cdot}$$31   1$${\cdot}$$40        $$\alpha_7$$         1$${\cdot}$$10  0$${\cdot}$$72   0$${\cdot}$$70   0$${\cdot}$$71        $$\alpha_8$$               1$${\cdot}$$59   0$${\cdot}$$83        $$\beta_1$$                    69$${\cdot}$$34   3$${\cdot}$$00     arch$$(6)$$  arch$$(7)$$  arch$$(8)$$  garch     Estimate  SE  Estimate  SE  Estimate  SE  Estimate  SE  $$\omega$$   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03  2E-03  6E-04  $$\alpha_1$$  19$${\cdot}$$13  3$${\cdot}$$11  18$${\cdot}$$68  3$${\cdot}$$09  17$${\cdot}$$25   2$${\cdot}$$99  11$${\cdot}$$50   1$${\cdot}$$70  $$\alpha_2$$   9$${\cdot}$$30  2$${\cdot}$$24   9$${\cdot}$$19  2$${\cdot}$$22   8$${\cdot}$$65   2$${\cdot}$$20        $$\alpha_3$$   5$${\cdot}$$78  1$${\cdot}$$86   4$${\cdot}$$94  1$${\cdot}$$73   5$${\cdot}$$38   1$${\cdot}$$81        $$\alpha_4$$   3$${\cdot}$$59  1$${\cdot}$$50   2$${\cdot}$$60  1$${\cdot}$$37   2$${\cdot}$$56   1$${\cdot}$$37        $$\alpha_5$$   0$${\cdot}$$04  0$${\cdot}$$68  4E-06  0$${\cdot}$$76  7E-05   0$${\cdot}$$79        $$\alpha_6$$   5$${\cdot}$$02  1$${\cdot}$$39   5$${\cdot}$$03  1$${\cdot}$$44   4$${\cdot}$$31   1$${\cdot}$$40        $$\alpha_7$$         1$${\cdot}$$10  0$${\cdot}$$72   0$${\cdot}$$70   0$${\cdot}$$71        $$\alpha_8$$               1$${\cdot}$$59   0$${\cdot}$$83        $$\beta_1$$                    69$${\cdot}$$34   3$${\cdot}$$00  SE, standard error; small values are written in standard form, e.g., 2E–03 means $$2\times 10^{-3}$$. We next compare the performance of the proposed test, based on $$Q(M)$$, with those of the tests based on $$Q_{\mathrm{sgn}}(M)$$, $$Q_{\mathrm{abs}}(M)$$ and $$Q_{\mathrm{sqr}}(M)$$. For each test, we employ the Bayesian information criterion-type method in (5) to select $$M$$, and use $$d_{\mathrm{min}}=6$$ because $$\hat{\rho}_k$$ first falls outside its confidence band at $$k=6$$ in Fig. 5; $$d_{\mathrm{max}}$$ is set to 30. Table 3 lists the $$p$$-values of these tests with automatically selected orders $$\tilde{M}$$, indicated by a superscript A. We also report the $$p$$-values for the tests with $$M=9$$, because $$\hat{\rho}_k$$ for both the fitted arch$$(6)$$ and arch$$(7)$$ models is significant at lag 9. The $$p$$-values of $$Q_{\mathrm{abs}}$$ and $$Q_{\mathrm{sqr}}$$ are all close or even equal to unity. Although $$Q_{\mathrm{sgn}}$$ has smaller $$p$$-values, it fails to reject any of the fitted arch$$(p)$$ models at the 5% significance level. By contrast, the inadequacy of the fitted arch$$(6)$$ and arch$$(7)$$ models is successfully detected by our proposed test for both $$M=\tilde{M}$$ and $$M=9$$, which indicates that $$\Psi=\hat{G}_n$$ achieves better performance in detecting possible autocorrelation structures. Table 3. The $$p$$-values of four goodness-of-fit tests with selected order $$\tilde{M}$$ or $$M=9$$     $$Q^{\rm A}$$  $$Q_{\mathrm{sgn}}^{\rm A}$$  $$Q_{\mathrm{abs}}^{\rm A}$$  $$Q_{\mathrm{sqr}}^{\rm A}$$  $$Q(9)$$  $$Q_{\mathrm{sgn}}(9)$$  $$Q_{\mathrm{abs}}(9)$$  $$Q_{\mathrm{sqr}}(9)$$  arch$$(6)$$  0$${\cdot}$$0014  0$${\cdot}$$4130  0$${\cdot}$$8210  1$${\cdot}$$0000  0$${\cdot}$$0015  0$${\cdot}$$3349  0$${\cdot}$$9242  1$${\cdot}$$0000  arch$$(7)$$  0$${\cdot}$$0185  0$${\cdot}$$2790  0$${\cdot}$$8666  1$${\cdot}$$0000  0$${\cdot}$$0125  0$${\cdot}$$0787  0$${\cdot}$$9483  1$${\cdot}$$0000  arch$$(8)$$  0$${\cdot}$$0904  0$${\cdot}$$1872  0$${\cdot}$$9139  1$${\cdot}$$0000  0$${\cdot}$$0981  0$${\cdot}$$1187  0$${\cdot}$$9805  1$${\cdot}$$0000  garch  0$${\cdot}$$1329  0$${\cdot}$$1367  0$${\cdot}$$9474  1$${\cdot}$$0000  0$${\cdot}$$1272  0$${\cdot}$$1034  0$${\cdot}$$9925  1$${\cdot}$$0000     $$Q^{\rm A}$$  $$Q_{\mathrm{sgn}}^{\rm A}$$  $$Q_{\mathrm{abs}}^{\rm A}$$  $$Q_{\mathrm{sqr}}^{\rm A}$$  $$Q(9)$$  $$Q_{\mathrm{sgn}}(9)$$  $$Q_{\mathrm{abs}}(9)$$  $$Q_{\mathrm{sqr}}(9)$$  arch$$(6)$$  0$${\cdot}$$0014  0$${\cdot}$$4130  0$${\cdot}$$8210  1$${\cdot}$$0000  0$${\cdot}$$0015  0$${\cdot}$$3349  0$${\cdot}$$9242  1$${\cdot}$$0000  arch$$(7)$$  0$${\cdot}$$0185  0$${\cdot}$$2790  0$${\cdot}$$8666  1$${\cdot}$$0000  0$${\cdot}$$0125  0$${\cdot}$$0787  0$${\cdot}$$9483  1$${\cdot}$$0000  arch$$(8)$$  0$${\cdot}$$0904  0$${\cdot}$$1872  0$${\cdot}$$9139  1$${\cdot}$$0000  0$${\cdot}$$0981  0$${\cdot}$$1187  0$${\cdot}$$9805  1$${\cdot}$$0000  garch  0$${\cdot}$$1329  0$${\cdot}$$1367  0$${\cdot}$$9474  1$${\cdot}$$0000  0$${\cdot}$$1272  0$${\cdot}$$1034  0$${\cdot}$$9925  1$${\cdot}$$0000  Finally, we evaluate the tail-heaviness of $$\varepsilon_t$$. The Pickands and Hill estimates of the tail index are calculated for the squared residuals of the fitted garch$$(1,1)$$ model. The implication is that $$E(\varepsilon_t^2)<\infty$$ and $$E(\varepsilon_t^4)=\infty$$; see the Supplementary Material and Resnick (2007) for details. We also adopt the strict stationarity tests in Francq & Zakoïan (2012) based on least absolute deviations, and confirm the stationarity of the observed log returns at the 1% significance level. Moreover, $$\hat{\alpha}_1\hat{\sigma}^2+\hat{\beta}_1=3{\cdot}5$$, which is much greater than 1, implying that the observed sequence has an infinite second-order moment. This phenomenon, together with the heavy-tailedness of $$\varepsilon_t$$, may have led to the considerable volatility exhibited in Fig. 4. 7. Conclusion and discussion For a time series model, let $$\{\varepsilon_t\}$$ and $$\{\hat{\varepsilon}_t\}$$ denote the innovations and corresponding residuals, respectively. In constructing a goodness-of-fit test, the sample autocorrelation function of $$\{\hat{\varepsilon}_t\}$$, $$\{|\hat{\varepsilon}_t|\}$$ or $$\{\hat{\varepsilon}_t^2\}$$ is usually employed. However, to ensure the existence of the autocorrelation function of $$\{{\varepsilon}_t\}$$, $$\{|{\varepsilon}_t|\}$$ or $$\{{\varepsilon}_t^2\}$$, the requirement of a finite second- or even fourth-order moment is unavoidable. The essence of our approach in this paper is to transform the residuals before calculating the conventional autocorrelation function. Such a transformation is simple to perform and yet leads to a rich class of tests through various transformation functions. When the absolute residuals are transformed by their corresponding empirical distribution function, no moment condition for $$\varepsilon_t$$ is required, and the resultant goodness-of-fit test is applicable to arbitrarily heavy-tailed innovations. There is an extensive body of literature on time series models with innovations of infinite variance, such as the infinite variance autoregressive (Davis & Resnick, 1986; Ling, 2005) and autoregressive moving-average (Zhu & Ling, 2015) models. The corresponding estimators may not even be $$\surd{n}$$-consistent. To the best of our knowledge, no goodness-of-fit test is currently available that is well suited to such situations; we therefore propose that the method in this paper be adopted to resolve this problem, which we leave for future research. Acknowledgement We thank the editor, associate editor and two referees for their invaluable comments, which have led to substantial improvements of the paper, and we acknowledge the Hong Kong Research Grants Council for partial support. Supplementary material Supplementary material available at Biometrika online includes further results on the noncentrality parameter, additional simulation studies, tail index estimation in the empirical example, and all technical proofs. Appendix Three important lemmas Lemmas A1 and A2 below can be used to derive asymptotic distributions of weighted residual empirical processes for generalized autoregressive conditional heteroscedastic models, and hence are of independent interest. Lemma A3 provides a Hájek projection for the Spearman rank autocorrelation coefficient. Lemma A1. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. If $$\{w_t\}$$ is a strictly stationary and ergodic process with $$0\leqslant w_t\leqslant 1$$ and $$w_t\in \mathcal{F}_{t-1}$$, then  $\sup_{0\leqslant x <\infty} \left | n^{-1/2} \sum_{t=1}^{n} w_t \big\{ I(|\hat{\varepsilon}_t|\leqslant x) - I(|\varepsilon_t|\leqslant x) \big\} - 0{\cdot}5\,xg(x) d_w^{\mathrm{\scriptscriptstyle T} } n^{1/2} (\hat{{\theta}}_n-{\theta}_0) \right | = o_{\rm p}(1),$ where $$d_w=E\{w_t h_t^{-1} {\partial h_t(\theta_0)}/{\partial\theta}\}$$.  Lemma A2. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. If $$\{w_t\}$$ is a strictly stationary and ergodic process with $$0\leqslant w_t\leqslant 1$$ and each $$w_t$$ is independent of $$\mathcal{F}_{t}$$, then  \begin{equation*} \sup_{0\leqslant x <\infty} \left | n^{-1/2} \sum_{t=1}^{n} w_t \big\{ I(|\hat{\varepsilon}_t|\leqslant x) - I(|\varepsilon_t|\leqslant x) \big\} - E(w_t)xg(x) {d}_0^{*{\mathrm{\scriptscriptstyle T} }} n^{1/2} (\hat{{\theta}}_n-{\theta}_0) \right | = o_{\rm p}(1), \end{equation*}where $$d_0^*=0{\cdot}5E\{h_t^{-1} {\partial h_t(\theta_0)}/{\partial\theta}\}$$.  Lemma A3. Let $$X_1, \ldots, X_n$$ be a sample of independent observations with distribution function $$F(x)$$ and empirical distribution function $$F_n(x)=n^{-1}\sum_{t=1}^{n} I(X_t\leqslant x)$$ for $$-\infty < x < \infty$$. Then, for any positive integer $$k$$,  $n^{-1/2} \sum_{t=k+1}^{n} \{F_n(X_t)F_n(X_{t-k}) - F(X_t)F(X_{t-k})\}= - n^{-1/2} \sum_{t=k+1}^{n} \{F(X_t) - 0{\cdot}5\} + o_{\rm p}(1)\text{.}$ References Andreou E. & Werker B. J. ( 2012). An alternative asymptotic analysis of residual-based statistics. Rev. Econ. Statist.  94, 88– 99. Google Scholar CrossRef Search ADS   Andreou E. & Werker B. J. ( 2015). Residual-based rank specification tests for AR-GARCH type models. J. Economet.  185, 305– 31. Google Scholar CrossRef Search ADS   Bartels R. ( 1982). The rank version of von Neumann’s ratio test for randomness. J. Am. Statist. Assoc.  77, 40– 6. Google Scholar CrossRef Search ADS   Basrak B., Davis R. A. & Mikosch T. ( 2002). Regular variation of GARCH processes. Stoch. Proces. Appl.  99, 95– 115. Google Scholar CrossRef Search ADS   Berkes I. & Horváth L. ( 2004). The efficiency of the estimators of the parameters in GARCH processes. Ann. Statist.  32, 633– 55. Google Scholar CrossRef Search ADS   Berkes I., Horváth L. & Kokoszka P. ( 2003). GARCH processes: Structure and estimation. Bernoulli  9, 201– 27. Google Scholar CrossRef Search ADS   Bollerslev T. ( 1986). Generalized autoregressive conditional heteroskedasticity. J. Economet.  31, 307– 27. Google Scholar CrossRef Search ADS   Bougerol P. & Picard N. ( 1992). Stationarity of GARCH processes and of some nonnegative time series. J. Economet.  52, 115– 27. Google Scholar CrossRef Search ADS   Chen M. & Zhu K. ( 2015). Sign-based portmanteau test for ARCH-type models with heavy-tailed innovations. J. Economet.  189, 313– 20. Google Scholar CrossRef Search ADS   Davis R. A. & Mikosch T. ( 1998). The sample autocorrelations of heavy-tailed processes with applications to arch. Ann. Statist.  26, 2049– 80. Google Scholar CrossRef Search ADS   Davis R. A. & Resnick S. I. ( 1986). Limit theory for the sample covariance and correlation function of moving averages. Ann. Statist.  14, 533– 58. Google Scholar CrossRef Search ADS   Drost F. C. & Klaassen C. A. ( 1997). Efficient estimation in semiparametric GARCH models. J. Economet.  81, 193– 221. Google Scholar CrossRef Search ADS   Dufour J.-M. & Roy R. ( 1985). Some robust exact results on sample autocorrelations and tests of randomness. J. Economet.  29, 257– 73. Google Scholar CrossRef Search ADS   Engle R. F. ( 1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica  50, 987– 1007. Google Scholar CrossRef Search ADS   Fan J. & Yao Q. ( 2003). Nonlinear Time Series: Nonparametric and Parametric Methods . New York: Springer. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2004). Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli  10, 605– 37. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2010). GARCH Models: Structure, Statistical Inference and Financial Applications . Chichester: John Wiley & Sons. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2012). Strict stationarity testing and estimation of explosive and stationary generalized autoregressive conditional heteroscedasticity models. Econometrica  80, 821– 61. Google Scholar CrossRef Search ADS   Guo S., Box J. L. & Zhang W. ( 2017). A dynamic structure for high dimensional covariance matrices and its application in portfolio allocation. J. Am. Statist. Assoc.  112, 235– 53. Google Scholar CrossRef Search ADS   Hall P. & Yao Q. ( 2003). Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica  71, 285– 317. Google Scholar CrossRef Search ADS   Hallin M., Ingenbleek J.-F. & Puri M. L. ( 1985). Linear serial rank tests for randomness against ARMA alternatives. Ann. Statist.  13, 1156– 81. Google Scholar CrossRef Search ADS   Hallin M. & Puri M. L. ( 1994). Aligned rank tests for linear models with autocorrelated error terms. J. Mult. Anal.  50, 175– 237. Google Scholar CrossRef Search ADS   He C. & Teräsvirta T. ( 1999). Fourth moment structure of the GARCH$$(p,q)$$ process. Economet. Theory  15, 824– 46. Google Scholar CrossRef Search ADS   Le Cam L. & Yang G. L. ( 1990). Asymptotics in Statistics.  New York: Springer. Google Scholar CrossRef Search ADS   Li G. & Li W. K. ( 2005). Diagnostic checking for time series models with conditional heteroscedasticity estimated by the least absolute deviation approach. Biometrika  92, 691– 701. Google Scholar CrossRef Search ADS   Li G. & Li W. K. ( 2008). Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity. Biometrika  95, 399– 414. Google Scholar CrossRef Search ADS   Li W. K. ( 2004). Diagnostic Checks in Time Series.  New York: Chapman & Hall/CRC. Li W. K. & Mak T. K. ( 1994). On the squared residual autocorrelations in non-linear time series with conditional heteroskedasticity. J. Time Ser. Anal.  15, 627– 36. Google Scholar CrossRef Search ADS   Ling S. ( 2005). Self-weighted least absolute deviation estimation for infinite variance autoregressive models. J. R. Statist. Soc. B  67, 381– 93. Google Scholar CrossRef Search ADS   Mikosch T. & Stărică C. ( 2000). Limit theory for the sample autocorrelations and extremes of a GARCH$$(1,1)$$ process. Ann. Statist.  28, 1427– 51. Google Scholar CrossRef Search ADS   Mittnik S. & Paolella M. S. ( 2003). Prediction of financial downside-risk with heavy-tailed conditional distributions. In Handbook of Heavy Tailed Distributions in Finance , Rachev S. T. ed. Amsterdam: Elsevier, pp. 385– 404. Nelson D. B. & Cao C. Q. ( 1992). Inequality constraints in the univariate GARCH model. J. Bus. Econ. Statist.  10, 229– 35. Peng L. & Yao Q. ( 2003). Least absolute deviation estimation for ARCH and GARCH models. Biometrika  90, 967– 75. Google Scholar CrossRef Search ADS   Resnick S. I. ( 2007). Heavy-Tail Phenomena: Probabilistic and Statistical Modeling.  New York: Springer. Silverman B. W. ( 1986). Density Estimation for Statistics and Data Analysis . London: Chapman & Hall. Google Scholar CrossRef Search ADS   van der Vaart A. W. ( 1998). Asymptotic Statistics.  New York: Cambridge University Press. Google Scholar CrossRef Search ADS   Wald A. & Wolfowitz J. ( 1943). An exact test for randomness in the non-parametric case based on serial correlation. Ann. Math. Statist.  14, 378– 88. Google Scholar CrossRef Search ADS   Zhu K. & Li W. K. ( 2015). A new Pearson-type QMLE for conditionally heteroskedastic models. J. Bus. Econ. Statist.  33, 552– 65. Google Scholar CrossRef Search ADS   Zhu K. & Ling S. ( 2015). LADE-based inference for ARMA models with unspecified and heavy-tailed heteroscedastic noises. J. Am. Statist. Assoc.  110, 784– 94. Google Scholar CrossRef Search ADS   Zivot E. ( 2009). Practical issues in the analysis of univariate GARCH Models. In Handbook of Financial Time Series , Mikosch T. Kreiß J.-P. Davis R. A. & Andersen T. G. eds. New York: Springer, pp. 113– 55. Google Scholar CrossRef Search ADS   © 2017 Biometrika Trust http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

# A robust goodness-of-fit test for generalized autoregressive conditional heteroscedastic models

, Volume 105 (1) – Mar 1, 2018
18 pages

Loading next page...

/lp/ou_press/a-robust-goodness-of-fit-test-for-generalized-autoregressive-udHVpCIHLd
Publisher
Oxford University Press
Copyright
© 2017 Biometrika Trust
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asx063
Publisher site
See Article on Publisher Site

### Abstract

Summary The estimation of time series models with heavy-tailed innovations has been widely discussed, but corresponding goodness-of-fit tests have attracted less attention, primarily because the autocorrelation function commonly used in constructing goodness-of-fit tests necessarily imposes certain moment conditions on the innovations. As a bounded random variable has finite moments of all orders, we address the problem by first transforming the residuals with a bounded function. More specifically, we consider the sample autocorrelation function of the transformed absolute residuals of a fitted generalized autoregressive conditional heteroscedastic model. With the corresponding residual empirical distribution function naturally employed as the transformation, a robust goodness-of-fit test is then constructed. The asymptotic distributions of the test statistic under the null hypothesis and local alternatives are derived, and Monte Carlo experiments are conducted to examine finite-sample properties. The proposed test is shown to be more powerful than existing tests when the innovations are heavy-tailed. 1. Introduction The heavy-tail phenomenon has attracted considerable attention in time series analysis, and great efforts have been made in model fitting and parameter estimation; see, for example, Davis & Resnick (1986) and Ling (2005). The generalized autoregressive conditional heteroscedastic model (Engle, 1982; Bollerslev, 1986) is well known for its success in capturing time-dependent conditional variances or scales, which are often observed in financial data; see Zivot (2009) and Guo et al. (2017). Although a stationary generalized autoregressive conditional heteroscedastic process with Gaussian innovations can be heavy-tailed (He & Teräsvirta, 1999; Basrak et al., 2002), numerous empirical studies have shown that the residuals $$\{\hat{\varepsilon}_t\}$$ of fitted generalized autoregressive conditional heteroscedastic models of financial returns appear to have high or even nonexistent kurtosis; see, for instance, Mikosch & Stărică (2000), Mittnik & Paolella (2003) and § 6 of this paper. Various robust estimators that allow $$E(\varepsilon_t^4)=\infty$$ yet still achieve $$\surd{n}$$-consistency have been introduced. For example, the least absolute deviations estimator in Peng & Yao (2003) and the Pearsonian quasi maximum likelihood estimator in Zhu & Li (2015) require only a finite fractional moment of $$\varepsilon_t$$, i.e., $$E(|\varepsilon_t|^{2\gamma})<\infty$$ for some $$\gamma>0$$, and the Laplacian quasi maximum likelihood estimator in Berkes & Horváth (2004) requires $$E(\varepsilon_t^2)<\infty$$. In contrast to the many studies of robust parameter estimation, research on corresponding goodness-of-fit tests, despite its importance, is still quite limited, primarily because the autocorrelation function commonly used in constructing the test imposes certain moment conditions on the innovations. As a bounded random variable has finite moments of all orders, we can remove such conditions through a bounded transformation. Although the distribution function of the innovations is a natural transformation, it is unknown in practice, so an alternative is to employ the empirical distribution function of the residuals. For conditional heteroscedastic models, diagnostic tools constructed from the sample autocorrelation functions of squared residuals (Li & Mak, 1994) and absolute residuals (Li & Li, 2005) are particularly popular. However, the former require $$E(\varepsilon_t^4)<\infty$$ (Li, 2004) and the latter $$E(\varepsilon_t^2)<\infty$$ (Li & Li, 2005). Even worse, the convergence rates of these residual sample autocorrelation functions can become extremely slow under generalized autoregressive conditional heteroscedastic alternatives if $$E(\varepsilon_t^4)=\infty$$ (Davis & Mikosch, 1998; Basrak et al., 2002), possibly undermining the power of the corresponding test. To address these problems, in this paper we construct a robust goodness-of-fit test based on the sample autocorrelation function of the transformed absolute residuals, where the transformation is the residual empirical distribution function. This test is shown to be asymptotically equivalent to the test where the transformation is the true distribution function of $$|\varepsilon_t|$$. We also derive the asymptotic power of the test based on transformed absolute residuals with any known function, which includes as special cases those existing methods based on squared and absolute residual autocorrelations (Li & Mak, 1994; Li & Li, 2005). Doing so makes it possible to theoretically compare the commonly used goodness-of-fit tests in the literature. Our asymptotic analysis is crucially reliant on Lemmas A1 and A2 in the Appendix, which provide useful results for weighted residual empirical processes of generalized autoregressive conditional heteroscedastic models and hence are of independent interest. 2. Goodness-of-fit test based on transformed absolute residuals 2.1. Goodness-of-fit test based on residual empirical processes Our null hypothesis is that the observed time series $$\{\,y_1, \ldots, y_n\}$$ is generated by the following model:   $$\label{garch} H_0:\quad y_t=\varepsilon_t h_t^{1/2}, \quad h_t=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i}^2+\sum_{j=1}^q\beta_{0j} h_{t-j},$$ (1) where $$\{\varepsilon_t\}$$ is a sequence of innovations. Denote by $${\theta}=(\omega, \alpha_1, \ldots, \alpha_p, \beta_1, \ldots, \beta_q)^{\mathrm{\scriptscriptstyle T} } \in\Theta$$ the parameter vector of model (1), where the parameter space $${\Theta} \subset \mathbb{R}_+^{p+q+1}$$, with $$\mathbb{R}_+=(0, \infty)$$, is a compact set and the true parameter vector $${\theta}_0=(\omega_0, \alpha_{01}, \ldots, \alpha_{0p}, \beta_{01}, \ldots, \beta_{0q})^{\mathrm{\scriptscriptstyle T} }$$ is an interior point of $$\Theta$$. We call model (1) the garch$$(p,q)$$ model. Assumption 1. Model (1) satisfies the following conditions: (i) the innovations $$\{\varepsilon_t\}$$ are independent and identically distributed with $$\varepsilon^2_t$$ following a nondegenerate distribution and $$E(|\varepsilon_t|^{2\gamma})<\infty$$ for some $$\gamma>0$$; (ii) $$\{\,y_t\}$$ is a strictly stationary and ergodic process; (iii) $$\sum_{j=1}^{q}\beta_j<1$$ for all $$\theta\in\Theta$$; and (iv) the polynomials $$\sum_{j=1}^{p}\alpha_{0j} z^j$$ and $$1-\sum_{j=1}^{q}\beta_{0j} z^j$$ have no common root. A necessary and sufficient condition for Assumption 1(ii) to hold is given in Bougerol & Picard (1992), and Assumption 1(iv) is for the identifiability of model (1) (Berkes et al., 2003; Francq & Zakoïan, 2004). We further restrict the innovations $$\{\varepsilon_t\}$$ of model (1) so that the estimator converges to $$\theta_0$$ as $$n\rightarrow\infty$$; see Francq & Zakoïan (2010, pp. 231–5). For example, we assume $$E(\varepsilon_t)=0$$ and $$\mathrm{var}(\varepsilon_t)=1$$ for the Gaussian quasi maximum likelihood estimator (Hall & Yao, 2003), $$\mathrm{median}(|\varepsilon_t|)=1$$ for the least absolute deviations estimator (Peng & Yao, 2003; Chen & Zhu, 2015), and $$E(\varepsilon_t)=0$$ and $$E(|\varepsilon_t|)=1$$ for the Laplacian quasi maximum likelihood estimator (Berkes & Horváth, 2004). Define the functions   $$\label{meq1} \varepsilon_t({\theta})=y_t\big/h_t^{1/2}({\theta}), \quad h_t({\theta})=\omega+\sum_{i=1}^p\alpha_iy_{t-i}^2+\sum_{j=1}^q\beta_j h_{t-j}({\theta})\text{.}$$ (2) Then $$h_t(\theta_0)=h_t$$ and $$\varepsilon_t(\theta_0)=\varepsilon_t$$. Because the recursive equation in (2) depends on past observations that are infinitely far away, in practice initial values are needed for $$\{\,y_0^2,\ldots, y_{1-p}^2, h_0, \ldots, h_{1-q}\}$$. For simplicity, we set them to zero and denote the corresponding functions by $$\tilde{\varepsilon}_t({\theta})$$ and $$\tilde{h}_t({\theta})$$; fixing these initial values does not affect our asymptotic results. Let $$\hat{{\theta}}_n=(\hat{\omega}, \hat{\alpha}_1, \ldots, \hat{\alpha}_p, \hat{\beta}_1, \ldots, \hat{\beta}_q)^{\mathrm{\scriptscriptstyle T} }$$ be an estimator for model (1). The residuals of the fitted model are $$\hat{\varepsilon}_t = \tilde{\varepsilon}_t(\hat{{\theta}}_n)=y_t/\hat{h}_t^{1/2}$$, where $$\hat{h}_t= \tilde{h}_t(\hat{{\theta}}_n)$$. In the literature, the sample autocorrelation function of absolute or squared residuals is commonly used to check the adequacy of fitted conditional heteroscedastic models, whereas that of the residuals usually has very low power (Li & Li, 2008). Hence, we focus on the absolute residuals $$|\hat{\varepsilon}_t|$$. We first transform them with the residual empirical distribution function,   $$\label{Gnhat} \hat{G}_n(x)=\frac{1}{n}\sum_{t=1}^{n}I(|\hat{\varepsilon}_t| \leqslant x) \quad (0\leqslant x < \infty),$$ (3) and obtain $$\hat{G}_n(|\hat{\varepsilon_t}|)$$. Let $$G(\cdot)$$ be the distribution function of $$|\varepsilon_t|$$, so $$E\{G(|\varepsilon_t|)\}=0{\cdot}5$$. The sample autocorrelation function of $$\{\hat{G}_n(|\hat{\varepsilon_t}|)\}$$ at lag $$k$$ can be defined as $$\hat{\rho}_k = \hat{\gamma}_k / \hat{\gamma}_0$$, where the sample autocovariance function is   $$\label{gammakhat} \hat{\gamma}_k = \frac{1}{n} \sum_{t=k+1}^{n} \bigl\{\hat{G}_n(|\hat{\varepsilon}_t|)-0{\cdot}5\bigr \}\bigl\{\hat{G}_n(|\hat{\varepsilon}_{t-k}|) -0{\cdot}5\bigr\} \quad (k\geqslant 0)\text{.}$$ (4) Note that $$\hat{\gamma}_k$$ would take the same value if the squared residuals $$\hat{\varepsilon}_t^2$$ were used in (3) and (4). Andreou & Werker (2015) considered the $$f$$-rank autocorrelation coefficients (Hallin & Puri, 1994) of the residuals and squared residuals of autoregressive models with generalized autoregressive conditional heteroscedastic errors, which are fitted by the Gaussian quasi maximum likelihood method. The $$f$$-rank autocorrelation coefficients in Andreou & Werker (2015) have a symmetric form only when the reference distribution is Gaussian. The proposed $$\hat{\rho}_k$$ has a symmetric and simple form, which can be interpreted as the Spearman rank correlation coefficient (Wald & Wolfowitz, 1943; Bartels, 1982; Dufour & Roy, 1985; Hallin et al., 1985). Andreou & Werker (2015) used the local asymptotic normality approach (Le Cam & Yang, 1990; van der Vaart, 1998; Andreou & Werker, 2012) to derive the limiting distributions of residual-based statistics. To apply the method of Andreou & Werker (2015), we would have to assume that the residuals are based on the true values of $$\{\,y_0, y_{-1}, \ldots\}$$, which are unobservable in practice. This problem is circumvented by our asymptotic approach. For a predetermined positive integer $$M$$, we first derive the asymptotic null distribution of $$\hat{\rho} =(\hat{\rho}_1, \ldots, \hat{\rho}_M)^{\mathrm{\scriptscriptstyle T} }$$. Let $$\mathcal{F}_t$$ be the $$\sigma$$-field generated by $$\{\varepsilon_t, \varepsilon_{t-1}, \ldots\}$$, and let $$g(\cdot)$$ be the density function of $$|\varepsilon_t|$$. Assumption 2. Under $$H_0$$, the estimator $$\hat{{\theta}}_n$$ admits the representation   \begin{equation*} n^{1/2} (\hat{{\theta}}_n-{\theta}_0 )=n^{-1/2}\sum_{t=1}^{n}\xi_t + o_{\rm p}(1), \end{equation*} where $$\{\xi_{t},\mathcal{F}_t\}$$ is a strictly stationary and ergodic martingale difference sequence with $$\Gamma=\mathrm{var}(\xi_t)<\infty$$. Assumption 3. The density $$g$$ satisfies the following conditions: (i) $$\lim_{x\rightarrow0}\,xg(x)=0$$; (ii) $$\lim_{x\rightarrow\infty}\,xg(x)=0$$; and (iii) $$g$$ is continuous on $$(0, \infty)$$. Let $$\kappa = E\{|\varepsilon_t| g(|\varepsilon_t|)\}$$ and   $\Sigma= {I}_M + 144\{ 0{\cdot}25\kappa^2{D}{\Gamma}{D}^{\mathrm{\scriptscriptstyle T} }+ 0{\cdot}5\kappa ({D}{Q}^{\mathrm{\scriptscriptstyle T} } +{Q}{D}^{\mathrm{\scriptscriptstyle T} })\},$ where $$I_M$$ is the $$M\times M$$ identity matrix, $$D=(d_{1}, \ldots, d_{M})^{\mathrm{\scriptscriptstyle T} }$$ and $$Q=(q_{1}, \ldots, q_{M})^{\mathrm{\scriptscriptstyle T} }$$, with   $d_{k}=E\!\left\{\frac{0{\cdot}5-G(|\varepsilon_{t-k}|)}{h_t} \frac{\partial h_t(\theta_0)}{\partial\theta}\right\}\!,\quad q_k=E\bigl[\{G(|\varepsilon_t|)-0{\cdot}5\} \{G(|\varepsilon_{t-k}|)-0{\cdot}5\}\xi_t\bigr]\text{.}$ Theorem 1. Suppose that $$H_0$$ and Assumptions 1–3 hold. If $${\Sigma}$$ is positive definite, then $$n^{1/2} \hat{{\rho}} \rightarrow N({0}, {\Sigma})$$ in distribution as $$n \rightarrow \infty$$.  Because $$g(x)=f(x)+f(-x)$$ for $$0\leqslant x<\infty$$, where $$f(\cdot)$$ is the density function of $$\varepsilon_t$$, we can estimate $$\kappa$$ by $$\hat{\kappa} = n^{-1}\sum_{t=1}^{n} |\hat{\varepsilon}_{t}|\{\hat{f}_n(|\hat{\varepsilon}_{t}|) + \hat{f}_n(-|\hat{\varepsilon}_{t}|)\}$$, where $$\hat{f}_n(\cdot)$$ is the kernel density estimator of $$f(\cdot)$$. Let $$\xi_t=\xi_t({\theta}_0)$$, i.e., the function $$\xi_t({\theta})$$ evaluated at $${\theta}_0$$. Let $${\tilde{\xi}}_t(\theta)$$ be obtained by replacing $$\{\,y_0^2,\ldots, y_{1-p}^2, h_0, \ldots, h_{1-q}\}$$ with their initial values in $$\xi_t({\theta})$$, and write $${\hat{\xi}}_t = {\tilde{\xi}}_t(\hat{\theta}_n)$$. We can estimate $${\Gamma}$$, $${D}$$ and $${Q}$$ by $${\hat{\Gamma}} = n^{-1}\sum_{t=1}^{n} {\hat{\xi}}_t {\hat{\xi}}_t^{\mathrm{\scriptscriptstyle T} }$$, $${\hat{D}}=(\hat{d}_1, \ldots, \hat{d}_M)^{\mathrm{\scriptscriptstyle T} }$$ and $${\hat{Q}}=({\hat{q}}_1, \ldots,{\hat{q}}_M)^{\mathrm{\scriptscriptstyle T} },$$ respectively, where $$\hat{d}_k = n^{-1} \sum_{t=k+1}^{n} \hat{h}_t^{-1} \{0{\cdot}5-\hat{G}_n(|\hat{\varepsilon}_{t-k}|)\} {\partial \tilde{h}_t(\hat{\theta}_n)}/{\partial\theta}$$ and $$\hat{q}_k=n^{-1} \sum_{t=k+1}^{n} \{\hat{G}_n(|\hat{\varepsilon}_{t}|)-0{\cdot}5 \} \{\hat{G}_n(|\hat{\varepsilon}_{t-k}|)-0{\cdot}5 \}{\hat{\xi}}_t\text{.}$$ Under the conditions of Theorem 1, it can be shown that $$\hat{\kappa}=\kappa+o_{\rm p}(1)$$, $${\hat{\Gamma}}={\Gamma}+o_{\rm p}(1)$$, $${\hat{D}}={D}+o_{\rm p}(1)$$ and $${\hat{Q}}={Q}+o_{\rm p}(1)$$. Thus, a consistent estimator $$\hat{{\Sigma}}$$ of $${\Sigma}$$ can be obtained, leading us to construct the test statistic   $Q(M) = n\hat{{\rho}}^{\mathrm{\scriptscriptstyle T} } \hat{\Sigma}^{-1} \hat{{\rho}},$ which under $$H_0$$ is asymptotically distributed as $$\chi^2_M$$, the chi-squared distribution with $$M$$ degrees of freedom. One could also employ $$n^{1/2}\hat{\rho}_k/\hat{\Sigma}_{kk}^{1/2}$$ to examine the significance of the residual autocorrelation at lag $$k$$ individually, where $$\hat{\Sigma}_{kk}$$ is the $$k$$th diagonal element of $$\hat{\Sigma}$$. 2.2. Goodness-of-fit test based on predetermined transformations We can also consider the transformation with any predetermined function $$\Psi(\cdot)$$. The sample autocorrelation function of $$\{\Psi(|\hat{\varepsilon}_t|)\}$$ at lag $$k$$ can be defined as $$\hat{\rho}_k^\Psi = \hat{\gamma}_k^\Psi/\hat{\gamma}_0^\Psi$$, where   $\hat{\gamma}_k^\Psi = \frac{1}{n} \sum_{t=k+1}^{n} \{\Psi(|\hat{\varepsilon}_t|)-\hat{\mu}_\Psi\}\{ \Psi(|\hat{\varepsilon}_{t-k}|)-\hat{\mu}_\Psi \} \quad (k\geqslant0),$ with $$\hat{\mu}_\Psi=n^{-1}\sum_{t=1}^{n}\Psi(|\hat{\varepsilon}_t|)$$, is the sample autocovariance function. Let $$\hat{\rho}_\Psi = (\hat{\rho}_1^\Psi, \ldots, \hat{\rho}_M^\Psi)^{\mathrm{\scriptscriptstyle T} }$$. Denote the first and second derivatives of $$\Psi$$ by $$\psi$$ and $$\dot{\psi}$$. Let $$\mu_\Psi=E\{\Psi(|\varepsilon_t|)\}$$, $$\sigma_\Psi^2=\mathrm{var}\{\Psi(|\varepsilon_t|)\}$$, $$\kappa_\Psi = E\{|\varepsilon_t| \psi(|\varepsilon_t|)\}$$ and   ${\Sigma}_\Psi= {I}_M + \sigma_\Psi^{-4}\bigl\{ 0{\cdot}25\kappa_\Psi^2D_\Psi{\Gamma}D_\Psi^{\mathrm{\scriptscriptstyle T} }+ 0{\cdot}5\kappa_\Psi (D_\Psi Q_\Psi^{\mathrm{\scriptscriptstyle T} } +Q_\Psi D_\Psi^{\mathrm{\scriptscriptstyle T} })\bigr\},$ where $$D_\Psi=(d_{1}^\Psi, \ldots, d_{M}^\Psi)^{\mathrm{\scriptscriptstyle T} }$$ and $$Q_\Psi=( q_{1}^\Psi, \ldots, q_{M}^\Psi)^{\mathrm{\scriptscriptstyle T} }$$, with   $d_{k}^\Psi=E\!\left\{\frac{\mu_\Psi-\Psi(|\varepsilon_{t-k}|)}{h_t} \frac{\partial h_t(\theta_0)}{\partial\theta}\right\}\!,\quad q_{k}^\Psi=E\big[\{\Psi(|\varepsilon_t|)-\mu_\Psi\}\{\Psi(|\varepsilon_{t-k}|)-\mu_\Psi \}\xi_t\big]\text{.}$ Assumption 4. There exists $$m>0$$ such that the function $$\Psi^*(x)=|\dot{\psi}(x)|x^2+|\psi(x)|x$$ satisfies $$\Psi^*(x)\leqslant Cx^m$$ for $$x>1$$ and $$\Psi^*(x)\leqslant C$$ for $$0\leqslant x\leqslant 1$$, where $$C>0$$ is a constant, and $$E(|\varepsilon_t|^m)<\infty$$ and $$E\{\Psi^2(|\varepsilon_t|)\}<\infty$$. Theorem 2. Suppose that $$H_0$$ and Assumptions 1, 2 and 4 hold. If $${\Sigma}_\Psi$$ is positive definite, then $$n^{1/2} \hat{\rho}_\Psi \rightarrow N({0}, {\Sigma}_\Psi)$$ in distribution as $$n \rightarrow \infty$$.  In a similar way, we can obtain a consistent estimator $$\hat{\Sigma}_\Psi$$ of the asymptotic covariance matrix $${\Sigma}_\Psi$$ using sample averages. Thus a goodness-of-fit test, $$Q_{\Psi}(M)= n\hat{{\rho}}_\Psi^{\mathrm{\scriptscriptstyle T} } \hat{\Sigma}_\Psi^{-1} \hat{{\rho}}_\Psi$$, can be constructed. The first interesting example is $$\Psi(x)=x^c$$ for some $$c>0$$, and Assumption 4 is implied by $$E(|\varepsilon_t|^{2c})<\infty$$. This includes existing tests based on absolute and squared residuals, which correspond to cases with $$c=1$$ and 2, respectively; see Li & Li (2005) and Li (2004). From the proof of Theorem 1, when $$\Psi$$ is bounded, Theorem 2 still holds if, instead of Assumption 4, the derivative $$\psi$$ satisfies the conditions on the density $$g$$ in Assumption 3. For Theorem 4 in § 3, the conditions can be similarly substituted. Motivated by the transformation $$\hat{G}_n$$ in the previous subsection, we can also consider $$\Psi=G$$, although $$G$$ is unknown in practice. Let $$G_n$$ denote the empirical distribution function of $$\{|\varepsilon_t|\}$$, defined as $$G_n(x)=n^{-1}\sum_{t=1}^{n}I(|\varepsilon_t| \leqslant x)$$ for $$0\leqslant x<\infty$$. From the proofs of Theorems 1 and 2, it can readily be verified that $$n^{1/2}\hat{\rho}_k$$, $$n^{1/2}\hat{\rho}^{G_n}_k$$ and $$n^{1/2}\hat{\rho}^{G}_k$$ are asymptotically equivalent. Proposition 1. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. Then $$n^{1/2}(\hat{{\gamma}}_k- \hat{\gamma}_k^G)= o_{\rm p}(1)$$ and $$n^{1/2}(\hat{{\gamma}}_k- \hat{\gamma}_k^{G_n})= o_{\rm p}(1)$$ for any positive integer $$k$$. Moreover, $$\hat{{\gamma}}_0$$, $$\hat{{\gamma}}_0^G$$ and $$\hat{{\gamma}}_0^{G_n}$$ all converge in probability to $$1/12$$ as $$n\rightarrow\infty$$.  To apply joint tests $$Q(M)$$ and $$Q_\Psi(M)$$, we can consider several specific values of order $$M$$ or select $$M$$ as   $$\label{bic} \tilde{M}=\mathop{\rm{arg\,max}}\limits_{d_{\mathrm{min}}\leqslant M\leqslant d_{\mathrm{max}}}\{Q(M)-M\log n\},\quad \tilde{M}_\Psi=\mathop{\rm{arg\,max}}\limits_{d_{\mathrm{min}}\leqslant M\leqslant d_{\mathrm{max}}}\{Q_\Psi(M)-M\log n\},$$ (5) where the integer $$M$$ is searched over a fixed range $$[d_{\mathrm{min}}, d_{\mathrm{max}}]$$ for $$d_{\mathrm{min}}\geqslant 1$$ and some large enough $$d_{\mathrm{max}}$$. As shown in § 5, the performance of the automatic tests is insensitive to the choice of $$d_\mathrm{max}$$. Corollary 1. (i) Under the conditions of Theorem 1, $$Q(\tilde{M})\rightarrow\chi_{d_{\mathrm{min}}}^2$$ in distribution as $$n\rightarrow\infty$$.  (ii) Under the conditions of Theorem 2, $$Q_\Psi(\tilde{M}_\Psi)\rightarrow\chi_{d_{\mathrm{min}}}^2$$ in distribution as $$n\rightarrow\infty$$.  In § 3 we demonstrate that under the local alternatives, $$n^{1/2}\hat{\rho}$$ is asymptotically normal with a possible shift in the mean, $$\varUpsilon=(\varUpsilon_1, \ldots, \varUpsilon_M)^{\mathrm{\scriptscriptstyle T} }$$; see Theorem 3. As a result, $$\lim_{n\rightarrow\infty}{\mathrm{pr}}(\tilde{M}=d_{\mathrm{min}})=1$$, which may be undesirable for particular local alternatives with $$\varUpsilon_1=\cdots=\varUpsilon_{d_{\mathrm{min}}}=0$$ and $$\varUpsilon_K\neq0$$ for some $$d_{\mathrm{min}} < K\leqslant d_{\mathrm{max}}$$, since in such cases $$Q(\tilde{M})$$ would have no power. The test $$Q_\Psi(\tilde{M}_\Psi)$$ would suffer from the same problem, which can be avoided by using a smaller penalty, such as the Akaike information criterion-type penalty $$2M$$, to ensure that the probability of choosing a value of $$M$$ larger than $$d_{\mathrm{min}}$$ is nonzero. However, as shown in § 5, doing so may lead to seriously inflated Type I error rates. In practice, the aforementioned problem can be remedied by choosing a proper $$d_\mathrm{min}$$. Suppose that the sample autocorrelation function $$\hat{\rho}_k$$ falls clearly outside the 95% confidence interval at certain lags. To guarantee that the joint test, $$Q(\tilde{M})$$, takes into account at least one of the lags, we need only choose $$d_{\mathrm{min}}$$ to be the smallest such lag by simply examining the plot of the residual autocorrelations, $$\hat{\rho}_k$$; if such a smallest lag does not exist, then we may set $$d_{\mathrm{min}}=1$$. 3. Asymptotic power under local alternatives To study the power of the proposed test, we consider the following local alternatives. For each $$n$$, the observed time series $$\{\,y_{1,n}, \ldots, y_{n,n}\}$$ is generated by   $$\label{garch_alt} H_{1n}:\quad y_{t,n}=\varepsilon_t h_{t,n}^{1/2}, \quad h_{t,n}=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i,n}^2+\sum_{j=1}^q\beta_{0j} h_{t-j,n}+n^{-1/2}s_{t,n},$$ (6) where the subscript $$n$$ is used to emphasize the dependence of $$y_{t,n}, h_{t,n}$$ and $$s_{t,n}$$ on $$n$$. For simplicity, we consider $$s_{t,n}=s(y_{t-1,n}^2, \ldots, y_{t-p^*,n}^2, h_{t-1,n}, \ldots,h_{t-q^*,n})$$ for some positive integers $$p^*> p$$ and $$q^*> q$$, where the function $$s$$ satisfies the following condition. Assumption 5. The function $$s$$ and all elements of its gradient $$\nabla s$$ are nonnegative everywhere. Assumption 6. There exists a positive integer $$n_0$$ such that for each $$n\geqslant n_0$$, $$\{\,y_{t,n}\}$$ and $$\{h_{t,n}\}$$ are strictly stationary and ergodic processes, and $$E(s_{t,n_0}^{\delta_0})<\infty$$ for some constant $$\delta_0>0$$ independent of $$n$$. The nonnegativity of $$s$$ guarantees that $$h_{t,n}\geqslant 0$$; see Nelson & Cao (1992) for a discussion of the relaxation of the nonnegativity constraints on the parameters of generalized autoregressive conditional heteroscedastic models. The condition $$\nabla s\geqslant 0$$ is used to simplify our technical proofs, and we can similarly derive asymptotic results for other cases of $$\nabla s$$. The finite fractional moment of $$s_{t,n_0}$$ in Assumption 6 ensures the same to hold for $$y_{t,n}$$ and $$h_{t,n}$$, which will be needed in the proofs. Similar to (2), we define the functions   \begin{equation*} \varepsilon_{t,n}({\theta})=y_{t,n}\big/h_{t,n}^{1/2}({\theta}) , \quad h_{t,n}({\theta})=\omega+\sum_{i=1}^p\alpha_iy_{t-i,n}^2+\sum_{j=1}^q\beta_j h_{t-j,n}({\theta})\text{.} \end{equation*} For simplicity, with the initial values set to be independent of $$n$$, we denote the resulting functions by $$\tilde{\varepsilon}_{t,n}({\theta})$$ and $$\tilde{h}_{t,n}({\theta})$$, respectively. Under $$H_{1n}$$, the residuals are calculated as $$\hat{\varepsilon}_{t} = \tilde{\varepsilon}_{t,n}(\hat{{\theta}}_n)=y_{t,n}/\hat{h}_{t,n}^{1/2}$$, where $$\hat{h}_{t,n}= \tilde{h}_{t,n}(\hat{{\theta}}_n)$$. While $$h_t=h_t(\theta_0)$$ in (2), we can show that the departure $$n^{-1/2}s_{t,n}$$ in (6) results in   \begin{equation*}\label{eq_diff} h_{t,n}-h_{t,n}(\theta_0)=n^{-1/2}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k,n}\geqslant 0, \end{equation*} where $$e_1=(1,0,\ldots,0)^{\mathrm{\scriptscriptstyle T} }$$ and   \begin{equation*} B_{0} = \begin{pmatrix} \beta_{01}\; & \;\cdots\; & \;\beta_{0q-1}\; & \;\beta_{0q}\\ & \;I_{q-1}\; &&\; 0\\ \end{pmatrix} \end{equation*} is a $$q\times q$$ matrix. Define the nonnegative $$\mathcal{F}_{t-1}$$-measurable random variables   $r_{t,n}=\frac{n^{1/2}\{h_{t,n}-h_{t,n}(\theta_0)\}}{h_{t,n}(\theta_0)}=\frac{1}{h_{t,n}(\theta_0)}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k,n}\text{.}$ Let $$s_t=s(y_{t-1}^2, \ldots, y_{t-p^*}^2, h_{t-1}, \ldots,h_{t-q^*})$$ and   $$\label{eq_rt} r_t=\frac{1}{h_t}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{t-k}\text{.}$$ (7) Assumption 7. There exist processes $$\{r_{t,n}^{(l)}: t=1\ldots,n\}$$ and $$\{r_{t,n}^{(u)}: t=1\ldots,n\}$$ for each $$n$$ that satisfy the following conditions: (i) all the $$r_{t,n}^{(l)}$$ and $$r_{t,n}^{(u)}$$ are $$\mathcal{F}_{t-1}$$-measurable; (ii) the processes $$\{r_{t,n_0}^{(l)}\}$$ and $$\{r_{t,n_0}^{(u)}\}$$ are strictly stationary and ergodic with $$r_{t,n_0}^{(l)}\leqslant r_{t,n}\leqslant r_{t,n_0}^{(u)}$$ for all $$n\geqslant n_0$$; and (iii) for each fixed $$t$$, $$\:r_{t,n}^{(l)}$$ increases monotonically with $$n$$, while $$r_{t,n}^{(u)}$$ decreases monotonically with $$n$$, i.e., $$r_{t,n}^{(l)}\leqslant r_{t,n+1}^{(l)}\leqslant r_{t,n+1}^{(u)} \leqslant r_{t,n}^{(u)}$$ for all $$n$$, and $$\lim_{n\rightarrow\infty} r_{t,n}^{(l)}=\lim_{n\rightarrow\infty} r_{t,n}^{(u)}=r_t$$ with probability $$1$$. Proposition 2. Consider the case of $$s_{t,n}=a_0+\sum_{i=1}^{p^*}a_iy_{t-i,n}^2+\sum_{j=1}^{q^*}a_{p^*+j}h_{t-j,n}$$, where $$a_0,a_1, \ldots, a_{p^*+q^*}$$ are nonnegative constants. Under Assumptions 1 and 6, if $$q>0$$, then the conditions in Assumption 7 hold and $$E\{(r_{t,n_0}^{(u)})^{m}\}<\infty$$ for any $$m>0$$.  For other forms of $$s_{t,n}$$, Assumption 7 can also be readily verified, although additional moment restrictions on $$y_{t,n}$$ may be required. Assumption 2’. Under $$H_{1n}$$, the estimator $$\hat{{\theta}}_n$$ admits the representation   \begin{equation*} n^{1/2}(\hat{{\theta}}_n-{\theta}_0)=n^{-1/2}\sum_{t=1}^{n}\xi_{t,n} + \varDelta+ o_{\rm p}(1), \end{equation*} where $$\{\xi_{t,n},\mathcal{F}_t: t=1\ldots,n\}$$ is a strictly stationary and ergodic martingale difference sequence for each sufficiently large $$n$$, $$\:\lim_{n\rightarrow\infty}\mathrm{var}(\xi_{t,n}) =\Gamma$$, and $$\varDelta\in\mathbb{R}^{p+q+1}$$ is a constant vector. It is possible to derive the explicit form of the shift $$\varDelta$$ under additional regularity conditions for the estimator $$\hat{\theta}_n$$ and those of the underlying model in (6). Specifically, assuming that model (6) is locally asymptotically normal (van der Vaart, 1998), by Le Cam’s third lemma we have $$\varDelta=\lim_{n\rightarrow\infty}\mathrm{cov}\{n^{-1/2}\sum_{t=1}^n\xi_t, \Delta^{(n)}(\theta_0)\}$$, where $$\Delta^{(n)}(\theta_0) = -0{\cdot}5n^{-1/2}\sum_{t=1}^{n} \{1+\varepsilon_t{f^\prime(\varepsilon_t)}/{f(\varepsilon_t)}\}h_t^{-1}{\partial h_t(\theta_0)}/{\partial \theta}$$ is the central sequence of the garch($$p, q$$) model (Drost & Klaassen, 1997). Let $$V=(v_1, \ldots, v_M)^{\mathrm{\scriptscriptstyle T} }$$ with $$v_k=E[\{0{\cdot}5-G(|\varepsilon_{t-k}|)\}r_t]$$, and let $$V_\Psi=(v_1^\Psi, \ldots, v_M^\Psi)^{\mathrm{\scriptscriptstyle T} }$$ with $$v_k^\Psi=E[\{\mu_\Psi-\Psi(|\varepsilon_{t-k}|)\}r_t]$$. Theorem 3. Suppose that $$H_{1n}$$ and Assumptions 1, 2’, 3 and 5–7 hold with $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$. If $${\Sigma}$$ is positive definite, then $$n^{1/2} \hat{\rho} \rightarrow N(\varUpsilon, \Sigma)$$ in distribution as $$n \rightarrow \infty$$, where $$\varUpsilon=6\kappa(D\varDelta-V)$$, with $$\kappa$$, $$D$$ and $$\Sigma$$ defined as in Theorem 1. Theorem 4. Suppose that $$H_{1n}$$ and Assumptions 1, 2’ and 4–7 hold with $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$. If $${\Sigma}_\Psi$$ is positive definite, then $$n^{1/2} \hat{\rho}_\Psi \rightarrow N(\varUpsilon_\Psi, {\Sigma}_\Psi)$$ in distribution as $$n \rightarrow \infty$$, where $$\varUpsilon_\Psi=0{\cdot}5\kappa_\Psi (D_\Psi\varDelta-V_\Psi)/\sigma_\Psi^{2}$$, with $$\kappa_\Psi$$, $$D_\Psi$$, $$\sigma_\Psi$$ and $$\Sigma_\Psi$$ defined as in Theorem 2. We can show that under $$H_{1n}$$, the consistency of the estimators $$\hat{\Sigma}$$ and $$\hat{\Sigma}_\Psi$$ in the previous section still holds, and hence $$Q(M)$$ and $$Q_{\Psi}(M)$$ converge to the noncentral $$\chi^2_M$$ distribution with noncentrality parameter $$c_\Psi=\varUpsilon_\Psi^{\mathrm{\scriptscriptstyle T} } \Sigma^{-1}_\Psi\varUpsilon_\Psi$$ as $$n\rightarrow\infty$$, where $$\Psi=G$$ for $$Q(M)$$. In other words, the local power is determined by the value of $$c_\Psi$$. 4. Two applications In this section we apply the asymptotic results from § § 2 and 3 to generalized autoregressive conditional heteroscedastic models fitted by the Laplacian quasi maximum likelihood method (Berkes & Horváth, 2004) and the least absolute deviations method (Peng & Yao, 2003). We first derive the asymptotic distributions of these two estimators under $$H_{1n}$$. Let us write   $J=E\!\left\{\frac{1}{h_t^2} \frac{\partial h_t(\theta_0)}{\partial\theta} \frac{\partial h_t(\theta_0)}{\partial\theta^{\mathrm{\scriptscriptstyle T} }}\right\}\!,\quad \lambda=E\!\left \{\frac{r_t}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta}\right \}\!,$ where $$r_t$$ is defined as in (7). For model (1), the Laplacian quasi maximum likelihood estimator (Berkes & Horváth, 2004) is defined as $$\hat{\theta}_n^{\mathrm{LQML}}={\rm{arg\,min}}_{\theta\in\Theta}n^{-1}\sum_{t=1}^{n} \{\log\tilde{h}_{t}^{1/2}(\theta)+{|y_{t}|}/{\tilde{h}_{t}^{1/2}(\theta)} \}$$, where the identifiability conditions are $$E(\varepsilon_t)=0$$ and $$E(|\varepsilon_t|)=1$$. Under $$H_0$$ and Assumption 1, if $$E(\varepsilon_t^2)<\infty$$, then we can show that   $n^{1/2}(\hat{\theta}^{\mathrm{LQML}}_n-{\theta}_0 )=\frac{2J^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{|\varepsilon_t|-1}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta} +o_{\rm p}(1),$ which converges in distribution to $$N[0, \,4 \{E(\varepsilon_t^2)-1\}J^{-1}]$$ as $$n\rightarrow\infty$$. Theorem 5. Suppose that $$H_{1n}$$ and Assumptions 1 and 5–7 hold. If $$E(r_{t,n_0}^{(u)})<\infty$$, then $$\hat{\theta}_n^{\mathrm{LQML}}\rightarrow \theta_0$$ almost surely as $$n\rightarrow\infty$$. Moreover, if $$E(\varepsilon_t^2)<\infty$$ and $$E\{(r_{t,n_0}^{(u)})^{2+\delta_1}\}<\infty$$ for some $$\delta_1>0$$, then  $n^{1/2} (\hat{\theta}_n^{\mathrm{LQML}}-{\theta}_0)=\frac{2J^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{|\varepsilon_t|-1}{h_{t,n}(\theta_0)}\frac{\partial h_{t,n}(\theta_0)}{\partial\theta} +J^{-1}\lambda+o_{\rm p}(1),$ which converges in distribution to $$N[J^{-1}\lambda, \, 4 \{E(\varepsilon_t^2)-1\}J^{-1}]$$ as $$n\rightarrow\infty$$.  For model (1), the least absolute deviations estimator in Peng & Yao (2003) is defined as $$\hat{\theta}_n^{\mathrm{LAD}}={\rm{arg\,min}}_{\theta\in\Theta}n^{-1}\sum_{t=1}^{n} |\log y_{t}^2-\log \tilde{h}_{t}({\theta}) |$$, where the identifiability condition is $$\mathrm{median}(|\varepsilon_t|)=1$$. Under $$H_0$$ and Assumption 1, if $$g(1)>0$$, then it can be shown that   $n^{1/2} (\hat{\theta}^{\mathrm{LAD}}_n-{\theta}_0 )=\frac{\{g(1)J\}^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{{\mathrm{sgn}}(|\varepsilon_t|-1)}{h_t}\frac{\partial h_t(\theta_0)}{\partial\theta} +o_{\rm p}(1),$ which converges in distribution to $$N[0, \,\{g(1)\}^{-2}J^{-1}]$$ as $$n\rightarrow\infty$$, where $${\mathrm{sgn}}(x)=I(x>0)- I(x<0)$$ is the sign function; see Chen & Zhu (2015). Theorem 6. If $$H_{1n}$$ and Assumptions 1 and 5–7 hold, then $$\hat{\theta}_n^{\mathrm{LAD}}\rightarrow \theta_0$$ almost surely as $$n\rightarrow\infty$$. Moreover, if $$g(1)>0$$ and $$E\{(r_{t,n_0}^{(u)})^{4+\delta_1}\}<\infty$$ for some $$\delta_1>0$$, then  $n^{1/2} (\hat{\theta}_n^{\mathrm{LAD}}-{\theta}_0 )=\frac{\{g(1)J\}^{-1}}{n^{1/2}}\sum_{t=1}^{n} \frac{{\mathrm{sgn}}(|\varepsilon_t|-1)}{h_{t,n}(\theta_0)}\frac{\partial h_{t,n}(\theta_0)}{\partial\theta} +J^{-1}\lambda+o_{\rm p}(1),$ which converges in distribution to $$N[J^{-1}\lambda, \,\{g(1)\}^{-2}J^{-1}]$$ as $$n\rightarrow\infty$$.  Given Theorems 5 and 6, the estimators $$\hat{\theta}_n^{\mathrm{LQML}}$$ and $$\hat{\theta}_n^{\mathrm{LAD}}$$ both satisfy Assumptions 2 and 2’ with $$\varDelta=J^{-1}\lambda$$, and we can then obtain the asymptotic distributions of $$n^{1/2}\hat{\rho}$$ and $$n^{1/2}\hat{\rho}_\Psi$$ under both $$H_0$$ and $$H_{1n}$$. Moreover, Theorems 1–4 ensure that the proposed statistic, $$n^{1/2}\hat{\rho}$$, has the same asymptotic distributions as $$n^{1/2}\hat{\rho}_\Psi$$ with $$\Psi=G$$ under both $$H_0$$ and $$H_{1n}$$. Therefore, we focus on $$n^{1/2}\hat{\rho}_\Psi$$ in the following discussion. By Theorems 1–4, under both $$H_0$$ and $$H_{1n}$$, the asymptotic covariance matrix of $$n^{1/2}\hat{\rho}_\Psi$$ is   $\Sigma_\Psi= {I}_M + \sigma_\Psi^{-4}\big(\kappa_\Psi^2\{E(\varepsilon_t^2)-1\}+2\kappa_\Psi E\big[\{\mu_\Psi-\Psi(|\varepsilon_t|)\}(|\varepsilon_t|-1)\big]\big) D_\Psi J^{-1}D_\Psi^{\mathrm{\scriptscriptstyle T} }$ for the Laplacian quasi maximum likelihood estimator $$\hat{\theta}_n^{\mathrm{LQML}}$$, and is   $\Sigma_\Psi= {I}_M + \sigma_\Psi^{-4}\left\{\frac{\kappa_\Psi^2}{4g^2(1)} +\frac{\kappa_\Psi}{g(1)} E\big[\{\mu_\Psi-\Psi(|\varepsilon_t|)\}\mathrm{sgn}(|\varepsilon_t|-1)\big] \right\} D_\Psi J^{-1}D_\Psi^{\mathrm{\scriptscriptstyle T} }$ for the least absolute deviations estimator $$\hat{\theta}_n^{\mathrm{LAD}}$$. When $$\Psi=G$$, $$\:\sigma_\Psi^2=1/12$$. Moreover, under $$H_{1n}$$, the asymptotic distributions of $$n^{1/2} \hat{\rho}_\Psi$$ for both estimators are shifted by   $\varUpsilon_\Psi=0{\cdot}5\kappa_\Psi (D_\Psi J^{-1}\lambda-V_\Psi)/\sigma_\Psi^{2}\text{.}$ We now consider when $$\varUpsilon_\Psi$$ is nonzero. Let $$b_1={\rm{arg\,min}}_{b\in\mathbb{R}^{p+q+1}}E\{(r_t- X_t^{\mathrm{\scriptscriptstyle T} } b)^2\}$$ and $$b_2={\rm{arg\,min}}_{b\in\mathbb{R}^{p+q+1}}E[\{\Psi(|\varepsilon_{t-k}|)- X_t^{\mathrm{\scriptscriptstyle T} } b\}^2]$$, where $$X_t=h_t^{-1}{\partial h_t(\theta_0)}/{\partial\theta}$$. Define the partial covariance (Fan & Yao, 2003)   $$\label{eq_pcov} \mathrm{pcov}\{r_t,\Psi(|\varepsilon_{t-k}|)\mid X_t\}=E[(r_t-X_t^{\mathrm{\scriptscriptstyle T} } b_1)\{\Psi(|\varepsilon_{t-k}|)-X_t^{\mathrm{\scriptscriptstyle T} } b_2\}]\text{.}$$ (8) Because $$b_1=J^{-1}\lambda$$, the $$k$$th element of the term $$D_\Psi J^{-1}\lambda-V_\Psi$$, i.e., $$d_k^{\Psi {\mathrm{\scriptscriptstyle T} }} J^{-1}\lambda-v_k^\Psi$$, can be written as $$-\mathrm{pcov}\{r_t,\Psi(|\varepsilon_{t-k}|)\mid X_t\}$$. Moreover, as $$\kappa_\Psi>0$$, the $$k$$th element of $$\varUpsilon_\Psi$$ is zero if and only if the partial covariance in (8) is zero. Consider the example in Proposition 2, where we have $$s_t=s_{1,t}+s_{2,t}$$ with $$s_{1,t}=a_0+\sum_{i=1}^{p}a_iy_{t-i}^2+\sum_{j=1}^{q}a_{p^*+j}h_{t-j}$$ and $$s_{2,t}=\sum_{i=p+1}^{p^*}a_iy_{t-i}^2+\sum_{j=q+1}^{q^*}a_{p^*+j}h_{t-j}$$. Then $$r_t=X_t^{\mathrm{\scriptscriptstyle T} } a+h_{t}^{-1}\sum_{k=0}^{\infty}e_1^{\mathrm{\scriptscriptstyle T} } B_{0}^ke_1s_{2,t-k}$$, where $$a=(a_0,a_1,\ldots,a_p,a_{p^*+1},\ldots,a_{p^*+q})^{\mathrm{\scriptscriptstyle T} }$$. As a result, when $$s_{2,t}= 0$$, i.e., when the model is correctly specified, the partial covariance in (8) is zero for all $$k>0$$, and the test $$Q_\Psi(M)$$ has no power. If the model is misspecified, i.e., when $$s_{2,t}\neq0$$, then by a method similar to the proof of identifiability for generalized autoregressive conditional heteroscedastic models (Francq & Zakoïan, 2004) we can show that $$r_t-X_t^{\mathrm{\scriptscriptstyle T} } b_1\neq 0$$ with probability 1, provided that Assumption 1 holds. Thus, (8) becomes nonzero at some $$k$$ values, resulting in nontrivial power for the test. In general, the local power of $$Q_\Psi(M)$$ is determined by the noncentrality parameter $$c_\Psi=\varUpsilon_\Psi^{\mathrm{\scriptscriptstyle T} } \Sigma^{-1}_\Psi\varUpsilon_\Psi$$, which depends on the departure $$s_{t,n}$$, the underlying model, the estimator $$\hat{\theta}_n$$ and the function $$\Psi$$. It is difficult to make a direct comparison of the values of $$c_\Psi$$ across different functions $$\Psi$$. We next calculate $$c_\Psi$$ for specific scenarios. Table 1 presents the values of $$c_\Psi$$ under local alternatives of the garch$$(1,1)$$ model with $$(\omega_0, \alpha_0, \beta_0)=(1, 0{\cdot}3, 0{\cdot}2)$$ and three types of departure, namely $$s_{t,n}=G(|y_{t-2,n}|)$$, $$|y_{t-2,n}|$$ and $$y_{t-2,n}^2$$, for $$\{\varepsilon_t\}$$ following the zero-mean normal distribution and Student’s $$t_7$$, $$t_{5}$$, $$t_3$$, $$t_{2{\cdot}5}$$ and $$t_1$$ distributions, standardized such that $$\mathrm{median}(|\varepsilon_t|)=1$$. We assume that the model is estimated by the least absolute deviations method, and we approximate the quantities in $$\varUpsilon_\Psi$$ and $$\Sigma_\Psi$$ by sample averages based on a generated sequence $$\{\,y_1,\ldots,y_n\}$$ with $$n=100\,000$$. We set $$M=6$$ and compare the three transformations $$\Psi(x)=G(x)$$, $$x$$ and $$x^2$$. Some values are left blank in Table 1 because of violations of the moment conditions on $$\varepsilon_t$$. It can be seen that $$\Psi=G$$ dominates all of the transformations when $$E(\varepsilon_t^4)=\infty$$, and even for moderate-tailed or Gaussian innovations when the departure is $$s_{t,n}=G(|y_{t-2,n}|)$$ or $$|y_{t-2,n}|$$. The desirable performance of $$\Psi=G$$ is also observed in other situations; see the Supplementary Material. Moreover, consistent with these results, our first simulation experiment in § 5 demonstrates that the proposed test, $$Q(M)$$, performs favourably compared with existing tests. Table 1. Noncentrality parameter $$c_{\Psi}$$$$(\times\, 10^2)$$ under different local alternatives of the garch$$(1,1)$$ model with $$(\omega_0, \alpha_0, \beta_0)=(1, 0{\cdot}3, 0{\cdot}2)$$, for $$\Psi(x)=G(x), x$$ and $$x^2$$     $$s_{t, n}=G(|y_{t-2, n}|)$$  $$s_{t, n}=|y_{t-2, n}|$$  $$s_{t, n}=y_{t-2, n}^2$$     $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$t_1$$  3E-05        2E-03        99$${\cdot}$$52        $$t_{2{\cdot}5}$$  0$${\cdot}$$05  3E-03     1$${\cdot}$$17  0$${\cdot}$$13     31$${\cdot}$$38  8$${\cdot}$$45     $$t_3$$  0$${\cdot}$$07  0$${\cdot}$$01     1$${\cdot}$$32  0$${\cdot}$$27     26$${\cdot}$$10  12$${\cdot}$$15     $$t_5$$  0$${\cdot}$$10  0$${\cdot}$$03  3E-03  1$${\cdot}$$42  0$${\cdot}$$72  0$${\cdot}$$11  17$${\cdot}$$07  16$${\cdot}$$86  3$${\cdot}$$98  $$t_7$$  0$${\cdot}$$11  0$${\cdot}$$05  0$${\cdot}$$01  1$${\cdot}$$42  0$${\cdot}$$93  0$${\cdot}$$26  14$${\cdot}$$35  16$${\cdot}$$96  7$${\cdot}$$91  Normal  0$${\cdot}$$15  0$${\cdot}$$10  0$${\cdot}$$04  1$${\cdot}$$36  1$${\cdot}$$25  0$${\cdot}$$74  9$${\cdot}$$32  13$${\cdot}$$62  12$${\cdot}$$80     $$s_{t, n}=G(|y_{t-2, n}|)$$  $$s_{t, n}=|y_{t-2, n}|$$  $$s_{t, n}=y_{t-2, n}^2$$     $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$G$$  $$x$$  $$x^2$$  $$t_1$$  3E-05        2E-03        99$${\cdot}$$52        $$t_{2{\cdot}5}$$  0$${\cdot}$$05  3E-03     1$${\cdot}$$17  0$${\cdot}$$13     31$${\cdot}$$38  8$${\cdot}$$45     $$t_3$$  0$${\cdot}$$07  0$${\cdot}$$01     1$${\cdot}$$32  0$${\cdot}$$27     26$${\cdot}$$10  12$${\cdot}$$15     $$t_5$$  0$${\cdot}$$10  0$${\cdot}$$03  3E-03  1$${\cdot}$$42  0$${\cdot}$$72  0$${\cdot}$$11  17$${\cdot}$$07  16$${\cdot}$$86  3$${\cdot}$$98  $$t_7$$  0$${\cdot}$$11  0$${\cdot}$$05  0$${\cdot}$$01  1$${\cdot}$$42  0$${\cdot}$$93  0$${\cdot}$$26  14$${\cdot}$$35  16$${\cdot}$$96  7$${\cdot}$$91  Normal  0$${\cdot}$$15  0$${\cdot}$$10  0$${\cdot}$$04  1$${\cdot}$$36  1$${\cdot}$$25  0$${\cdot}$$74  9$${\cdot}$$32  13$${\cdot}$$62  12$${\cdot}$$80  Small values are written in standard form, e.g., 3E-05 means $$3\times 10^{-5}$$. 5. Simulation experiments This section presents the results of three simulation experiments carried out to assess the empirical power of the proposed test $$Q(M)$$, evaluate the performance of the automatic method of selecting $$M$$, and verify the asymptotic equivalence in Proposition 1. The least absolute deviations estimator (Peng & Yao, 2003) is employed throughout. In the first experiment, we compare the power of the proposed test, $$Q(M)$$, with that of three existing goodness-of-fit tests: the sign-based test of Chen & Zhu (2015), $$Q_{{\mathrm{sgn}}}(M)$$; the test based on absolute residuals in Li & Li (2005), $$Q_{\mathrm{abs}}(M)$$; and the test based on squared residuals in Li (2004), $$Q_{\mathrm{sqr}}(M)$$. For comparison, $$M$$ is fixed at 6. We generate 1000 replications from   $$\label{garch11_alt} y_{t,n}=\varepsilon_t h_{t,n}^{1/2}, \hspace{5mm} h_{t,n}=0{\cdot}01 +0{\cdot}03y_{t-1,n}^2 +0{\cdot}2 h_{t-1,n}+n^{-1/2}s_{t,n},$$ (9) where $$\{\varepsilon_t\}$$ are independent and identically distributed, following the normal distribution with mean zero or Student’s $$t_7$$, $$t_{5}$$, $$t_3$$, $$t_{2{\cdot}5}$$ or $$t_1$$ distribution, and are standardized such that median$$(|\varepsilon_t|)=1$$. We consider departures $$s_{t,n}=2y_{t-2,n}^2$$, $$2|y_{t-2,n}|$$ and $$2G(|y_{t-2,n}|)$$, and the sample size is $$n=1000$$. The density function of $$\varepsilon_t$$ is estimated by the kernel density method with the Gaussian kernel and its rule-of-thumb bandwidth, $$h=0{\cdot}9n^{-1/5}\min(\hat{\sigma},\hat{R}/1{\cdot}34)$$, where $$\hat{\sigma}$$ and $$\hat{R}$$ are the sample standard deviation and interquartile of the residuals $$\{\hat{\varepsilon}_t\}$$, respectively; see Silverman (1986). Figure 1 displays the power of the four tests. When the tails of $$\varepsilon_t$$ become heavier, the power of $$Q_{\mathrm{abs}}(M)$$ and of $$Q_{\mathrm{sqr}}(M)$$ drops dramatically. Although both $$Q(M)$$ and $$Q_{{\mathrm{sgn}}}(M)$$ maintain their power, $$Q(M)$$ is clearly more powerful, suggesting that the degree of information loss from its transformation of absolute residuals is relatively small. Finally, although $$Q_{\mathrm{abs}}(M)$$ performs well when $$s_{t,n}=2y_{t-2,n}^2$$ and $$\varepsilon_t$$ is lighter-tailed, the proposed $$Q(M)$$ is almost always the most powerful test for the other two types of departure, even when $$\varepsilon_t$$ is moderate-tailed. Fig. 1. View largeDownload slide Power (%) of four goodness-of-fit tests, $$Q(6)$$ (circles), $$Q_{\mathrm{sgn}}(6)$$ (triangles), $$Q_{\mathrm{abs}}(6)$$ (squares) and $$Q_{\mathrm{sqr}}(6)$$ (pluses), for six different innovation distributions and three different departures: (a) $$s_{t,n}=2 y_{t-2,n}^2$$; (b) $$s_{t,n}=2|y_{t-2,n}|$$; (c) $$s_{t,n}=2 G(|y_{t-2,n}|)$$. Fig. 1. View largeDownload slide Power (%) of four goodness-of-fit tests, $$Q(6)$$ (circles), $$Q_{\mathrm{sgn}}(6)$$ (triangles), $$Q_{\mathrm{abs}}(6)$$ (squares) and $$Q_{\mathrm{sqr}}(6)$$ (pluses), for six different innovation distributions and three different departures: (a) $$s_{t,n}=2 y_{t-2,n}^2$$; (b) $$s_{t,n}=2|y_{t-2,n}|$$; (c) $$s_{t,n}=2 G(|y_{t-2,n}|)$$. The second experiment evaluates the performance of the proposed order selection method. We compare three different methods: the Bayesian information criterion-type method in (5), where the penalty term is $$M\log n$$; the Akaike information criterion-type method, for which the penalty term in (5) is replaced by $$2M$$; and the mixed method, for which the penalty term in (5) is replaced by $$2M$$ if and only if $$n^{1/2}\max(|\hat{\rho}_1|, \ldots, |\hat{\rho}_{d_{\mathrm{max}}}|)> (\log n)^{1/2}$$. We set $$d_{\mathrm{min}}=1$$ and $$d_{\mathrm{max}}=5, 25$$ or $$50$$. The data are generated from (9) with $$s_{t,n}=cy_{t-2,n}^2$$, where $$c=0$$ corresponds to the size and $$c=1,\ldots, 5$$ to the power. The innovations $$\{\varepsilon_t\}$$ are Student $$t_3$$-distributed; the findings under the other innovation distributions from the previous experiment are similar. All other settings are preserved from the first experiment. Figure 2 shows that the rejection rates are insensitive to the value of $$d_{\mathrm{max}}$$, but vary for different selection methods. The size of the Bayesian information criterion-based automatic test is close to the nominal rate. Although the power of that test is slightly smaller than the power of the Akaike information criterion-based test, the latter is severely oversized. The behaviour of the mixed method falls between that of the other two methods. In addition, when comparing the performance of the Bayesian information criterion-based automatic test $$Q(\tilde{M})$$ for $$c=2$$ in Fig. 2 with Fig. 1(a) for $$Q(6)$$ and Student $$t_3$$-distributed innovations, we can see that the automatic test has power comparable to that with a fixed $$M$$. Based on these findings, we recommend using the Bayesian information criterion-based method for automatic selection of $$M$$. Fig. 2. View largeDownload slide Rejection rates (%) of the automatic test, $$Q(\tilde{M})$$, for (a) $$d_{\mathrm{max}}=5$$, (b) $$d_{\mathrm{max}}=25$$, and (c) $$d_{\mathrm{max}}=50$$, with three selection rules: Bayesian information criterion-type rule (squares), Akaike information criterion-type rule (triangles), and mixed method (circles); horizontal dashed lines indicate the 5% nominal level. Fig. 2. View largeDownload slide Rejection rates (%) of the automatic test, $$Q(\tilde{M})$$, for (a) $$d_{\mathrm{max}}=5$$, (b) $$d_{\mathrm{max}}=25$$, and (c) $$d_{\mathrm{max}}=50$$, with three selection rules: Bayesian information criterion-type rule (squares), Akaike information criterion-type rule (triangles), and mixed method (circles); horizontal dashed lines indicate the 5% nominal level. The third experiment is conducted to verify the asymptotic equivalence of the test $$Q_\Psi(M)$$ based on the transformations $$\hat{G}_n$$, $$G_n$$ and $$G$$. We generate 1000 replications from   \begin{equation*} y_t=\varepsilon_t h_t^{1/2}, \quad h_t=0{\cdot}01+0{\cdot}2y_{t-1}^2 +0{\cdot}2 h_{t-1}, \end{equation*} where $$\{\varepsilon_t\}$$ follow the normal distribution with mean zero and median$$(|\varepsilon_t|)=1$$, and the sample sizes are $$n=200,$$ 2000 and 20 000. Figure 3 displays the histograms of $$n^{1/2}(\hat{\gamma}_k- \hat{\gamma}_k^G)$$, $$n^{1/2}(\hat{\gamma}_k^{G_n}- \hat{\gamma}_k^G)$$ and $$n^{1/2}\hat{\gamma}_k^G$$ with $$k=1$$. It can be seen that as $$n$$ increases, the distributions of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ and $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ both shrink towards zero, while that of $$n^{1/2}\hat{\gamma}_1^G$$ maintains the same shape, thereby confirming the asymptotic results in Proposition 1. Fig. 3. View largeDownload slide Histograms of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ (top row), $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ (middle row) and $$n^{1/2}\hat{\gamma}_1^G$$ (bottom row) under $$H_0$$ for sample sizes $$n=200$$ (left column), 2000 (middle column) and 20 000 (right column). Fig. 3. View largeDownload slide Histograms of $$n^{1/2}(\hat{\gamma}_1- \hat{\gamma}_1^G)$$ (top row), $$n^{1/2}(\hat{\gamma}_1^{G_n}- \hat{\gamma}_1^G)$$ (middle row) and $$n^{1/2}\hat{\gamma}_1^G$$ (bottom row) under $$H_0$$ for sample sizes $$n=200$$ (left column), 2000 (middle column) and 20 000 (right column). The results of three further simulation studies are reported in the Supplementary Material, wherein we verify the asymptotic distributions of $$Q(M)$$ under $$H_0$$ and $$H_{1n}$$ and apply the proposed order selection method to all test statistics in the first experiment. In particular, we show that the null distribution of $$Q(M)$$ is well approximated by the $$\chi_M^2$$ distribution even in small samples and that when $$n$$ is large, $$Q(M)$$ converges to a noncentral $$\chi_M^2$$ distribution under $$H_{1n}$$, although the convergence rate seems slower for heavier-tailed innovation distributions. Finally, the proposed order selection method performs well when applied to other test statistics. 6. An empirical example In this section we analyse the daily log returns, in percentage form, of the exchange rate of the Chinese yuan to the United States dollar from 23 January 2009 to 9 October 2015. The sample size is $$n=1520$$. Figure 4 shows clear volatility clustering. The sample autocorrelation function lies inside or near the bounds of $$\pm1{\cdot}96/n^{1/2}$$ at the first 30 lags, so a pure generalized autoregressive conditional heteroscedastic model is suggested. Fig. 4. View largeDownload slide Daily log returns (%) of yuan-to-dollar exchange rates from 23 January 2009 to 9 October 2015. Fig. 4. View largeDownload slide Daily log returns (%) of yuan-to-dollar exchange rates from 23 January 2009 to 9 October 2015. We fit four models using the least absolute deviations method: the garch$$(1,1)$$ model and the autoregressive conditional heteroscedastic models of orders $$p=6$$, $$7$$ and $$8$$, defined as $$y_t=\varepsilon_t h_t^{1/2}$$, $$h_t=\omega_0+\sum_{i=1}^p\alpha_{0i}y_{t-i}^2$$ and denoted by arch$$(p)$$. The estimated coefficients and associated standard errors are listed in Table 2. Before conducting goodness-of-fit tests, we first plot the sample autocorrelation functions of the absolute residuals transformed by $$\Psi(x)=\hat{G}_n(x)$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$, along with their corresponding 95% confidence bands. Figure 5 shows that the residual autocorrelation function $$\hat{\rho}_k$$ falls noticeably outside the confidence band at lag $$k=6$$ for all fitted arch$$(p)$$ models, yet falls inside the band at all lags for the fitted garch$$(1,1)$$ model. In contrast, $$\hat{\rho}_k^{{\mathrm{sgn}}}$$, $$\hat{\rho}_k^{\mathrm{abs}}$$ and $$\hat{\rho}_k^{\mathrm{sqr}}$$ all either lie inside the confidence bands or stand out only slightly. The last two sample autocorrelation functions in particular are very small at almost all lags. Fig. 5. View largeDownload slide Sample autocorrelation functions of absolute residuals transformed by $$\Psi=\hat{G}_n$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$ (from top to bottom) for four fitted models, with corresponding 95% confidence bands. Fig. 5. View largeDownload slide Sample autocorrelation functions of absolute residuals transformed by $$\Psi=\hat{G}_n$$, $${\mathrm{sgn}}(x-1)$$, $$x$$ and $$x^2$$ (from top to bottom) for four fitted models, with corresponding 95% confidence bands. Table 2. Estimation results $$(\times\, 10^2)$$, with standard errors, for all fitted models in the exchange rate example     arch$$(6)$$  arch$$(7)$$  arch$$(8)$$  garch     Estimate  SE  Estimate  SE  Estimate  SE  Estimate  SE  $$\omega$$   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03  2E-03  6E-04  $$\alpha_1$$  19$${\cdot}$$13  3$${\cdot}$$11  18$${\cdot}$$68  3$${\cdot}$$09  17$${\cdot}$$25   2$${\cdot}$$99  11$${\cdot}$$50   1$${\cdot}$$70  $$\alpha_2$$   9$${\cdot}$$30  2$${\cdot}$$24   9$${\cdot}$$19  2$${\cdot}$$22   8$${\cdot}$$65   2$${\cdot}$$20        $$\alpha_3$$   5$${\cdot}$$78  1$${\cdot}$$86   4$${\cdot}$$94  1$${\cdot}$$73   5$${\cdot}$$38   1$${\cdot}$$81        $$\alpha_4$$   3$${\cdot}$$59  1$${\cdot}$$50   2$${\cdot}$$60  1$${\cdot}$$37   2$${\cdot}$$56   1$${\cdot}$$37        $$\alpha_5$$   0$${\cdot}$$04  0$${\cdot}$$68  4E-06  0$${\cdot}$$76  7E-05   0$${\cdot}$$79        $$\alpha_6$$   5$${\cdot}$$02  1$${\cdot}$$39   5$${\cdot}$$03  1$${\cdot}$$44   4$${\cdot}$$31   1$${\cdot}$$40        $$\alpha_7$$         1$${\cdot}$$10  0$${\cdot}$$72   0$${\cdot}$$70   0$${\cdot}$$71        $$\alpha_8$$               1$${\cdot}$$59   0$${\cdot}$$83        $$\beta_1$$                    69$${\cdot}$$34   3$${\cdot}$$00     arch$$(6)$$  arch$$(7)$$  arch$$(8)$$  garch     Estimate  SE  Estimate  SE  Estimate  SE  Estimate  SE  $$\omega$$   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03   0$${\cdot}$$01  2E-03  2E-03  6E-04  $$\alpha_1$$  19$${\cdot}$$13  3$${\cdot}$$11  18$${\cdot}$$68  3$${\cdot}$$09  17$${\cdot}$$25   2$${\cdot}$$99  11$${\cdot}$$50   1$${\cdot}$$70  $$\alpha_2$$   9$${\cdot}$$30  2$${\cdot}$$24   9$${\cdot}$$19  2$${\cdot}$$22   8$${\cdot}$$65   2$${\cdot}$$20        $$\alpha_3$$   5$${\cdot}$$78  1$${\cdot}$$86   4$${\cdot}$$94  1$${\cdot}$$73   5$${\cdot}$$38   1$${\cdot}$$81        $$\alpha_4$$   3$${\cdot}$$59  1$${\cdot}$$50   2$${\cdot}$$60  1$${\cdot}$$37   2$${\cdot}$$56   1$${\cdot}$$37        $$\alpha_5$$   0$${\cdot}$$04  0$${\cdot}$$68  4E-06  0$${\cdot}$$76  7E-05   0$${\cdot}$$79        $$\alpha_6$$   5$${\cdot}$$02  1$${\cdot}$$39   5$${\cdot}$$03  1$${\cdot}$$44   4$${\cdot}$$31   1$${\cdot}$$40        $$\alpha_7$$         1$${\cdot}$$10  0$${\cdot}$$72   0$${\cdot}$$70   0$${\cdot}$$71        $$\alpha_8$$               1$${\cdot}$$59   0$${\cdot}$$83        $$\beta_1$$                    69$${\cdot}$$34   3$${\cdot}$$00  SE, standard error; small values are written in standard form, e.g., 2E–03 means $$2\times 10^{-3}$$. We next compare the performance of the proposed test, based on $$Q(M)$$, with those of the tests based on $$Q_{\mathrm{sgn}}(M)$$, $$Q_{\mathrm{abs}}(M)$$ and $$Q_{\mathrm{sqr}}(M)$$. For each test, we employ the Bayesian information criterion-type method in (5) to select $$M$$, and use $$d_{\mathrm{min}}=6$$ because $$\hat{\rho}_k$$ first falls outside its confidence band at $$k=6$$ in Fig. 5; $$d_{\mathrm{max}}$$ is set to 30. Table 3 lists the $$p$$-values of these tests with automatically selected orders $$\tilde{M}$$, indicated by a superscript A. We also report the $$p$$-values for the tests with $$M=9$$, because $$\hat{\rho}_k$$ for both the fitted arch$$(6)$$ and arch$$(7)$$ models is significant at lag 9. The $$p$$-values of $$Q_{\mathrm{abs}}$$ and $$Q_{\mathrm{sqr}}$$ are all close or even equal to unity. Although $$Q_{\mathrm{sgn}}$$ has smaller $$p$$-values, it fails to reject any of the fitted arch$$(p)$$ models at the 5% significance level. By contrast, the inadequacy of the fitted arch$$(6)$$ and arch$$(7)$$ models is successfully detected by our proposed test for both $$M=\tilde{M}$$ and $$M=9$$, which indicates that $$\Psi=\hat{G}_n$$ achieves better performance in detecting possible autocorrelation structures. Table 3. The $$p$$-values of four goodness-of-fit tests with selected order $$\tilde{M}$$ or $$M=9$$     $$Q^{\rm A}$$  $$Q_{\mathrm{sgn}}^{\rm A}$$  $$Q_{\mathrm{abs}}^{\rm A}$$  $$Q_{\mathrm{sqr}}^{\rm A}$$  $$Q(9)$$  $$Q_{\mathrm{sgn}}(9)$$  $$Q_{\mathrm{abs}}(9)$$  $$Q_{\mathrm{sqr}}(9)$$  arch$$(6)$$  0$${\cdot}$$0014  0$${\cdot}$$4130  0$${\cdot}$$8210  1$${\cdot}$$0000  0$${\cdot}$$0015  0$${\cdot}$$3349  0$${\cdot}$$9242  1$${\cdot}$$0000  arch$$(7)$$  0$${\cdot}$$0185  0$${\cdot}$$2790  0$${\cdot}$$8666  1$${\cdot}$$0000  0$${\cdot}$$0125  0$${\cdot}$$0787  0$${\cdot}$$9483  1$${\cdot}$$0000  arch$$(8)$$  0$${\cdot}$$0904  0$${\cdot}$$1872  0$${\cdot}$$9139  1$${\cdot}$$0000  0$${\cdot}$$0981  0$${\cdot}$$1187  0$${\cdot}$$9805  1$${\cdot}$$0000  garch  0$${\cdot}$$1329  0$${\cdot}$$1367  0$${\cdot}$$9474  1$${\cdot}$$0000  0$${\cdot}$$1272  0$${\cdot}$$1034  0$${\cdot}$$9925  1$${\cdot}$$0000     $$Q^{\rm A}$$  $$Q_{\mathrm{sgn}}^{\rm A}$$  $$Q_{\mathrm{abs}}^{\rm A}$$  $$Q_{\mathrm{sqr}}^{\rm A}$$  $$Q(9)$$  $$Q_{\mathrm{sgn}}(9)$$  $$Q_{\mathrm{abs}}(9)$$  $$Q_{\mathrm{sqr}}(9)$$  arch$$(6)$$  0$${\cdot}$$0014  0$${\cdot}$$4130  0$${\cdot}$$8210  1$${\cdot}$$0000  0$${\cdot}$$0015  0$${\cdot}$$3349  0$${\cdot}$$9242  1$${\cdot}$$0000  arch$$(7)$$  0$${\cdot}$$0185  0$${\cdot}$$2790  0$${\cdot}$$8666  1$${\cdot}$$0000  0$${\cdot}$$0125  0$${\cdot}$$0787  0$${\cdot}$$9483  1$${\cdot}$$0000  arch$$(8)$$  0$${\cdot}$$0904  0$${\cdot}$$1872  0$${\cdot}$$9139  1$${\cdot}$$0000  0$${\cdot}$$0981  0$${\cdot}$$1187  0$${\cdot}$$9805  1$${\cdot}$$0000  garch  0$${\cdot}$$1329  0$${\cdot}$$1367  0$${\cdot}$$9474  1$${\cdot}$$0000  0$${\cdot}$$1272  0$${\cdot}$$1034  0$${\cdot}$$9925  1$${\cdot}$$0000  Finally, we evaluate the tail-heaviness of $$\varepsilon_t$$. The Pickands and Hill estimates of the tail index are calculated for the squared residuals of the fitted garch$$(1,1)$$ model. The implication is that $$E(\varepsilon_t^2)<\infty$$ and $$E(\varepsilon_t^4)=\infty$$; see the Supplementary Material and Resnick (2007) for details. We also adopt the strict stationarity tests in Francq & Zakoïan (2012) based on least absolute deviations, and confirm the stationarity of the observed log returns at the 1% significance level. Moreover, $$\hat{\alpha}_1\hat{\sigma}^2+\hat{\beta}_1=3{\cdot}5$$, which is much greater than 1, implying that the observed sequence has an infinite second-order moment. This phenomenon, together with the heavy-tailedness of $$\varepsilon_t$$, may have led to the considerable volatility exhibited in Fig. 4. 7. Conclusion and discussion For a time series model, let $$\{\varepsilon_t\}$$ and $$\{\hat{\varepsilon}_t\}$$ denote the innovations and corresponding residuals, respectively. In constructing a goodness-of-fit test, the sample autocorrelation function of $$\{\hat{\varepsilon}_t\}$$, $$\{|\hat{\varepsilon}_t|\}$$ or $$\{\hat{\varepsilon}_t^2\}$$ is usually employed. However, to ensure the existence of the autocorrelation function of $$\{{\varepsilon}_t\}$$, $$\{|{\varepsilon}_t|\}$$ or $$\{{\varepsilon}_t^2\}$$, the requirement of a finite second- or even fourth-order moment is unavoidable. The essence of our approach in this paper is to transform the residuals before calculating the conventional autocorrelation function. Such a transformation is simple to perform and yet leads to a rich class of tests through various transformation functions. When the absolute residuals are transformed by their corresponding empirical distribution function, no moment condition for $$\varepsilon_t$$ is required, and the resultant goodness-of-fit test is applicable to arbitrarily heavy-tailed innovations. There is an extensive body of literature on time series models with innovations of infinite variance, such as the infinite variance autoregressive (Davis & Resnick, 1986; Ling, 2005) and autoregressive moving-average (Zhu & Ling, 2015) models. The corresponding estimators may not even be $$\surd{n}$$-consistent. To the best of our knowledge, no goodness-of-fit test is currently available that is well suited to such situations; we therefore propose that the method in this paper be adopted to resolve this problem, which we leave for future research. Acknowledgement We thank the editor, associate editor and two referees for their invaluable comments, which have led to substantial improvements of the paper, and we acknowledge the Hong Kong Research Grants Council for partial support. Supplementary material Supplementary material available at Biometrika online includes further results on the noncentrality parameter, additional simulation studies, tail index estimation in the empirical example, and all technical proofs. Appendix Three important lemmas Lemmas A1 and A2 below can be used to derive asymptotic distributions of weighted residual empirical processes for generalized autoregressive conditional heteroscedastic models, and hence are of independent interest. Lemma A3 provides a Hájek projection for the Spearman rank autocorrelation coefficient. Lemma A1. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. If $$\{w_t\}$$ is a strictly stationary and ergodic process with $$0\leqslant w_t\leqslant 1$$ and $$w_t\in \mathcal{F}_{t-1}$$, then  $\sup_{0\leqslant x <\infty} \left | n^{-1/2} \sum_{t=1}^{n} w_t \big\{ I(|\hat{\varepsilon}_t|\leqslant x) - I(|\varepsilon_t|\leqslant x) \big\} - 0{\cdot}5\,xg(x) d_w^{\mathrm{\scriptscriptstyle T} } n^{1/2} (\hat{{\theta}}_n-{\theta}_0) \right | = o_{\rm p}(1),$ where $$d_w=E\{w_t h_t^{-1} {\partial h_t(\theta_0)}/{\partial\theta}\}$$.  Lemma A2. Suppose that $$H_0$$ and Assumptions 1 and 3 hold with $$n^{1/2} (\hat{{\theta}}_n-{\theta}_0)=O_{\rm p}(1)$$. If $$\{w_t\}$$ is a strictly stationary and ergodic process with $$0\leqslant w_t\leqslant 1$$ and each $$w_t$$ is independent of $$\mathcal{F}_{t}$$, then  \begin{equation*} \sup_{0\leqslant x <\infty} \left | n^{-1/2} \sum_{t=1}^{n} w_t \big\{ I(|\hat{\varepsilon}_t|\leqslant x) - I(|\varepsilon_t|\leqslant x) \big\} - E(w_t)xg(x) {d}_0^{*{\mathrm{\scriptscriptstyle T} }} n^{1/2} (\hat{{\theta}}_n-{\theta}_0) \right | = o_{\rm p}(1), \end{equation*}where $$d_0^*=0{\cdot}5E\{h_t^{-1} {\partial h_t(\theta_0)}/{\partial\theta}\}$$.  Lemma A3. Let $$X_1, \ldots, X_n$$ be a sample of independent observations with distribution function $$F(x)$$ and empirical distribution function $$F_n(x)=n^{-1}\sum_{t=1}^{n} I(X_t\leqslant x)$$ for $$-\infty < x < \infty$$. Then, for any positive integer $$k$$,  $n^{-1/2} \sum_{t=k+1}^{n} \{F_n(X_t)F_n(X_{t-k}) - F(X_t)F(X_{t-k})\}= - n^{-1/2} \sum_{t=k+1}^{n} \{F(X_t) - 0{\cdot}5\} + o_{\rm p}(1)\text{.}$ References Andreou E. & Werker B. J. ( 2012). An alternative asymptotic analysis of residual-based statistics. Rev. Econ. Statist.  94, 88– 99. Google Scholar CrossRef Search ADS   Andreou E. & Werker B. J. ( 2015). Residual-based rank specification tests for AR-GARCH type models. J. Economet.  185, 305– 31. Google Scholar CrossRef Search ADS   Bartels R. ( 1982). The rank version of von Neumann’s ratio test for randomness. J. Am. Statist. Assoc.  77, 40– 6. Google Scholar CrossRef Search ADS   Basrak B., Davis R. A. & Mikosch T. ( 2002). Regular variation of GARCH processes. Stoch. Proces. Appl.  99, 95– 115. Google Scholar CrossRef Search ADS   Berkes I. & Horváth L. ( 2004). The efficiency of the estimators of the parameters in GARCH processes. Ann. Statist.  32, 633– 55. Google Scholar CrossRef Search ADS   Berkes I., Horváth L. & Kokoszka P. ( 2003). GARCH processes: Structure and estimation. Bernoulli  9, 201– 27. Google Scholar CrossRef Search ADS   Bollerslev T. ( 1986). Generalized autoregressive conditional heteroskedasticity. J. Economet.  31, 307– 27. Google Scholar CrossRef Search ADS   Bougerol P. & Picard N. ( 1992). Stationarity of GARCH processes and of some nonnegative time series. J. Economet.  52, 115– 27. Google Scholar CrossRef Search ADS   Chen M. & Zhu K. ( 2015). Sign-based portmanteau test for ARCH-type models with heavy-tailed innovations. J. Economet.  189, 313– 20. Google Scholar CrossRef Search ADS   Davis R. A. & Mikosch T. ( 1998). The sample autocorrelations of heavy-tailed processes with applications to arch. Ann. Statist.  26, 2049– 80. Google Scholar CrossRef Search ADS   Davis R. A. & Resnick S. I. ( 1986). Limit theory for the sample covariance and correlation function of moving averages. Ann. Statist.  14, 533– 58. Google Scholar CrossRef Search ADS   Drost F. C. & Klaassen C. A. ( 1997). Efficient estimation in semiparametric GARCH models. J. Economet.  81, 193– 221. Google Scholar CrossRef Search ADS   Dufour J.-M. & Roy R. ( 1985). Some robust exact results on sample autocorrelations and tests of randomness. J. Economet.  29, 257– 73. Google Scholar CrossRef Search ADS   Engle R. F. ( 1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica  50, 987– 1007. Google Scholar CrossRef Search ADS   Fan J. & Yao Q. ( 2003). Nonlinear Time Series: Nonparametric and Parametric Methods . New York: Springer. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2004). Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli  10, 605– 37. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2010). GARCH Models: Structure, Statistical Inference and Financial Applications . Chichester: John Wiley & Sons. Google Scholar CrossRef Search ADS   Francq C. & Zakoïan J.-M. ( 2012). Strict stationarity testing and estimation of explosive and stationary generalized autoregressive conditional heteroscedasticity models. Econometrica  80, 821– 61. Google Scholar CrossRef Search ADS   Guo S., Box J. L. & Zhang W. ( 2017). A dynamic structure for high dimensional covariance matrices and its application in portfolio allocation. J. Am. Statist. Assoc.  112, 235– 53. Google Scholar CrossRef Search ADS   Hall P. & Yao Q. ( 2003). Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica  71, 285– 317. Google Scholar CrossRef Search ADS   Hallin M., Ingenbleek J.-F. & Puri M. L. ( 1985). Linear serial rank tests for randomness against ARMA alternatives. Ann. Statist.  13, 1156– 81. Google Scholar CrossRef Search ADS   Hallin M. & Puri M. L. ( 1994). Aligned rank tests for linear models with autocorrelated error terms. J. Mult. Anal.  50, 175– 237. Google Scholar CrossRef Search ADS   He C. & Teräsvirta T. ( 1999). Fourth moment structure of the GARCH$$(p,q)$$ process. Economet. Theory  15, 824– 46. Google Scholar CrossRef Search ADS   Le Cam L. & Yang G. L. ( 1990). Asymptotics in Statistics.  New York: Springer. Google Scholar CrossRef Search ADS   Li G. & Li W. K. ( 2005). Diagnostic checking for time series models with conditional heteroscedasticity estimated by the least absolute deviation approach. Biometrika  92, 691– 701. Google Scholar CrossRef Search ADS   Li G. & Li W. K. ( 2008). Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity. Biometrika  95, 399– 414. Google Scholar CrossRef Search ADS   Li W. K. ( 2004). Diagnostic Checks in Time Series.  New York: Chapman & Hall/CRC. Li W. K. & Mak T. K. ( 1994). On the squared residual autocorrelations in non-linear time series with conditional heteroskedasticity. J. Time Ser. Anal.  15, 627– 36. Google Scholar CrossRef Search ADS   Ling S. ( 2005). Self-weighted least absolute deviation estimation for infinite variance autoregressive models. J. R. Statist. Soc. B  67, 381– 93. Google Scholar CrossRef Search ADS   Mikosch T. & Stărică C. ( 2000). Limit theory for the sample autocorrelations and extremes of a GARCH$$(1,1)$$ process. Ann. Statist.  28, 1427– 51. Google Scholar CrossRef Search ADS   Mittnik S. & Paolella M. S. ( 2003). Prediction of financial downside-risk with heavy-tailed conditional distributions. In Handbook of Heavy Tailed Distributions in Finance , Rachev S. T. ed. Amsterdam: Elsevier, pp. 385– 404. Nelson D. B. & Cao C. Q. ( 1992). Inequality constraints in the univariate GARCH model. J. Bus. Econ. Statist.  10, 229– 35. Peng L. & Yao Q. ( 2003). Least absolute deviation estimation for ARCH and GARCH models. Biometrika  90, 967– 75. Google Scholar CrossRef Search ADS   Resnick S. I. ( 2007). Heavy-Tail Phenomena: Probabilistic and Statistical Modeling.  New York: Springer. Silverman B. W. ( 1986). Density Estimation for Statistics and Data Analysis . London: Chapman & Hall. Google Scholar CrossRef Search ADS   van der Vaart A. W. ( 1998). Asymptotic Statistics.  New York: Cambridge University Press. Google Scholar CrossRef Search ADS   Wald A. & Wolfowitz J. ( 1943). An exact test for randomness in the non-parametric case based on serial correlation. Ann. Math. Statist.  14, 378– 88. Google Scholar CrossRef Search ADS   Zhu K. & Li W. K. ( 2015). A new Pearson-type QMLE for conditionally heteroskedastic models. J. Bus. Econ. Statist.  33, 552– 65. Google Scholar CrossRef Search ADS   Zhu K. & Ling S. ( 2015). LADE-based inference for ARMA models with unspecified and heavy-tailed heteroscedastic noises. J. Am. Statist. Assoc.  110, 784– 94. Google Scholar CrossRef Search ADS   Zivot E. ( 2009). Practical issues in the analysis of univariate GARCH Models. In Handbook of Financial Time Series , Mikosch T. Kreiß J.-P. Davis R. A. & Andersen T. G. eds. New York: Springer, pp. 113– 55. Google Scholar CrossRef Search ADS   © 2017 Biometrika Trust

### Journal

BiometrikaOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off