# Testing independence for multivariate time series via the auto-distance correlation matrix

Testing independence for multivariate time series via the auto-distance correlation matrix SUMMARY We introduce the matrix multivariate auto-distance covariance and correlation functions for time series, discuss their interpretation and develop consistent estimators for practical implementation. We also develop a test of the independent and identically distributed hypothesis for multivariate time series data and show that it performs better than the multivariate Ljung–Box test. We discuss computational aspects and present a data example to illustrate the method. 1. Introduction In applications in fields such as economics (e.g., Tsay, 2014), medicine (McLachlan et al., 2004) and environmetrics (Hipel & McLeod, 1994), often we observe several time series evolving simultaneously. Analysing each component separately could lead to wrong conclusions because of possible interrelationships between the series. Such relationships are usually identified by employing the autocovariance function. For a $$d$$-dimensional stationary time series $$\{X_{t}, \, t \in \mathbb{Z}\}$$ with mean $$\mu$$, the autocovariance function is defined by \begin{eqnarray*} \Gamma(\,j) & = & E\bigl\{(X_{t+j}-\mu)(X_{t}-\mu)^{\rm T}\bigr\} = \{\gamma_{rm}(\,j)\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\hbox{.} \end{eqnarray*} A consistent estimator is the sample autocorrelation function (Brockwell & Davis, 1991, p. 397) \begin{eqnarray*} \hat{\Gamma}(\,j) & = & \begin{cases} \displaystyle{n^{-1}\sum_{t=1}^{n-j}{(X_{t+j}-\bar{X})(X_{t}-\bar{X})^{\rm T}},} & \quad \hbox{$0 \leq j \leq n-1$,} \\ \displaystyle{n^{-1}\sum_{t=-j+1}^{n}{(X_{t+j}-\bar{X})(X_{t}-\bar{X})^{\rm T}},} & \quad \hbox{$-n+1 \leq j < 0$,} \end{cases} \end{eqnarray*} which is often used to measure pairwise dependence. The multivariate Ljung–Box test statistic (Hosking, 1980; Li & McLeod, 1981) is formulated as $$\label{eq:mLB} {\mathrm{m}}\mathrm{LB}= n^2\sum_{j=1}^{p}{(n-j)^{-1}\,\mbox{tr}\bigl\{\hat{\Gamma}^{\rm T}(\,j)\hat{\Gamma}^{-1}(0)\hat{\Gamma}(\,j)\hat{\Gamma}^{-1}(0)\bigr\}},$$ (1) and it is widely used for testing the hypothesis $$\Gamma(1) = \dots = \Gamma(p) = 0$$. However, (1) should be applied carefully because the number of lags included is held constant but, in practice, the dependence might be of higher order (Hong, 1998, 2000; Xiao & Wu, 2014). Furthermore, the autocovariance function cannot always detect serial dependence for purely non-Gaussian and nonlinear models, though it is suitable for Gaussian models. Test statistics based on the autocovariance function for testing independence are not consistent against alternatives for models with zero autocovariance (Romano & Thombs, 1996; Shao, 2011), so alternative dependence measures need to be studied (Tjøstheim, 1996; Lacal & Tjøstheim, 2017, 2018). We propose the auto-distance covariance function as a statistic suitable for detecting nonlinear relationships in multivariate time series. This function is based on the distance covariance (Székely et al., 2007). Feuerverger (1993) provided an early treatment, and Zhou (2012), Dueck et al. (2014), Fokianos & Pitsillou (2017) and Davis et al. (2018) extended it to time series. Work on the closely related notion of the Hilbert–Schmidt independence criterion includes the papers by Gretton et al. (2008) and Sejdinovic et al. (2013). In the present paper, we introduce the auto-distance covariance matrix for identification of possible nonlinear relationships among the components of a vector series $$\{X_t\}$$, and we show that its sample version is a consistent estimator of the population auto-distance covariance matrix. The sample auto-distance covariance matrix may be used to construct tests for independence of multivariate time series. This is accomplished by following Hong (1999), who introduced the so-called generalized spectral density function. The generalized spectral density matrix captures all the forms of dependence because it is constructed by using the characteristic function. Thus, we can develop statistics for testing independence by considering an increasing number of lags. The present paper extends the work of Zhou (2012) and Fokianos & Pitsillou (2017), who considered univariate testing of independence based on the auto-distance covariance and Szèkely et al. (2007), since some of our results can be transferred to independent data. Indeed, using the auto-distance covariance matrix to identify possible dependencies among the components of a random vector could give rise to novel dimension-reduction methods. All our methods are implemented in the R (R Development Core Team, 2018) package dCovTS (Pitsillou & Fokianos, 2016). 2. Auto-distance covariance matrix 2.1. Definitions Suppose that $$\{X_t,\, t \in \mathbb{Z} \}$$ is a $$d$$-variate strictly stationary time series. Denote its cumulative distribution function by $$F(x_1, \dots, x_d)$$ and assume that $$E (|{X_{t;r}}|) < \infty$$ for $$r=1, \dots, d$$. Let $$F_r(\cdot)$$ denote the marginal distribution function of $$\{X_{t;r}\}$$ and $$F_{rm}(\cdot\,,\cdot)$$ that of $$(X_{t;r},X_{t;m})$$ with $$r,m=1, \dots, d$$. Let $$\{X_t: t= 1, \dots, n\}$$ be a sample of size $$n$$. By extending the results of Szèkely et al. (2007), Zhou (2012) defined the distance covariance function for multivariate time series, but without taking into account possible cross-dependencies between all possible pairs of the component series of $$\{X_t\}$$. Here, we define the pairwise distance covariance function as the distance between the joint characteristic function and the marginal characteristic functions of the pair $$(X_{t;r},X_{t+j;m})$$, for $$r,m = 1, \dots, d$$. Denote the joint characteristic function of $$X_{t;r}$$ and $$X_{t+j;m}$$ by \begin{equation*} \phi_{j}^{(r,m)}(u,v) = E\bigl[\exp\{{\rm i}(uX_{t;r} + vX_{t+j;m})\}\bigr] \quad (u,v \in \mathbb{R};\, j \in \mathbb{Z}), \end{equation*} where $$r,m=1, \dots, d$$ and $${\rm i}^2=-1$$. Let $$\phi^{(r)}(u) = E[\exp\{{\rm i}(uX_{t;r})\}]$$ denote the marginal characteristic function of $$X_{t;r}$$ for $$r=1, \dots, d$$. Let $$\label{eq:Sigmamatrix} \Sigma_{j}(u,v) = \bigl\{\sigma_{j}^{(r,m)}(u,v)\bigr\} \quad (\,j \in \mathbb{Z})$$ (2) denote the $$d \times d$$ matrix whose $$(r,m)$$ element is $$\label{eq:sigmarmk} \sigma_{j}^{(r,m)}(u,v) = \mbox{cov}\bigl\{\exp({\rm i} uX_{t;r}),\,\exp({\rm i} vX_{t+j;m})\bigr\} = \phi_{j}^{(r,m)}(u,v) - \phi^{(r)}(u)\phi^{(m)}(v)\text{.}$$ (3) If $$\sigma_{j}^{(r,m)}(u,v)=0$$ for all $$(u,v) \in \mathbb{R}^2$$, then the random variables $$X_{t;r}$$ and $$X_{t+j;m}$$ are independent for all $$j$$. Let the $$\|\cdot\|_{\mathcal{W}}$$-norm of $$\sigma_{j}^{(r,m)}(u,v)$$ be defined by \begin{equation*} \|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2 = \int_{\mathbb{R}^2}{|{\sigma_{j}^{(r,m)}(u,v)}|^2\,\mathcal{W}({\rm d} u,{\rm d} v)} \quad (\,j \in \mathbb{Z}), \end{equation*} where $$\mathcal{W}(\cdot\,,\cdot)$$ is an arbitrary positive weight function such that $$\|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2 < \infty$$. Feuerverger (1993) and Szèkely et al.(2007) employed a nonintegrable weight function $$\label{eq:weight} \mathcal{W}({\rm d} u,{\rm d} v) = \frac{1}{\pi |{u}|^2}\frac{1}{\pi |{v}|^2}\,{\rm d} u \,{\rm d} v\text{.}$$ (4) The choice of $$\mathcal{W}(\cdot\,,\cdot)$$ is key in this work. Obviously, (4) is nonintegrable in $$\mathbb{R}^2$$, but choices with $$\int{{\rm d}\mathcal{W}} < \infty$$ are possible. Following Hong (1999) and Chen & Hong (2012), suppose that $$\mathcal{W}(\cdot\,,\cdot) : \mathbb{R}^2 \rightarrow \mathbb{R}^{+}$$ is nondecreasing with bounded total variation. This obviously holds for $$\mathcal{W}({\rm d} u,{\rm d} v) = {\rm d}\Phi(u)\,{\rm d}\Phi(v)$$, where $$\Phi(\cdot)$$ is the standard normal cumulative distribution function. In this case, $$\|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2$$ can be computed by Monte Carlo simulation. For related work, see also Meintanis & Iliopoulos (2008) and Hlávka et al. (2011). In what follows, we use (4) throughout, taking into account the fact that integrable weight functions may miss potential dependence among observations (Szèkely et al. 2007, p. 2771). Definition 1. The pairwise auto-distance covariance function between $$X_{t;r}$$ and $$X_{t+j;m}$$ is denoted by $$V_{rm}(\,j)$$ and defined as the positive square root of $$\label{eq:Vrm} V_{rm}^2(\,j) = \|\sigma_j^{(r,m)}(u,v)\|_{\mathcal{W}}^2 \quad (r, m = 1, \dots, d;\: j \in \mathbb{Z}),$$ (5)with $$\mathcal{W}(\cdot\,,\cdot)$$ given by (4). The auto-distance covariance matrix of $$\{X_t\}$$ at lag $$j$$ is denoted by $$V(\,j)$$ and is the $$d \times d$$ matrix $$\label{eq:V} V(\,j) = \bigl\{ V_{rm}(\,j) \bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.}$$ (6) Clearly, $$V_{rm}^2(\,j) \geq 0$$ for all $$j$$ and $$X_{t;r}$$ and $$X_{t+j;m}$$ are independent if and only if $$V_{rm}^2(\,j) = 0$$. Furthermore, we define the $$d \times d$$ matrices $$\label{eq:V2} V^{(2)}(\,j) = \bigl\{V_{rm}^2(\,j)\bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.}$$ (7) Definition 1 is valid for any weight function $$\mathcal{W}(\cdot\,, \cdot)$$ such that $$V_{rm}^{2}(\,j) < \infty$$; with (4), it is a pairwise auto-distance covariance function. Definition 2. The pairwise auto-distance correlation function between $$X_{t;r}$$ and $$X_{t+j;m}$$ is denoted by $$R_{rm}(\,j)$$ and defined as the positive square root of \begin{eqnarray*} R^2_{rm}(\,j) & = & \frac{V_{rm}^2(\,j)}{\{V^2_{rr}(0)V^2_{mm}(0)\}^{1/2}} \quad (r, m = 1, \dots, d;\: j \in \mathbb{Z}), \end{eqnarray*} provided that $$V_{rr}(0)V_{mm}(0) \neq 0$$. The auto-distance correlation matrix of $$\{X_t\}$$ at lag $$j$$ is \begin{eqnarray*} R(\,j) & = & \bigl\{R_{rm}(\,j)\bigr\}_{r,m=1}^{d} \quad (\,j \in \mathbb{Z})\text{.} \end{eqnarray*} Similarly, define the $$d \times d$$ matrices \begin{eqnarray*} R^{(2)}(\,j) & = & \bigl\{R_{rm}^2(\,j)\bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.} \end{eqnarray*} Then (7) shows that $$R^{(2)}(\,j) = D^{-1}V^{(2)}(\,j)D^{-1}$$, where $$D = \mbox{diag} \{ V_{rr}(0) \}$$ ($$r = 1, \dots, d$$). All of the above population quantities exist and are well-defined owing to standard properties of the characteristic function. Davis et al. (2018) give existence results for more general weight functions. When $$j \neq 0$$, $$V_{rm}(\,j)$$ measures the dependence of $$X_{t;r}$$ on $$X_{t+j;m}$$. For $$j > 0$$ and if $$V_{rm}(\,j) > 0$$, we say that the series $$X_{t;m}$$ leads the series $$X_{t;r}$$ at lag $$j$$. In general, $$V_{rm}(\,j) \neq V_{mr}(\,j)$$ for $$r \neq m$$, since they measure different types of dependence between the series $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ ($$r,m= 1, \dots, d$$). Thus, $$V(\,j)$$ and $$R(\,j)$$ are nonsymmetric matrices, but by stationarity we have \begin{eqnarray*} V_{rm}^2(-j) & = & \big\| \mbox{cov}\bigl\{\exp({\rm i} uX_{t;r}),\exp({\rm i} vX_{{t-j};m})\bigr\} \big\|^2_{\mathcal{W}} \\ & = & \big\| \mbox{cov}\bigl\{\exp({\rm i} uX_{t;m}),\exp({\rm i} vX_{t+j;r})\bigr\} \big\|^2_{\mathcal{W}} = V_{mr}^2(\,j) \quad (r,m = 1, \dots, d)\text{.} \end{eqnarray*} Consequently, $$V(-j)=V^{\rm T}(\,j)$$ and $$R(-j)=R^{\rm T}(\,j)$$, because the matrices $$V(\,j)$$ and $$R(\,j)$$ have as elements the positive square roots of the elements of $$V^{(2)}(\,j)$$ and $$R^{(2)}(\,j)$$. Auto-distance covariance matrices can be interpreted as follows. For all $$j \in \mathbb{Z}$$, the diagonal elements $$\{V_{rr}(\,j)\}_{r=1}^d$$ correspond to the auto-distance covariance function of $$\{X_{t;r}\}$$ and explain dependence among the pairs $$(X_{t;r}, X_{t+j;r})$$$$(r = 1, \dots, d)$$. The off-diagonal elements $$\{V_{rm}(0)\}_{r,m=1}^d$$ measure concurrent dependence between $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$. If $$V_{rm}(0) > 0$$, then $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ are concurrently dependent. For $$j \neq 0$$, $$\:\{V_{rm}(\,j)\}_{r,m=1}^d$$ measures dependence between $$\{X_{t;r}\}$$ and $$\{X_{t+j;m}\}$$. If $$V_{rm}(\,j)=0$$ for all $$j \neq 0$$, then $$\{X_{t+j;m}\}$$ does not depend on $$\{X_{t;r}\}$$. For all $$j \in \mathbb{Z}$$, $$\:V_{rm}(\,j) = V_{mr}(\,j) = 0$$ implies that $$\{X_{t;r}\}$$ and $$\{X_{t+j;m}\}$$ are independent. Moreover, for all $$j \neq 0$$, if $$V_{rm}(\,j)=0$$ and $$V_{mr}(\,j)=0$$, then $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ have no lead-lag relationship. If for all $$j > 0$$, $$\:V_{rm}(\,j)=0$$ but there exists some $$j > 0$$ such that $$V_{mr}(\,j) > 0$$, then $$\{X_{t;m}\}$$ does not depend on any past values of $$\{X_{t;r}\}$$ but $$\{X_{t;r}\}$$ depends on some past values of $$\{X_{t;m}\}$$. 2.2. Estimation To estimate (5) and (6), define, for $$j \geq 0$$, \begin{equation*} \hat{\sigma}_{j}^{(r,m)}(u,v) = \hat{\phi}^{(r,m)}_{j}(u,v) - \hat{\phi}^{(r)}(u)\hat{\phi}^{(m)}(v), \end{equation*} with \begin{equation*} \hat{\phi}^{(r,m)}_{j}(u,v) = (n-j)^{-1}\sum_{t=1}^{n-j}{\exp\bigl\{{\rm i}(uX_{t;r}+vX_{t+j;m})\bigr\}} \quad (u,v \in \mathbb{R}), \end{equation*}$$\hat{\phi}^{(r)}(u) = \hat{\phi}^{(r,m)}_j(u,0)$$ and $$\hat{\phi}^{(m)}(v) = \hat{\phi}^{(r,m)}_j(0,v)$$. Then the sample pairwise auto-distance covariance is defined by the positive square root of \begin{eqnarray*} \hat{V}_{rm}^2(\,j) & = & \pi^{-2}\int_{\mathbb{R}^2}{\frac{|{{\hat{\sigma}_{j}^{(r,m)}(u,v)}}|^2}{|{u}|^2|{v}|^2}\,{\rm d} u\,{\rm d} v}\text{.} \end{eqnarray*} Let $$Y_{t;m}=X_{t+j;m}$$. Then, based on the sample $$\{(X_{t;r},Y_{t;m}):t=1, \dots, n-j\}$$, we calculate the $$(n-j) \times (n-j)$$ Euclidean distance matrices $$A^{r}=(A_{ts}^{r})$$ and $$B^{m}=(B_{ts}^{m})$$ with elements $$\label{eq:distmatrices} A_{ts}^{r} = a_{ts}^{r} - \bar{a}_{t\text{.}}^{r} - \bar{a}_{\text{.}s}^{r} + \bar{a}_{\text{..}}^{r}\, , \nonumber$$ where $$\alpha_{ts}^{r} = |{X_{t;r} - X_{s;r}}|$$, $$\bar{\alpha}_{t\text{.}}^{r} = (\sum_{s=1}^{n-j}{a_{ts}^{r}})/(n-j)$$, $$\bar{\alpha}_{\text{.}s}^{r} = (\sum_{t=1}^{n-j}{a_{ts}^{r}})/(n-j)$$ and $$\bar{\alpha}_{\text{..}}^{r} = (\sum_{t=1}^{n-j}\sum_{s=1}^{n-j}{a_{ts}^{r}})/(n-j)^2$$. Similarly, define the quantities $$b_{ts}^{m}=|{Y_{t;m}-Y_{s;m}}|$$ to obtain $$\bar{b}_{t\text{.}}^{m}$$, $$\bar{b}_{\text{.}s}^{m}$$, $$\bar{b}_{\text{..}}^{m}$$ and $$B_{ts}^{m}$$. Then, following Szèkely et al. (2007), we obtain that \begin{eqnarray*} \hat{V}_{rm}^2(\,j) & = & (n-j)^{-2}\sum_{t,s=1}^{n-j}{A_{ts}^rB_{ts}^m}\text{.} \end{eqnarray*} If $$j < 0$$ we set $$\hat{V}_{rm}^2(\,j) = \hat{V}_{mr}^2(-j)$$. By (7), define the $$d \times d$$ matrices $$\hat{V}^{(2)}(\,j) = \{\hat{V}_{rm}^2(\,j)\}$$ ($$j \in \mathbb{Z}$$). The sample distance covariance matrix is given by $$\hat{V}(\,j) = \{\hat{V}_{rm}(\,j)\}$$ ($$j \in \mathbb{Z}$$). Similarly, define $$\hat{R}^{(2)}(\,j)$$ and $$\hat{R}(\,j)$$ ($$j \in \mathbb{Z}$$). 2.3. Large-sample properties of the sample distance covariance matrix The assumption of stationarity is quite restrictive for applications, so it is of interest to investigate the behaviour of the distance covariance function when this assumption does not hold. Consider a simple random walk where $$\{X_t \}$$ is assumed to be a univariate Gaussian process with $$E(X_t) = 0$$, $$\mbox{var}(X_t)=1$$ and $$\mbox{cov}(X_t,X_{t+j})=\rho(\,j)$$. Then (Fokianos & Pitsillou et al. 2007), for $$j \in \mathbb{Z}$$, \begin{equation*} R^2(\,j) = \displaystyle \frac{\rho(\,j)\mbox{arcsin}\{\rho(\,j)\} + \{1-\rho^2(\,j)\}^{1/2}-\rho(\,j)\mbox{arcsin}\{\rho(\,j)/2\}-\{4-\rho^2(\,j)\}^{1/2}+1}{1+\pi/3-3^{1/2}} \text{.} \end{equation*} This shows that the behaviour of $$\rho^2(\,j)$$ determines that of $$R^2(\,j)$$, so we expect $$R^2(\,j)$$ to decay as slowly as $$\rho^2(\,j)$$ does for a random walk. We can test whether the increments $$X_{t}-X_{t-1}$$ form an independent and identically distributed sequence by employing the methods of Fokianos & Pitsillou (2017). In the Supplementary Material we show that $$\hat{V}_{rm}^2(\,j)$$ is a $$V$$-statistic (Serfling, 1980, § 5.1.2) which can be approximated by another $$V$$-statistic with a kernel of lower order. Indeed, suppose that $$u,u' \in \mathbb{R}$$ and let $$X$$ be a real-valued random variable. Define \begin{align*} m_X(u) &= E(|{X-u}|), \quad \bar{m}_X = E\{m_X(X)\}, \\ d_X(u,u') &= |{u-u'}| - m_X(u) - m_X(u') + \bar{m}_X\text{.} \end{align*} With some abuse of notation and setting $$X \equiv X_{t;r}$$ and $$Y \equiv X_{t+j;m}$$, we obtain that \begin{equation*} V_{rm}^2(\,j) = E\bigl\{d_X(X,X')d_Y(Y,Y')\bigr\}, \end{equation*} where $$(X', Y')$$ is an independent and identically distributed copy of $$(X,Y)$$ (Szèkely & Rizzo, 2013, p.1262) ; that is, there exists a kernel $$h:\mathbb{R}^2 \times \mathbb{R}^2 \rightarrow \mathbb{R}$$, given by $$\label{eq:kernel} h(x,y;x',y')=d_X(x,x')\,d_Y(y,y'),$$ (8) such that $$V_{rm}^2(\,j) = \int_{\mathbb{R}^2}\int_{\mathbb{R}^2}{h(x,y;x',y')\,{\rm d} F_{rm}(x,y)\,{\rm d} F_{rm}(x',y')}\text{.}$$ The kernel function is symmetric, continuous and positive semidefinite. Under independence, $$\hat{V}_{rm}^2(\cdot)$$ is a degenerate $$V$$-statistic. If $$E (|{X_{t;r}}|^{2}) < \infty$$, then Szèkely & Rizzo (2013, Lemma 1) and Fubini’s theorem yield \begin{eqnarray*} \label{eq:hSzekely} {V}_{rm}^2(\,j) & = & E(|{X-X'}|\,|{Y-Y'}|) + E(|{X-X'}|)E(|{Y-Y''}|) - 2E(|{X-X'}|\,|{Y-Y''}|), \end{eqnarray*} where $$(X', Y')$$ and $$(X'', Y'')$$ are independent and identically distributed copies of $$(X,Y)$$. If $$X_{t;r}$$ is independent of $$X_{t+j;m}$$, then $$E\{h(x,y;X,Y)\} = 0,$$ so $$\hat{V}_{rm}^2(\,j)$$ has a first-order degeneracy. The following proposition establishes the strong consistency of the estimator $$\hat{V}(\,j)$$. Proposition 1. Let $$\{X_t\}$$ be a $$d$$-variate strictly stationary and ergodic process with $$E(|{X_{t;r}}|^{2}) < \infty$$ for $$r=1,\ldots, d$$. Then, for all $$j \in \mathbb{Z}$$, $$\:\hat{V}(\,j) \rightarrow V(\,j)$$ almost surely as $$n \rightarrow \infty$$. Proposition 1 follows directly from the strong law of large numbers for $$V$$-statistics of ergodic and stationary sequences (Aaronson et al., 1996). Alternatively, it can be proved by the methods of Szèkely et al. (2007), Fokianos & Pitsillou (2017, Proposition 1) and Davis et al. (2018) assuming $$E(|{X_{t;r}}|) < \infty$$, or by the approach of Zhou (2012) assuming $$E(|{X_{t;r}}|^{1+\epsilon}) < \infty$$ for some $$\epsilon >0$$. In particular, by assuming strict stationarity, ergodicity and that $$E(|{X_{t;r}}|^{2}) < \infty$$ for $$r = 1, \ldots, d$$, the strong consistency of the sample auto-distance covariance matrix is established by individually considering each element of $$\hat{V}^{(2)}(\,j)$$ and then the corresponding element of $$\hat{V}(\,j)$$. Previous related work on strong laws for $$V$$-statistics has required stationarity, ergodicity, existence of second moments, almost sure $$F_{rm}(\cdot\,,\cdot)$$ continuity of the kernel function, and uniform integrability. Under these assumptions, $$\hat{V}(\,j)$$ is a weakly consistent estimator of $$V(\,j)$$; see Borovcova et al. (1999, Theorem 1) and Aaronson et al. (1996, Proposition 2.8). The following theorem is proved in the Supplementary Material and gives the limiting distribution of the sample pairwise auto-distance covariance function, $$\hat{V}_{rm}^2(\cdot)$$, when $$V^{2}_{rm}(\cdot) \neq 0$$ and $$\{X_{t} \}$$ is a pairwise independent sequence. Related results have been established by Zhou (2012) and Davis et al. (2018). We attack the problem by using results on $$U$$-statistics for $$\beta$$-mixing processes (Yoshihara, 1976). The first result shows that we can form asymptotic confidence intervals for $$V^{2}_{rm}(\cdot)$$; unfortunately, however, this gives no further information on the dependence structure in the data. Theorem 1. Suppose that $$\{X_{t}\}$$ is $$\beta$$-mixing and there exists a positive number $$\delta$$ such that for $$\nu=2+\delta$$: (i) $$E (|{X_{t;r}}|^{\nu}) < \infty$$ for $$r=1, \ldots,d$$; (ii) for any integers $$i_{1}$$ and $$i_{2}$$, $$\:\sup_{i_{1}, i_{2}} E (|{X_{i_{1};r}- X_{i_{2};r}}|^{\nu}) < \infty$$ for all $$r=1, \ldots,d$$; (iii) the mixing coefficients $$\beta(k)$$ satisfy $$\beta(k) = O(k^{-(2+\delta^\prime)/\delta^\prime})$$ for some $$\delta^\prime$$ such that $$0 < \delta^\prime < \delta$$. Then, for fixed $$j$$, the following hold. (a) If $$V^{2}_{rm}(\,j) \neq 0$$, then \begin{eqnarray*} n^{1/2} \bigl\{\hat{V}^2_{rm}(\,j) - V^2_{rm}(\,j)\bigr\} & \rightarrow & N(0,\,36\sigma^2) \end{eqnarray*}in distribution as $$n \rightarrow \infty$$, where $$\sigma^{2}$$ is given in the Supplementary Material. (b) If $$V^{2}_{rm}(\,j) = 0$$ and $$\delta^\prime < \delta/(3+\delta)$$, then $$\label{eq:LeuchtAsymp} n \hat{V}_{rm}^2(\,j) \rightarrow Z = \sum_{l}{\lambda_lZ_l^2}$$ (9)in distribution as $$n \rightarrow \infty$$, where $$(Z_l)$$ is an independent and identically distributed sequence of standard normal random variables and $$(\lambda_l)$$ is a sequence of nonzero eigenvalues which satisfy the Hilbert–Schmidt equation $$E\{h(x,y;X,Y)\Phi(X,Y)\} = \lambda \Phi(x,y)$$. The kernel $$h(\cdot)$$ is defined by (8) and is represented by $$h(x,y;x',y') = \sum_{l=1}^{\infty}{\lambda_l\Phi_l(x,y)\Phi_l(x',y')}$$, where $$(\Phi_l)_l$$ is the sequence of corresponding orthonormal eigenfunctions. Theorem 1(b) is proved via a Hoeffding decomposition (Hoeffding (1948)), by showing that $$\hat{V}_{rm}^2(.)$$ is approximated by a $$V$$-statistic of order two which is degenerate under independence; then, applying Theorem 1 of Leucht & Neumann (2013a) yields the result. If $$\{X_{t} \}$$ is an independent and identically distributed sequence, then Theorem 1(b) holds; see Szèkely et al. (2007). In general, it is of interest to approximate the asymptotic distribution of the matrix-variate $$V$$-statistic $$\{V^{(2)}(\,j),\, j \in \mathbb{Z}\}$$, whether or not independence holds. To the best of our knowledge, this problem has not been addressed in the literature, but see Chen (2016). Furthermore, it is of interest to determine simultaneous confidence intervals for the elements of $$V(\,j)$$, mimicking the methodology for the ordinary autocorrelation function, under the assumption of independence. However, the asymptotic distribution given in (9) cannot be employed in applications, so a simulation-based method should be applied. We discuss this further in § 4.2. 3. Testing In this section, we develop a test statistic for testing the null hypothesis $$H_{0}$$: $$\{{X_{t}}\}$$ is an independent and identically distributed sequence. Recall that the Frobenius norm of an $$m \times n$$ matrix $$A$$ is defined as $$\|A\|_{\rm F}^2 = \sum_{i=1}^{m}\sum_{j=1}^{n}{{|{\alpha_{ij}}|}^2} = \mbox{tr}(A^{*}A)$$, where $$A^{*}$$ denotes the conjugate transpose and $$\mbox{tr}(A)$$ the trace of the matrix $$A$$. We obtain from (3) that $$\label{eq:supsigma} \sup_{(u,v) \in \mathbb{R}^2}\sum_{j=-\infty}^{\infty}\big|\sigma_{j}^{(r,m)}(u,v)\big| < \infty,$$ (10) provided that $$\{{X_{t}}\}$$ is a $$\beta$$-mixing process with coefficients decaying to zero. Indeed, $${|{\mbox{cov}\{\exp({\rm i} uX_{t;r}), \exp({\rm i} vX_{t+j,m})\}}|} = |{\sigma_{j}^{(r,m)}(u,v)}| \leq C \beta(\,j)$$ for some constant $$C$$. Thus, the sequence of covariance matrices $$\{\Sigma_{j}(u,v),\, j \in \mathbb{Z} \}$$ defined by (2) has absolutely summable components for all $$(u,v) \in \mathbb{R}^2$$. Define the Fourier transform of $$\sigma_{j}^{(r,m)}(\cdot\,,\cdot)$$ by $$\label{eq:frm} f^{(r,m)}(\omega,u,v) = (2\pi)^{-1}\sum_{j=-\infty}^{\infty}\sigma_{j}^{(r,m)}(u,v)\exp(-{\rm i} j\omega) \quad (\omega \in [-\pi,\pi])\text{.}$$ (11) Because of (10), $$f^{(r,m)}(\cdot\,,\cdot\,,\cdot)$$ is bounded and uniformly continuous. If $$r=m$$, then $$f^{(r,r)}(\omega,u,v)$$ is called the generalized spectrum or generalized spectral density of $$X_{t;r}$$ at frequency $$\omega$$ for all $$(u,v) \in \mathbb{R}^2$$. If $$r \neq m$$, then $$f^{(r,m)}(\omega,u,v)$$ is called the generalized cross-spectrum or generalized cross-spectral density of $$X_{t;r}$$ and $$X_{t;m}$$ at frequency $$\omega$$ for all $$(u,v) \in \mathbb{R}^2$$. Collecting all the elements of (11) in a $$d \times d$$ matrix, we obtain the generalized spectral density matrix \begin{eqnarray*} F(\omega,u,v) & = & (2\pi)^{-1}\sum_{j=-\infty}^{\infty}{\Sigma_{j}(u,v)\exp(-{\rm i} j\omega)} = \bigl\{f^{(r,m)}(\omega,u,v)\bigr\}_{r,m=1}^d\text{.} \end{eqnarray*} Under the null hypothesis of independent and identically distributed data, $$\Sigma_{j}(u,v)=0$$ for all $$j \neq 0$$; in this case we denote $$F(\cdot\,,\cdot\,,\cdot)$$ by \begin{eqnarray*} F_0(\omega,u,v) & = & (2\pi)^{-1}\bigl\{\sigma_0^{(r,m)}(u,v)\bigr\}_{r,m=1}^d\text{.} \label{eq:F0matrix} \end{eqnarray*} In general, $$F_0(\cdot\,,\cdot\,,\cdot)$$ is not a diagonal matrix, but when $$X_{t;r}$$ and $$X_{t;m}$$ are independent for all $$r,m = 1, \dots, d$$, then $$F_0(\cdot\,,\cdot\,,\cdot)$$ reduces to a diagonal matrix. Consider the following class of kernel-density estimators: \begin{eqnarray*} \hat{f}^{(r,m)}(\omega,u,v) = (2\pi)^{-1}\sum_{j=-(n-1)}^{(n-1)}{(1-{|{j}|}/n)^{1/2}K(\,j/p) \hat{\sigma}_{j}^{(r,m)}(u,v)\exp(-{\rm i} j\omega)} & \\[-7pt] \quad (\omega \in [-\pi,\pi]), & \end{eqnarray*} where $$p$$ is a bandwidth parameter and $$K(\cdot)$$ is a univariate kernel function satisfying the following condition. Assumption 1. The kernel function $$K: \mathbb{R} \rightarrow [-1,1]$$ is symmetric and continuous at zero and all but a finite number of points, with $$K(0)=1$$, $$\int_{-\infty}^{\infty}{K^2(z)\,{\rm d} z} < \infty$$ and $${|{K(z)}|} \leq C {|{z}|}^{-b}$$ for large $$z$$ and $$b > 1/2$$. Next, let \begin{equation*} \hat{F}(\omega,u,v) = \bigl\{\hat{f}^{(r,m)}(\omega,u,v)\bigr\}_{r,m=1}^d, \quad \hat{F}_0(\omega,u,v) = (2\pi)^{-1}\bigl\{\hat{\sigma}_0^{(r,m)}(u,v)\bigr\}_{r,m=1}^d\text{.} \label{eq:Fhatmatrix} \end{equation*} Then, it is shown in the Supplementary Material that for $$\mathcal{W}(\cdot\,,\cdot)$$ given by (4), \begin{align} \label{eq:traceNormResult2} L_2^2\bigl\{\hat{F}(\omega,u,v),\hat{F}_0(\omega,u,v)\bigr\} & = \int_{\mathbb{R}^2}\int_{-\pi}^{\pi}{\big\|\hat{F}(\omega,u,v)-\hat{F}_0(\omega,u,v)\big\|^2_{\rm F}\,{\rm d}\omega \,\mathcal{W}({\rm d} u,{\rm d} v)} \nonumber \\[1ex] & = 2\pi^{-1}\sum_{j=1}^{n-1}{(1-j/n)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{V}(\,j)\bigr\}}\text{.} \end{align} (12) In terms of the distance correlation matrix, (12) becomes $$\label{eq:traceNormResult3} L_2^2\bigl\{\hat{G}(\omega,u,v),\hat{G}_0(\omega,u,v)\bigr\} = 2\pi^{-1}\sum_{j=1}^{n-1}{(1-j/n)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}},$$ (13) where $$\hat{G}(\cdot\,,\cdot\,,\cdot)$$ and $$\hat{G}_0(\cdot\,,\cdot\,,\cdot)$$ are the estimators of the normalized multivariate generalized spectra $$G(\cdot\,,\cdot\,,\cdot)$$ and $$G_{0}(\cdot\,,\cdot\,,\cdot)$$; details are given in the Supplementary Material. Equations (12) and (13) motivate our study of multivariate tests of independence. In particular, it is of interest to test whether the vector series $$\{X_t\}$$ is independent and identically distributed regardless of any possible dependence between the time series components $$\{X_{t;r}\}$$ for $$r = 1, \dots, d$$. Expression (13) can be viewed as a multivariate Ljung–Box-type statistic based on the distance covariance matrix instead of the ordinary autocovariance matrix. Indeed, by choosing $$K(z) = 1$$ for $${|{z}|} \leq 1$$ and 0 otherwise, (13) becomes \begin{eqnarray*} L_2^2\bigl\{\hat{G}(\omega,u,v),\hat{G}_0(\omega,u,v)\bigr\} & = & 2\pi^{-1}\sum_{j=1}^{p}{(1-j/n)\,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}}\text{.} \end{eqnarray*} Define $$T^{(r,m)}_n = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\hat{V}_{rm}^2(\,j)}$$. Then the test statistic motivated by (12) is \begin{eqnarray*} T_n & = & \sum_{r,m}{T^{(r,m)}_n} = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\, \mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{V}(\,j)\bigr\}}\text{.} \end{eqnarray*} Similarly, by using (13), let \begin{eqnarray*} \bar{T}_n & = & \sum_{r,m}{\frac{T_n^{(r,m)}}{\{\hat{V}_{rr}^2(0)\hat{V}_{mm}^2(0)\}^{1/2}}} = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}}\text{.} \end{eqnarray*} Theorem 2. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption Assumption 1 holds. Let $$p=cn^{\lambda}$$, where $$c > 0$$ and $$\lambda \in (0,1)$$. If $$\{X_{t}\}$$ is an independent and identically distributed sequence, then \begin{eqnarray*} M_n^{(r,m)} = \frac{T^{(r,m)}_n - \hat{C}_0^{(r,m)}p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{\bigl\{\hat{D}_0^{(r,m)}p\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} & \rightarrow & N(0,1) \end{eqnarray*}in distribution as $$n \rightarrow \infty$$, where \begin{eqnarray*} C_0^{(r,m)} & = & \int_{\mathbb{R}^2}{\sigma_{0}^{(r,r)}(u,-u)\sigma_{0}^{(m,m)}(v,-v)\mathcal{W}({\rm d} u,{\rm d} v)}, \\ D_0^{(r,m)} & = & 2\int_{\mathbb{R}^2}\big|\sigma_{0}^{(r,r)}(u,u')\sigma_{0}^{(m,m)}(v,v')\big|^2 \mathcal{W}({\rm d} u,{\rm d} v)\mathcal{W}({\rm d} u',{\rm d} v') = 2V_{rr}^2(0)V_{mm}^2(0), \end{eqnarray*}and $$\hat{C}^{(r,m)}_0$$ and $$\hat{D}^{(r,m)}_0$$ are their sample counterparts. Theorem 2 implies the following result. Corollary 1. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption 1 holds. Let $$p=cn^{\lambda}$$, where $$c > 0$$ and $$\lambda \in (0,1)$$. If $$\{X_{t}\}$$ is an independent and identically distributed sequence, then \begin{align*} M_n & \equiv \frac{T_n - \bigl(\sum_{r,m}\hat{C}_0^{(r,m)}\bigr)p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{\bigl\{\bigl(\sum_{r,m}\hat{D}_0^{(r,m)}\bigr)p \int_{0}^{\infty} {K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \rightarrow N(0,1), \\ \bar{M}_n & \equiv \frac{\bar{T}_n - \bigl(\sum_{r,m}c_0^{(r,m)}\bigr)p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{d\bigl\{2p\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \rightarrow N(0,1) \end{align*}in distribution as $$n \rightarrow \infty$$, where $$c_0^{(r,m)} = C_0^{(r,m)}/\{V_{rr}(0)V_{mm}(0)\}$$ and $$\hat{c}_0^{(r,m)}$$ is the corresponding empirical analogue. The next result states the consistency of the test statistics. Theorem 3. Let $$\{X_{t}\}$$ be a $$\beta$$-mixing, strictly stationary, but not independent and identically distributed process with mixing coefficients satisfying $$\sum_{k} \beta(k) < \infty$$. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption 1 holds. Let $$p=cn^{\lambda}$$ for $$c>0$$ and $$\lambda \in (0,1)$$. Then, as $$n \rightarrow \infty$$, \begin{align*} \label{eq:powereq} \displaystyle \frac{p^{1/2}}{n}M_n & \rightarrow \frac{({\pi}/{2})L_2^2\bigl\{F(\omega,u,v),F_0(\omega,u,v)\bigr\}}{\bigl\{\sum_{r,m}{D_0^{(r,m)}\int_{0}^{\infty}{K^4(z)\,{\rm d} z}}\bigr\}^{1/2}}, \\ \frac{p^{1/2}}{n}\bar{M}_n &\rightarrow \frac{({\pi}/{2})L_2^2\bigl\{G(\omega,u,v),G_0(\omega,u,v)\bigr\}}{d\bigl\{2\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \end{align*}in probability. When we are dealing with a nonstationary process, $$\bar{T}_n$$ converges to $$\infty$$ in probability, so the test will have asymptotic power 1. Rejection of the null does not allow us to conclude that the process is stationary under the alternative hypothesis. Following the earlier discussion, we can test the independent and identically distributed hypothesis for the increments $$X_{t}-X_{t-1}$$. In general, the results depend on the bandwidth parameter $$p$$ and the sample size $$n$$. We do not consider further the choice of bandwidth parameter, but our limited experience has shown that choosing roughly $$p \geq 15$$ for a sample size of $$n=500$$ yields a good asymptotic approximation. However, it is preferable to vary the value of $$p$$ and examine the sensitivity of the results. We suggest the use of simulation-based techniques to approximate the distribution of $$T_{n}$$ or $$\bar{T}_{n}$$ for small $$n$$. 4. Computation of the test statistic with applications 4.1. Bootstrap method To approximate the asymptotic distribution of $$T_n$$ or $$\bar{T}_n$$, we follow the approach of Dehling & Mikosch (1994), who proposed a wild bootstrap for approximating the distribution of degenerate $$U$$-statistics for independent and identically distributed data. More recently, Leucht & Neumann (2013a,b) and Chwialkowski et al. (2014) proposed a novel dependent wild bootstrap (Shao, 2010) for approximating the distribution of degenerate $$U$$- and $$V$$-statistics calculated from time series data. Their method relies on generating auxiliary random variables $$(W_{tn}^*)_{t=1}^{n-j}$$ and computing bootstrap realizations of $$\hat{V}_{rm}^2(\,j)$$ as \begin{eqnarray*} \hat{V}_{rm}^{2*}(\,j) & = & (n-j)^{-2}\sum_{t,s=1}^{n-j}{W_{tn}^{*}h(X_{t;r},Y_{t;m};X_{s;r},Y_{s;m})W_{sn}^{*}}, \end{eqnarray*} where $$h(\cdot)$$ is defined by (8), for $$r,m=1, \dots, d$$ and $$j=1,\dots,n-1$$. A bootstrap realization of $$T_n$$ is computed as \begin{eqnarray*} T_n^{*} & = & \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\sum_{r,m}{\hat{V}_{rm}^{2*}(\,j)}}\text{.} \end{eqnarray*} To test whether $$\{X_t\}$$ is an independent and identically distributed sequence, we repeat the above steps $$B$$ times to obtain $$T_{n,1}^{*}, \dots, T_{n,B}^{*}$$ and then approximate the $$p$$-value of $$T_n$$ by $$\{\sum_{b=1}^{B}{\mathbb{I}(T_{n,b}^{*} \geq T_n)}\}/(B+1)$$, where $$\mathbb{I}(\cdot)$$ denotes the indicator function. For $$\bar{T}_n$$ the approach is analogous. We generate $$W_{tn}^{*}$$ as independent and identically distributed standard normal variables because we are operating under the null hypothesis. Theorem 2 and Corollary 1 show that we can employ the test statistic with a normal approximation, but experience has shown that a rather large sample size is required to achieve the nominal significance level. Furthermore, a standard nonparametric bootstrap provides an alternative test of the null hypothesis. This is implemented in the R package dCovTS (Pitsillou & Fokianos, 2016). The wild bootstrap saves computational time, as simulation is done in a separate loop. The results of Leucht & Neumann (2013b) imply the wild bootstrap validity of $$\hat{V}_{rm}^{2*}(\,j)$$ as a proxy for $$\hat{V}_{rm}^{2}(\,j)$$, for a fixed lag $$j$$ and for $$\tau$$-dependent processes. Here we restrict ourselves to $$\beta$$-mixing processes. Interesting time series models, such as autoregressive moving average models with continuous innovations and generalized autoregressive conditional heteroscedastic models, belong to both classes of processes. The approximation of $$\hat{V}_{rm}^{2*}(\,j)$$ as a proxy for $$\hat{V}_{rm}^{2}(\,j)$$ for a fixed lag $$j$$ is guaranteed to hold under suitable conditions on the mixing coefficients and for processes that lie in both classes. To gain insight into the behaviour of $$T_{n}$$ and its wild bootstrap counterpart, we will need to study the joint distribution of $$\{\hat{V}_{rm}^{2*}(\,j)\}$$ as a proxy for the joint distribution of $$\{\hat{V}_{rm}^{2}(\,j)\}$$. Although we do not address this problem theoretically here, empirical evidence supports the belief that the distribution of $$T_{n}^{*}$$ approximates that of $$T_{n}$$ adequately. 4.2. Obtaining simultaneous critical values for auto-distance correlation plots It is customary to check the white noise assumption by plotting the sample autocorrelation function with simultaneous confidence intervals. The critical values employed for obtaining confidence intervals are computed using the asymptotic normality of the first $$q$$ sample autocorrelations for white noise (Brockwell & Davis, 1991, Theorem 7.2.1). Here we use a similar plot to check independence using the auto-distance correlation function. This task is complicated because the vector comprising $$R_{rm}(\,j)$$ ($$j = 1, \dots, q$$) is a function of degenerate $$V$$-statistics under the null hypothesis. To overcome this difficulty we resort to Monte Carlo simulation. Critical values chosen by the wild bootstrap asymptotically maintain the nominal size of a test statistic. Given $$B$$ bootstrap realizations of $$\hat{R}_{rm}(\,j)$$, say $$\{\hat{R}^{*}_{rm,b}(\,j): b= 1, \dots, B\}$$, we compute the $$p$$-value \begin{eqnarray*} p_{rm}(\,j) & = & (B+1)^{-1}\sum_{b=1}^{B}{\mathbb{I}\bigl\{\hat{R}^{*}_{rm,b}(\,j) \geq \hat{R}_{rm}(\,j)\bigr\}}\text{.} \end{eqnarray*} But the $$p$$-values $$\{p_{rm}(\,j) : j=1, \dots, q\}$$ correspond to testing the hypotheses $$R_{rm}(\,j)=0$$ ($$j=1, \dots, q$$). Because this is a multiple testing situation, we adjust these values at some prespecified level $$\alpha$$ to obtain a new set of $$p$$-values, say $$\{\tilde{p}_{rm}(\,j): j =1, \dots, q\}$$, by using the false discovery rate of Benjamini & Hochberg (1995). Using the adjusted $$p$$-values, we obtain critical points $$\{c_{rm}(\,j): j =1, \dots, q\}$$ for which \begin{eqnarray*} \tilde{p}_{rm}(\,j) & = & \frac{\#\bigl\{\hat{R}_{rm}^{*}(\,j) \geq c_{rm}(\,j)\bigr\}}{B} \quad (\,j = 1, \dots, q)\text{.} \end{eqnarray*} In our plots the horizontal line corresponds to $$c=\mbox{max}_{r,m,j}c_{rm}(\,j)$$, a conservative approach that guarantees that all simultaneous confidence intervals are at a given level $$\alpha$$. In the Supplementary Material we show that these critical values depend only on the length and not on the dimension of the series. 4.3. Results We now describe a limited simulation study concerning the test statistic $$\bar{T}_n$$ computed using a Lipschitz-continuous univariate kernel function $$K(\cdot)$$; that is, for any $$z_1, z_2 \in \mathbb{R}$$, $${|{K(z_1) - K(z_2)}|} \leq C{|{z_1 - z_2}|}$$ for some constant $$C$$. We use the Daniell, Parzen and Bartlett kernels. The Lipschitz condition rules out the truncated kernel, but results based on it are also given. In addition, we compare the performance of $$\bar{T}_n$$ with that of the multivariate Ljung–Box statistic (1). We first investigate the size of the test by considering bivariate standard normal data. Table 1 shows the achieved levels of all test statistics calculated at 5% and 10% nominal levels, and indicates that the proposed test statistics approximate the nominal levels adequately. Table 1. Empirical Type I error $$(\%)$$ of statistics for testing the hypothesis that the data are independent and identically distributed: the data are generated from the bivariate standard normal distribution, and the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations  $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 bar, Bartlett kernel; trunc, truncated kernel; par, Parzen kernel; dan, Daniell kernel; $${\mathrm{m}}{\small{\text{LB}}}$$, multivariate Ljung–Box statistic. Table 1. Empirical Type I error $$(\%)$$ of statistics for testing the hypothesis that the data are independent and identically distributed: the data are generated from the bivariate standard normal distribution, and the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations  $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 bar, Bartlett kernel; trunc, truncated kernel; par, Parzen kernel; dan, Daniell kernel; $${\mathrm{m}}{\small{\text{LB}}}$$, multivariate Ljung–Box statistic. In order to compare the power of the test statistics, we consider a bivariate nonlinear moving average model of order 2, $$\label{eq:vNMA2model} X_{t;i} = \epsilon_{t;i} \epsilon_{t-1;i} \epsilon_{t-2;i} \quad (i=1, 2),$$ (14) where $$\{\epsilon_{t;i}: i=1, 2\}$$ is an independent and identically distributed sequence of standard normal random variables, a bivariate generalized autoregressive conditional heteroscedastic model of order $$(1,1)$$, $$\label{eq:vGARCH11model} X_{t;i} = h_{t;i}^{1/2}\epsilon_{t;i} \quad (i=1, 2),$$ (15) where \begin{eqnarray*} \begin{bmatrix} h_{t;1} \\ h_{t;2} \end{bmatrix} & = & \begin{bmatrix} 0{\cdot}003 \\ 0{\cdot}005 \end{bmatrix} + \begin{bmatrix} 0{\cdot}2 \!&\! 0{\cdot}1 \\ 0{\cdot}1 \!&\! 0{\cdot}3 \end{bmatrix} \begin{bmatrix} X_{t-1;1}^2 \\ X_{t-1;2}^2 \end{bmatrix} - \begin{bmatrix} 0{\cdot}4 \!&\! 0{\cdot}05 \\ 0{\cdot}05\! &\! 0{\cdot}5 \end{bmatrix} \begin{bmatrix} h_{t-1;1} \\ h_{t-1;2} \end{bmatrix} \end{eqnarray*} and $$\{\epsilon_{t;i}: i=1, 2\}$$ is a sample from a bivariate normal distribution with correlation $$\rho=0{\cdot}4$$, and a bivariate autoregressive model of order 1, \begin{eqnarray} \label{eq:vAR1modelIID} \begin{bmatrix} X_{t;1} \\ X_{t;2} \end{bmatrix} & = & \begin{bmatrix} 0{\cdot}04 \!&\!-0{\cdot}10 \\ 0{\cdot}11\! &\! 0{\cdot}50 \end{bmatrix} \begin{bmatrix} X_{t-1;1} \\ X_{t-1;2} \end{bmatrix} + \begin{bmatrix} \epsilon_{t;1} \\ \epsilon_{t;2} \end{bmatrix}, \end{eqnarray} (16) where the error is as in the previous model but with $$\rho=0$$ and $$0{\cdot}4$$. Table 2 shows that $$\bar{T}_n$$ attains greater power than (1) when data are generated by the nonlinear models (14) and (15). The Ljung–Box test statistic performs better than $$\bar{T}_n$$ when data are generated by (16) with $$\rho=0$$, as well as for small values of $$p$$ and large sample sizes. When $$p$$ is large, $$\bar{T}_n$$ generally performs better than (1). Table 2. Empirical power $$(\%)$$ of all test statistics of size $$5\%$$: the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; the results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations, and the test statistic $$\bar{T}_n$$ is calculated using the Bartlett kernel  $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 Table 2. Empirical power $$(\%)$$ of all test statistics of size $$5\%$$: the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; the results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations, and the test statistic $$\bar{T}_n$$ is calculated using the Bartlett kernel  $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 4.4. Application We analyse monthly unemployment rates in the 50 U.S. states from January 1976 to September 2011, seasonally adjusted and available from the website http://faculty.chicagobooth.edu/ruey.tsay/teaching/mtsbk/ of Tsay’s, (2014) book. We first consider the 416 differenced monthly rates of Alaska, Arkansas, Colorado and Delaware. Figure 1 displays the sample auto-distance correlation matrices at each lag and reveals the dependence structure for lags 1 to 12. After the sixth lag, the auto-distance correlation function suggests that there is no dependence among the four series. Assuming that the four-dimensional series follows a vector autoregressive model and employing the Akaike information criterion, a fifth-order model is found to fit the data well. The auto-distance correlation plots for the resulting residual series in Fig. 2 show no serial dependence. Tests of independence for the residual series using the Bartlett kernel for computation of $$\bar{T}_n$$ and $${\small{\text{m}}}{\small{\text{LB}}}$$ yield respective $$p$$-values of 0$$\cdot$$428, 0$$\cdot$$228, 0$$\cdot$$158 and 1, 0$$\cdot$$999, 0$$\cdot$$999 when $$p=6, 11, 19$$, confirming the adequacy of the model fit. Fig. 1. View largeDownload slide Visualization of the sample auto-distance correlation matrices of the four-dimensional unemployment series for Alaska, Arkansas, Colorado and Delaware, starting from the top left at lag $$j=1$$. Darker rectangles correspond to higher distance correlations between series at a specific lag. Fig. 1. View largeDownload slide Visualization of the sample auto-distance correlation matrices of the four-dimensional unemployment series for Alaska, Arkansas, Colorado and Delaware, starting from the top left at lag $$j=1$$. Darker rectangles correspond to higher distance correlations between series at a specific lag. Fig. 2. View largeDownload slide Auto-distance correlation plots of the residuals vector process after fitting a fifth-order vector autoregressive model to the unemployment data. The dotted horizontal line in each panel is drawn following the method described in § 4.2. Fig. 2. View largeDownload slide Auto-distance correlation plots of the residuals vector process after fitting a fifth-order vector autoregressive model to the unemployment data. The dotted horizontal line in each panel is drawn following the method described in § 4.2. Acknowledgements We are grateful to the editor, associate editor and referees for their insightful comments. We also thank H. Dehling, G. Székely, M. Rizzo and M. Wendler for useful discussions. Pitsillou’s research was supported by the University of Cyprus. Supplementary material Supplementary material available at Biometrika online describes how the statistic $$\hat{V}_{rm}^2(\cdot)$$ can be expressed as a $$V$$-statistic, gives derivations of the proposed test statistics, and provides more simulation results and proofs of the theoretical results. References Aaronson J. , Burton R. , Dehling H. , Gilat D. , Hill T. & Weiss B. ( 1996 ). Strong laws for L- and U-statistics. Trans. Am. Math. Soc. 348 , 2845 – 66 . Google Scholar CrossRef Search ADS Benjamini Y. & Hochberg Y. ( 1995 ). Controlling the false discovery rate: A practical and powerful approach. J. R. Statist. Soc. B 57 , 289 – 300 . Borovcova S. , Burton R. & Dehling H. ( 1999 ). Consistency of the Takens estimator for the correlation dimension. Ann. Appl. Prob. 9 , 376 – 90 . Google Scholar CrossRef Search ADS Brockwell P. J. & Davis R. A. ( 1991 ). Time Series: Theory and Methods. New York : Springer , 2nd ed . Google Scholar CrossRef Search ADS Chen B. & Hong Y. ( 2012 ). Testing for the Markov property in time series. Economet. Theory 28 , 130 – 78 . Google Scholar CrossRef Search ADS Chen X. ( 2016 ). Gaussian approximation for the sup-norm of high-dimensional matrix-variate U-statistics and its applications. arXiv: 1602.00199v3 . Chwialkowski K. P. , Sejdinovic D. & Gretton A. ( 2014 ). A wild bootstrap for degenerate kernel tests . In Advances in Neural Information Processing Systems 27 , Ghahramani Z. Welling M. Cortes C. Lawrence N. D. & Weinberger K. Q. eds. Neural Information Processing Systems Foundation , pp. 3608 – 16 . Davis R. A. , Matsui M. , Mikosch T. & Wan P. ( 2018 ). Applications of distance correlation to time series. Bernoulli to appear, arXiv: 1606.05481 . Dehling H. & Mikosch T. ( 1994 ). Random quadratic forms and the bootstrap for U-statistics. J. Mult. Anal. 51 , 392 – 413 . Google Scholar CrossRef Search ADS Dueck J. , Edelmann D. , Gneiting T. & Richards D. ( 2014 ). The affinely invariant distance correlation. Bernoulli 20 , 2305 – 30 . Google Scholar CrossRef Search ADS Feuerverger A. ( 1993 ). A consistent test for bivariate dependence. Int. Statist. Rev. 61 , 419 – 33 . Google Scholar CrossRef Search ADS Fokianos K. & Pitsillou M. ( 2017 ). Consistent testing for pairwise dependence in time series. Technometrics 59 , 262 – 70 . Google Scholar CrossRef Search ADS Gretton A. , Fukumizu K. , Teo C. H. , Song L. , Schölkopf B. & Smola A. ( 2008 ). A kernel statistical test of independence . In Advances in Neural Information Processing Systems 20 , Platt J. C. Koller D. Singer Y. & Roweis S. T. eds. Neural Information Processing Systems Foundation , pp. 585 – 92 . Hipel K. W. & McLeod A. I. ( 1994 ). Time Series Modelling of Water Resources and Environmental Systems . Amsterdam : Elsevier . Hlávka Z. , Hus̆ková M. & Meintanis S. G. ( 2011 ). Tests for independence in non-parametric heteroscedastic regression models. J. Mult. Anal. 102 , 816 – 27 . Google Scholar CrossRef Search ADS Hoeffding W. ( 1948 ). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 , 293 – 325 . Google Scholar CrossRef Search ADS Hong Y. ( 1998 ). Testing for pairwise serial independence via the empirical distribution function. J. R. Statist. Soc. B 60 , 429 – 53 . Google Scholar CrossRef Search ADS Hong Y. ( 1999 ). Hypothesis testing in time series via the emprical characteristic function: A generalized spectral density approach. J. Am. Statist. Assoc. 94 , 1201 – 20 . Google Scholar CrossRef Search ADS Hong Y. ( 2000 ). Generalized spectral tests for serial dependence. J. R. Statist. Soc. B 62 , 557 – 74 . Google Scholar CrossRef Search ADS Hosking J. R. M. ( 1980 ). Multivariate portmanteau statistic. J. Am. Statist. Assoc. 75 , 349 – 86 . Google Scholar CrossRef Search ADS Lacal V. & Tjøstheim D. ( 2017 ). Local Gaussian autocorrelation and tests of serial independence. J. Time Ser. Anal. 38 , 51 – 71 . Google Scholar CrossRef Search ADS Lacal V. & Tjøstheim D. ( 2018 ). Estimating and testing nonlinear local dependence between two time series. J. Bus. Econ. Statist. to appear. Leucht A. & Neumann M. H. ( 2013a ). Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 65 , 349 – 86 . Google Scholar CrossRef Search ADS Leucht A. & Neumann M. H. ( 2013b ). Dependent wild bootstrap for degenerate U- and V-statistics. J. Mult. Anal. 117 , 257 – 80 . Google Scholar CrossRef Search ADS Li W. K. & McLeod A. I. ( 1981 ). Distribution of the residuals autocorrelations in multivariate ARMA time series models. J. R. Statist. Soc. B 43 , 231 – 9 . McLachlan G. J. , Do K.-A. & Ambroise C. ( 2004 ). Analyzing Microarray Gene Expression Data. Hoboken, New Jersey : Wiley . Google Scholar CrossRef Search ADS Meintanis S. G. & Iliopoulos G. ( 2008 ). Fourier methods for testing multivariate independence. Comp. Statist. Data. Anal. 52 , 1884 – 95 . Google Scholar CrossRef Search ADS Pitsillou M. & Fokianos K. ( 2016 ). dCovTS: Distance covariance and correlation for time series analysis. R Journal 8 , 324 – 40 . R Development Core Team ( 2018 ). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing , Vienna, Austria . ISBN 3-900051-07-0 . http://www.R-project.org. Romano J. P. & Thombs L. A. ( 1996 ). Inference for autocorrelations under weak assumptions. J. Am. Statist. Assoc. 91 , 590 – 600 . Google Scholar CrossRef Search ADS Sejdinovic D. , Sriperumbudur B. , Gretton A. & Fukumizu K. ( 2013 ). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 , 2263 – 91 . Google Scholar CrossRef Search ADS Serfling R. J. ( 1980 ). Approximation Theorems of Mathematical Statistics. New York : Wiley . Google Scholar CrossRef Search ADS Shao X. ( 2010 ). The dependent wild bootstrap. J. Am. Statist. Assoc. 105 , 218 – 35 . Google Scholar CrossRef Search ADS Shao X. ( 2011 ). Testing for white noise under unknown dependence and its applications to diagnostic checking for time series models. Economet. Theory 27 , 312 – 43 . Google Scholar CrossRef Search ADS Szèkely G. J. & Rizzo M. L. ( 2013 ). Energy statistics: A class of statistics based on distances. J. Statist. Plan. Infer. 143 , 1249 – 72 . Google Scholar CrossRef Search ADS Sèkely G. J. , Rizzo M. L. & Bakirov N. K. ( 2007 ). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 , 2769 – 94 . Google Scholar CrossRef Search ADS Tjøstheim D. ( 1996 ). Measures of dependence and tests of independence. Statistics 28 , 249 – 84 . Google Scholar CrossRef Search ADS Tsay R. S. ( 2014 ). Multivariate Time Series Analysis with R and Financial Applications. Hoboken, New Jersey : Wiley . Xiao H. & Wu W. B. ( 2014 ). Portmanteau test and simultaneous inference for serial covariances. Statist. Sinica 24 , 577 – 600 . Yoshihara K. ( 1976 ). Limiting behavior of U-statistics for stationary absolutely regular processes. Z. Wahr. verw. Geb. 35 , 237 – 52 . Google Scholar CrossRef Search ADS Zhou Z. ( 2012 ). Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. Anal. 33 , 438 – 57 . Google Scholar CrossRef Search ADS © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

# Testing independence for multivariate time series via the auto-distance correlation matrix

Biometrika, Volume Advance Article (2) – Jan 20, 2018
16 pages

/lp/ou_press/testing-independence-for-multivariate-time-series-via-the-auto-91xTztPCrR
Publisher
Oxford University Press
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asx082
Publisher site
See Article on Publisher Site

### Abstract

SUMMARY We introduce the matrix multivariate auto-distance covariance and correlation functions for time series, discuss their interpretation and develop consistent estimators for practical implementation. We also develop a test of the independent and identically distributed hypothesis for multivariate time series data and show that it performs better than the multivariate Ljung–Box test. We discuss computational aspects and present a data example to illustrate the method. 1. Introduction In applications in fields such as economics (e.g., Tsay, 2014), medicine (McLachlan et al., 2004) and environmetrics (Hipel & McLeod, 1994), often we observe several time series evolving simultaneously. Analysing each component separately could lead to wrong conclusions because of possible interrelationships between the series. Such relationships are usually identified by employing the autocovariance function. For a $$d$$-dimensional stationary time series $$\{X_{t}, \, t \in \mathbb{Z}\}$$ with mean $$\mu$$, the autocovariance function is defined by \begin{eqnarray*} \Gamma(\,j) & = & E\bigl\{(X_{t+j}-\mu)(X_{t}-\mu)^{\rm T}\bigr\} = \{\gamma_{rm}(\,j)\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\hbox{.} \end{eqnarray*} A consistent estimator is the sample autocorrelation function (Brockwell & Davis, 1991, p. 397) \begin{eqnarray*} \hat{\Gamma}(\,j) & = & \begin{cases} \displaystyle{n^{-1}\sum_{t=1}^{n-j}{(X_{t+j}-\bar{X})(X_{t}-\bar{X})^{\rm T}},} & \quad \hbox{$0 \leq j \leq n-1$,} \\ \displaystyle{n^{-1}\sum_{t=-j+1}^{n}{(X_{t+j}-\bar{X})(X_{t}-\bar{X})^{\rm T}},} & \quad \hbox{$-n+1 \leq j < 0$,} \end{cases} \end{eqnarray*} which is often used to measure pairwise dependence. The multivariate Ljung–Box test statistic (Hosking, 1980; Li & McLeod, 1981) is formulated as $$\label{eq:mLB} {\mathrm{m}}\mathrm{LB}= n^2\sum_{j=1}^{p}{(n-j)^{-1}\,\mbox{tr}\bigl\{\hat{\Gamma}^{\rm T}(\,j)\hat{\Gamma}^{-1}(0)\hat{\Gamma}(\,j)\hat{\Gamma}^{-1}(0)\bigr\}},$$ (1) and it is widely used for testing the hypothesis $$\Gamma(1) = \dots = \Gamma(p) = 0$$. However, (1) should be applied carefully because the number of lags included is held constant but, in practice, the dependence might be of higher order (Hong, 1998, 2000; Xiao & Wu, 2014). Furthermore, the autocovariance function cannot always detect serial dependence for purely non-Gaussian and nonlinear models, though it is suitable for Gaussian models. Test statistics based on the autocovariance function for testing independence are not consistent against alternatives for models with zero autocovariance (Romano & Thombs, 1996; Shao, 2011), so alternative dependence measures need to be studied (Tjøstheim, 1996; Lacal & Tjøstheim, 2017, 2018). We propose the auto-distance covariance function as a statistic suitable for detecting nonlinear relationships in multivariate time series. This function is based on the distance covariance (Székely et al., 2007). Feuerverger (1993) provided an early treatment, and Zhou (2012), Dueck et al. (2014), Fokianos & Pitsillou (2017) and Davis et al. (2018) extended it to time series. Work on the closely related notion of the Hilbert–Schmidt independence criterion includes the papers by Gretton et al. (2008) and Sejdinovic et al. (2013). In the present paper, we introduce the auto-distance covariance matrix for identification of possible nonlinear relationships among the components of a vector series $$\{X_t\}$$, and we show that its sample version is a consistent estimator of the population auto-distance covariance matrix. The sample auto-distance covariance matrix may be used to construct tests for independence of multivariate time series. This is accomplished by following Hong (1999), who introduced the so-called generalized spectral density function. The generalized spectral density matrix captures all the forms of dependence because it is constructed by using the characteristic function. Thus, we can develop statistics for testing independence by considering an increasing number of lags. The present paper extends the work of Zhou (2012) and Fokianos & Pitsillou (2017), who considered univariate testing of independence based on the auto-distance covariance and Szèkely et al. (2007), since some of our results can be transferred to independent data. Indeed, using the auto-distance covariance matrix to identify possible dependencies among the components of a random vector could give rise to novel dimension-reduction methods. All our methods are implemented in the R (R Development Core Team, 2018) package dCovTS (Pitsillou & Fokianos, 2016). 2. Auto-distance covariance matrix 2.1. Definitions Suppose that $$\{X_t,\, t \in \mathbb{Z} \}$$ is a $$d$$-variate strictly stationary time series. Denote its cumulative distribution function by $$F(x_1, \dots, x_d)$$ and assume that $$E (|{X_{t;r}}|) < \infty$$ for $$r=1, \dots, d$$. Let $$F_r(\cdot)$$ denote the marginal distribution function of $$\{X_{t;r}\}$$ and $$F_{rm}(\cdot\,,\cdot)$$ that of $$(X_{t;r},X_{t;m})$$ with $$r,m=1, \dots, d$$. Let $$\{X_t: t= 1, \dots, n\}$$ be a sample of size $$n$$. By extending the results of Szèkely et al. (2007), Zhou (2012) defined the distance covariance function for multivariate time series, but without taking into account possible cross-dependencies between all possible pairs of the component series of $$\{X_t\}$$. Here, we define the pairwise distance covariance function as the distance between the joint characteristic function and the marginal characteristic functions of the pair $$(X_{t;r},X_{t+j;m})$$, for $$r,m = 1, \dots, d$$. Denote the joint characteristic function of $$X_{t;r}$$ and $$X_{t+j;m}$$ by \begin{equation*} \phi_{j}^{(r,m)}(u,v) = E\bigl[\exp\{{\rm i}(uX_{t;r} + vX_{t+j;m})\}\bigr] \quad (u,v \in \mathbb{R};\, j \in \mathbb{Z}), \end{equation*} where $$r,m=1, \dots, d$$ and $${\rm i}^2=-1$$. Let $$\phi^{(r)}(u) = E[\exp\{{\rm i}(uX_{t;r})\}]$$ denote the marginal characteristic function of $$X_{t;r}$$ for $$r=1, \dots, d$$. Let $$\label{eq:Sigmamatrix} \Sigma_{j}(u,v) = \bigl\{\sigma_{j}^{(r,m)}(u,v)\bigr\} \quad (\,j \in \mathbb{Z})$$ (2) denote the $$d \times d$$ matrix whose $$(r,m)$$ element is $$\label{eq:sigmarmk} \sigma_{j}^{(r,m)}(u,v) = \mbox{cov}\bigl\{\exp({\rm i} uX_{t;r}),\,\exp({\rm i} vX_{t+j;m})\bigr\} = \phi_{j}^{(r,m)}(u,v) - \phi^{(r)}(u)\phi^{(m)}(v)\text{.}$$ (3) If $$\sigma_{j}^{(r,m)}(u,v)=0$$ for all $$(u,v) \in \mathbb{R}^2$$, then the random variables $$X_{t;r}$$ and $$X_{t+j;m}$$ are independent for all $$j$$. Let the $$\|\cdot\|_{\mathcal{W}}$$-norm of $$\sigma_{j}^{(r,m)}(u,v)$$ be defined by \begin{equation*} \|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2 = \int_{\mathbb{R}^2}{|{\sigma_{j}^{(r,m)}(u,v)}|^2\,\mathcal{W}({\rm d} u,{\rm d} v)} \quad (\,j \in \mathbb{Z}), \end{equation*} where $$\mathcal{W}(\cdot\,,\cdot)$$ is an arbitrary positive weight function such that $$\|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2 < \infty$$. Feuerverger (1993) and Szèkely et al.(2007) employed a nonintegrable weight function $$\label{eq:weight} \mathcal{W}({\rm d} u,{\rm d} v) = \frac{1}{\pi |{u}|^2}\frac{1}{\pi |{v}|^2}\,{\rm d} u \,{\rm d} v\text{.}$$ (4) The choice of $$\mathcal{W}(\cdot\,,\cdot)$$ is key in this work. Obviously, (4) is nonintegrable in $$\mathbb{R}^2$$, but choices with $$\int{{\rm d}\mathcal{W}} < \infty$$ are possible. Following Hong (1999) and Chen & Hong (2012), suppose that $$\mathcal{W}(\cdot\,,\cdot) : \mathbb{R}^2 \rightarrow \mathbb{R}^{+}$$ is nondecreasing with bounded total variation. This obviously holds for $$\mathcal{W}({\rm d} u,{\rm d} v) = {\rm d}\Phi(u)\,{\rm d}\Phi(v)$$, where $$\Phi(\cdot)$$ is the standard normal cumulative distribution function. In this case, $$\|\sigma_{j}^{(r,m)}(u,v)\|_{\mathcal{W}}^2$$ can be computed by Monte Carlo simulation. For related work, see also Meintanis & Iliopoulos (2008) and Hlávka et al. (2011). In what follows, we use (4) throughout, taking into account the fact that integrable weight functions may miss potential dependence among observations (Szèkely et al. 2007, p. 2771). Definition 1. The pairwise auto-distance covariance function between $$X_{t;r}$$ and $$X_{t+j;m}$$ is denoted by $$V_{rm}(\,j)$$ and defined as the positive square root of $$\label{eq:Vrm} V_{rm}^2(\,j) = \|\sigma_j^{(r,m)}(u,v)\|_{\mathcal{W}}^2 \quad (r, m = 1, \dots, d;\: j \in \mathbb{Z}),$$ (5)with $$\mathcal{W}(\cdot\,,\cdot)$$ given by (4). The auto-distance covariance matrix of $$\{X_t\}$$ at lag $$j$$ is denoted by $$V(\,j)$$ and is the $$d \times d$$ matrix $$\label{eq:V} V(\,j) = \bigl\{ V_{rm}(\,j) \bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.}$$ (6) Clearly, $$V_{rm}^2(\,j) \geq 0$$ for all $$j$$ and $$X_{t;r}$$ and $$X_{t+j;m}$$ are independent if and only if $$V_{rm}^2(\,j) = 0$$. Furthermore, we define the $$d \times d$$ matrices $$\label{eq:V2} V^{(2)}(\,j) = \bigl\{V_{rm}^2(\,j)\bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.}$$ (7) Definition 1 is valid for any weight function $$\mathcal{W}(\cdot\,, \cdot)$$ such that $$V_{rm}^{2}(\,j) < \infty$$; with (4), it is a pairwise auto-distance covariance function. Definition 2. The pairwise auto-distance correlation function between $$X_{t;r}$$ and $$X_{t+j;m}$$ is denoted by $$R_{rm}(\,j)$$ and defined as the positive square root of \begin{eqnarray*} R^2_{rm}(\,j) & = & \frac{V_{rm}^2(\,j)}{\{V^2_{rr}(0)V^2_{mm}(0)\}^{1/2}} \quad (r, m = 1, \dots, d;\: j \in \mathbb{Z}), \end{eqnarray*} provided that $$V_{rr}(0)V_{mm}(0) \neq 0$$. The auto-distance correlation matrix of $$\{X_t\}$$ at lag $$j$$ is \begin{eqnarray*} R(\,j) & = & \bigl\{R_{rm}(\,j)\bigr\}_{r,m=1}^{d} \quad (\,j \in \mathbb{Z})\text{.} \end{eqnarray*} Similarly, define the $$d \times d$$ matrices \begin{eqnarray*} R^{(2)}(\,j) & = & \bigl\{R_{rm}^2(\,j)\bigr\}_{r,m=1}^d \quad (\,j \in \mathbb{Z})\text{.} \end{eqnarray*} Then (7) shows that $$R^{(2)}(\,j) = D^{-1}V^{(2)}(\,j)D^{-1}$$, where $$D = \mbox{diag} \{ V_{rr}(0) \}$$ ($$r = 1, \dots, d$$). All of the above population quantities exist and are well-defined owing to standard properties of the characteristic function. Davis et al. (2018) give existence results for more general weight functions. When $$j \neq 0$$, $$V_{rm}(\,j)$$ measures the dependence of $$X_{t;r}$$ on $$X_{t+j;m}$$. For $$j > 0$$ and if $$V_{rm}(\,j) > 0$$, we say that the series $$X_{t;m}$$ leads the series $$X_{t;r}$$ at lag $$j$$. In general, $$V_{rm}(\,j) \neq V_{mr}(\,j)$$ for $$r \neq m$$, since they measure different types of dependence between the series $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ ($$r,m= 1, \dots, d$$). Thus, $$V(\,j)$$ and $$R(\,j)$$ are nonsymmetric matrices, but by stationarity we have \begin{eqnarray*} V_{rm}^2(-j) & = & \big\| \mbox{cov}\bigl\{\exp({\rm i} uX_{t;r}),\exp({\rm i} vX_{{t-j};m})\bigr\} \big\|^2_{\mathcal{W}} \\ & = & \big\| \mbox{cov}\bigl\{\exp({\rm i} uX_{t;m}),\exp({\rm i} vX_{t+j;r})\bigr\} \big\|^2_{\mathcal{W}} = V_{mr}^2(\,j) \quad (r,m = 1, \dots, d)\text{.} \end{eqnarray*} Consequently, $$V(-j)=V^{\rm T}(\,j)$$ and $$R(-j)=R^{\rm T}(\,j)$$, because the matrices $$V(\,j)$$ and $$R(\,j)$$ have as elements the positive square roots of the elements of $$V^{(2)}(\,j)$$ and $$R^{(2)}(\,j)$$. Auto-distance covariance matrices can be interpreted as follows. For all $$j \in \mathbb{Z}$$, the diagonal elements $$\{V_{rr}(\,j)\}_{r=1}^d$$ correspond to the auto-distance covariance function of $$\{X_{t;r}\}$$ and explain dependence among the pairs $$(X_{t;r}, X_{t+j;r})$$$$(r = 1, \dots, d)$$. The off-diagonal elements $$\{V_{rm}(0)\}_{r,m=1}^d$$ measure concurrent dependence between $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$. If $$V_{rm}(0) > 0$$, then $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ are concurrently dependent. For $$j \neq 0$$, $$\:\{V_{rm}(\,j)\}_{r,m=1}^d$$ measures dependence between $$\{X_{t;r}\}$$ and $$\{X_{t+j;m}\}$$. If $$V_{rm}(\,j)=0$$ for all $$j \neq 0$$, then $$\{X_{t+j;m}\}$$ does not depend on $$\{X_{t;r}\}$$. For all $$j \in \mathbb{Z}$$, $$\:V_{rm}(\,j) = V_{mr}(\,j) = 0$$ implies that $$\{X_{t;r}\}$$ and $$\{X_{t+j;m}\}$$ are independent. Moreover, for all $$j \neq 0$$, if $$V_{rm}(\,j)=0$$ and $$V_{mr}(\,j)=0$$, then $$\{X_{t;r}\}$$ and $$\{X_{t;m}\}$$ have no lead-lag relationship. If for all $$j > 0$$, $$\:V_{rm}(\,j)=0$$ but there exists some $$j > 0$$ such that $$V_{mr}(\,j) > 0$$, then $$\{X_{t;m}\}$$ does not depend on any past values of $$\{X_{t;r}\}$$ but $$\{X_{t;r}\}$$ depends on some past values of $$\{X_{t;m}\}$$. 2.2. Estimation To estimate (5) and (6), define, for $$j \geq 0$$, \begin{equation*} \hat{\sigma}_{j}^{(r,m)}(u,v) = \hat{\phi}^{(r,m)}_{j}(u,v) - \hat{\phi}^{(r)}(u)\hat{\phi}^{(m)}(v), \end{equation*} with \begin{equation*} \hat{\phi}^{(r,m)}_{j}(u,v) = (n-j)^{-1}\sum_{t=1}^{n-j}{\exp\bigl\{{\rm i}(uX_{t;r}+vX_{t+j;m})\bigr\}} \quad (u,v \in \mathbb{R}), \end{equation*}$$\hat{\phi}^{(r)}(u) = \hat{\phi}^{(r,m)}_j(u,0)$$ and $$\hat{\phi}^{(m)}(v) = \hat{\phi}^{(r,m)}_j(0,v)$$. Then the sample pairwise auto-distance covariance is defined by the positive square root of \begin{eqnarray*} \hat{V}_{rm}^2(\,j) & = & \pi^{-2}\int_{\mathbb{R}^2}{\frac{|{{\hat{\sigma}_{j}^{(r,m)}(u,v)}}|^2}{|{u}|^2|{v}|^2}\,{\rm d} u\,{\rm d} v}\text{.} \end{eqnarray*} Let $$Y_{t;m}=X_{t+j;m}$$. Then, based on the sample $$\{(X_{t;r},Y_{t;m}):t=1, \dots, n-j\}$$, we calculate the $$(n-j) \times (n-j)$$ Euclidean distance matrices $$A^{r}=(A_{ts}^{r})$$ and $$B^{m}=(B_{ts}^{m})$$ with elements $$\label{eq:distmatrices} A_{ts}^{r} = a_{ts}^{r} - \bar{a}_{t\text{.}}^{r} - \bar{a}_{\text{.}s}^{r} + \bar{a}_{\text{..}}^{r}\, , \nonumber$$ where $$\alpha_{ts}^{r} = |{X_{t;r} - X_{s;r}}|$$, $$\bar{\alpha}_{t\text{.}}^{r} = (\sum_{s=1}^{n-j}{a_{ts}^{r}})/(n-j)$$, $$\bar{\alpha}_{\text{.}s}^{r} = (\sum_{t=1}^{n-j}{a_{ts}^{r}})/(n-j)$$ and $$\bar{\alpha}_{\text{..}}^{r} = (\sum_{t=1}^{n-j}\sum_{s=1}^{n-j}{a_{ts}^{r}})/(n-j)^2$$. Similarly, define the quantities $$b_{ts}^{m}=|{Y_{t;m}-Y_{s;m}}|$$ to obtain $$\bar{b}_{t\text{.}}^{m}$$, $$\bar{b}_{\text{.}s}^{m}$$, $$\bar{b}_{\text{..}}^{m}$$ and $$B_{ts}^{m}$$. Then, following Szèkely et al. (2007), we obtain that \begin{eqnarray*} \hat{V}_{rm}^2(\,j) & = & (n-j)^{-2}\sum_{t,s=1}^{n-j}{A_{ts}^rB_{ts}^m}\text{.} \end{eqnarray*} If $$j < 0$$ we set $$\hat{V}_{rm}^2(\,j) = \hat{V}_{mr}^2(-j)$$. By (7), define the $$d \times d$$ matrices $$\hat{V}^{(2)}(\,j) = \{\hat{V}_{rm}^2(\,j)\}$$ ($$j \in \mathbb{Z}$$). The sample distance covariance matrix is given by $$\hat{V}(\,j) = \{\hat{V}_{rm}(\,j)\}$$ ($$j \in \mathbb{Z}$$). Similarly, define $$\hat{R}^{(2)}(\,j)$$ and $$\hat{R}(\,j)$$ ($$j \in \mathbb{Z}$$). 2.3. Large-sample properties of the sample distance covariance matrix The assumption of stationarity is quite restrictive for applications, so it is of interest to investigate the behaviour of the distance covariance function when this assumption does not hold. Consider a simple random walk where $$\{X_t \}$$ is assumed to be a univariate Gaussian process with $$E(X_t) = 0$$, $$\mbox{var}(X_t)=1$$ and $$\mbox{cov}(X_t,X_{t+j})=\rho(\,j)$$. Then (Fokianos & Pitsillou et al. 2007), for $$j \in \mathbb{Z}$$, \begin{equation*} R^2(\,j) = \displaystyle \frac{\rho(\,j)\mbox{arcsin}\{\rho(\,j)\} + \{1-\rho^2(\,j)\}^{1/2}-\rho(\,j)\mbox{arcsin}\{\rho(\,j)/2\}-\{4-\rho^2(\,j)\}^{1/2}+1}{1+\pi/3-3^{1/2}} \text{.} \end{equation*} This shows that the behaviour of $$\rho^2(\,j)$$ determines that of $$R^2(\,j)$$, so we expect $$R^2(\,j)$$ to decay as slowly as $$\rho^2(\,j)$$ does for a random walk. We can test whether the increments $$X_{t}-X_{t-1}$$ form an independent and identically distributed sequence by employing the methods of Fokianos & Pitsillou (2017). In the Supplementary Material we show that $$\hat{V}_{rm}^2(\,j)$$ is a $$V$$-statistic (Serfling, 1980, § 5.1.2) which can be approximated by another $$V$$-statistic with a kernel of lower order. Indeed, suppose that $$u,u' \in \mathbb{R}$$ and let $$X$$ be a real-valued random variable. Define \begin{align*} m_X(u) &= E(|{X-u}|), \quad \bar{m}_X = E\{m_X(X)\}, \\ d_X(u,u') &= |{u-u'}| - m_X(u) - m_X(u') + \bar{m}_X\text{.} \end{align*} With some abuse of notation and setting $$X \equiv X_{t;r}$$ and $$Y \equiv X_{t+j;m}$$, we obtain that \begin{equation*} V_{rm}^2(\,j) = E\bigl\{d_X(X,X')d_Y(Y,Y')\bigr\}, \end{equation*} where $$(X', Y')$$ is an independent and identically distributed copy of $$(X,Y)$$ (Szèkely & Rizzo, 2013, p.1262) ; that is, there exists a kernel $$h:\mathbb{R}^2 \times \mathbb{R}^2 \rightarrow \mathbb{R}$$, given by $$\label{eq:kernel} h(x,y;x',y')=d_X(x,x')\,d_Y(y,y'),$$ (8) such that $$V_{rm}^2(\,j) = \int_{\mathbb{R}^2}\int_{\mathbb{R}^2}{h(x,y;x',y')\,{\rm d} F_{rm}(x,y)\,{\rm d} F_{rm}(x',y')}\text{.}$$ The kernel function is symmetric, continuous and positive semidefinite. Under independence, $$\hat{V}_{rm}^2(\cdot)$$ is a degenerate $$V$$-statistic. If $$E (|{X_{t;r}}|^{2}) < \infty$$, then Szèkely & Rizzo (2013, Lemma 1) and Fubini’s theorem yield \begin{eqnarray*} \label{eq:hSzekely} {V}_{rm}^2(\,j) & = & E(|{X-X'}|\,|{Y-Y'}|) + E(|{X-X'}|)E(|{Y-Y''}|) - 2E(|{X-X'}|\,|{Y-Y''}|), \end{eqnarray*} where $$(X', Y')$$ and $$(X'', Y'')$$ are independent and identically distributed copies of $$(X,Y)$$. If $$X_{t;r}$$ is independent of $$X_{t+j;m}$$, then $$E\{h(x,y;X,Y)\} = 0,$$ so $$\hat{V}_{rm}^2(\,j)$$ has a first-order degeneracy. The following proposition establishes the strong consistency of the estimator $$\hat{V}(\,j)$$. Proposition 1. Let $$\{X_t\}$$ be a $$d$$-variate strictly stationary and ergodic process with $$E(|{X_{t;r}}|^{2}) < \infty$$ for $$r=1,\ldots, d$$. Then, for all $$j \in \mathbb{Z}$$, $$\:\hat{V}(\,j) \rightarrow V(\,j)$$ almost surely as $$n \rightarrow \infty$$. Proposition 1 follows directly from the strong law of large numbers for $$V$$-statistics of ergodic and stationary sequences (Aaronson et al., 1996). Alternatively, it can be proved by the methods of Szèkely et al. (2007), Fokianos & Pitsillou (2017, Proposition 1) and Davis et al. (2018) assuming $$E(|{X_{t;r}}|) < \infty$$, or by the approach of Zhou (2012) assuming $$E(|{X_{t;r}}|^{1+\epsilon}) < \infty$$ for some $$\epsilon >0$$. In particular, by assuming strict stationarity, ergodicity and that $$E(|{X_{t;r}}|^{2}) < \infty$$ for $$r = 1, \ldots, d$$, the strong consistency of the sample auto-distance covariance matrix is established by individually considering each element of $$\hat{V}^{(2)}(\,j)$$ and then the corresponding element of $$\hat{V}(\,j)$$. Previous related work on strong laws for $$V$$-statistics has required stationarity, ergodicity, existence of second moments, almost sure $$F_{rm}(\cdot\,,\cdot)$$ continuity of the kernel function, and uniform integrability. Under these assumptions, $$\hat{V}(\,j)$$ is a weakly consistent estimator of $$V(\,j)$$; see Borovcova et al. (1999, Theorem 1) and Aaronson et al. (1996, Proposition 2.8). The following theorem is proved in the Supplementary Material and gives the limiting distribution of the sample pairwise auto-distance covariance function, $$\hat{V}_{rm}^2(\cdot)$$, when $$V^{2}_{rm}(\cdot) \neq 0$$ and $$\{X_{t} \}$$ is a pairwise independent sequence. Related results have been established by Zhou (2012) and Davis et al. (2018). We attack the problem by using results on $$U$$-statistics for $$\beta$$-mixing processes (Yoshihara, 1976). The first result shows that we can form asymptotic confidence intervals for $$V^{2}_{rm}(\cdot)$$; unfortunately, however, this gives no further information on the dependence structure in the data. Theorem 1. Suppose that $$\{X_{t}\}$$ is $$\beta$$-mixing and there exists a positive number $$\delta$$ such that for $$\nu=2+\delta$$: (i) $$E (|{X_{t;r}}|^{\nu}) < \infty$$ for $$r=1, \ldots,d$$; (ii) for any integers $$i_{1}$$ and $$i_{2}$$, $$\:\sup_{i_{1}, i_{2}} E (|{X_{i_{1};r}- X_{i_{2};r}}|^{\nu}) < \infty$$ for all $$r=1, \ldots,d$$; (iii) the mixing coefficients $$\beta(k)$$ satisfy $$\beta(k) = O(k^{-(2+\delta^\prime)/\delta^\prime})$$ for some $$\delta^\prime$$ such that $$0 < \delta^\prime < \delta$$. Then, for fixed $$j$$, the following hold. (a) If $$V^{2}_{rm}(\,j) \neq 0$$, then \begin{eqnarray*} n^{1/2} \bigl\{\hat{V}^2_{rm}(\,j) - V^2_{rm}(\,j)\bigr\} & \rightarrow & N(0,\,36\sigma^2) \end{eqnarray*}in distribution as $$n \rightarrow \infty$$, where $$\sigma^{2}$$ is given in the Supplementary Material. (b) If $$V^{2}_{rm}(\,j) = 0$$ and $$\delta^\prime < \delta/(3+\delta)$$, then $$\label{eq:LeuchtAsymp} n \hat{V}_{rm}^2(\,j) \rightarrow Z = \sum_{l}{\lambda_lZ_l^2}$$ (9)in distribution as $$n \rightarrow \infty$$, where $$(Z_l)$$ is an independent and identically distributed sequence of standard normal random variables and $$(\lambda_l)$$ is a sequence of nonzero eigenvalues which satisfy the Hilbert–Schmidt equation $$E\{h(x,y;X,Y)\Phi(X,Y)\} = \lambda \Phi(x,y)$$. The kernel $$h(\cdot)$$ is defined by (8) and is represented by $$h(x,y;x',y') = \sum_{l=1}^{\infty}{\lambda_l\Phi_l(x,y)\Phi_l(x',y')}$$, where $$(\Phi_l)_l$$ is the sequence of corresponding orthonormal eigenfunctions. Theorem 1(b) is proved via a Hoeffding decomposition (Hoeffding (1948)), by showing that $$\hat{V}_{rm}^2(.)$$ is approximated by a $$V$$-statistic of order two which is degenerate under independence; then, applying Theorem 1 of Leucht & Neumann (2013a) yields the result. If $$\{X_{t} \}$$ is an independent and identically distributed sequence, then Theorem 1(b) holds; see Szèkely et al. (2007). In general, it is of interest to approximate the asymptotic distribution of the matrix-variate $$V$$-statistic $$\{V^{(2)}(\,j),\, j \in \mathbb{Z}\}$$, whether or not independence holds. To the best of our knowledge, this problem has not been addressed in the literature, but see Chen (2016). Furthermore, it is of interest to determine simultaneous confidence intervals for the elements of $$V(\,j)$$, mimicking the methodology for the ordinary autocorrelation function, under the assumption of independence. However, the asymptotic distribution given in (9) cannot be employed in applications, so a simulation-based method should be applied. We discuss this further in § 4.2. 3. Testing In this section, we develop a test statistic for testing the null hypothesis $$H_{0}$$: $$\{{X_{t}}\}$$ is an independent and identically distributed sequence. Recall that the Frobenius norm of an $$m \times n$$ matrix $$A$$ is defined as $$\|A\|_{\rm F}^2 = \sum_{i=1}^{m}\sum_{j=1}^{n}{{|{\alpha_{ij}}|}^2} = \mbox{tr}(A^{*}A)$$, where $$A^{*}$$ denotes the conjugate transpose and $$\mbox{tr}(A)$$ the trace of the matrix $$A$$. We obtain from (3) that $$\label{eq:supsigma} \sup_{(u,v) \in \mathbb{R}^2}\sum_{j=-\infty}^{\infty}\big|\sigma_{j}^{(r,m)}(u,v)\big| < \infty,$$ (10) provided that $$\{{X_{t}}\}$$ is a $$\beta$$-mixing process with coefficients decaying to zero. Indeed, $${|{\mbox{cov}\{\exp({\rm i} uX_{t;r}), \exp({\rm i} vX_{t+j,m})\}}|} = |{\sigma_{j}^{(r,m)}(u,v)}| \leq C \beta(\,j)$$ for some constant $$C$$. Thus, the sequence of covariance matrices $$\{\Sigma_{j}(u,v),\, j \in \mathbb{Z} \}$$ defined by (2) has absolutely summable components for all $$(u,v) \in \mathbb{R}^2$$. Define the Fourier transform of $$\sigma_{j}^{(r,m)}(\cdot\,,\cdot)$$ by $$\label{eq:frm} f^{(r,m)}(\omega,u,v) = (2\pi)^{-1}\sum_{j=-\infty}^{\infty}\sigma_{j}^{(r,m)}(u,v)\exp(-{\rm i} j\omega) \quad (\omega \in [-\pi,\pi])\text{.}$$ (11) Because of (10), $$f^{(r,m)}(\cdot\,,\cdot\,,\cdot)$$ is bounded and uniformly continuous. If $$r=m$$, then $$f^{(r,r)}(\omega,u,v)$$ is called the generalized spectrum or generalized spectral density of $$X_{t;r}$$ at frequency $$\omega$$ for all $$(u,v) \in \mathbb{R}^2$$. If $$r \neq m$$, then $$f^{(r,m)}(\omega,u,v)$$ is called the generalized cross-spectrum or generalized cross-spectral density of $$X_{t;r}$$ and $$X_{t;m}$$ at frequency $$\omega$$ for all $$(u,v) \in \mathbb{R}^2$$. Collecting all the elements of (11) in a $$d \times d$$ matrix, we obtain the generalized spectral density matrix \begin{eqnarray*} F(\omega,u,v) & = & (2\pi)^{-1}\sum_{j=-\infty}^{\infty}{\Sigma_{j}(u,v)\exp(-{\rm i} j\omega)} = \bigl\{f^{(r,m)}(\omega,u,v)\bigr\}_{r,m=1}^d\text{.} \end{eqnarray*} Under the null hypothesis of independent and identically distributed data, $$\Sigma_{j}(u,v)=0$$ for all $$j \neq 0$$; in this case we denote $$F(\cdot\,,\cdot\,,\cdot)$$ by \begin{eqnarray*} F_0(\omega,u,v) & = & (2\pi)^{-1}\bigl\{\sigma_0^{(r,m)}(u,v)\bigr\}_{r,m=1}^d\text{.} \label{eq:F0matrix} \end{eqnarray*} In general, $$F_0(\cdot\,,\cdot\,,\cdot)$$ is not a diagonal matrix, but when $$X_{t;r}$$ and $$X_{t;m}$$ are independent for all $$r,m = 1, \dots, d$$, then $$F_0(\cdot\,,\cdot\,,\cdot)$$ reduces to a diagonal matrix. Consider the following class of kernel-density estimators: \begin{eqnarray*} \hat{f}^{(r,m)}(\omega,u,v) = (2\pi)^{-1}\sum_{j=-(n-1)}^{(n-1)}{(1-{|{j}|}/n)^{1/2}K(\,j/p) \hat{\sigma}_{j}^{(r,m)}(u,v)\exp(-{\rm i} j\omega)} & \\[-7pt] \quad (\omega \in [-\pi,\pi]), & \end{eqnarray*} where $$p$$ is a bandwidth parameter and $$K(\cdot)$$ is a univariate kernel function satisfying the following condition. Assumption 1. The kernel function $$K: \mathbb{R} \rightarrow [-1,1]$$ is symmetric and continuous at zero and all but a finite number of points, with $$K(0)=1$$, $$\int_{-\infty}^{\infty}{K^2(z)\,{\rm d} z} < \infty$$ and $${|{K(z)}|} \leq C {|{z}|}^{-b}$$ for large $$z$$ and $$b > 1/2$$. Next, let \begin{equation*} \hat{F}(\omega,u,v) = \bigl\{\hat{f}^{(r,m)}(\omega,u,v)\bigr\}_{r,m=1}^d, \quad \hat{F}_0(\omega,u,v) = (2\pi)^{-1}\bigl\{\hat{\sigma}_0^{(r,m)}(u,v)\bigr\}_{r,m=1}^d\text{.} \label{eq:Fhatmatrix} \end{equation*} Then, it is shown in the Supplementary Material that for $$\mathcal{W}(\cdot\,,\cdot)$$ given by (4), \begin{align} \label{eq:traceNormResult2} L_2^2\bigl\{\hat{F}(\omega,u,v),\hat{F}_0(\omega,u,v)\bigr\} & = \int_{\mathbb{R}^2}\int_{-\pi}^{\pi}{\big\|\hat{F}(\omega,u,v)-\hat{F}_0(\omega,u,v)\big\|^2_{\rm F}\,{\rm d}\omega \,\mathcal{W}({\rm d} u,{\rm d} v)} \nonumber \\[1ex] & = 2\pi^{-1}\sum_{j=1}^{n-1}{(1-j/n)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{V}(\,j)\bigr\}}\text{.} \end{align} (12) In terms of the distance correlation matrix, (12) becomes $$\label{eq:traceNormResult3} L_2^2\bigl\{\hat{G}(\omega,u,v),\hat{G}_0(\omega,u,v)\bigr\} = 2\pi^{-1}\sum_{j=1}^{n-1}{(1-j/n)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}},$$ (13) where $$\hat{G}(\cdot\,,\cdot\,,\cdot)$$ and $$\hat{G}_0(\cdot\,,\cdot\,,\cdot)$$ are the estimators of the normalized multivariate generalized spectra $$G(\cdot\,,\cdot\,,\cdot)$$ and $$G_{0}(\cdot\,,\cdot\,,\cdot)$$; details are given in the Supplementary Material. Equations (12) and (13) motivate our study of multivariate tests of independence. In particular, it is of interest to test whether the vector series $$\{X_t\}$$ is independent and identically distributed regardless of any possible dependence between the time series components $$\{X_{t;r}\}$$ for $$r = 1, \dots, d$$. Expression (13) can be viewed as a multivariate Ljung–Box-type statistic based on the distance covariance matrix instead of the ordinary autocovariance matrix. Indeed, by choosing $$K(z) = 1$$ for $${|{z}|} \leq 1$$ and 0 otherwise, (13) becomes \begin{eqnarray*} L_2^2\bigl\{\hat{G}(\omega,u,v),\hat{G}_0(\omega,u,v)\bigr\} & = & 2\pi^{-1}\sum_{j=1}^{p}{(1-j/n)\,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}}\text{.} \end{eqnarray*} Define $$T^{(r,m)}_n = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\hat{V}_{rm}^2(\,j)}$$. Then the test statistic motivated by (12) is \begin{eqnarray*} T_n & = & \sum_{r,m}{T^{(r,m)}_n} = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\, \mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{V}(\,j)\bigr\}}\text{.} \end{eqnarray*} Similarly, by using (13), let \begin{eqnarray*} \bar{T}_n & = & \sum_{r,m}{\frac{T_n^{(r,m)}}{\{\hat{V}_{rr}^2(0)\hat{V}_{mm}^2(0)\}^{1/2}}} = \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p) \,\mbox{tr}\bigl\{\hat{V}^*(\,j)\hat{D}^{-1}\hat{V}(\,j)\hat{D}^{-1}\bigr\}}\text{.} \end{eqnarray*} Theorem 2. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption Assumption 1 holds. Let $$p=cn^{\lambda}$$, where $$c > 0$$ and $$\lambda \in (0,1)$$. If $$\{X_{t}\}$$ is an independent and identically distributed sequence, then \begin{eqnarray*} M_n^{(r,m)} = \frac{T^{(r,m)}_n - \hat{C}_0^{(r,m)}p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{\bigl\{\hat{D}_0^{(r,m)}p\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} & \rightarrow & N(0,1) \end{eqnarray*}in distribution as $$n \rightarrow \infty$$, where \begin{eqnarray*} C_0^{(r,m)} & = & \int_{\mathbb{R}^2}{\sigma_{0}^{(r,r)}(u,-u)\sigma_{0}^{(m,m)}(v,-v)\mathcal{W}({\rm d} u,{\rm d} v)}, \\ D_0^{(r,m)} & = & 2\int_{\mathbb{R}^2}\big|\sigma_{0}^{(r,r)}(u,u')\sigma_{0}^{(m,m)}(v,v')\big|^2 \mathcal{W}({\rm d} u,{\rm d} v)\mathcal{W}({\rm d} u',{\rm d} v') = 2V_{rr}^2(0)V_{mm}^2(0), \end{eqnarray*}and $$\hat{C}^{(r,m)}_0$$ and $$\hat{D}^{(r,m)}_0$$ are their sample counterparts. Theorem 2 implies the following result. Corollary 1. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption 1 holds. Let $$p=cn^{\lambda}$$, where $$c > 0$$ and $$\lambda \in (0,1)$$. If $$\{X_{t}\}$$ is an independent and identically distributed sequence, then \begin{align*} M_n & \equiv \frac{T_n - \bigl(\sum_{r,m}\hat{C}_0^{(r,m)}\bigr)p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{\bigl\{\bigl(\sum_{r,m}\hat{D}_0^{(r,m)}\bigr)p \int_{0}^{\infty} {K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \rightarrow N(0,1), \\ \bar{M}_n & \equiv \frac{\bar{T}_n - \bigl(\sum_{r,m}c_0^{(r,m)}\bigr)p\int_{0}^{\infty}{K^2(z)\,{\rm d} z}}{d\bigl\{2p\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \rightarrow N(0,1) \end{align*}in distribution as $$n \rightarrow \infty$$, where $$c_0^{(r,m)} = C_0^{(r,m)}/\{V_{rr}(0)V_{mm}(0)\}$$ and $$\hat{c}_0^{(r,m)}$$ is the corresponding empirical analogue. The next result states the consistency of the test statistics. Theorem 3. Let $$\{X_{t}\}$$ be a $$\beta$$-mixing, strictly stationary, but not independent and identically distributed process with mixing coefficients satisfying $$\sum_{k} \beta(k) < \infty$$. Suppose that $$E ({|{X_{t,r}}|}^{2}) < \infty\; (r=1,\ldots,d)$$ and that Assumption 1 holds. Let $$p=cn^{\lambda}$$ for $$c>0$$ and $$\lambda \in (0,1)$$. Then, as $$n \rightarrow \infty$$, \begin{align*} \label{eq:powereq} \displaystyle \frac{p^{1/2}}{n}M_n & \rightarrow \frac{({\pi}/{2})L_2^2\bigl\{F(\omega,u,v),F_0(\omega,u,v)\bigr\}}{\bigl\{\sum_{r,m}{D_0^{(r,m)}\int_{0}^{\infty}{K^4(z)\,{\rm d} z}}\bigr\}^{1/2}}, \\ \frac{p^{1/2}}{n}\bar{M}_n &\rightarrow \frac{({\pi}/{2})L_2^2\bigl\{G(\omega,u,v),G_0(\omega,u,v)\bigr\}}{d\bigl\{2\int_{0}^{\infty}{K^4(z)\,{\rm d} z}\bigr\}^{1/2}} \end{align*}in probability. When we are dealing with a nonstationary process, $$\bar{T}_n$$ converges to $$\infty$$ in probability, so the test will have asymptotic power 1. Rejection of the null does not allow us to conclude that the process is stationary under the alternative hypothesis. Following the earlier discussion, we can test the independent and identically distributed hypothesis for the increments $$X_{t}-X_{t-1}$$. In general, the results depend on the bandwidth parameter $$p$$ and the sample size $$n$$. We do not consider further the choice of bandwidth parameter, but our limited experience has shown that choosing roughly $$p \geq 15$$ for a sample size of $$n=500$$ yields a good asymptotic approximation. However, it is preferable to vary the value of $$p$$ and examine the sensitivity of the results. We suggest the use of simulation-based techniques to approximate the distribution of $$T_{n}$$ or $$\bar{T}_{n}$$ for small $$n$$. 4. Computation of the test statistic with applications 4.1. Bootstrap method To approximate the asymptotic distribution of $$T_n$$ or $$\bar{T}_n$$, we follow the approach of Dehling & Mikosch (1994), who proposed a wild bootstrap for approximating the distribution of degenerate $$U$$-statistics for independent and identically distributed data. More recently, Leucht & Neumann (2013a,b) and Chwialkowski et al. (2014) proposed a novel dependent wild bootstrap (Shao, 2010) for approximating the distribution of degenerate $$U$$- and $$V$$-statistics calculated from time series data. Their method relies on generating auxiliary random variables $$(W_{tn}^*)_{t=1}^{n-j}$$ and computing bootstrap realizations of $$\hat{V}_{rm}^2(\,j)$$ as \begin{eqnarray*} \hat{V}_{rm}^{2*}(\,j) & = & (n-j)^{-2}\sum_{t,s=1}^{n-j}{W_{tn}^{*}h(X_{t;r},Y_{t;m};X_{s;r},Y_{s;m})W_{sn}^{*}}, \end{eqnarray*} where $$h(\cdot)$$ is defined by (8), for $$r,m=1, \dots, d$$ and $$j=1,\dots,n-1$$. A bootstrap realization of $$T_n$$ is computed as \begin{eqnarray*} T_n^{*} & = & \sum_{j=1}^{n-1}{(n-j)K^2(\,j/p)\sum_{r,m}{\hat{V}_{rm}^{2*}(\,j)}}\text{.} \end{eqnarray*} To test whether $$\{X_t\}$$ is an independent and identically distributed sequence, we repeat the above steps $$B$$ times to obtain $$T_{n,1}^{*}, \dots, T_{n,B}^{*}$$ and then approximate the $$p$$-value of $$T_n$$ by $$\{\sum_{b=1}^{B}{\mathbb{I}(T_{n,b}^{*} \geq T_n)}\}/(B+1)$$, where $$\mathbb{I}(\cdot)$$ denotes the indicator function. For $$\bar{T}_n$$ the approach is analogous. We generate $$W_{tn}^{*}$$ as independent and identically distributed standard normal variables because we are operating under the null hypothesis. Theorem 2 and Corollary 1 show that we can employ the test statistic with a normal approximation, but experience has shown that a rather large sample size is required to achieve the nominal significance level. Furthermore, a standard nonparametric bootstrap provides an alternative test of the null hypothesis. This is implemented in the R package dCovTS (Pitsillou & Fokianos, 2016). The wild bootstrap saves computational time, as simulation is done in a separate loop. The results of Leucht & Neumann (2013b) imply the wild bootstrap validity of $$\hat{V}_{rm}^{2*}(\,j)$$ as a proxy for $$\hat{V}_{rm}^{2}(\,j)$$, for a fixed lag $$j$$ and for $$\tau$$-dependent processes. Here we restrict ourselves to $$\beta$$-mixing processes. Interesting time series models, such as autoregressive moving average models with continuous innovations and generalized autoregressive conditional heteroscedastic models, belong to both classes of processes. The approximation of $$\hat{V}_{rm}^{2*}(\,j)$$ as a proxy for $$\hat{V}_{rm}^{2}(\,j)$$ for a fixed lag $$j$$ is guaranteed to hold under suitable conditions on the mixing coefficients and for processes that lie in both classes. To gain insight into the behaviour of $$T_{n}$$ and its wild bootstrap counterpart, we will need to study the joint distribution of $$\{\hat{V}_{rm}^{2*}(\,j)\}$$ as a proxy for the joint distribution of $$\{\hat{V}_{rm}^{2}(\,j)\}$$. Although we do not address this problem theoretically here, empirical evidence supports the belief that the distribution of $$T_{n}^{*}$$ approximates that of $$T_{n}$$ adequately. 4.2. Obtaining simultaneous critical values for auto-distance correlation plots It is customary to check the white noise assumption by plotting the sample autocorrelation function with simultaneous confidence intervals. The critical values employed for obtaining confidence intervals are computed using the asymptotic normality of the first $$q$$ sample autocorrelations for white noise (Brockwell & Davis, 1991, Theorem 7.2.1). Here we use a similar plot to check independence using the auto-distance correlation function. This task is complicated because the vector comprising $$R_{rm}(\,j)$$ ($$j = 1, \dots, q$$) is a function of degenerate $$V$$-statistics under the null hypothesis. To overcome this difficulty we resort to Monte Carlo simulation. Critical values chosen by the wild bootstrap asymptotically maintain the nominal size of a test statistic. Given $$B$$ bootstrap realizations of $$\hat{R}_{rm}(\,j)$$, say $$\{\hat{R}^{*}_{rm,b}(\,j): b= 1, \dots, B\}$$, we compute the $$p$$-value \begin{eqnarray*} p_{rm}(\,j) & = & (B+1)^{-1}\sum_{b=1}^{B}{\mathbb{I}\bigl\{\hat{R}^{*}_{rm,b}(\,j) \geq \hat{R}_{rm}(\,j)\bigr\}}\text{.} \end{eqnarray*} But the $$p$$-values $$\{p_{rm}(\,j) : j=1, \dots, q\}$$ correspond to testing the hypotheses $$R_{rm}(\,j)=0$$ ($$j=1, \dots, q$$). Because this is a multiple testing situation, we adjust these values at some prespecified level $$\alpha$$ to obtain a new set of $$p$$-values, say $$\{\tilde{p}_{rm}(\,j): j =1, \dots, q\}$$, by using the false discovery rate of Benjamini & Hochberg (1995). Using the adjusted $$p$$-values, we obtain critical points $$\{c_{rm}(\,j): j =1, \dots, q\}$$ for which \begin{eqnarray*} \tilde{p}_{rm}(\,j) & = & \frac{\#\bigl\{\hat{R}_{rm}^{*}(\,j) \geq c_{rm}(\,j)\bigr\}}{B} \quad (\,j = 1, \dots, q)\text{.} \end{eqnarray*} In our plots the horizontal line corresponds to $$c=\mbox{max}_{r,m,j}c_{rm}(\,j)$$, a conservative approach that guarantees that all simultaneous confidence intervals are at a given level $$\alpha$$. In the Supplementary Material we show that these critical values depend only on the length and not on the dimension of the series. 4.3. Results We now describe a limited simulation study concerning the test statistic $$\bar{T}_n$$ computed using a Lipschitz-continuous univariate kernel function $$K(\cdot)$$; that is, for any $$z_1, z_2 \in \mathbb{R}$$, $${|{K(z_1) - K(z_2)}|} \leq C{|{z_1 - z_2}|}$$ for some constant $$C$$. We use the Daniell, Parzen and Bartlett kernels. The Lipschitz condition rules out the truncated kernel, but results based on it are also given. In addition, we compare the performance of $$\bar{T}_n$$ with that of the multivariate Ljung–Box statistic (1). We first investigate the size of the test by considering bivariate standard normal data. Table 1 shows the achieved levels of all test statistics calculated at 5% and 10% nominal levels, and indicates that the proposed test statistics approximate the nominal levels adequately. Table 1. Empirical Type I error $$(\%)$$ of statistics for testing the hypothesis that the data are independent and identically distributed: the data are generated from the bivariate standard normal distribution, and the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations  $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 bar, Bartlett kernel; trunc, truncated kernel; par, Parzen kernel; dan, Daniell kernel; $${\mathrm{m}}{\small{\text{LB}}}$$, multivariate Ljung–Box statistic. Table 1. Empirical Type I error $$(\%)$$ of statistics for testing the hypothesis that the data are independent and identically distributed: the data are generated from the bivariate standard normal distribution, and the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations  $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 $$n$$ 500 1000 $$p$$ 6 11 20 6 12 24 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% 10% 5% $$\bar{T}_n$$ bar 11$$\cdot$$5 5$$\cdot$$5 10$$\cdot$$2 5$$\cdot$$2 10$$\cdot$$8 4$$\cdot$$7 9$$\cdot$$7 4$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$7 12 6 trunc 8$$\cdot$$7 3$$\cdot$$8 8$$\cdot$$3 3$$\cdot$$4 8$$\cdot$$6 4$$\cdot$$1 8 3 8 5 5 3 par 9$$\cdot$$7 5$$\cdot$$1 9$$\cdot$$3 4$$\cdot$$3 8$$\cdot$$4 4$$\cdot$$0 8 3 11 6 12 6 dan 8$$\cdot$$9 3$$\cdot$$9 10$$\cdot$$7 5$$\cdot$$7 10$$\cdot$$2 4$$\cdot$$6 10$$\cdot$$2 5$$\cdot$$0 11$$\cdot$$7 5$$\cdot$$5 9$$\cdot$$2 4$$\cdot$$2 $${\mathrm{m}}{\small{\text{LB}}}$$ 11$$\cdot$$4 4$$\cdot$$8 10$$\cdot$$1 4$$\cdot$$9 11$$\cdot$$1 5$$\cdot$$6 10$$\cdot$$5 5$$\cdot$$5 9$$\cdot$$1 4$$\cdot$$9 9$$\cdot$$6 5$$\cdot$$4 bar, Bartlett kernel; trunc, truncated kernel; par, Parzen kernel; dan, Daniell kernel; $${\mathrm{m}}{\small{\text{LB}}}$$, multivariate Ljung–Box statistic. In order to compare the power of the test statistics, we consider a bivariate nonlinear moving average model of order 2, $$\label{eq:vNMA2model} X_{t;i} = \epsilon_{t;i} \epsilon_{t-1;i} \epsilon_{t-2;i} \quad (i=1, 2),$$ (14) where $$\{\epsilon_{t;i}: i=1, 2\}$$ is an independent and identically distributed sequence of standard normal random variables, a bivariate generalized autoregressive conditional heteroscedastic model of order $$(1,1)$$, $$\label{eq:vGARCH11model} X_{t;i} = h_{t;i}^{1/2}\epsilon_{t;i} \quad (i=1, 2),$$ (15) where \begin{eqnarray*} \begin{bmatrix} h_{t;1} \\ h_{t;2} \end{bmatrix} & = & \begin{bmatrix} 0{\cdot}003 \\ 0{\cdot}005 \end{bmatrix} + \begin{bmatrix} 0{\cdot}2 \!&\! 0{\cdot}1 \\ 0{\cdot}1 \!&\! 0{\cdot}3 \end{bmatrix} \begin{bmatrix} X_{t-1;1}^2 \\ X_{t-1;2}^2 \end{bmatrix} - \begin{bmatrix} 0{\cdot}4 \!&\! 0{\cdot}05 \\ 0{\cdot}05\! &\! 0{\cdot}5 \end{bmatrix} \begin{bmatrix} h_{t-1;1} \\ h_{t-1;2} \end{bmatrix} \end{eqnarray*} and $$\{\epsilon_{t;i}: i=1, 2\}$$ is a sample from a bivariate normal distribution with correlation $$\rho=0{\cdot}4$$, and a bivariate autoregressive model of order 1, \begin{eqnarray} \label{eq:vAR1modelIID} \begin{bmatrix} X_{t;1} \\ X_{t;2} \end{bmatrix} & = & \begin{bmatrix} 0{\cdot}04 \!&\!-0{\cdot}10 \\ 0{\cdot}11\! &\! 0{\cdot}50 \end{bmatrix} \begin{bmatrix} X_{t-1;1} \\ X_{t-1;2} \end{bmatrix} + \begin{bmatrix} \epsilon_{t;1} \\ \epsilon_{t;2} \end{bmatrix}, \end{eqnarray} (16) where the error is as in the previous model but with $$\rho=0$$ and $$0{\cdot}4$$. Table 2 shows that $$\bar{T}_n$$ attains greater power than (1) when data are generated by the nonlinear models (14) and (15). The Ljung–Box test statistic performs better than $$\bar{T}_n$$ when data are generated by (16) with $$\rho=0$$, as well as for small values of $$p$$ and large sample sizes. When $$p$$ is large, $$\bar{T}_n$$ generally performs better than (1). Table 2. Empirical power $$(\%)$$ of all test statistics of size $$5\%$$: the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; the results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations, and the test statistic $$\bar{T}_n$$ is calculated using the Bartlett kernel  $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 Table 2. Empirical power $$(\%)$$ of all test statistics of size $$5\%$$: the bandwidth $$p$$ is $$[3n^\lambda]$$ with $$\lambda=0{\cdot}1$$, $$0{\cdot}2$$ or $$0{\cdot}3$$; the results are based on $$B=499$$ bootstrap replications for each of $$1000$$ simulations, and the test statistic $$\bar{T}_n$$ is calculated using the Bartlett kernel  $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 $$n$$ 500 800 1000 $$p$$ 6 11 20 6 12 23 6 12 24 Model (14) $$\bar{T}_n$$ 100 100 100 100 100 100 100 100 100 $${\mathrm{m}}{\small{\text{LB}}}$$ 40 32$$\cdot$$5 26$$\cdot$$6 47$$\cdot$$1 35$$\cdot$$9 28$$\cdot$$5 46$$\cdot$$6 36$$\cdot$$6 26$$\cdot$$8 Model (15) $$\bar{T}_n$$ 76$$\cdot$$4 79$$\cdot$$5 78$$\cdot$$3 97$$\cdot$$2 98 95$$\cdot$$4 99$$\cdot$$4 98 99 $${\mathrm{m}}{\small{\text{LB}}}$$ 56$$\cdot$$9 54$$\cdot$$6 48$$\cdot$$1 65$$\cdot$$1 60$$\cdot$$4 53$$\cdot$$9 64$$\cdot$$9 65$$\cdot$$2 54$$\cdot$$9 Model (16) with $$\rho=0$$ $$\bar{T}_n$$ 66$$\cdot$$6 58$$\cdot$$4 44$$\cdot$$8 41$$\cdot$$3 41$$\cdot$$1 69$$\cdot$$1 30$$\cdot$$4 27 80$$\cdot$$4 $${\mathrm{m}}{\small{\text{LB}}}$$ 48$$\cdot$$5 36$$\cdot$$1 25$$\cdot$$6 75$$\cdot$$7 59$$\cdot$$4 44$$\cdot$$2 86$$\cdot$$2 71$$\cdot$$8 55$$\cdot$$8 Model (16) with $$\rho=0{\cdot}4$$ $$\bar{T}_n$$ 71$$\cdot$$4 62$$\cdot$$5 46$$\cdot$$9 92$$\cdot$$7 89$$\cdot$$7 75$$\cdot$$7 96$$\cdot$$8 95$$\cdot$$6 88 $${\mathrm{m}}{\small{\text{LB}}}$$ 50$$\cdot$$8 37$$\cdot$$1 28$$\cdot$$3 77$$\cdot$$5 58$$\cdot$$9 44$$\cdot$$6 86$$\cdot$$9 72$$\cdot$$6 54$$\cdot$$1 4.4. Application We analyse monthly unemployment rates in the 50 U.S. states from January 1976 to September 2011, seasonally adjusted and available from the website http://faculty.chicagobooth.edu/ruey.tsay/teaching/mtsbk/ of Tsay’s, (2014) book. We first consider the 416 differenced monthly rates of Alaska, Arkansas, Colorado and Delaware. Figure 1 displays the sample auto-distance correlation matrices at each lag and reveals the dependence structure for lags 1 to 12. After the sixth lag, the auto-distance correlation function suggests that there is no dependence among the four series. Assuming that the four-dimensional series follows a vector autoregressive model and employing the Akaike information criterion, a fifth-order model is found to fit the data well. The auto-distance correlation plots for the resulting residual series in Fig. 2 show no serial dependence. Tests of independence for the residual series using the Bartlett kernel for computation of $$\bar{T}_n$$ and $${\small{\text{m}}}{\small{\text{LB}}}$$ yield respective $$p$$-values of 0$$\cdot$$428, 0$$\cdot$$228, 0$$\cdot$$158 and 1, 0$$\cdot$$999, 0$$\cdot$$999 when $$p=6, 11, 19$$, confirming the adequacy of the model fit. Fig. 1. View largeDownload slide Visualization of the sample auto-distance correlation matrices of the four-dimensional unemployment series for Alaska, Arkansas, Colorado and Delaware, starting from the top left at lag $$j=1$$. Darker rectangles correspond to higher distance correlations between series at a specific lag. Fig. 1. View largeDownload slide Visualization of the sample auto-distance correlation matrices of the four-dimensional unemployment series for Alaska, Arkansas, Colorado and Delaware, starting from the top left at lag $$j=1$$. Darker rectangles correspond to higher distance correlations between series at a specific lag. Fig. 2. View largeDownload slide Auto-distance correlation plots of the residuals vector process after fitting a fifth-order vector autoregressive model to the unemployment data. The dotted horizontal line in each panel is drawn following the method described in § 4.2. Fig. 2. View largeDownload slide Auto-distance correlation plots of the residuals vector process after fitting a fifth-order vector autoregressive model to the unemployment data. The dotted horizontal line in each panel is drawn following the method described in § 4.2. Acknowledgements We are grateful to the editor, associate editor and referees for their insightful comments. We also thank H. Dehling, G. Székely, M. Rizzo and M. Wendler for useful discussions. Pitsillou’s research was supported by the University of Cyprus. Supplementary material Supplementary material available at Biometrika online describes how the statistic $$\hat{V}_{rm}^2(\cdot)$$ can be expressed as a $$V$$-statistic, gives derivations of the proposed test statistics, and provides more simulation results and proofs of the theoretical results. References Aaronson J. , Burton R. , Dehling H. , Gilat D. , Hill T. & Weiss B. ( 1996 ). Strong laws for L- and U-statistics. Trans. Am. Math. Soc. 348 , 2845 – 66 . Google Scholar CrossRef Search ADS Benjamini Y. & Hochberg Y. ( 1995 ). Controlling the false discovery rate: A practical and powerful approach. J. R. Statist. Soc. B 57 , 289 – 300 . Borovcova S. , Burton R. & Dehling H. ( 1999 ). Consistency of the Takens estimator for the correlation dimension. Ann. Appl. Prob. 9 , 376 – 90 . Google Scholar CrossRef Search ADS Brockwell P. J. & Davis R. A. ( 1991 ). Time Series: Theory and Methods. New York : Springer , 2nd ed . Google Scholar CrossRef Search ADS Chen B. & Hong Y. ( 2012 ). Testing for the Markov property in time series. Economet. Theory 28 , 130 – 78 . Google Scholar CrossRef Search ADS Chen X. ( 2016 ). Gaussian approximation for the sup-norm of high-dimensional matrix-variate U-statistics and its applications. arXiv: 1602.00199v3 . Chwialkowski K. P. , Sejdinovic D. & Gretton A. ( 2014 ). A wild bootstrap for degenerate kernel tests . In Advances in Neural Information Processing Systems 27 , Ghahramani Z. Welling M. Cortes C. Lawrence N. D. & Weinberger K. Q. eds. Neural Information Processing Systems Foundation , pp. 3608 – 16 . Davis R. A. , Matsui M. , Mikosch T. & Wan P. ( 2018 ). Applications of distance correlation to time series. Bernoulli to appear, arXiv: 1606.05481 . Dehling H. & Mikosch T. ( 1994 ). Random quadratic forms and the bootstrap for U-statistics. J. Mult. Anal. 51 , 392 – 413 . Google Scholar CrossRef Search ADS Dueck J. , Edelmann D. , Gneiting T. & Richards D. ( 2014 ). The affinely invariant distance correlation. Bernoulli 20 , 2305 – 30 . Google Scholar CrossRef Search ADS Feuerverger A. ( 1993 ). A consistent test for bivariate dependence. Int. Statist. Rev. 61 , 419 – 33 . Google Scholar CrossRef Search ADS Fokianos K. & Pitsillou M. ( 2017 ). Consistent testing for pairwise dependence in time series. Technometrics 59 , 262 – 70 . Google Scholar CrossRef Search ADS Gretton A. , Fukumizu K. , Teo C. H. , Song L. , Schölkopf B. & Smola A. ( 2008 ). A kernel statistical test of independence . In Advances in Neural Information Processing Systems 20 , Platt J. C. Koller D. Singer Y. & Roweis S. T. eds. Neural Information Processing Systems Foundation , pp. 585 – 92 . Hipel K. W. & McLeod A. I. ( 1994 ). Time Series Modelling of Water Resources and Environmental Systems . Amsterdam : Elsevier . Hlávka Z. , Hus̆ková M. & Meintanis S. G. ( 2011 ). Tests for independence in non-parametric heteroscedastic regression models. J. Mult. Anal. 102 , 816 – 27 . Google Scholar CrossRef Search ADS Hoeffding W. ( 1948 ). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 , 293 – 325 . Google Scholar CrossRef Search ADS Hong Y. ( 1998 ). Testing for pairwise serial independence via the empirical distribution function. J. R. Statist. Soc. B 60 , 429 – 53 . Google Scholar CrossRef Search ADS Hong Y. ( 1999 ). Hypothesis testing in time series via the emprical characteristic function: A generalized spectral density approach. J. Am. Statist. Assoc. 94 , 1201 – 20 . Google Scholar CrossRef Search ADS Hong Y. ( 2000 ). Generalized spectral tests for serial dependence. J. R. Statist. Soc. B 62 , 557 – 74 . Google Scholar CrossRef Search ADS Hosking J. R. M. ( 1980 ). Multivariate portmanteau statistic. J. Am. Statist. Assoc. 75 , 349 – 86 . Google Scholar CrossRef Search ADS Lacal V. & Tjøstheim D. ( 2017 ). Local Gaussian autocorrelation and tests of serial independence. J. Time Ser. Anal. 38 , 51 – 71 . Google Scholar CrossRef Search ADS Lacal V. & Tjøstheim D. ( 2018 ). Estimating and testing nonlinear local dependence between two time series. J. Bus. Econ. Statist. to appear. Leucht A. & Neumann M. H. ( 2013a ). Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 65 , 349 – 86 . Google Scholar CrossRef Search ADS Leucht A. & Neumann M. H. ( 2013b ). Dependent wild bootstrap for degenerate U- and V-statistics. J. Mult. Anal. 117 , 257 – 80 . Google Scholar CrossRef Search ADS Li W. K. & McLeod A. I. ( 1981 ). Distribution of the residuals autocorrelations in multivariate ARMA time series models. J. R. Statist. Soc. B 43 , 231 – 9 . McLachlan G. J. , Do K.-A. & Ambroise C. ( 2004 ). Analyzing Microarray Gene Expression Data. Hoboken, New Jersey : Wiley . Google Scholar CrossRef Search ADS Meintanis S. G. & Iliopoulos G. ( 2008 ). Fourier methods for testing multivariate independence. Comp. Statist. Data. Anal. 52 , 1884 – 95 . Google Scholar CrossRef Search ADS Pitsillou M. & Fokianos K. ( 2016 ). dCovTS: Distance covariance and correlation for time series analysis. R Journal 8 , 324 – 40 . R Development Core Team ( 2018 ). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing , Vienna, Austria . ISBN 3-900051-07-0 . http://www.R-project.org. Romano J. P. & Thombs L. A. ( 1996 ). Inference for autocorrelations under weak assumptions. J. Am. Statist. Assoc. 91 , 590 – 600 . Google Scholar CrossRef Search ADS Sejdinovic D. , Sriperumbudur B. , Gretton A. & Fukumizu K. ( 2013 ). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 , 2263 – 91 . Google Scholar CrossRef Search ADS Serfling R. J. ( 1980 ). Approximation Theorems of Mathematical Statistics. New York : Wiley . Google Scholar CrossRef Search ADS Shao X. ( 2010 ). The dependent wild bootstrap. J. Am. Statist. Assoc. 105 , 218 – 35 . Google Scholar CrossRef Search ADS Shao X. ( 2011 ). Testing for white noise under unknown dependence and its applications to diagnostic checking for time series models. Economet. Theory 27 , 312 – 43 . Google Scholar CrossRef Search ADS Szèkely G. J. & Rizzo M. L. ( 2013 ). Energy statistics: A class of statistics based on distances. J. Statist. Plan. Infer. 143 , 1249 – 72 . Google Scholar CrossRef Search ADS Sèkely G. J. , Rizzo M. L. & Bakirov N. K. ( 2007 ). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 , 2769 – 94 . Google Scholar CrossRef Search ADS Tjøstheim D. ( 1996 ). Measures of dependence and tests of independence. Statistics 28 , 249 – 84 . Google Scholar CrossRef Search ADS Tsay R. S. ( 2014 ). Multivariate Time Series Analysis with R and Financial Applications. Hoboken, New Jersey : Wiley . Xiao H. & Wu W. B. ( 2014 ). Portmanteau test and simultaneous inference for serial covariances. Statist. Sinica 24 , 577 – 600 . Yoshihara K. ( 1976 ). Limiting behavior of U-statistics for stationary absolutely regular processes. Z. Wahr. verw. Geb. 35 , 237 – 52 . Google Scholar CrossRef Search ADS Zhou Z. ( 2012 ). Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. Anal. 33 , 438 – 57 . Google Scholar CrossRef Search ADS © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

### Journal

BiometrikaOxford University Press

Published: Jan 20, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create folders to

Export folders, citations