Comparing Predictive Accuracy under Long Memory, With an Application to Volatility Forecasting

Comparing Predictive Accuracy under Long Memory, With an Application to Volatility Forecasting Abstract This article extends the popular Diebold–Mariano test for equal predictive accuracy to situations when the forecast error loss differential exhibits long memory. This situation can arise frequently since long memory can be transmitted from forecasts and the forecast objective to forecast error loss differentials. The nature of this transmission depends on the (un)biasedness of the forecasts and whether the involved series share common long memory. Further theoretical results show that the conventional Diebold–Mariano test is invalidated under these circumstances. Robust statistics based on a memory and autocorrelation consistent estimator and an extended fixed-bandwidth approach are considered. The subsequent extensive Monte Carlo study provides numerical results on various issues. As empirical applications, we consider recent extensions of the HAR model for the S&P500 realized volatility. While we find that forecasts improve significantly if jumps are considered, improvements achieved by the inclusion of an implied volatility index turn out to be insignificant. 1 Introduction If the accuracy of competing forecasts is to be evaluated in a (pseudo-)out-of-sample setup, it has become standard practice to employ the test of Diebold and Mariano (1995) (hereafter DM test). Let ŷ1t and ŷ2t denote two competing forecasts for the forecast objective series yt and let the loss function be given by g(yt,ŷit)≥0 for i = 1, 2. The forecast error loss differential is then denoted by zt=g(yt,ŷ1t)−g(yt,ŷ2t). (1) By only imposing restrictions on the loss differential zt, instead of the forecast objective and the forecasts, Diebold and Mariano (1995) test the null hypothesis of equal predictive accuracy, that is H0:E(zt)=0, by means of a simple t-statistic for the mean of the loss differentials. To account for serial correlation, a long-run variance estimator such as the heteroscedasticity and autocorrelation consistent (HAC) estimator is applied (see Newey and West (1987), Andrews (1991) and Andrews and Monahan (1992)). For weakly dependent and second-order stationary processes, this leads to an asymptotic standard normal distribution of the t-statistic. Apart from the development of other forecast comparison tests such as those of West (1996) or Giacomini and White (2006), several direct extensions and improvements of the DM test have been proposed. Harvey, Leybourne, and Newbold (1997) suggest a version that corrects for the bias of the long-run variance estimation in finite samples. A multivariate DM test is derived by Mariano and Preve (2012). To mitigate the well-known size issues of HAC-based tests in finite samples of persistent short memory processes, Choi and Kiefer (2010) construct a DM test using the so-called fixed-bandwidth (or in short, fixed-b) asymptotics, originally introduced in Kiefer and Vogelsang (2005) (see also Li and Patton (2018)). The issue of near unit root asymptotics is tackled by Rossi (2005). These studies belong to the classical I(0)/I(1) framework. Contrary to the aforementioned studies, we consider the situation in which the loss differentials follow long memory processes. Our first contribution is to show that long memory can be transmitted from the forecasts and the forecast objective to the forecast errors and subsequently to the forecast error loss differentials. We provide theoretical results for the mean squared error (MSE) loss function and Gaussian processes. We give conditions under which the transmission occurs and characterize the memory properties of the forecast error loss differential. The memory transmission for non-Gaussian processes and other loss functions is demonstrated by means of Monte Carlo simulations resembling typical forecast scenarios. As a second contribution, we show (both theoretically and via simulations) that the original DM test is invalidated under long memory and suffers from severe upward size distortions. Third, we study two simple extensions of the DM statistic that permit valid inference under long and short memory. These extensions are based on the memory and autocorrelation consistent (MAC) estimator of Robinson (2005) (see also Abadir, Distaso, and Giraitis (2009)) and the extended fixed-b asymptotics (EFB) of McElroy and Politis (2012). The performance of these modified statistics is analyzed in a Monte Carlo study that is specifically tailored to reflect the properties that are likely to occur in the loss differentials. We compare several bandwidth and kernel choices that allow recommendations for practical applications. Our fourth contribution is an empirical application in which we reconsider two recent extensions of the heterogeneous autoregressive model for realized volatility (HAR-RV) by Corsi (2009). First, we test whether forecasts obtained from HAR-RV type models can be improved by including information on model-free risk-neutral implied volatility, which is measured by the CBOE volatility index (VIX). We find that short memory approaches (classic DM test and fixed-b versions) reject the null hypothesis of equal predictive accuracy in favor of models including implied volatility. On the contrary, our long memory robust statistics do not indicate a significant improvement in forecast performance which implies that previous rejections might be spurious due to neglected long memory. The second issue we tackle in our empirical applications relates to earlier work by inter alia Andersen, Bollerslev, and Diebold (2007) and Corsi, Pirino, and Renò (2010), who consider the decomposition of the quadratic variation of the log-price process into a continuous integrated volatility component and a discrete jump component. Here, we find that the separate treatment of continuous components and jump components significantly improves forecasts of realized variance for short forecast horizons even if the memory in the loss differentials is accounted for. The rest of this article is organized as follows. Section 2 reviews the classic DM test and presents the fixed-b approach for the short memory case. Section 3 covers the case of long-range dependence and contains our theoretical results on the transmission of long memory to the loss differential series. Two distinct approaches to design a robust t-statistic are discussed in Section 4. Section 5 contains our Monte Carlo study and in Section 6 we present our empirical results. Conclusions are drawn in Section 7. All proofs are contained in the Appendix. 2 DM Test Diebold and Mariano (1995) construct a test for H0:E[g(yt,ŷ1t)−g(yt,ŷ2t)]=E(zt)=0, solely based on assumptions on the loss differential series zt. Suppose that zt follows the weakly stationary linear process zt=μz+∑j=0∞θjvt−j , (2) where it is required that |μz|<∞ and ∑j=0∞θj2<∞ hold. For simplicity of the exposition we additionally assume that vt∼iid(0,σv2). If ŷ1t and ŷ2t perform equally well according to the loss function g(·), then μz=0 holds, otherwise μz≠0. The corresponding t-statistic is based on the sample mean z¯=T−1∑t=1Tzt and an estimate (V̂) of the long-run variance V=lim⁡T→∞Var(Tτ(z¯−μz)). The DM statistic is given by tDM=Tτz¯V̂ . (3) Under stationary short memory, we have τ=1/2, while the rate changes to τ=1/2−d under stationary long memory, with 0<d<1/2 being the long memory parameter. The (asymptotic) distribution of this t-statistic hinges on the autocorrelation properties of the loss differential series zt. In the following, we shall distinguish two cases: (i) zt is a stationary short memory process with d = 0 and (ii) strong dependence in form of a long memory process (with 0<d<1/2) is present in zt as presented in Section 3. 2.1 Conventional Approach: HAC For the estimation of the long-run variance V, Diebold and Mariano (1995) suggest using the truncated long-run variance of an MA(h – 1) process for an h-step-ahead forecast. This is motivated by the fact that optimal h-step-ahead forecast errors of a linear time series process follow an MA(h – 1) process. Nevertheless, as pointed out by Diebold (2015), among others, the test is readily extendable to more general situations if, for example, HAC estimators are used (see also Clark (1999) for some early simulation evidence). The latter have become the standard class of estimators for the long-run variance. In particular, V̂HAC=∑j=−T+1T−1k(jB)γ̂z(j) , (4) where k(·) is a user-chosen kernel function, B denotes the bandwidth and γ̂z(j) =1T∑t=|j|+1T(zt−z¯)(zt−|j|−z¯) is the usual estimator for the autocovariance of process zt at lag j. The corresponding DM statistic is given by tHAC=T1/2z¯V̂HAC. (5) If zt is weakly stationary with absolutely summable autocovariances γz(j), it holds that V=∑j=−∞∞γz(j). Regularity conditions assumed1 a central limit theorem applies for z¯. Consistency of the long-run variance estimator V̂HAC requires some additional regularity conditions (see, for instance, Andrews (1991) for additional technical details), and in particular the assumption that the ratio b=B/T converges to zero as T→∞. It follows that the tHAC-statistic is asymptotically standard normal under the null hypothesis, that is tHAC⇒N(0,1). For the sake of a comparable notation to the long memory case, note that V=2πfz(0), where fz(0) is the spectral density function of zt at frequency zero. 2.2 Fixed-bandwidth Approach Even though nowadays the application of HAC estimators is standard practice, related tests are often found to be seriously size-distorted in finite samples, especially under strong persistence. Kiefer and Vogelsang (2005) develop an alternative asymptotic framework in which the ratio B / T approaches a fixed constant b∈(0,1] as T→∞. Therefore, it is called fixed-b inference as opposed to the classical small-b HAC approach where b→0. In the case of fixed-b (FB), the estimator V̂(k,b) does not converge to V any longer. Instead, V̂(k,b) converges to V multiplied by a functional of a Brownian bridge process. Hence, V̂(k,b)⇒VQ(k,b). The corresponding t-statistic tFB=T1/2z¯V̂(k,b) (6) has a nonnormal and nonstandard limiting distribution, that is tFB⇒W(1)Q(k,b) . Here, W(r) is a standard Brownian motion on r∈[0,1]. Both, the choice of the bandwidth parameter b and the (twice continuously differentiable) kernel k appear in the limit distribution. For example, for the Bartlett kernel we have Q(k,b)=2b(∫01W˜(r)2dr−∫01−bW˜(r+b)W˜(r)dr), with W˜(r)=W(r)−rW(1) denoting a standard Brownian bridge. Thus, critical values reflect the user choices on the kernel and the bandwidth even in the limit. In many settings, fixed-b inference is more accurate than the conventional HAC estimation approach. An example of its application to forecast comparisons are the aforementioned articles of Choi and Kiefer (2010) and Li and Patton (2018), who apply both techniques (HAC and fixed-b) to compare exchange rate forecasts. Our Monte Carlo simulation study sheds additional light on their relative empirical performance. 3 Long Memory in Forecast Error Loss Differentials 3.1 Preliminaries Under long-range dependence in zt, one has to expect that neither conventional HAC estimators nor the fixed-b approach can be applied without any further modification since strong dependence such as fractional integration is ruled out by assumption of a weakly stationary linear process. Given that zt has long memory, we show that HAC-based tests reject with probability one in the limit (as T→∞) even under the null. This result is stated in our Proposition 6 (at the end of this section). As our finite-sample simulations clearly demonstrate, this implies strong upward size distortions and invalidates the use of the classic DM test statistic. Before we actually state these results formally, we first show that the loss differential zt may exhibit long memory in various situations. We start with a basic definition of stationary long memory time series, c.f. Definition 1.2 of Beran et al. (2013). Definition 1 A time series at with spectral density fa(λ), for λ∈[−π,π], has long memory with memory parameter da∈(0,1/2), if fa(λ)∼Lf|λ|−2da, for da∈(0,1/2), as λ→0. The symmetric function Lf(·)is slowly varying at the origin. We then write at∼LM(da). This is the usual definition of a stationary long memory process and Theorem 1.3 of Beran et al. (2013) states that under this restriction and mild regularity conditions, Definition 1 is equivalent to γa(j)∼Lγ|j|2da−1 as j→∞, where γa(j) is the autocovariance function of at at lag j and Lγ(·) is slowly varying at infinity. If da = 0 holds, the process has short memory. Our results build on the asymptotic behavior of the autocovariances that have the long memory property from Definition 1. Whether this memory is generated by fractional integration can not be inferred. However, this does not affect the validity of the test statistics introduced in Section 4. We therefore adopt Definition 1 that covers fractional integration as a special case. A similar approach is taken by Dittmann and Granger (2002).2 Given Definition 1, we now state some assumptions regarding the long memory structure of the forecast objective and the forecasts. Assumption 1 (Long Memory). The time series yt, ŷ1t, ŷ2twith expectations E(yt)=μy, E(ŷ1t)=μ1and E(ŷ2t)=μ2are causal Gaussian long memory processes (according to Definition 1) of orders dy, d1,and d2, respectively. Similar to Dittmann and Granger (2002), we rely on the assumption of Gaussianity since no results for the memory structure of squares and cross-products of non-Gaussian long memory processes are available in the existing literature. It shall be noted that Gaussianity is only assumed for the derivation of the memory transmission from the forecasts and the forecast objective to the loss differential, but not for the subsequent results. In the following, we make use of the concept of common long memory in which a linear combination of long memory series has reduced memory. The amount of reduction is labeled as δ. Definition 2 (Common Long Memory). The time series at and bt have common long memory (CLM) if both at and bt are LM(d) and there exists a linear combination ct=at−ψ0−ψ1btwith ψ0∈Rand ψ1∈R\0such that ct∼LM(d−δ), for some d≥δ>0. We write at,bt∼CLM(d,d−δ). For simplicity and ease of exposition, we first exclude the possibility of common long memory among the series. This assumption is relaxed later on. Assumption 2 (Absence of Common Long Memory). If at,bt∼LM(d), then at−ψ0−ψ1bt∼LM(d)for all ψ0∈R,ψ1∈Rand at,bt∈{yt,ŷ1t,ŷ2t}. To derive the long memory properties of the forecast error loss differential, we make use of a result in Leschinski (2017) that characterizes the memory structure of the product series atbt for two long memory time series at and bt. Such products play an important role in the following analysis. The result is therefore shown as Proposition 1 below, for convenience. Proposition 1 (Leschinski (2017), Memory of Products). Let at and bt be long memory series according to Definition 1 with memory parameters da and db, and means μa and μb, respectively. Then atbt∼ {LM(max⁡{da,db}),for μa,μb≠0LM(da),for μa=0,μb≠0LM(db),for μb=0,μa≠0LM(max⁡{da+db−1/2,0}),for μa=μb=0 and Sa,b≠0LM(da+db−1/2),for μa=μb=0 and Sa,b=0, where Sa,b=∑j=−∞∞γa(j)γb(j) with γa(·) and γb(·) denoting the autocovariance functions of at and bt, respectively. Proposition 1 shows that the memory of products of long memory time series critically depends on the means μa and μb of the series at and bt. If both series are mean zero, the memory of the product is either the maximum of the sum of the memory parameters of both factor series minus one half—or it is zero—depending on the sum of autocovariances. Since da,db<1/2, this is always smaller than any of the original memory parameters. If only one of the series is mean zero, the memory of the product atbt is determined by the memory of this particular series. Finally, if both series have nonzero means, the memory of the product is equal to the maximum of the memory orders of the two series. Furthermore, Proposition 1 makes a distinction between antipersistent series and short memory series if the processes have zero means and da+db−1/2<0. Our results below, however, do not require this distinction. The reason being that a linear combination involving the square of at least one of the series appears in each case, and these cannot be antipersistent long memory processes (see the proofs of Propositions 2 and 5 for details). As discussed in Leschinski (2017), Proposition 1 is related to the results in Dittmann and Granger (2002), who consider the memory of nonlinear transformations of zero mean long memory time series that can be represented through a finite sum of Hermite polynomials. Their results include the square at2 of a time series which is also covered by Proposition 1 if at = bt. If the mean is zero ( μa=0), we have at2∼LM(max⁡{2da−1/2,0}). Therefore, the memory is reduced to zero if d≤1/4. However, as can be seen from Proposition 1, this behavior depends critically on the expectation of the series. Since it is the most widely used loss function in practice, we focus on the MSE loss function g(yt,ŷit)=(yt−ŷit)2 for i = 1, 2. The quadratic forecast error loss differential is then given by zt=(yt−ŷ1t)2−(yt−ŷ2t)2=ŷ1t2−ŷ2t2−2yt(ŷ1t−ŷ2t). (7) As usual in DM tests, we do not need to know, or assume the form for, the forecasting models or methods used to generate the forecasts. The forecasts are taken as “primitives” in this analysis. 3.2 Transmission of Long Memory to the Loss Differential Following the introduction of the necessary definitions and a preliminary result, we now present the result for the memory order of zt defined via (7) in Proposition 2. It is based on the memory of yt, ŷ1t and ŷ2t and assumes the absence of common long memory for simplicity. Proposition 2 (Memory Transmission in the Absence of Common Long Memory). Under Assumptions 1 and 2, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy, d1, d2}, if μ1≠μ2≠μymax⁡{d1, d2}, if μ1=μ2≠μymax⁡{2d1−1/2, d2, dy}, if μ1=μy≠μ2max⁡{2d2−1/2, d1, dy}, if μ1≠μy=μ2max⁡{2max⁡{d1, d2}−1/2, dy+max⁡{d1, d2}−1/2, 0}, if μ1=μ2=μy. Proof See the Appendix. The basic idea of the proof relates to Proposition 3 of Chambers (1998). It shows that the behavior of a linear combination of long memory series is dominated by the series with the strongest memory. Since we know from Proposition 1 that μ1,μ2, and μy play an important role for the memory of a squared long memory series, we set yt=yt*+μy and ŷit=ŷit*+μi, so that the starred series denote the demeaned series and μi denotes the expected value of the respective series. Straightforward algebra yields zt=ŷ1t*2−ŷ2t*2−2[yt*(μ1−μ2)+ŷ1t*(μy−μ1)+ŷ2t*(μy−μ2)]−2[yt*(ŷ1t*−ŷ2t*)]+const. (8) From (8), it is apparent that zt is a linear combination of (i) the squared forecasts ŷ1t*2 and ŷ2t*2, (ii) the forecast objective yt, (iii) the forecast series ŷ1t*, and ŷ2t* and (iv) products of the forecast objective with the forecasts, that is yt*ŷ1t* and yt*ŷ2t*. The memory of the squared series and the product series is determined in Proposition 1, from which the zero mean product series yt*ŷit* is LM(max⁡{dy+di−1/2, 0}) or LM(dy+di−1/2). Moreover, the memory of the squared zero mean series ŷit*2 is max⁡{2di−1/2, 0}. By combining these results with that of Chambers (1998), the memory of the loss differential zt is the maximum of all memory parameters of the components in (8). Proposition 2 then follows from a case-by-case analysis. It demonstrates the transmission of long memory from the forecasts ŷ1t, ŷ2t and the forecast objective yt to the loss differential zt. The nature of this transmission, however, critically hinges on the (un)biasedness of the forecasts. If both forecasts are unbiased (i.e., if μ1=μ2=μy), the memory from all three input series is reduced and the memory of the loss differential zt is equal to the maximum of (i) these reduced orders and (ii) zero. Therefore, only if memory parameters are small enough such that dy+max⁡{d1+d2}<1/2, the memory of the loss differential zt is reduced to zero. In all other cases, there is a transmission of dependence from the forecast and/or the forecast objective to the loss differential. The reason for this can immediately be seen from (8). Terms in the first bracket have larger memory than the remaining ones, because di>2di−1/2 and max⁡{dy,di}>dy+di−1/2. Therefore, these terms dominate the memory of the products and squares whenever biasedness is present, that is μi−μy≠0 holds. Interestingly, the transmission of memory from the forecast objective yt is prevented if both forecasts have equal bias, that is μ1=μ2. On the contrary, if μ1≠μ2, dz is at least as high as dy. 3.3 Memory Transmission under Common Long Memory The results in Proposition 2 are based on Assumption 2 that precludes common long memory among the series. Of course, in practice it is likely that such an assumption is violated. In fact, it can be argued that reasonable forecasts of long memory time series should have common long memory with the forecast objective. Therefore, we relax this assumption and replace it with Assumption 3, below. Assumption 3 (Common Long Memory). The causal Gaussian process xt has long memory according to Definition 1 of order dx with expectation E(xt)=μx. If at,bt∼CLM(dx,dx−δ), then they can be represented as yt=βy+ξyxt+ηtfor at,bt=ytand ŷit=βi+ξixt+ɛit, for at,bt=ŷit, with ξy,ξi≠0. Both, ηt and ɛitare mean zero causal Gaussian long memory processes with parameters dηand dɛifulfilling 1/2>dx>dη,dɛi≥0, for i = 1, 2. Assumption 3 restricts the common long memory to be of a form so that both series at and bt can be represented as linear functions of their joint factor xt. This excludes more complicated forms of dependence that are sometimes considered in the cointegration literature such as nonlinear or time-varying cointegration. We know from Proposition 2 that the transmission of memory critically depends on the biasedness of the forecasts which leads to a complicated case-by-case analysis. If common long memory according to Assumption 3 is allowed for, we have an even more complex situation since there are several possible relationships: CLM of yt with one of the ŷit; CLM of both ŷit with each other, but not with yt; and CLM of each ŷit with yt. Each of these situations has to be considered with all possible combinations of the ξj and the μj for all j∈{y,1,2}. To deal with this complexity, we focus on three important special cases: (i) the forecasts are biased and the ξj differ from each other, (ii) the forecasts are biased, but the ξj are equal, and (iii) the forecasts are unbiased and ξa=ξb if at and bt are in a common long memory relationship. To understand the role of the coefficients ξa and ξb in the series that are subject to CLM, note that the forecast errors yt−ŷit impose a cointegrating vector of (1,−1). A different scaling of the forecast objective and the forecasts is not possible. In the case of CLM between yt and ŷit, for example, we have from Assumption 3 that yt−ŷit=βy−βi+xt(ξy−ξi)+ηt−ɛit, (9) so that xt(ξy−ξi) does not disappear from the linear combination if the scaling parameters ξy and ξi are different from each other. We refer to a situation where ξa=ξb as “balanced CLM,” whereas CLM with ξa≠ξb is referred to as “unbalanced CLM.” In the special case (i) both forecasts are biased and the presence of CLM does not lead to a cancellation of the memory of xt in the loss differential. Of course this can be seen as an extreme case, but it serves to illuminate the mechanisms at work—especially in contrast to the results in Propositions 4 and 5, below. By substituting the linear relations from Assumption 3 for those series involved in the CLM relationship in the loss differential zt=ŷ1t2−ŷ2t2−2yt(ŷ1t−ŷ2t) and again setting at=at*+μa for those series that are not involved in the CLM relationship, it is possible to find expressions that are analogous to (8). Since analogous terms to those in the first bracket of (8) appear in each case, it is possible to focus on the transmission of memory from the forecasts and the objective function to the loss differential. We obtain the following result. Proposition 3 (Memory Transmission with Biased Forecasts and Unbalanced CLM). Let ξi≠ξy, ξ1≠ξ2, μi≠μy, and μ1≠μ2, for i = 1, 2. Then under Assumptions 1 and 3, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy,dx}, if ŷ1t,ŷ2t∼CLM(dx,dx−δ), except if ξ1/ξ2=(μy−μ2)/(μy−μ1)max⁡{d2,dx}, if ŷ1t,yt∼CLM(dx,dx−δ), except if ξ1/ξy=−(μ1−μ2)/(μy−μ1)max⁡{d1,dx}, if ŷ2t,yt∼CLM(dx,dx−δ), except if ξ2/ξy=−(μ1−μ2)/(μy−μ2)dx, if ŷ1t,ŷ2t,yt∼CLM(dx,dx−δ), except if ξ1(μy−μ1)+ξy(μ1−μ2)=ξ2(μy−μ2). Proof See the Appendix. In absence of common long memory we observe in Proposition 1 that the memory is given as max⁡{d1,d2,dy} if the means differ from each other. Now, if two of the series share common long memory, they both have memory dx. Hence, Proposition 3 shows that the transmission mechanism is essentially unchanged and the memory of the loss differential is still dominated by the largest memory parameter. The only exception to this rule is the knife-edge case where the differences in the means and the memory parameters offset each other. Similar to (i), case (ii) refers to a situation of biasedness, but now with balanced CLM, so that the underlying long memory factor xt cancels out in the forecast error loss differentials. The memory transmission can thus be characterized by the following proposition. Proposition 4 (Memory Transmission with Biased Forecasts and Balanced CLM). Let ξ1=ξ2=ξy. Then under Assumptions 1 and 3, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy,dx}, if ŷ1t,ŷ2t∼CLM(dx,dx−δ), and μ1≠μ2max⁡{d2,dx}, if ŷ1t,yt∼CLM(dx,dx−δ), and μy≠μ2max⁡{d1,dx}, if ŷ2t,yt∼CLM(dx,dx−δ), and μy≠μ1d˜, if ŷ1t,ŷ2t,yt∼CLM(dx,dx−δ),for some 0≤d˜<dx. Proof See the Appendix. We refer to the first three cases in Propositions 3 and 4 as “partial CLM” as there is always one of the ŷit or yt that is not part of the CLM relationship and the fourth case as “full CLM.” We can observe that the dominance of the memory of the most persistent series under partial CLM is preserved for both balanced and unbalanced CLM. We therefore conclude that this effect is generated by the interaction with the series that is not involved in the CLM relationship. This can also be seen from Equations (22) to (24) in the proof. Only in the fourth case with full CLM, the memory transmission changes between Propositions 3 and 4. In this case, the memory in the loss differential is reduced to dz < dx. The third special case (iii) refers to a situation of unbiasedness similar to the last case in Proposition 2. In addition to that, it is assumed that there is balanced CLM as in Proposition 4, where ξa=ξb if at and bt are in a common long memory relationship. Compared to the setting of the previous propositions this is the most ideal situation in terms of forecast accuracy. Here, we have the following result. Proposition 5 (Memory Transmission with Unbiased Forecasts and Balanced CLM). Under Assumptions 1 and 3, and if μy=μ1=μ2and ξy=ξa=ξb, then zt∼LM(dz), with dz={max⁡{d2+max⁡{dx,dη}−1/2, 2max⁡{dx,d2}−1/2, dɛ1},if yt,ŷ1t∼CLM(dx,dx−δ˜)max⁡{d1+max⁡{dx,dη}−1/2, 2max⁡{dx,d1}−1/2, dɛ2},if yt,ŷ2t∼CLM(dx,dx−δ˜)max⁡{max⁡{dx, dy}+max⁡{dɛ1, dɛ2}−1/2, 0},if ŷ1t,ŷ2t∼CLM(dx,dx−δ˜)max⁡{dη+max⁡{dɛ1, dɛ2}−1/2, 2max⁡{dɛ1, dɛ2}−1/2, 0},if yt,ŷ1t∼CLM(dx,dx−δ˜)and yt,ŷ2t∼CLM(dx,dx−δ˜).Here, 0<δ˜≤1/2denotes a generic constant for the reduction in memory. Proof See the Appendix. Proposition 5 shows that the memory of the forecasts and the objective variable can indeed cancel out if the forecasts are unbiased and if they have the same factor loading on xt (i.e., if ξ1=ξ2=ξy). However, in the first two cases, the memory of the error series ɛ1t and ɛ2t imposes a lower bound on the memory of the loss differential. Furthermore, even though the memory can be reduced to zero in the third and fourth case, this situation only occurs if the memory orders of xt, yt and the error series are sufficiently small. Otherwise, the memory is reduced, but does not vanish. Overall, the results in Propositions 2, 3, 4, and 5 show that long memory can be transmitted from forecasts or the forecast objective to the forecast error loss differentials. Our results also show that the biasedness of the forecasts plays an important role for the transmission of dependence to the loss differentials. To get further insights into the mechanisms found in Propositions 2, 3, 4, and 5, let us consider a situation in which two forecasts with different nonzero biases are compared. In the absence of CLM, it is obvious from Proposition 2 that the memory of the loss differential is determined by the maximum of the memory orders of the forecasts and the forecast objective. If one of the forecasts has common long memory with the objective, the same holds true—irrespective of the loadings ξa on the common factor. As can be seen from Proposition 3, even if both forecasts have CLM with the objective, the maximal memory order is transmitted to zt if the factor loadings ξa differ. Only if the factor loadings are equal, the memory is reduced as stated in Proposition 4. If we consider two forecasts that are unbiased in the absence of CLM, it can be seen from Proposition 2 that the memory of the loss differential is lower than that of the original series. The same holds true in the presence of CLM, as covered by Proposition 5. In practical situations, it might be overly restrictive to impose exact unbiasedness (under which memory would be reduced according to Proposition 5). Our empirical application regarding the predictive ability of the VIX serves as an example since it is a biased forecast of future quadratic variation due to the existence of a variance risk premium (see Section 6). Biases can also be caused by estimation errors. This issue might be of less importance in a setup where the estimation period grows at a faster rate than the (pseudo-) out-of-sample period that is used for forecast evaluation. For the DM test, however, it is usually assumed that this is not the case. Otherwise, it could not be used for the comparison of forecasts from nested models due to a degenerated limiting distribution (cf. Giacomini and White (2006) for a discussion). Instead, the sample of size T* is split into an estimation period TE and a forecasting period T such that T*=TE+T and it is assumed that T grows at a faster rate than TE so that TE/T→0 as T*→∞. Therefore, the estimation error shrinks at a lower rate than the growth rate of the evaluation period and it remains relevant, asymptotically. 3.4 Asymptotic and Finite-Sample Behaviour under Long Memory After establishing that forecast error loss differentials may exhibit long memory in various situations, we now consider the effect of long memory on the HAC-based DM test. The following Proposition establishes that the size of the test approaches unity, as T→∞. Thus, the test indicates with probability one that one of the forecasts is superior to the other one, even if both tests perform equally well according to g(·). Note that the test also has an asymptotic rejection probability of one under the alternative. Proposition 6 (DM under Long Memory). For zt∼LM(d)with d∈(0,1/2), the asymptotic size (under H0) of the tHAC-statistic equals unity as T→∞. Proof See the Appendix. This result shows that inference based on HAC estimators is asymptotically invalid under long memory. To explore to what extent this finding also affects the finite-sample performance of the tHAC- and tFB-statistics, we conduct a small-scale Monte Carlo experiment as an illustration. The results shown in Figure 1 are obtained with M = 5000 Monte Carlo repetitions. We simulate samples of T = 50 and T = 2000 observations from a fractionally integrated process using different values of the memory parameter d in the range from 0 to 0.4. The HAC estimator and the fixed-b approach are implemented with the commonly used Bartlett- and Quadratic Spectral (QS) kernels.3 Figure 1. View largeDownload slide Size of the tHAC- and tFB-tests with T∈{50,2000} for different values of the memory parameter d. Figure 1. View largeDownload slide Size of the tHAC- and tFB-tests with T∈{50,2000} for different values of the memory parameter d. We start by commenting on the results for the small sample size of T = 50 in the left panel of Figure 1. As demonstrated by Kiefer and Vogelsang (2005), the fixed-b approach works exceptionally well for the short memory case of d = 0, with the Bartlett and QS kernel achieving approximately equal size control. The tHAC-statistic over-rejects more than the fixed-b approach and, as stated in Andrews (1991), better size control is provided if the Quadratic Spectral kernel is used. If the memory parameter d is positive, we observe that all tests severely over-reject the null hypothesis. For d = 0.4, the size of the HAC-based test is approximately 65% and that of the fixed-b version using the Bartlett kernel is around 40%. We therefore find that the size distortions are not only an asymptotic phenomenon, but they are already severe in samples of just T = 50 observations. Moreover, even for small deviations of d from zero, all tests are over-sized. These findings motivate the use of long memory robust procedures. Continuing with the results for T = 2000 in the right panel of Figure 1, we observe similar findings in general. For the short memory case, size distortions observed in small samples vanish. All tests statistics are well behaved for d = 0. On the contrary, for d > 0 size distortions are stronger compared to T = 50, although the magnitude of the additional distortion is moderate. This feature can be attributed to the slow divergence rate (as given in the proof of Proposition 6) of the test statistic under long memory. 4 Long-Run Variance Estimation under Long Memory Since conventional HAC estimators lead to spurious rejections under long memory, we consider memory robust long-run variance estimators. To the best of our knowledge only two extensions of this kind are available in the literature: the MAC estimator of Robinson (2005) and an extension of the fixed-b estimator from McElroy and Politis (2012). We do not assume that forecasts are obtained from some specific class of model. We merely extend the typical assumptions of Diebold and Mariano (1995) on the loss differentials so that long memory is allowed. 4.1 MAC Estimator The MAC estimator is developed by Robinson (2005) and further explored and extended by Abadir, Distaso, and Giraitis (2009). Albeit stated in a somewhat different form, the same result is derived independently by Phillips and Kim (2007), who consider the long-run variance of a multivariate fractionally integrated process. Robinson (2005) assumes that zt is linear (in the sense of our Equation (1), see also Assumption L in Abadir, Distaso, and Giraitis (2009)) and that for λ→0 its spectral density fulfills f(λ)=b0|λ|−2d+o(|λ|−2d), with b0>0, |λ|≤π, d∈(−1/2,1/2) and b0=lim⁡λ→0|λ|2df(λ). Among others, this assumption covers stationary and invertible ARFIMA processes. For notational convenience, here we drop the index z from the spectral density and the memory parameter. A key result for the MAC estimator is that as T→∞ Var(T1/2−dz¯)→b0p(d) with p(d)={2Γ(1−2d)sin⁡(πd)d(1+2d)if d=0,2πif d=0. The case of short memory (d = 0) yields the familiar result that the long-run variance of the sample mean equals 2πb0=2πf(0). Hence, estimation of the long-run variance requires estimation of f(0) in the case of short memory. If long memory is present in the data generating process (DGP), estimation of the long-run variance additionally hinges on the estimation of d. The MAC estimator is therefore given by V̂(d̂,md,m)=b̂m(d̂)p(d̂) . In more detail, the estimation of V works as follows: First, if the estimator for d fulfills the condition d̂−d=op(1/log⁡T), plug-in estimation is valid (cf. Abadir, Distaso, and Giraitis (2009)). Thus, p(d) can simply be estimated through p(d̂). A popular estimator that fulfills this rather weak requirement is the local Whittle estimator with bandwidth md=⌊Tqd⌋, where 0<qd<1 denotes a generic bandwidth parameter and ⌊·⌋ denotes the largest integer smaller than its argument. This estimator is given by d̂LW=arg⁡min⁡d∈(−1/2,1/2)RLW(d), where RLW(d)=log⁡(1md∑j=1mdj2dIT(λj))−2dmd∑j=1mdlog⁡j, IT(λj) is the periodogram (which is independent of d̂), IT(λj)=(2πT)−1|∑t=1Texp⁡(itλj)zt|2 and the λj=2πj/T are the Fourier frequencies for j=1,...,⌊T/2⌋. Many other estimation approaches (e.g., log-periodogram estimation, etc.) would be a possibility as well. Since the loss differential in (7) is a linear combination of processes with different memory orders, the local polynomial Whittle plus noise (LPWN) estimator of Frederiksen, Nielsen, and Nielsen (2012) is a particularly useful alternative. This estimator extends the local Whittle estimator by approximating the log-spectrum of possible short memory components and perturbation terms in the vicinity of the origin by polynomials. This leads to a reduction of finite-sample bias. The estimator is consistent for d∈(0,1) and asymptotically normal in the presence of perturbations for d∈(0,0.75), but with the variance inflated by a multiplicative constant compared with the local Whittle estimator. Based on a consistent estimator d̂, as those discussed above, b0 can be estimated consistently by b̂m(d̂)=m−1∑j=1mλj2d̂IT(λj). The bandwidth m is determined according to m=⌊Tq⌋ such that m→∞ and m=o(T/(log⁡T)2). The MAC estimator is consistent as long as d̂→pd and b̂m(d̂)→pb0. These results hold under very weak assumptions—neither linearity of zt nor Gaussianity are required. Under somewhat stronger assumptions the tMAC-statistic is also normal distributed (see Theorem 3.1. of Abadir, Distaso, and Giraitis (2009)): tMAC⇒N(0,1) . The t-statistic using the feasible MAC estimator can be written as tMAC=T1/2−d̂z¯V̂(d̂,md,m), with md and m being the bandwidths for estimation of d and b0, respectively.4 4.2 Extended Fixed-Bandwidth Approach Following up on the work by Kiefer and Vogelsang (2005), McElroy and Politis (2012) extend the fixed-bandwidth approach to long-range dependence. Their approach is similar to the one of Kiefer and Vogelsang (2005) in many respects, as can be seen below. The test statistic suggested by McElroy and Politis (2012) is given by tEFB=T1/2z¯V̂(k,b). In contrast to the tMAC-statistic, the tEFB-statistic involves a scaling of T1/2. This has an effect on the limit distribution, which depends on the memory parameter d. Analogously to the short memory case, the limiting distribution is derived by assuming that a functional central limit theorem for the partial sums of zt applies, so that tEFB⇒Wd(1)Q(k,b,d), where Wd(r) is a fractional Brownian motion and Q(k,b,d) depends on the fractional Brownian bridge W˜d(r)=Wd(r)−rWd(1). Furthermore, Q(k,b,d) depends on the first and second derivatives of the kernel k(·). In more detail, for the Bartlett kernel we have Q(k,b,d)=2b(∫01W˜d(r)2dr−∫01−bW˜d(r+b)W˜d(r)dr) and thus, a similar structure as for the short memory case. Further details and examples can be found in McElroy and Politis (2012). The joint distribution of Wd(1) and Q(k,b,d) is found through their joint Fourier-Laplace transformation, see Fitzsimmons and McElroy (2010). It is symmetric around zero and has a cumulative distribution function which is continuous in d. Besides the similarities to the short memory case, there are some important conceptual differences to the MAC estimator. First, the MAC estimator belongs to the class of “small-b” estimators in the sense that it estimates the long-run variance directly, whereas the fixed-b approach leads also in the long memory case to an estimate of the long-run variance multiplied by a functional of a fractional Brownian bridge. Second, the limiting distribution of the tEFB-statistic is not a standard normal, but rather depending on the chosen kernel k, the fixed-bandwidth parameter b, and the long memory parameter d. While the first two are user-specific, the latter one requires a plug-in estimator, as does the MAC estimator. As a consequence, the critical values are depending on d.McElroy and Politis (2012) offer response curves for various kernels.5 5 Monte Carlo Study This section presents further results on memory transmission to the forecast error loss differentials and the relative performance of the tMAC and tEFB-statistics by means of extensive Monte Carlo simulations. It is divided into three parts. First, we conduct Monte Carlo experiments to verify the results obtained in Propositions 2–5 and to explore whether similar results apply for non-Gaussian processes and under the QLIKE loss function. The second part studies the memory properties of the loss differential in a number of empirically motivated forecasting scenarios. Finally, in the third part we explore the finite-sample size and power properties of the robustified tests discussed above and make recommendations for their practical application. 5.1 Memory Transmission to the Forecast Error Loss Differentials: Beyond MSE and Gaussianity The results on the transmission of long memory from the forecasts or the forecast objective to the loss differentials in Propositions 2–5 are restricted to stationary Gaussian processes and forecasts evaluated using MSE as a loss function. In this section, we first verify the validity of the predictions from our propositions. Furthermore, we study how these results translate to non-Gaussian processes, nonstationary processes, and the QLIKE loss function which we use in our empirical application in Section 6 on volatility forecasting. It is given by QLIKE(ŷit,yt)=log⁡ŷit+ytŷit . (10) For a discussion of the role and importance of this loss function in the evaluation of volatility forecasts see Patton (2011). All DGPs are based on fractional integration. Due to the large number of cases in Propositions 2–5, we restrict ourselves to representative situations. The first two DGPs are based on cases (i) and (v) in Proposition 2 that covers situations when the forecasts and the forecast objective are generated from a system without common long memory. We simulate processes of the form at=μa+at*σ̂a*, (11) where at∈{yt,ŷ1t,ŷ2t}, and at*=(1−L)−daɛat. As in Section 3, the starred variable at* is a zero-mean process, whereas at has mean μa and the ɛat are iid. The innovation sequences are either standard normal or t(5)-distributed. The standardization of at* neutralizes the effect of increasing values of the memory parameter d on the process variance and controls the scaling of the mean relative to the variance. The loss differential series zt is then calculated as in (1). We use 5000 Monte Carlo replications and consider sample sizes of T={250,2000}. The first two DPGs for zt are obtained by setting the means μa in (11) as follows DGP1: (μ1,μ2,μy)=(1,−1,0)DGP2: (μ1,μ2,μy)=(0,0,0). The other DGPs represent the last cases of Propositions 3–5. These are based on the fractionally cointegrated system (yt*ŷ1t*ŷ2t*xt)=(100ξy010ξ1001ξ20001)(ηtɛ1tɛ2txt), where ηt, ɛ1t, ɛ2t, and xt are mutually independent and fractionally integrated with parameters dη, dɛ1, dɛ2, and dx. DGPs 3 to 5 are then obtained by selecting the following parameter constellations: DGP3: (μ1,μ2,μy,ξ1,ξ2,ξy)=(1,−1,0,1,2,1.5)DGP4: (μ1,μ2,μy,ξ1,ξ2,ξy)=(1,−1,0,1,1,1)DGP5: (μ1,μ2,μy,ξ1,ξ2,ξy)=(0,0,0,1,1,1). Each of our DGPs 2 to 5 is formulated such that the reduction in the memory parameter is the strongest among all cases covered in the respective proposition. Simulation results for other cases would therefore show an even stronger transmission of memory to the loss differentials. Since the QLIKE criterion is only defined for nonnegative forecasts, we consider a long memory stochastic volatility specification if QLIKE is used and simulate forecasts and forecast objective of the form exp⁡(at/2), whereas the MSE is calculated directly for the at. It should be noted that the loss differential zt is a linear combination of several persistent and antipersistent component series. This is a very challenging setup for the empirical estimation of the memory parameter. We therefore resort to the aforementioned LPWN estimator of Frederiksen, Nielsen, and Nielsen (2012) with a bandwidth of md=⌊T0.65⌋ and a polynomial of degree one for the noise term that can be expected to have the lowest bias in this setup among the available methods to estimate the memory parameters. However, the estimation remains difficult and any mismatch between the theoretical predictions from our propositions and the finite-sample results reported here is likely to be due to the finite-sample bias of the semiparametric estimators. The results for DGPs 1 and 2 are given in Table 1. We start with the discussion of simulation results for cases covered by our theoretical results. Table 1 shows the results for DGPs 1 and 2. Under MSE loss, and with Gaussian innovations, Proposition 2 states that for DGP1 we have dz=0.25 if d1,d2∈{0,0.2} and dz=0.4 if either d1 or d2 is equal to 0.4, in the top left panel. In the bottom left panel results for DGP2 are reported. Proposition 2 states that dz = 0 if d1,d2∈{0,0.2} and dz=0.3, for d1=0.4 or d2=0.4. We can observe that the memory tends to be slightly larger than predicted for small d1 and d2 and it tends to be slightly smaller for dz=0.4. However, the results closely mirror the theoretical results from Proposition 2 in general. Table 1. Monte Carlo averages of estimated memory in the loss differential zt for DGP1 and DGP2 with dy=0.25 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 Table 1. Monte Carlo averages of estimated memory in the loss differential zt for DGP1 and DGP2 with dy=0.25 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 With regard to the cases not covered by the theoretical derivations, we can observe that the results for t-distributed innovations are nearly identical to those obtained for the Gaussian distribution. The same holds true for the Gaussian long memory stochastic volatility model and the QLIKE loss function. If the innovations of the LMSV model are t-distributed, the memory in the loss differential is slightly lower, but still substantial. Finally, in presence of nonstationary long memory with d1 or d2 equal to 0.6, we can observe that the loss differential exhibits long memory with an estimated degree between 0.4 and 0.5. The only exception is when the QLIKE loss function is used for DGP1. Here, we observe some asymmetry in the results, in the sense that the estimated memory parameter of the loss differential is slightly lower if d2 is low, relative to d1. However, the memory transmission is still substantial. The results for DGP3 to DGP5, where forecasts and the forecast objective have common long memory, are shown in Table 2. If we again consider the left column that displays the results for MSE loss and Gaussian innovations, Proposition 3 states for the case of DGP3 that the memory for all d in the stationary range should be dx=0.45. Proposition 4 does not give an exact prediction for DGP4, but states that the memory in the loss differential should be reduced compared with DGP3. Finally, for DGP5, Proposition 5 implies that dz = 0, for d1,d2∈{0,0.2} and dz=0.3 if d1 or d2 equal 0.4. Table 2. Monte Carlo averages of estimated memory in the loss differential zt for DGP3, DGP4, and DGP5 with dη=0.2 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 Table 2. Monte Carlo averages of estimated memory in the loss differential zt for DGP3, DGP4, and DGP5 with dη=0.2 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 As for DGP1 and DGP2, our estimates of dz are roughly in line with the theoretical predictions. For DGP3, where dz should be large, we see that the estimates are a bit lower, but the estimated degree of memory is still considerable. The results for DGP4 are indeed slightly lower than those for DGP3, as predicted by Proposition 4. Finally, for DGP5 we again observe that dz is somewhat overestimated if the true value is low and vice versa. As in Table 1, we see that the results are qualitatively the same if we consider t-distributed innovation sequences and the QLIKE loss function. Additional simulations with dx=0.65 show that the results are virtually identical for d1,d2≤0.4. If either d1=0.6 or d2=0.6, the memory transmission becomes even stronger, which is also in line with the findings for DGP1 and DGP2 in Table 1. Overall, we find that the finite-sample results presented in this section are in line with the theoretical findings from Section 3. Moreover, the practical relevance of the results in Propositions 2–5 extends far beyond the stationary Gaussian case with MSE loss as demonstrated by the finding that the transmission results obtained with t-distributed innovations, non-stationary processes, and the QLIKE loss function are nearly identical. 5.2 Empirical Forecast Scenarios The relevance of memory transmission to the forecast error loss differentials in practice is further examined by considering a number of simple forecast scenarios motivated by typical empirical examples. To ensure that the null hypothesis of equal predictive accuracy holds, we have to construct two competing forecasts that are different from each other, but perform equally well in terms of a loss function—here the MSE. The length of the estimation period equals TE = 250 and the memory parameter estimates are obtained by the LPWN estimator. The first scenario is motivated by the spurious long memory literature. The DGP is a fractionally integrated process with a time-varying mean that is generated by a random level shift process as in Perron and Qu (2010) or Qu (2011). In detail, yt=xt+μtxt=(1−L)−1/4ɛx,tμt=μt−1+πtɛμ,t, where ɛx,t∼iidN(0,1), ɛμ,t∼iidN(0,1), πt∼iidBern(p) and ɛxt, ɛμt and πt are mutually independent.6 It is well known that it can be difficult to distinguish long memory and low frequency contaminations such as structural breaks (cf. Diebold and Inoue (2001) or Granger and Hyung (2004)). Therefore, it is often assumed that the process is either driven by the one or the other, see, for example, Berkes et al. (2006), who suggest a test that allows to test for the null hypothesis of a weakly dependent process with breaks against the alternative of long-range dependence, or Lu and Perron (2010) who demonstrate that a pure level shift process has superior predictive performance compared with ARFIMA and HAR models for the log-absolute returns of the S&P500. See also Varneskov and Perron (2017) for a related recent contribution. In the spirit of this dichotomy, we compare forecasts which solely consider the breaks with those that assume the absence of breaks and predict the process based on a fractionally integrated model (with the memory estimated by the local Whittle method).7 Table 3 shows the results of this exercise. It is clear to see that the average loss differential is close to zero. The estimated memory of the loss differentials is around 0.17 for larger sample sizes. While the classical DM test based on a HAC estimator over-rejects, both the tMAC and the tEFB-statistics control the size well, at least in larger samples. Table 3. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for a spurious long memory scenario T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 Note: The true DGP is fractionally integrated with random level shifts and the forecasts assume either a pure shift process or a pure long memory process. Table 3. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for a spurious long memory scenario T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 Note: The true DGP is fractionally integrated with random level shifts and the forecasts assume either a pure shift process or a pure long memory process. As a second scenario, we consider simple predictive regressions based on two regressors that are fractionally cointegrated with the forecast objective. Here xt is fractionally integrated of order d. Then yt=xt+(1−L)−(d−δ)ηtxi,t=xt+(1−L)−(d−δ)ɛi,tŷit=β̂0i+β̂1ixi,t−1, where ηt and the ɛi,t are mutually independent and normally distributed with unit variances, β̂0i and β̂1i are the OLS estimators and 0<δ<d. To resemble processes in the lower nonstationary long memory region (as the realized volatilities in our empirical application) we set d = 0.6. This corresponds to a situation where we forecast realized volatility of the S&P500 with either past values of the VIX or another past realized volatility such as that of a sector index. The cointegration strength is set to δ=0.3. The results are shown in Table 4. Again, one can see that the Monte Carlo averages ( z¯) are close to zero. The tMAC and tEFB-statistics tend to be conservative in larger samples, whereas the tHAC test rejects far too often. The strength of the memory in the loss differential lies roughly at 0.24. Table 4. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables are fractionally cointegrated with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 Table 4. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables are fractionally cointegrated with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 Our third scenario is closely related to the previous one. In practice it is hard to distinguish fractionally cointegrated series from fractionally integrated series with highly correlated short-run components (cf. the simulation studies in Hualde and Velasco (2008)). Therefore, our third scenario is similar to the second, but with correlated innovations, yt=(1−L)−dηtxit=(1−L)−dɛi,tand ŷit=β̂0i+β̂1ixi,t−1. Here, all pairwise correlations between ηt and the ɛi,t are ρ=0.4. Furthermore, we set d = 0.4, so that we operate in the stationary long memory region. The situation is the same as in the previous scenarios, with strong long memory of d̂z≈0.3 in the loss differentials, see Table 5. Apparently, the tests are quite conservative for this DGP. This can be attributed to the complicated memory estimation in the forecast error loss differential series via the standard local Whittle estimator, see also our discussion in Section 5.1. Table 5. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables have correlated innovations with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 Table 5. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables have correlated innovations with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 Altogether, these results demonstrate that memory transmission can indeed occur in a variety of situations whether that is due to level shifts, cointegration, or correlation—in a nonstationary series, or in a stationary series. 5.3 Size and Power of Long Memory Robust t-Statistics We now turn our attention to the empirical size and power properties of the memory robust tEFB and tMAC-statistics by using the same DGPs as in Section 5.1. Thereby, we reflect the situations covered by our propositions and the distributional properties that are realistic for the forecast error loss differential zt. However, we have to ensure that the loss differentials have zero expectation (size) and that the distance from the null is comparable for different DGPs (power). As DGP1, DGP2, DGP4, and DGP5 are constructed in a symmetric way, we have E[MSE(yt,ŷ1t)−MSE(yt,ŷ2t)]=0 and E[QLIKE(yt,ŷ1t)−QLIKE(yt,ŷ2t)]≠0, due to the asymmetry of the QLIKE loss function. Furthermore, DGP3 is not constructed in a symmetric way, so that E[z˜t]≠0, irrespective of the loss function. We therefore have to correct the means of the loss differentials. In addition to that, different DGPs generate different degrees of long memory. Given that sample means of long memory processes with memory d converge with the rate T1/2−d, we consider the local power to achieve comparable results across DGPs. Let z˜t be generated as in (1), with the ŷit and yt as described in Section 5.1, and z¯=(MT)−1∑i=1M∑t=1Tz˜t, where M denotes the number of Monte Carlo repetitions. Then the loss differentials are obtained via zt=z˜t−z¯+cSD(z˜t)T1/2−dz. (12) The parameter c controls the distance from the null hypothesis (c = 0). Here, each realization of z˜t is centered with the average sample mean from M = 5000 simulations of the respective DGP. Similarly, dz is determined as the Monte Carlo average of the LPWN estimates for the respective setup. In the power simulations, the memory parameters are set to d1=d2=d and dɛ1=dɛ2=d to keep the tables reasonably concise. Table 6 presents the size results for the tMAC-statistic. It tends to be liberal for small T, but generally controls the size well in larger samples. There are, however, two exceptions. First, the test remains liberal for DGP3, even if the sample size increases. This effect is particularly pronounced for d = 0 and if zt is based on the QLIKE loss function. Second, the test is conservative for DGP2, particularly for increasing values of d. With regard to the bandwidth parameters, we find that the size is slightly better controlled with qd=0.65 and q = 0.6. However, the bandwidth choice seems to have limited effects on the size of the test, especially in larger samples. Table 6. Size results of the tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 Table 6. Size results of the tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 Size results for the tEFB-statistic are displayed in Table 7. To analyze the impact of the bandwidth b and the kernel choice, we set qd=0.65.8 Size performance is more favorable with the MQS kernel than with the Bartlett kernel. Furthermore, it is positively impacted by using a large value of b (0.6 or 0.9). Similar to the tMAC-statistic, we observe that the test is liberal with T = 250, but that the overall performance is very satisfactory for T = 2000. Again, the test tends to be liberal for DGP3—especially for QLIKE. However, if the MQS kernel and a larger b is used, this effect disappears nearly completely. The conservative behavior of the test for DGP2 and large values of d is also the same as for the tMAC-statistic. The tMAC-statistic tends to be perform better than the tEFB-statistic using the Bartlett kernel, but worse when considering the MQS kernel (cf. Tables 6 and 7). Table 7. Size results of the tEFB-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations and m=⌊T0.65⌋ MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.09 0.09 0.08 5 0.05 0.03 0.04 0.04 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 2000 1 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.04 0.04 0.07 0.05 0.05 2 0.03 0.03 0.03 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.05 3 0.07 0.05 0.06 0.08 0.07 0.08 0.07 0.06 0.06 0.07 0.07 0.08 4 0.05 0.04 0.04 0.06 0.06 0.05 0.06 0.05 0.05 0.06 0.06 0.06 5 0.06 0.04 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.07 0.07 0.06 0.4 250 1 0.05 0.05 0.05 0.07 0.07 0.06 0.10 0.09 0.08 0.14 0.13 0.11 2 0.03 0.03 0.03 0.03 0.03 0.04 0.05 0.04 0.04 0.05 0.04 0.05 3 0.08 0.07 0.07 0.09 0.09 0.08 0.06 0.06 0.05 0.09 0.08 0.08 4 0.08 0.06 0.07 0.09 0.09 0.09 0.09 0.07 0.06 0.11 0.11 0.11 5 0.04 0.03 0.03 0.05 0.05 0.05 0.04 0.04 0.04 0.05 0.05 0.05 2000 1 0.04 0.04 0.04 0.04 0.04 0.04 0.08 0.05 0.06 0.08 0.08 0.07 2 0.02 0.02 0.02 0.02 0.02 0.01 0.03 0.03 0.03 0.03 0.03 0.03 3 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 5 0.03 0.03 0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.03 0.03 MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.09 0.09 0.08 5 0.05 0.03 0.04 0.04 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 2000 1 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.04 0.04 0.07 0.05 0.05 2 0.03 0.03 0.03 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.05 3 0.07 0.05 0.06 0.08 0.07 0.08 0.07 0.06 0.06 0.07 0.07 0.08 4 0.05 0.04 0.04 0.06 0.06 0.05 0.06 0.05 0.05 0.06 0.06 0.06 5 0.06 0.04 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.07 0.07 0.06 0.4 250 1 0.05 0.05 0.05 0.07 0.07 0.06 0.10 0.09 0.08 0.14 0.13 0.11 2 0.03 0.03 0.03 0.03 0.03 0.04 0.05 0.04 0.04 0.05 0.04 0.05 3 0.08 0.07 0.07 0.09 0.09 0.08 0.06 0.06 0.05 0.09 0.08 0.08 4 0.08 0.06 0.07 0.09 0.09 0.09 0.09 0.07 0.06 0.11 0.11 0.11 5 0.04 0.03 0.03 0.05 0.05 0.05 0.04 0.04 0.04 0.05 0.05 0.05 2000 1 0.04 0.04 0.04 0.04 0.04 0.04 0.08 0.05 0.06 0.08 0.08 0.07 2 0.02 0.02 0.02 0.02 0.02 0.01 0.03 0.03 0.03 0.03 0.03 0.03 3 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 5 0.03 0.03 0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.03 0.03 Table 7. Size results of the tEFB-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations and m=⌊T0.65⌋ MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.09 0.09 0.08 5 0.05 0.03 0.04 0.04 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 2000 1 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.04 0.04 0.07 0.05 0.05 2 0.03 0.03 0.03 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.05 3 0.07 0.05 0.06 0.08 0.07 0.08 0.07 0.06 0.06 0.07 0.07 0.08 4 0.05 0.04 0.04 0.06 0.06 0.05 0.06 0.05 0.05 0.06 0.06 0.06 5 0.06 0.04 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.07 0.07 0.06 0.4 250 1 0.05 0.05 0.05 0.07 0.07 0.06 0.10 0.09 0.08 0.14 0.13 0.11 2 0.03 0.03 0.03 0.03 0.03 0.04 0.05 0.04 0.04 0.05 0.04 0.05 3 0.08 0.07 0.07 0.09 0.09 0.08 0.06 0.06 0.05 0.09 0.08 0.08 4 0.08 0.06 0.07 0.09 0.09 0.09 0.09 0.07 0.06 0.11 0.11 0.11 5 0.04 0.03 0.03 0.05 0.05 0.05 0.04 0.04 0.04 0.05 0.05 0.05 2000 1 0.04 0.04 0.04 0.04 0.04 0.04 0.08 0.05 0.06 0.08 0.08 0.07 2 0.02 0.02 0.02 0.02 0.02 0.01 0.03 0.03 0.03 0.03 0.03 0.03 3 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 5 0.03 0.03 0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.03 0.03 MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.09 0.09 0.08 5 0.05 0.03 0.04 0.04 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 2000 1 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.04 0.04 0.07 0.05 0.05 2 0.03 0.03 0.03 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.05 3 0.07 0.05 0.06 0.08 0.07 0.08 0.07 0.06 0.06 0.07 0.07 0.08 4 0.05 0.04 0.04 0.06 0.06 0.05 0.06 0.05 0.05 0.06 0.06 0.06 5 0.06 0.04 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.07 0.07 0.06 0.4 250 1 0.05 0.05 0.05 0.07 0.07 0.06 0.10 0.09 0.08 0.14 0.13 0.11 2 0.03 0.03 0.03 0.03 0.03 0.04 0.05 0.04 0.04 0.05 0.04 0.05 3 0.08 0.07 0.07 0.09 0.09 0.08 0.06 0.06 0.05 0.09 0.08 0.08 4 0.08 0.06 0.07 0.09 0.09 0.09 0.09 0.07 0.06 0.11 0.11 0.11 5 0.04 0.03 0.03 0.05 0.05 0.05 0.04 0.04 0.04 0.05 0.05 0.05 2000 1 0.04 0.04 0.04 0.04 0.04 0.04 0.08 0.05 0.06 0.08 0.08 0.07 2 0.02 0.02 0.02 0.02 0.02 0.01 0.03 0.03 0.03 0.03 0.03 0.03 3 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 5 0.03 0.03 0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.03 0.03 With regard to the power results in Table 8, we find that the power tends to be a bit lower for d = 0.4, which is likely due to the fact that the scaling parameter dz in equation (12) tends to be underestimated if the true value is larger (cf. Section 5.1). In general, the power of the tEFB-statistic appears to be better if the Bartlett kernel is used compared with the MQS kernel. For DGP1, the power of the test using the MQS kernel is particularly low. The performance of the tMAC-statistic is somewhere in between the two tEFB-statistics which means that the ordering is directly inverse to the performance in terms of size control. It can be observed in Table 8 that, given a specific value of the loss differential, tests using the QLIKE loss function are generally more powerful compared with tests using MSE loss.9 Table 8. Local power for both the tEFB and tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations, b = 0.6, and m=⌊T0.65⌋ tEFB tMAC MSE QLIKE MSE QLIKE Kernel Bartlett MQS Bartlett MQS d T DGP /c 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 250 1 0.09 0.15 0.31 0.45 0.06 0.10 0.19 0.30 0.10 0.69 0.93 0.99 0.06 0.56 0.79 0.91 0.10 0.17 0.34 0.49 0.11 0.65 0.72 0.72 2 0.02 0.25 0.68 0.83 0.03 0.13 0.37 0.59 0.02 0.94 1.00 1.00 0.03 0.89 0.96 0.99 0.01 0.29 0.76 0.84 0.02 0.88 0.90 0.90 3 0.10 0.46 0.70 0.86 0.07 0.31 0.58 0.72 0.13 1.00 1.00 1.00 0.08 0.96 1.00 1.00 0.11 0.50 0.67 0.72 0.14 0.76 0.76 0.78 4 0.09 0.28 0.54 0.68 0.06 0.17 0.38 0.55 0.10 0.95 1.00 1.00 0.06 0.83 0.98 1.00 0.10 0.31 0.57 0.67 0.12 0.75 0.75 0.76 5 0.05 0.69 0.87 0.92 0.03 0.40 0.76 0.86 0.05 1.00 1.00 1.00 0.04 0.98 1.00 1.00 0.04 0.75 0.87 0.87 0.04 0.90 0.91 0.91 2000 1 0.06 0.13 0.30 0.49 0.04 0.09 0.18 0.32 0.06 0.78 0.97 1.00 0.05 0.60 0.86 0.95 0.06 0.15 0.34 0.52 0.07 0.69 0.79 0.81 2 0.02 0.44 0.84 0.90 0.03 0.21 0.53 0.76 0.03 0.99 1.00 1.00 0.03 0.96 1.00 1.00 0.01 0.53 0.88 0.89 0.02 0.94 0.95 0.95 3 0.07 0.57 0.87 0.96 0.05 0.38 0.71 0.85 0.15 1.00 1.00 1.00 0.09 0.99 1.00 1.00 0.07 0.61 0.79 0.84 0.16 0.90 0.91 0.92 4 0.06 0.32 0.66 0.81 0.05 0.20 0.46 0.66 0.08 0.99 1.00 1.00 0.05 0.92 1.00 1.00 0.06 0.36 0.69 0.76 0.07 0.85 0.88 0.89 5 0.04 0.88 0.94 0.98 0.04 0.62 0.90 0.95 0.05 1.00 1.00 1.00 0.04 1.00 1.00 1.00 0.03 0.90 0.92 0.92 0.03 0.96 0.96 0.96 0.2 250 1 0.08 0.13 0.24 0.38 0.05 0.09 0.16 0.26 0.10 0.62 0.88 0.97 0.06 0.49 0.74 0.87 0.08 0.15 0.28 0.42 0.11 0.59 0.69 0.7 2 0.03 0.18 0.49 0.68 0.03 0.10 0.27 0.46 0.04 0.86 0.98 1.00 0.03 0.76 0.92 0.98 0.02 0.20 0.55 0.73 0.03 0.81 0.83 0.84 3 0.09 0.35 0.62 0.79 0.06 0.22 0.48 0.63 0.11 1.00 1.00 1.00 0.07 0.93 1.00 1.00 0.09 0.38 0.63 0.70 0.12 0.76 0.78 0.76 4 0.08 0.21 0.44 0.61 0.05 0.14 0.31 0.46 0.10 0.89 1.00 1.00 0.05 0.76 0.95 0.99 0.09 0.23 0.48 0.62 0.10 0.72 0.75 0.76 5 0.05 0.56 0.79 0.86 0.04 0.34 0.64 0.79 0.05 0.99 1.00 1.00 0.04 0.94 1.00 1.00 0.04 0.62 0.79 0.83 0.05 0.85 0.87 0.87 2000 1 0.05 0.11 0.26 0.44 0.04 0.08 0.16 0.27 0.06 0.74 0.95 0.99 0.05 0.54 0.83 0.93 0.05 0.13 0.29 0.51 0.06 0.72 0.84 0.87 2 0.03 0.26 0.64 0.80 0.03 0.14 0.37 0.59 0.05 0.91 1.00 1.00 0.04 0.82 0.95 0.99 0.02 0.29 0.71 0.81 0.04 0.83 0.86 0.87 3 0.06 0.47 0.82 0.93 0.05 0.30 0.62 0.80 0.11 1.00 1.00 1.00 0.08 0.98 1.00 1.00 0.05 0.54 0.81 0.87 0.11 0.92 0.94 0.94 4 0.06 0.25 0.58 0.76 0.05 0.16 0.36 0.56 0.06 0.97 1.00 1.00 0.04 0.88 0.99 1.00 0.05 0.27 0.64 0.78 0.05 0.88 0.92 0.93 5 0.06 0.73 0.89 0.94 0.05 0.47 0.79 0.89 0.07 1.00 1.00 1.00 0.05 0.99 1.00 1.00 0.05 0.78 0.86 0.88 0.07 0.90 0.91 0.91 0.4 250 1 0.07 0.10 0.17 0.25 0.05 0.08 0.12 0.18 0.12 0.47 0.72 0.86 0.10 0.35 0.57 0.71 0.07 0.11 0.19 0.28 0.14 0.43 0.59 0.65 2 0.03 0.08 0.18 0.29 0.03 0.05 0.11 0.19 0.05 0.59 0.83 0.93 0.03 0.46 0.68 0.81 0.03 0.08 0.21 0.31 0.05 0.59 0.69 0.72 3 0.10 0.17 0.37 0.55 0.07 0.13 0.26 0.39 0.10 0.93 1.00 1.00 0.07 0.78 0.96 1.00 0.11 0.19 0.40 0.52 0.11 0.67 0.69 0.70 4 0.08 0.14 0.25 0.37 0.07 0.11 0.18 0.25 0.10 0.67 0.91 0.98 0.08 0.52 0.77 0.90 0.10 0.16 0.27 0.38 0.11 0.57 0.66 0.68 5 0.04 0.19 0.39 0.51 0.04 0.12 0.27 0.39 0.05 0.82 0.96 0.99 0.04 0.70 0.90 0.95 0.05 0.21 0.41 0.54 0.06 0.70 0.76 0.77 2000 1 0.05 0.07 0.12 0.21 0.03 0.05 0.09 0.14 0.07 0.43 0.71 0.87 0.06 0.30 0.54 0.71 0.04 0.07 0.13 0.24 0.08 0.45 0.68 0.78 2 0.01 0.04 0.12 0.24 0.02 0.04 0.09 0.16 0.03 0.56 0.81 0.92 0.03 0.41 0.67 0.79 0.01 0.02 0.11 0.24 0.03 0.60 0.78 0.84 3 0.05 0.12 0.32 0.54 0.05 0.08 0.21 0.35 0.06 0.94 1.00 1.00 0.05 0.78 0.97 1.00 0.05 0.12 0.38 0.60 0.06 0.82 0.87 0.87 4 0.06 0.09 0.17 0.31 0.04 0.07 0.12 0.20 0.06 0.63 0.91 0.98 0.05 0.46 0.76 0.89 0.05 0.10 0.20 0.37 0.06 0.64 0.81 0.84 5 0.03 0.13 0.35 0.54 0.03 0.09 0.21 0.36 0.03 0.83 0.97 0.99 0.03 0.69 0.88 0.95 0.02 0.13 0.38 0.57 0.02 0.80 0.88 0.90 tEFB tMAC MSE QLIKE MSE QLIKE Kernel Bartlett MQS Bartlett MQS d T DGP /c 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 250 1 0.09 0.15 0.31 0.45 0.06 0.10 0.19 0.30 0.10 0.69 0.93 0.99 0.06 0.56 0.79 0.91 0.10 0.17 0.34 0.49 0.11 0.65 0.72 0.72 2 0.02 0.25 0.68 0.83 0.03 0.13 0.37 0.59 0.02 0.94 1.00 1.00 0.03 0.89 0.96 0.99 0.01 0.29 0.76 0.84 0.02 0.88 0.90 0.90 3 0.10 0.46 0.70 0.86 0.07 0.31 0.58 0.72 0.13 1.00 1.00 1.00 0.08 0.96 1.00 1.00 0.11 0.50 0.67 0.72 0.14 0.76 0.76 0.78 4 0.09 0.28 0.54 0.68 0.06 0.17 0.38 0.55 0.10 0.95 1.00 1.00 0.06 0.83 0.98 1.00 0.10 0.31 0.57 0.67 0.12 0.75 0.75 0.76 5 0.05 0.69 0.87 0.92 0.03 0.40 0.76 0.86 0.05 1.00 1.00 1.00 0.04 0.98 1.00 1.00 0.04 0.75 0.87 0.87 0.04 0.90 0.91 0.91 2000 1 0.06 0.13 0.30 0.49 0.04 0.09 0.18 0.32 0.06 0.78 0.97 1.00 0.05 0.60 0.86 0.95 0.06 0.15 0.34 0.52 0.07 0.69 0.79 0.81 2 0.02 0.44 0.84 0.90 0.03 0.21 0.53 0.76 0.03 0.99 1.00 1.00 0.03 0.96 1.00 1.00 0.01 0.53 0.88 0.89 0.02 0.94 0.95 0.95 3 0.07 0.57 0.87 0.96 0.05 0.38 0.71 0.85 0.15 1.00 1.00 1.00 0.09 0.99 1.00 1.00 0.07 0.61 0.79 0.84 0.16 0.90 0.91 0.92 4 0.06 0.32 0.66 0.81 0.05 0.20 0.46 0.66 0.08 0.99 1.00 1.00 0.05 0.92 1.00 1.00 0.06 0.36 0.69 0.76 0.07 0.85 0.88 0.89 5 0.04 0.88 0.94 0.98 0.04 0.62 0.90 0.95 0.05 1.00 1.00 1.00 0.04 1.00 1.00 1.00 0.03 0.90 0.92 0.92 0.03 0.96 0.96 0.96 0.2 250 1 0.08 0.13 0.24 0.38 0.05 0.09 0.16 0.26 0.10 0.62 0.88 0.97 0.06 0.49 0.74 0.87 0.08 0.15 0.28 0.42 0.11 0.59 0.69 0.7 2 0.03 0.18 0.49 0.68 0.03 0.10 0.27 0.46 0.04 0.86 0.98 1.00 0.03 0.76 0.92 0.98 0.02 0.20 0.55 0.73 0.03 0.81 0.83 0.84 3 0.09 0.35 0.62 0.79 0.06 0.22 0.48 0.63 0.11 1.00 1.00 1.00 0.07 0.93 1.00 1.00 0.09 0.38 0.63 0.70 0.12 0.76 0.78 0.76 4 0.08 0.21 0.44 0.61 0.05 0.14 0.31 0.46 0.10 0.89 1.00 1.00 0.05 0.76 0.95 0.99 0.09 0.23 0.48 0.62 0.10 0.72 0.75 0.76 5 0.05 0.56 0.79 0.86 0.04 0.34 0.64 0.79 0.05 0.99 1.00 1.00 0.04 0.94 1.00 1.00 0.04 0.62 0.79 0.83 0.05 0.85 0.87 0.87 2000 1 0.05 0.11 0.26 0.44 0.04 0.08 0.16 0.27 0.06 0.74 0.95 0.99 0.05 0.54 0.83 0.93 0.05 0.13 0.29 0.51 0.06 0.72 0.84 0.87 2 0.03 0.26 0.64 0.80 0.03 0.14 0.37 0.59 0.05 0.91 1.00 1.00 0.04 0.82 0.95 0.99 0.02 0.29 0.71 0.81 0.04 0.83 0.86 0.87 3 0.06 0.47 0.82 0.93 0.05 0.30 0.62 0.80 0.11 1.00 1.00 1.00 0.08 0.98 1.00 1.00 0.05 0.54 0.81 0.87 0.11 0.92 0.94 0.94 4 0.06 0.25 0.58 0.76 0.05 0.16 0.36 0.56 0.06 0.97 1.00 1.00 0.04 0.88 0.99 1.00 0.05 0.27 0.64 0.78 0.05 0.88 0.92 0.93 5 0.06 0.73 0.89 0.94 0.05 0.47 0.79 0.89 0.07 1.00 1.00 1.00 0.05 0.99 1.00 1.00 0.05 0.78 0.86 0.88 0.07 0.90 0.91 0.91 0.4 250 1 0.07 0.10 0.17 0.25 0.05 0.08 0.12 0.18 0.12 0.47 0.72 0.86 0.10 0.35 0.57 0.71 0.07 0.11 0.19 0.28 0.14 0.43 0.59 0.65 2 0.03 0.08 0.18 0.29 0.03 0.05 0.11 0.19 0.05 0.59 0.83 0.93 0.03 0.46 0.68 0.81 0.03 0.08 0.21 0.31 0.05 0.59 0.69 0.72 3 0.10 0.17 0.37 0.55 0.07 0.13 0.26 0.39 0.10 0.93 1.00 1.00 0.07 0.78 0.96 1.00 0.11 0.19 0.40 0.52 0.11 0.67 0.69 0.70 4 0.08 0.14 0.25 0.37 0.07 0.11 0.18 0.25 0.10 0.67 0.91 0.98 0.08 0.52 0.77 0.90 0.10 0.16 0.27 0.38 0.11 0.57 0.66 0.68 5 0.04 0.19 0.39 0.51 0.04 0.12 0.27 0.39 0.05 0.82 0.96 0.99 0.04 0.70 0.90 0.95 0.05 0.21 0.41 0.54 0.06 0.70 0.76 0.77 2000 1 0.05 0.07 0.12 0.21 0.03 0.05 0.09 0.14 0.07 0.43 0.71 0.87 0.06 0.30 0.54 0.71 0.04 0.07 0.13 0.24 0.08 0.45 0.68 0.78 2 0.01 0.04 0.12 0.24 0.02 0.04 0.09 0.16 0.03 0.56 0.81 0.92 0.03 0.41 0.67 0.79 0.01 0.02 0.11 0.24 0.03 0.60 0.78 0.84 3 0.05 0.12 0.32 0.54 0.05 0.08 0.21 0.35 0.06 0.94 1.00 1.00 0.05 0.78 0.97 1.00 0.05 0.12 0.38 0.60 0.06 0.82 0.87 0.87 4 0.06 0.09 0.17 0.31 0.04 0.07 0.12 0.20 0.06 0.63 0.91 0.98 0.05 0.46 0.76 0.89 0.05 0.10 0.20 0.37 0.06 0.64 0.81 0.84 5 0.03 0.13 0.35 0.54 0.03 0.09 0.21 0.36 0.03 0.83 0.97 0.99 0.03 0.69 0.88 0.95 0.02 0.13 0.38 0.57 0.02 0.80 0.88 0.90 Table 8. Local power for both the tEFB and tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations, b = 0.6, and m=⌊T0.65⌋ tEFB tMAC MSE QLIKE MSE QLIKE Kernel Bartlett MQS Bartlett MQS d T DGP /c 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 250 1 0.09 0.15 0.31 0.45 0.06 0.10 0.19 0.30 0.10 0.69 0.93 0.99 0.06 0.56 0.79 0.91 0.10 0.17 0.34 0.49 0.11 0.65 0.72 0.72 2 0.02 0.25 0.68 0.83 0.03 0.13 0.37 0.59 0.02 0.94 1.00 1.00 0.03 0.89 0.96 0.99 0.01 0.29 0.76 0.84 0.02 0.88 0.90 0.90 3 0.10 0.46 0.70 0.86 0.07 0.31 0.58 0.72 0.13 1.00 1.00 1.00 0.08 0.96 1.00 1.00 0.11 0.50 0.67 0.72 0.14 0.76 0.76 0.78 4 0.09 0.28 0.54 0.68 0.06 0.17 0.38 0.55 0.10 0.95 1.00 1.00 0.06 0.83 0.98 1.00 0.10 0.31 0.57 0.67 0.12 0.75 0.75 0.76 5 0.05 0.69 0.87 0.92 0.03 0.40 0.76 0.86 0.05 1.00 1.00 1.00 0.04 0.98 1.00 1.00 0.04 0.75 0.87 0.87 0.04 0.90 0.91 0.91 2000 1 0.06 0.13 0.30 0.49 0.04 0.09 0.18 0.32 0.06 0.78 0.97 1.00 0.05 0.60 0.86 0.95 0.06 0.15 0.34 0.52 0.07 0.69 0.79 0.81 2 0.02 0.44 0.84 0.90 0.03 0.21 0.53 0.76 0.03 0.99 1.00 1.00 0.03 0.96 1.00 1.00 0.01 0.53 0.88 0.89 0.02 0.94 0.95 0.95 3 0.07 0.57 0.87 0.96 0.05 0.38 0.71 0.85 0.15 1.00 1.00 1.00 0.09 0.99 1.00 1.00 0.07 0.61 0.79 0.84 0.16 0.90 0.91 0.92 4 0.06 0.32 0.66 0.81 0.05 0.20 0.46 0.66 0.08 0.99 1.00 1.00 0.05 0.92 1.00 1.00 0.06 0.36 0.69 0.76 0.07 0.85 0.88 0.89 5 0.04 0.88 0.94 0.98 0.04 0.62 0.90 0.95 0.05 1.00 1.00 1.00 0.04 1.00 1.00 1.00 0.03 0.90 0.92 0.92 0.03 0.96 0.96 0.96 0.2 250 1 0.08 0.13 0.24 0.38 0.05 0.09 0.16 0.26 0.10 0.62 0.88 0.97 0.06 0.49 0.74 0.87 0.08 0.15 0.28 0.42 0.11 0.59 0.69 0.7 2 0.03 0.18 0.49 0.68 0.03 0.10 0.27 0.46 0.04 0.86 0.98 1.00 0.03 0.76 0.92 0.98 0.02 0.20 0.55 0.73 0.03 0.81 0.83 0.84 3 0.09 0.35 0.62 0.79 0.06 0.22 0.48 0.63 0.11 1.00 1.00 1.00 0.07 0.93 1.00 1.00 0.09 0.38 0.63 0.70 0.12 0.76 0.78 0.76 4 0.08 0.21 0.44 0.61 0.05 0.14 0.31 0.46 0.10 0.89 1.00 1.00 0.05 0.76 0.95 0.99 0.09 0.23 0.48 0.62 0.10 0.72 0.75 0.76 5 0.05 0.56 0.79 0.86 0.04 0.34 0.64 0.79 0.05 0.99 1.00 1.00 0.04 0.94 1.00 1.00 0.04 0.62 0.79 0.83 0.05 0.85 0.87 0.87 2000 1 0.05 0.11 0.26 0.44 0.04 0.08 0.16 0.27 0.06 0.74 0.95 0.99 0.05 0.54 0.83 0.93 0.05 0.13 0.29 0.51 0.06 0.72 0.84 0.87 2 0.03 0.26 0.64 0.80 0.03 0.14 0.37 0.59 0.05 0.91 1.00 1.00 0.04 0.82 0.95 0.99 0.02 0.29 0.71 0.81 0.04 0.83 0.86 0.87 3 0.06 0.47 0.82 0.93 0.05 0.30 0.62 0.80 0.11 1.00 1.00 1.00 0.08 0.98 1.00 1.00 0.05 0.54 0.81 0.87 0.11 0.92 0.94 0.94 4 0.06 0.25 0.58 0.76 0.05 0.16 0.36 0.56 0.06 0.97 1.00 1.00 0.04 0.88 0.99 1.00 0.05 0.27 0.64 0.78 0.05 0.88 0.92 0.93 5 0.06 0.73 0.89 0.94 0.05 0.47 0.79 0.89 0.07 1.00 1.00 1.00 0.05 0.99 1.00 1.00 0.05 0.78 0.86 0.88 0.07 0.90 0.91 0.91 0.4 250 1 0.07 0.10 0.17 0.25 0.05 0.08 0.12 0.18 0.12 0.47 0.72 0.86 0.10 0.35 0.57 0.71 0.07 0.11 0.19 0.28 0.14 0.43 0.59 0.65 2 0.03 0.08 0.18 0.29 0.03 0.05 0.11 0.19 0.05 0.59 0.83 0.93 0.03 0.46 0.68 0.81 0.03 0.08 0.21 0.31 0.05 0.59 0.69 0.72 3 0.10 0.17 0.37 0.55 0.07 0.13 0.26 0.39 0.10 0.93 1.00 1.00 0.07 0.78 0.96 1.00 0.11 0.19 0.40 0.52 0.11 0.67 0.69 0.70 4 0.08 0.14 0.25 0.37 0.07 0.11 0.18 0.25 0.10 0.67 0.91 0.98 0.08 0.52 0.77 0.90 0.10 0.16 0.27 0.38 0.11 0.57 0.66 0.68 5 0.04 0.19 0.39 0.51 0.04 0.12 0.27 0.39 0.05 0.82 0.96 0.99 0.04 0.70 0.90 0.95 0.05 0.21 0.41 0.54 0.06 0.70 0.76 0.77 2000 1 0.05 0.07 0.12 0.21 0.03 0.05 0.09 0.14 0.07 0.43 0.71 0.87 0.06 0.30 0.54 0.71 0.04 0.07 0.13 0.24 0.08 0.45 0.68 0.78 2 0.01 0.04 0.12 0.24 0.02 0.04 0.09 0.16 0.03 0.56 0.81 0.92 0.03 0.41 0.67 0.79 0.01 0.02 0.11 0.24 0.03 0.60 0.78 0.84 3 0.05 0.12 0.32 0.54 0.05 0.08 0.21 0.35 0.06 0.94 1.00 1.00 0.05 0.78 0.97 1.00 0.05 0.12 0.38 0.60 0.06 0.82 0.87 0.87 4 0.06 0.09 0.17 0.31 0.04 0.07 0.12 0.20 0.06 0.63 0.91 0.98 0.05 0.46 0.76 0.89 0.05 0.10 0.20 0.37 0.06 0.64 0.81 0.84 5 0.03 0.13 0.35 0.54 0.03 0.09 0.21 0.36 0.03 0.83 0.97 0.99 0.03 0.69 0.88 0.95 0.02 0.13 0.38 0.57 0.02 0.80 0.88 0.90 tEFB tMAC MSE QLIKE MSE QLIKE Kernel Bartlett MQS Bartlett MQS d T DGP /c 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 250 1 0.09 0.15 0.31 0.45 0.06 0.10 0.19 0.30 0.10 0.69 0.93 0.99 0.06 0.56 0.79 0.91 0.10 0.17 0.34 0.49 0.11 0.65 0.72 0.72 2 0.02 0.25 0.68 0.83 0.03 0.13 0.37 0.59 0.02 0.94 1.00 1.00 0.03 0.89 0.96 0.99 0.01 0.29 0.76 0.84 0.02 0.88 0.90 0.90 3 0.10 0.46 0.70 0.86 0.07 0.31 0.58 0.72 0.13 1.00 1.00 1.00 0.08 0.96 1.00 1.00 0.11 0.50 0.67 0.72 0.14 0.76 0.76 0.78 4 0.09 0.28 0.54 0.68 0.06 0.17 0.38 0.55 0.10 0.95 1.00 1.00 0.06 0.83 0.98 1.00 0.10 0.31 0.57 0.67 0.12 0.75 0.75 0.76 5 0.05 0.69 0.87 0.92 0.03 0.40 0.76 0.86 0.05 1.00 1.00 1.00 0.04 0.98 1.00 1.00 0.04 0.75 0.87 0.87 0.04 0.90 0.91 0.91 2000 1 0.06 0.13 0.30 0.49 0.04 0.09 0.18 0.32 0.06 0.78 0.97 1.00 0.05 0.60 0.86 0.95 0.06 0.15 0.34 0.52 0.07 0.69 0.79 0.81 2 0.02 0.44 0.84 0.90 0.03 0.21 0.53 0.76 0.03 0.99 1.00 1.00 0.03 0.96 1.00 1.00 0.01 0.53 0.88 0.89 0.02 0.94 0.95 0.95 3 0.07 0.57 0.87 0.96 0.05 0.38 0.71 0.85 0.15 1.00 1.00 1.00 0.09 0.99 1.00 1.00 0.07 0.61 0.79 0.84 0.16 0.90 0.91 0.92 4 0.06 0.32 0.66 0.81 0.05 0.20 0.46 0.66 0.08 0.99 1.00 1.00 0.05 0.92 1.00 1.00 0.06 0.36 0.69 0.76 0.07 0.85 0.88 0.89 5 0.04 0.88 0.94 0.98 0.04 0.62 0.90 0.95 0.05 1.00 1.00 1.00 0.04 1.00 1.00 1.00 0.03 0.90 0.92 0.92 0.03 0.96 0.96 0.96 0.2 250 1 0.08 0.13 0.24 0.38 0.05 0.09 0.16 0.26 0.10 0.62 0.88 0.97 0.06 0.49 0.74 0.87 0.08 0.15 0.28 0.42 0.11 0.59 0.69 0.7 2 0.03 0.18 0.49 0.68 0.03 0.10 0.27 0.46 0.04 0.86 0.98 1.00 0.03 0.76 0.92 0.98 0.02 0.20 0.55 0.73 0.03 0.81 0.83 0.84 3 0.09 0.35 0.62 0.79 0.06 0.22 0.48 0.63 0.11 1.00 1.00 1.00 0.07 0.93 1.00 1.00 0.09 0.38 0.63 0.70 0.12 0.76 0.78 0.76 4 0.08 0.21 0.44 0.61 0.05 0.14 0.31 0.46 0.10 0.89 1.00 1.00 0.05 0.76 0.95 0.99 0.09 0.23 0.48 0.62 0.10 0.72 0.75 0.76 5 0.05 0.56 0.79 0.86 0.04 0.34 0.64 0.79 0.05 0.99 1.00 1.00 0.04 0.94 1.00 1.00 0.04 0.62 0.79 0.83 0.05 0.85 0.87 0.87 2000 1 0.05 0.11 0.26 0.44 0.04 0.08 0.16 0.27 0.06 0.74 0.95 0.99 0.05 0.54 0.83 0.93 0.05 0.13 0.29 0.51 0.06 0.72 0.84 0.87 2 0.03 0.26 0.64 0.80 0.03 0.14 0.37 0.59 0.05 0.91 1.00 1.00 0.04 0.82 0.95 0.99 0.02 0.29 0.71 0.81 0.04 0.83 0.86 0.87 3 0.06 0.47 0.82 0.93 0.05 0.30 0.62 0.80 0.11 1.00 1.00 1.00 0.08 0.98 1.00 1.00 0.05 0.54 0.81 0.87 0.11 0.92 0.94 0.94 4 0.06 0.25 0.58 0.76 0.05 0.16 0.36 0.56 0.06 0.97 1.00 1.00 0.04 0.88 0.99 1.00 0.05 0.27 0.64 0.78 0.05 0.88 0.92 0.93 5 0.06 0.73 0.89 0.94 0.05 0.47 0.79 0.89 0.07 1.00 1.00 1.00 0.05 0.99 1.00 1.00 0.05 0.78 0.86 0.88 0.07 0.90 0.91 0.91 0.4 250 1 0.07 0.10 0.17 0.25 0.05 0.08 0.12 0.18 0.12 0.47 0.72 0.86 0.10 0.35 0.57 0.71 0.07 0.11 0.19 0.28 0.14 0.43 0.59 0.65 2 0.03 0.08 0.18 0.29 0.03 0.05 0.11 0.19 0.05 0.59 0.83 0.93 0.03 0.46 0.68 0.81 0.03 0.08 0.21 0.31 0.05 0.59 0.69 0.72 3 0.10 0.17 0.37 0.55 0.07 0.13 0.26 0.39 0.10 0.93 1.00 1.00 0.07 0.78 0.96 1.00 0.11 0.19 0.40 0.52 0.11 0.67 0.69 0.70 4 0.08 0.14 0.25 0.37 0.07 0.11 0.18 0.25 0.10 0.67 0.91 0.98 0.08 0.52 0.77 0.90 0.10 0.16 0.27 0.38 0.11 0.57 0.66 0.68 5 0.04 0.19 0.39 0.51 0.04 0.12 0.27 0.39 0.05 0.82 0.96 0.99 0.04 0.70 0.90 0.95 0.05 0.21 0.41 0.54 0.06 0.70 0.76 0.77 2000 1 0.05 0.07 0.12 0.21 0.03 0.05 0.09 0.14 0.07 0.43 0.71 0.87 0.06 0.30 0.54 0.71 0.04 0.07 0.13 0.24 0.08 0.45 0.68 0.78 2 0.01 0.04 0.12 0.24 0.02 0.04 0.09 0.16 0.03 0.56 0.81 0.92 0.03 0.41 0.67 0.79 0.01 0.02 0.11 0.24 0.03 0.60 0.78 0.84 3 0.05 0.12 0.32 0.54 0.05 0.08 0.21 0.35 0.06 0.94 1.00 1.00 0.05 0.78 0.97 1.00 0.05 0.12 0.38 0.60 0.06 0.82 0.87 0.87 4 0.06 0.09 0.17 0.31 0.04 0.07 0.12 0.20 0.06 0.63 0.91 0.98 0.05 0.46 0.76 0.89 0.05 0.10 0.20 0.37 0.06 0.64 0.81 0.84 5 0.03 0.13 0.35 0.54 0.03 0.09 0.21 0.36 0.03 0.83 0.97 0.99 0.03 0.69 0.88 0.95 0.02 0.13 0.38 0.57 0.02 0.80 0.88 0.90 Altogether, we find that the tEFB-statistic using the MQS kernel should be used in combination with the QLIKE loss function and for smaller samples since it is the only specification that offers reliable size control. In larger samples and under MSE loss all of the tests have satisfactory size. In this case, the tEFB-statistic with the Bartlett kernel is preferable in terms of power. However, since the size control of the tMAC-statistic is better than that of the tEFB-statistic with the Bartlett kernel, and the power is better than that using the MQS kernel, it can still be a sensible option in intermediate cases. 6 Applications to Realized Volatility Forecasting Due to its relevance for risk management and derivative pricing, volatility forecasting is of vital importance and is also one of the fields in which long memory models are applied most often (cf. Deo, Hurvich, and Lu (2006), Martens, Van Dijk and De Pooter (2009) and Chiriac and Voev (2011)). Since intraday data on financial transactions has become widely available, the focus has shifted from GARCH-type models to the direct modeling of realized volatility series. In particular, the heterogeneous autoregressive model (HAR-RV) of Corsi (2009) and its extensions have emerged as one of the most popular approaches. We reevaluate some recent results from the related literature, in particular from Bekaert and Hoerova (2014), using traditional DM tests as well as the long memory robust versions from Section 4. We use a data set of 5-minute log-returns of the S&P500 Index from January 2, 1996 to August 31, 2015 and we include close-to-open returns. In total, we have T = 4883 observations in our sample. The raw data is obtained from the Thomson Reuters Tick History Database. Before we turn to the forecast evaluations in Sections 6.1 and 6.2, we use the remainder of this section to define the relevant volatility variables and to introduce the data and the employed time series models. Define the j-th intraday return on day t by rt,j and let there be N intraday returns per day. The daily realized variance is then defined as RVt=∑j=1Nrt,j2 , see, for example, Andersen et al. (2001) and Barndorff-Nielsen and Shephard (2002). If rt,j is sampled with an ever-increasing frequency such that N→∞, RVt provides a consistent estimate of the quadratic variation of the log-price process. Therefore, RVt is usually treated as a direct observation of the stochastic volatility process. The HAR-RV model of Corsi (2009), for example, explains log-realized variance by an autoregression involving overlapping averages of past realized variances. Similar to the notation in Bekaert and Hoerova (2014), the model reads ln⁡RVt(h)=α+ρ1ln⁡RVt−h(1)+ρ5ln⁡RVt−h(5)+ρ22ln⁡RVt−h(22)+ɛt, (13) where RVt(h)=22h∑j=0h−1RVt−j and ɛt is a white noise process. Although this is formally not a long memory model, this simple process provides a good approximation to the slowly decaying autocorrelation functions of long memory processes in finite samples. Forecast comparisons show that the HAR-RV model performs similar to ARFIMA models (cf. Corsi (2009)). Motivated by developments in derivative pricing highlighting the importance of jumps in price processes, Andersen, Bollerslev, and Diebold (2007) extend the HAR-RV model to consider jump components in realized volatility. Here, the underlying model for the continuous time log-price process p(t) is given by dp(t)=μ(t)dt+σ(t)dW(t)+κ(t)dq(t) , where 0≤t≤T, μ(t) has locally bounded variation, σ(t) is a strictly positive stochastic volatility process that is càdlàg and W(t) is a standard Brownian motion. The counting process q(t) takes the value dq(t) = 1 if a jump is realized and it is allowed to have time-varying intensity. Finally, the process κ(t) determines the size of discrete jumps in case these are realized. The quadratic variation of the cumulative return process can be thus decomposed into integrated volatility plus the sum of squared jumps: [r]tt+h=∫tt+hσ2(s)ds+∑t<s≤t+hκ2(s) . To measure the integrated volatility component, Barndorff-Nielsen and Shephard (2004, 2006) introduce the concept of bipower variation (BPV) as an alternative estimator that is robust to the presence of jumps. Here, we use threshold bipower variation (TBPV) as suggested by Corsi, Pirino, and Renò (2010), who showed that BPV can be biased in finite samples. TBPV is defined as follows: TBPVt=π2∑j=2N|rt,j||rt,j−1|I(|rt,j|2≤ζj) I(|rt,j−1|2≤ζj−1), where ζj is a strictly positive, random threshold function as specified in Corsi, Pirino, and Renò (2010), and I(·) is an indicator function.10 Since TBPVt→p∫tt+1σ2(s)ds for N→∞, one can decompose the realized volatility into the continuous integrated volatility component Ct and the jump component Jt as Jt=max⁡{RVt−TBPVt,0}I(C-Tz>3.09) ,Ct=RVt−Jt . Following Bekaert and Hoerova (2014), we express all variables in monthly units: Ct(h)=22h∑j=0h−1Ct−j and Jt(h)=22h∑j=0h−1Jt−j. The argument of the indicator function I(C-Tz>3.09) ensures that the jump component is set to zero if it is insignificant at the nominal 0.1% level, so that Jt is not contaminated by measurement error, see also Corsi and Renò (2012). For details on the C-Tz-statistic, see Corsi, Pirino, and Renò (2010). Different from the previous studies reporting an insignificant or negative impact of jumps, Corsi, Pirino, and Renò (2010) show that the impact of jumps on future realized volatility is significant and positive. We use the HAR-RV-TCJ model that is studied in Bekaert and Hoerova (2014): ln⁡RVt(h)=α+ρ1ln⁡Ct−h(1)+ρ5ln⁡Ct−h(5)+ρ22ln⁡Ct−h(22) +ϖ1ln⁡(1+Jt−h(1))+ϖ5ln⁡(1+Jt−h(5))+ϖ22 ln⁡(1+Jt−h(22))+ɛt . (14) The daily log-realized variance series ( ln⁡RVt) is depicted in Figure 2 . It is common to use log-realized variance to avoid nonnegativity constraints on the parameters and to have a better approximation to the normal distribution, as advocated by Andersen et al. (2001). As can be seen from Figure 2, the series shows the typical features of a long memory time series, namely strong serial dependence, as well as local trends. Figure 2. View largeDownload slide Daily log-realized volatility of the S&P500 index and its autocorrelation function. Figure 2. View largeDownload slide Daily log-realized volatility of the S&P500 index and its autocorrelation function. Estimates of the memory parameter are shown in Table 9. Local Whittle estimates ( d̂LW) exceed 0.5 slightly and thus indicate a mild form of nonstationarity. Since there is a large literature on the potential of spurious long memory in volatility time series, we carry out the test of Qu (2011). To avoid issues due to nonstationarity and to increase the power of the test, we follow Kruse (2015) and apply the test to the fractional difference of the data. The necessary degree of differencing is determined using the estimator by Hou and Perron (2014) ( d̂HP) that is robust to low-frequency contaminations. The comparison of the W˜z statistic to its critical values reveals that the test fails to reject the null hypothesis of true long memory. Table 9. Long memory estimation and testing results for S&P500 log-realized volatility q d^LW d^HP SE W˜z d^(0,0) d^(1,0) d^(1,1) 0.55 0.554 0.493 (0.048) 0.438 0.613 (0.088) 0.612 (0.132) 0.689 (0.163) 0.60 0.553 0.522 (0.039) 0.568 0.567 (0.074) 0.577 (0.110) 0.692 (0.131) 0.65 0.573 0.573 (0.032) 0.544 0.573 (0.059) 0.570 (0.089) 0.570 (0.118) 0.70 0.549 0.532 (0.026) 0.449 0.573 (0.048) 0.578 (0.072) 0.588 (0.093) 0.75 0.539 0.518 (0.021) 0.515 0.564 (0.039) 0.574 (0.058) 0.593 (0.075) q d^LW d^HP SE W˜z d^(0,0) d^(1,0) d^(1,1) 0.55 0.554 0.493 (0.048) 0.438 0.613 (0.088) 0.612 (0.132) 0.689 (0.163) 0.60 0.553 0.522 (0.039) 0.568 0.567 (0.074) 0.577 (0.110) 0.692 (0.131) 0.65 0.573 0.573 (0.032) 0.544 0.573 (0.059) 0.570 (0.089) 0.570 (0.118) 0.70 0.549 0.532 (0.026) 0.449 0.573 (0.048) 0.578 (0.072) 0.588 (0.093) 0.75 0.539 0.518 (0.021) 0.515 0.564 (0.039) 0.574 (0.058) 0.593 (0.075) Notes: Local Whittle estimates for the d parameter and results of the Qu (2011) test ( W˜z modified statistic by Kruse (2015)) for true versus spurious long memory are reported for various bandwidth choices md=⌊Tq⌋. Critical values are 1.118, 1.252, and 1.517 at the nominal significance level of 10%, 5%, and 1%, respectively. Asymptotic standard errors for d^LW and d^HP are given in parentheses. The indices of the LPWN estimators indicate the orders of the polynomials used. Table 9. Long memory estimation and testing results for S&P500 log-realized volatility q d^LW d^HP SE W˜z d^(0,0) d^(1,0) d^(1,1) 0.55 0.554 0.493 (0.048) 0.438 0.613 (0.088) 0.612 (0.132) 0.689 (0.163) 0.60 0.553 0.522 (0.039) 0.568 0.567 (0.074) 0.577 (0.110) 0.692 (0.131) 0.65 0.573 0.573 (0.032) 0.544 0.573 (0.059) 0.570 (0.089) 0.570 (0.118) 0.70 0.549 0.532 (0.026) 0.449 0.573 (0.048) 0.578 (0.072) 0.588 (0.093) 0.75 0.539 0.518 (0.021) 0.515 0.564 (0.039) 0.574 (0.058) 0.593 (0.075) q d^LW d^HP SE W˜z d^(0,0) d^(1,0) d^(1,1) 0.55 0.554 0.493 (0.048) 0.438 0.613 (0.088) 0.612 (0.132) 0.689 (0.163) 0.60 0.553 0.522 (0.039) 0.568 0.567 (0.074) 0.577 (0.110) 0.692 (0.131) 0.65 0.573 0.573 (0.032) 0.544 0.573 (0.059) 0.570 (0.089) 0.570 (0.118) 0.70 0.549 0.532 (0.026) 0.449 0.573 (0.048) 0.578 (0.072) 0.588 (0.093) 0.75 0.539 0.518 (0.021) 0.515 0.564 (0.039) 0.574 (0.058) 0.593 (0.075) Notes: Local Whittle estimates for the d parameter and results of the Qu (2011) test ( W˜z modified statistic by Kruse (2015)) for true versus spurious long memory are reported for various bandwidth choices md=⌊Tq⌋. Critical values are 1.118, 1.252, and 1.517 at the nominal significance level of 10%, 5%, and 1%, respectively. Asymptotic standard errors for d^LW and d^HP are given in parentheses. The indices of the LPWN estimators indicate the orders of the polynomials used. Since N is finite in practice, ln⁡RVt might contain a measurement error and is therefore often modeled as the sum of the quadratic variation and an iid perturbation process such that ln⁡RVt=ln⁡[r]tt+1+ut, where ut∼iid(0,σu2). Furthermore, it is well known that local Whittle estimates can be biased in presence of short-run dynamics. To this end, we report results of LPWN estimator applied to ln⁡RVt. The estimates remain remarkably stable—irrespective of the choice of the estimator. The downward bias of the local Whittle estimator due to the measurement error in realized variance is therefore moderate. Altogether, the realized variance series appears to be a long memory process. Consequently, if forecasts of the series are evaluated, a transmission of long-range dependence to the loss differentials as implied by Propositions 2–5 may occur. This would invalidate conventional DM tests, as shown in Proposition 6 and Figure 1 highlighting the importance of the robust tMAC and tEFB-statistics discussed in Section 4. 6.1 Predictive Ability of the VIX for Quadratic Variation The predictive ability of implied volatility for future realized volatility is an issue that has received a lot of attention in the related literature. The CBOE VIX represents the market expectation of quadratic variation of the S&P500 over the next month, derived under the assumption of risk neutral pricing. Both, ln⁡(VIXt2/12) and ln⁡RVt+22(22) are depicted in Figure 3 . Both series behave fairly similar and are quite persistent. As for the log-realized volatility series, the Qu (2011) test does not reject the null hypothesis of true long memory for the VIX after appropriate fractional differencing following Kruse (2015). Figure 3. View largeDownload slide Log squared implied volatility and log cumulative realized volatility of the S&P500 (left panel) and variance difference VDt=ln⁡(VIXt2/12)−ln⁡RVt+22(22) (right panel). Figure 3. View largeDownload slide Log squared implied volatility and log cumulative realized volatility of the S&P500 (left panel) and variance difference VDt=ln⁡(VIXt2/12)−ln⁡RVt+22(22) (right panel). Chernov (2007) investigates the role of a variance risk premium in the market for volatility forecasting. The variance risk premium is given by VPt=VIXt2/12−RVt+22(22), where VIXt2/12 is expressed in monthly percentages squared as VIXt is given in annualized percent (as in Bekaert and Hoerova (2014)). A related variable that is used, for example, by Bollerslev et al. (2013) is the variance difference VDt=log⁡(VIXt2/12)−log⁡(RVt+22(22)) that is displayed on the right-hand side of Figure 3. The graph clearly suggests that the VIX tends to overestimate the realized variance and the sample average of the variance difference is 0.623. Furthermore, the linear combination of log-realized and log-implied volatility is rather persistent and has a significant memory of d̂LPWN=0.2. This is consistent with the existence of a fractional cointegration relationship between ln⁡ (VIXt2/12) and ln⁡RVt+22(22) which has been considered in several contributions including Christensen and Nielsen (2006), Nielsen (2007), and Bollerslev et al. (2013). Bollerslev, Tauchen, and Zhou (2009), Bekaert and Hoerova (2014), and Bollerslev et al. (2013) additionally extend the analysis toward the predictive ability of VDt for stock returns. While the aforementioned articles test the predictive ability of the VIX itself and the “implied-realized-parity”, there has also been a series of studies that analyze whether the inclusion of implied volatility can improve model-based forecasts. On the one hand, Becker, Clements, and White (2007) conclude that the VIX does not contain any incremental information on future volatility relative to an array of forecasting models. On the other hand, Becker, Clements, and McClelland (2009) show that the VIX is found to subsume information on past jump activity and contains incremental information on future jumps if continuous components and jump components are considered separately. Similarly, Busch, Christensen, and Nielsen (2011) study a HAR-RV model with continuous components and jumps and propose a VecHAR-RV model. They find that the VIX has incremental information and partially predicts jumps. Motivated by these findings, we test whether the inclusion of ln⁡(VIXt2/12) improves model-based forecasts from HAR-RV-type models, using DM statistics. Since the VIX can be seen as a forecast of future quadratic variation over the next month, we consider a 22-step forecast horizon. Consecutive observations of multistep forecasts of stock variables, such as integrated realized volatility, can be expected to exhibit relatively persistent short memory dynamics. The empirical autocorrelations of these loss differentials reveal an MA structure with linearly decaying coefficients. We therefore base all our robust statistics on the LPWN estimator discussed above.11 Since Chen and Ghysels (2011) and Corsi and Renò (2012) show that the inclusion of leverage effects improves forecasts, we also include a comparison of the HAR-RV-TCJ-L model and the HAR-RV-TCJ-L-VIX model. For details on the HAR-RV-TCJ-L model, see Corsi and Renò (2012) and Equation (2) in Bekaert and Hoerova (2014). Furthermore, as in Bekaert and Hoerova (2014), log-normality is assumed for the logarithmic realized volatility forecasts when applying the exponential transformation in the following comparison of realized volatility forecasts, see their Equation (6) for further details. Table 10 reports the results on forecast evaluation. Models are estimated using a rolling window of Tw = 1000 observations.12 This implies that the forecast window contains 3883 observations. All DM tests are conducted with one-sided alternatives. In each case, we test that the more complex model outperforms its parsimonious version. Table 10. Predictive ability of the VIX for future RV (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.14 0.29 0.27 0.22* 0.24* 2.97 3.03 2.49 0.93 1.04 1.18 2.49 2.75 2.99 2.85 (3.40) (4.06) (4.75) (5.39) HAR-RV-TCJ 0.11 0.26 0.27 0.18* 0.14 2.42 2.46 2.10 1.40 1.61 1.89 2.10 2.50 2.89 2.72 (2.61) (3.15) (3.69) (4.23) HAR-RV-TCJ-L 0.08 0.28 0.27 0.18* 0.16 1.78 1.79 1.82 0.90 1.03 1.20 1.82 2.15 2.43 2.32 (1.65) (1.65) (2.09) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE HAR-RV 0.15 2.03 2.02 0.23* 0.20 3.07 3.25 2.77 1.30 1.46 1.66 2.71 3.29 3.29 3.13 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.13 2.03 2.02 0.19* 0.13 2.72 2.81 3.00 1.73 1.98 2.30 2.98 3.92 3.62 3.59 (3.35) (4.87) (7.01) (9.41) HAR-RV-TCJ-L 0.10 2.02 2.02 0.19* 0.10 2.07 2.05 2.86 1.58 1.85 2.18 2.90 3.34 3.21 3.47 (1.65) (1.65) (2.37) (1.65) (3.35) (4.87) (7.01) (9.41) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.14 0.29 0.27 0.22* 0.24* 2.97 3.03 2.49 0.93 1.04 1.18 2.49 2.75 2.99 2.85 (3.40) (4.06) (4.75) (5.39) HAR-RV-TCJ 0.11 0.26 0.27 0.18* 0.14 2.42 2.46 2.10 1.40 1.61 1.89 2.10 2.50 2.89 2.72 (2.61) (3.15) (3.69) (4.23) HAR-RV-TCJ-L 0.08 0.28 0.27 0.18* 0.16 1.78 1.79 1.82 0.90 1.03 1.20 1.82 2.15 2.43 2.32 (1.65) (1.65) (2.09) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE HAR-RV 0.15 2.03 2.02 0.23* 0.20 3.07 3.25 2.77 1.30 1.46 1.66 2.71 3.29 3.29 3.13 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.13 2.03 2.02 0.19* 0.13 2.72 2.81 3.00 1.73 1.98 2.30 2.98 3.92 3.62 3.59 (3.35) (4.87) (7.01) (9.41) HAR-RV-TCJ-L 0.10 2.02 2.02 0.19* 0.10 2.07 2.05 2.86 1.58 1.85 2.18 2.90 3.34 3.21 3.47 (1.65) (1.65) (2.37) (1.65) (3.35) (4.87) (7.01) (9.41) Notes: Models excluding the VIX are tested against models including the VIX. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the nominal 5% level; an additional star indicates significance at the nominal 1% level. Critical values of the tests are given in parentheses. Table 10. Predictive ability of the VIX for future RV (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.14 0.29 0.27 0.22* 0.24* 2.97 3.03 2.49 0.93 1.04 1.18 2.49 2.75 2.99 2.85 (3.40) (4.06) (4.75) (5.39) HAR-RV-TCJ 0.11 0.26 0.27 0.18* 0.14 2.42 2.46 2.10 1.40 1.61 1.89 2.10 2.50 2.89 2.72 (2.61) (3.15) (3.69) (4.23) HAR-RV-TCJ-L 0.08 0.28 0.27 0.18* 0.16 1.78 1.79 1.82 0.90 1.03 1.20 1.82 2.15 2.43 2.32 (1.65) (1.65) (2.09) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE HAR-RV 0.15 2.03 2.02 0.23* 0.20 3.07 3.25 2.77 1.30 1.46 1.66 2.71 3.29 3.29 3.13 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.13 2.03 2.02 0.19* 0.13 2.72 2.81 3.00 1.73 1.98 2.30 2.98 3.92 3.62 3.59 (3.35) (4.87) (7.01) (9.41) HAR-RV-TCJ-L 0.10 2.02 2.02 0.19* 0.10 2.07 2.05 2.86 1.58 1.85 2.18 2.90 3.34 3.21 3.47 (1.65) (1.65) (2.37) (1.65) (3.35) (4.87) (7.01) (9.41) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.14 0.29 0.27 0.22* 0.24* 2.97 3.03 2.49 0.93 1.04 1.18 2.49 2.75 2.99 2.85 (3.40) (4.06) (4.75) (5.39) HAR-RV-TCJ 0.11 0.26 0.27 0.18* 0.14 2.42 2.46 2.10 1.40 1.61 1.89 2.10 2.50 2.89 2.72 (2.61) (3.15) (3.69) (4.23) HAR-RV-TCJ-L 0.08 0.28 0.27 0.18* 0.16 1.78 1.79 1.82 0.90 1.03 1.20 1.82 2.15 2.43 2.32 (1.65) (1.65) (2.09) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE HAR-RV 0.15 2.03 2.02 0.23* 0.20 3.07 3.25 2.77 1.30 1.46 1.66 2.71 3.29 3.29 3.13 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.13 2.03 2.02 0.19* 0.13 2.72 2.81 3.00 1.73 1.98 2.30 2.98 3.92 3.62 3.59 (3.35) (4.87) (7.01) (9.41) HAR-RV-TCJ-L 0.10 2.02 2.02 0.19* 0.10 2.07 2.05 2.86 1.58 1.85 2.18 2.90 3.34 3.21 3.47 (1.65) (1.65) (2.37) (1.65) (3.35) (4.87) (7.01) (9.41) Notes: Models excluding the VIX are tested against models including the VIX. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the nominal 5% level; an additional star indicates significance at the nominal 1% level. Critical values of the tests are given in parentheses. In accordance with our recommendations from Section 5, the tEFB tests are carried out using the Bartlett kernel if MSE loss is considered and with the MQS kernel if QLIKE loss is used. For the sake of a better comparability, we also chose the Bartlett- or the Quadratic spectral kernel for the tFB-statistic, accordingly. As in the previous literature, the tDM-statistic is implemented using an MA approximation with 44 lags for the forecast horizon of 22 days, cf. for instance Bekaert and Hoerova (2014). For the tHAC-statistic we use an automatic bandwidth selection procedure and the tFB-statistic is computed by using b = 0.2 which offers a good trade-off between size control and power, as confirmed in the simulation studies of Sun, Phillips, and Jin (2008). The upper panel (Panel A) of Table 10 reveals that the forecast error loss differentials have long memory with d parameters between 0.14 and 0.24. The results are very similar for the local Whittle and the LPWN estimator. This is a clear indication that memory transmission to the loss differential is taking place, as predicted by Propositions 2–5 and the simulations in Section 5.1. Standard DM statistics (tDM, tHAC, and tFB) reject the null hypothesis of equal predictive accuracy, thereby confirming the findings in the previous literature. However, if the memory robust statistics in the right panel of Table 10 are taken into account, all evidence for a superior predictive ability of models including the VIX vanishes. Therefore, the previous rejections might be spurious and reflect the theoretical findings in Proposition 6. In regard of the persistence in the loss differential series, the improvements are too small to be considered significant. These findings highlight the importance of long memory robust tests for forecast comparisons in practice. As a further comparison, we also consider the QLIKE loss function from Equation (10) in addition to the MSE, see lower Panel B (Table 10). The motivation is that realized volatility is generally considered to be an unbiased, but perturbed proxy of the underlying latent volatility process. It is shown by Patton (2011) that among the commonly employed loss functions only MSE and QLIKE preserve the true ranking of competing forecasts when being evaluated on a perturbed proxy. Even though the propositions presented above only apply for the MSE loss function, the simulations in Section 5.1 clearly show that—given the same setting—memory transmission of a similar magnitude to the MSE case occurs. Results are reported in the lower Panel B of Table 10. They suggest that the average standardized forecast error loss differentials are positive and slightly larger in magnitude compared with those for the MSE case. Moreover, they have a similar memory structure. From this descriptive viewpoint, results are not sensitive to the choice between the QLIKE and the MSE loss function. When using short memory inference, the null hypothesis of pairwise equal predictive accuracy among the models is rejected in all cases. This is also in line with the previous results. However, when turning to the long memory robust statistics we obtain somewhat different results. On the one hand, the tMAC-statistic rejects for the majority of models and bandwidth parameters. On the other hand, the tEFB-statistic does not generate any rejections. We observe in Table 8 that the tests using QLIKE loss generate considerably more power, and that the power of the tEFB-statistic using the MQS kernel is lower than that of the tMAC-statistic. One could therefore conclude that the inclusion of the VIX improves the accuracy of the forecasts. However, we also find that the tMAC-statistic can be liberal in some situations if the QLIKE loss function is used. Taking these issues into account, the evidence for superior predictive ability of models including the VIX is weak and considerably weaker when considering tests not allowing for the long memory property of the loss differentials. 6.2 Separation of Continuous Components and Jump Components As a second empirical application, we consider the role of jumps. We revisit the question whether the extended HAR-RV-TCJ model from Equation (14) leads to a significant improvement in forecast performance compared with the standard HAR-RV-model which is defined in Equation (13). The continuous components and jump components—separated using the approach described above—are shown in Figure 4 . The occurrence of jumps is often associated with macroeconomic events (cf. Barndorff-Nielsen and Shephard (2006) and Andersen, Bollerslev, and Diebold (2007)) and they are observed relatively frequently at about 40% of the days in the sample. The trajectory of the integrated variance follows closely the one of the log-realized volatility series. Figure 4. View largeDownload slide Log continuous component ln⁡Ct and jump component ln⁡ (1+Jt) of ln RVt. Figure 4. View largeDownload slide Log continuous component ln⁡Ct and jump component ln⁡ (1+Jt) of ln RVt. Table 11 shows the results of our forecasting exercise for h∈{1,5,22} steps. Similar to the previous analysis, the tDM-statistic is implemented using an MA approximation including 5, 10, or 44 lags for forecast horizons h = 1, 5, and 22, respectively, as is customary in the literature. All other specifications are the same as before. Standard tests (tDM, tHAC, and tFB) agree upon rejection of the null hypothesis of equal predictive accuracy in favor of a better performance of the HAR-RV-TCJ model for h = 1 and h = 5, but not for h = 22. Table 11. Separation of continuous and jump components (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.12 0.41 0.38 0.09* 0.13 6.93 7.63 4.00 3.40 3.31 3.26 4.00 4.07 4.47 4.95 (2.61) (3.15) (3.69) (4.23) h = 5 0.09 0.26 0.25 0.07 0.01 3.67 3.79 2.79 3.60 3.83 4.25 2.79 3.98 5.10 5.85 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.29 0.29 0.36* 0.34* 0.78 0.91 0.67 0.14 0.15 0.17 0.67 0.93 1.06 1.16 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE h = 1 0.07 1.93 1.93 0.02 0.01 4.41 4.24 2.68 4.07 3.83 3.85 2.60 2.63 3.00 3.76 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.99 1.99 0.09* 0.02 1.38 1.42 1.36 1.32 1.38 1.51 1.50 4.22 3.05 11.27 (2.77) (4.10) (5.97) (7.99) h = 22 –0.01 2.03 2.03 0.43* 0.38* –0.12 –0.14 –0.12 –0.02 –0.02 –0.02 –0.12 –0.21 –0.17 –0.21 (1.65) (1.65) (2.37) (1.65) (8.85) (11.77) (16.17) (21.60) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.12 0.41 0.38 0.09* 0.13 6.93 7.63 4.00 3.40 3.31 3.26 4.00 4.07 4.47 4.95 (2.61) (3.15) (3.69) (4.23) h = 5 0.09 0.26 0.25 0.07 0.01 3.67 3.79 2.79 3.60 3.83 4.25 2.79 3.98 5.10 5.85 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.29 0.29 0.36* 0.34* 0.78 0.91 0.67 0.14 0.15 0.17 0.67 0.93 1.06 1.16 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE h = 1 0.07 1.93 1.93 0.02 0.01 4.41 4.24 2.68 4.07 3.83 3.85 2.60 2.63 3.00 3.76 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.99 1.99 0.09* 0.02 1.38 1.42 1.36 1.32 1.38 1.51 1.50 4.22 3.05 11.27 (2.77) (4.10) (5.97) (7.99) h = 22 –0.01 2.03 2.03 0.43* 0.38* –0.12 –0.14 –0.12 –0.02 –0.02 –0.02 –0.12 –0.21 –0.17 –0.21 (1.65) (1.65) (2.37) (1.65) (8.85) (11.77) (16.17) (21.60) Notes: The forecast performance of the HAR-RV-TCJ model is tested against the HAR-RV for different forecast horizons. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the 5% level and an additional star indicates significance at the 1% level. Critical values of the tests are given in parentheses. Table 11. Separation of continuous and jump components (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.12 0.41 0.38 0.09* 0.13 6.93 7.63 4.00 3.40 3.31 3.26 4.00 4.07 4.47 4.95 (2.61) (3.15) (3.69) (4.23) h = 5 0.09 0.26 0.25 0.07 0.01 3.67 3.79 2.79 3.60 3.83 4.25 2.79 3.98 5.10 5.85 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.29 0.29 0.36* 0.34* 0.78 0.91 0.67 0.14 0.15 0.17 0.67 0.93 1.06 1.16 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE h = 1 0.07 1.93 1.93 0.02 0.01 4.41 4.24 2.68 4.07 3.83 3.85 2.60 2.63 3.00 3.76 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.99 1.99 0.09* 0.02 1.38 1.42 1.36 1.32 1.38 1.51 1.50 4.22 3.05 11.27 (2.77) (4.10) (5.97) (7.99) h = 22 –0.01 2.03 2.03 0.43* 0.38* –0.12 –0.14 –0.12 –0.02 –0.02 –0.02 –0.12 –0.21 –0.17 –0.21 (1.65) (1.65) (2.37) (1.65) (8.85) (11.77) (16.17) (21.60) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.12 0.41 0.38 0.09* 0.13 6.93 7.63 4.00 3.40 3.31 3.26 4.00 4.07 4.47 4.95 (2.61) (3.15) (3.69) (4.23) h = 5 0.09 0.26 0.25 0.07 0.01 3.67 3.79 2.79 3.60 3.83 4.25 2.79 3.98 5.10 5.85 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.29 0.29 0.36* 0.34* 0.78 0.91 0.67 0.14 0.15 0.17 0.67 0.93 1.06 1.16 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE h = 1 0.07 1.93 1.93 0.02 0.01 4.41 4.24 2.68 4.07 3.83 3.85 2.60 2.63 3.00 3.76 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.99 1.99 0.09* 0.02 1.38 1.42 1.36 1.32 1.38 1.51 1.50 4.22 3.05 11.27 (2.77) (4.10) (5.97) (7.99) h = 22 –0.01 2.03 2.03 0.43* 0.38* –0.12 –0.14 –0.12 –0.02 –0.02 –0.02 –0.12 –0.21 –0.17 –0.21 (1.65) (1.65) (2.37) (1.65) (8.85) (11.77) (16.17) (21.60) Notes: The forecast performance of the HAR-RV-TCJ model is tested against the HAR-RV for different forecast horizons. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the 5% level and an additional star indicates significance at the 1% level. Critical values of the tests are given in parentheses. If we consider estimates of the memory parameter (for the MSE loss in the upper Panel A of Table 11), strong (stationary) long memory of 0.34 is only found for h = 22. For smaller forecast horizons of h = 1 and h = 5, LPWN estimates are no longer significantly different from zero since the asymptotic variance is inflated by a multiplicative constant which is also larger for smaller values of d. However, local Whittle estimates remain significant at d̂LW=0.09 and d̂LW=0.07, which is qualitatively similar to the results obtained using the LPWN estimator. Therefore, there is again evidence for a transmission of memory to the forecast error loss differential and the rejections of equal predictive accuracy obtained using standard tests might be spurious. Nevertheless, the improvement in forecast accuracy is large enough, so that the long memory robust tMAC- and tEFB-statistics reject across the board for h = 1 and h = 5. When considering the QLIKE loss function as an alternative to the MSE in the lower Panel B of Table 11, we find that the memory of the loss differential increases with the forecast horizon, similar to the MSE case. However, the standardized mean of the loss differential is much smaller, which indicates that the improvement achieved by allowing for separate dynamics of the continuous components and the jump components is dependent upon the loss metric. The conventional tests now only reject for h = 1. This is confirmed by the tMAC-statistics, but not the tEFB which provides two rejections for the case h = 5. Since the estimated memory parameter for h = 1 is nearly zero in the QLIKE case, it is likely that the nonrejections of the tEFB-statistic can be attributed to the lower power of the more flexible procedure. Taking these results together, we can therefore confirm that the separation of continuous and jump components indeed improves the forecast performance under MSE loss on daily and weekly horizons. Under QLIKE loss a significant improvement is only found for h = 1. The results obtained so far are based on realized variances calculated including overnight returns (following Bekaert and Hoerova (2014)). Other authors use only intraday returns. As a robustness check, we repeat the analysis excluding overnight returns. In this case, jumps are only detected on about 21% of the days. However, the exclusion of overnight returns does not only affect the jump component, but also the RV measure itself. The results of our forecast comparison are shown in Table A.1 and A.2 in the Appendix. It can be seen that the results are qualitatively similar. If the VIX is included in the HAR-RV models and the forecasts are evaluated using MSE loss, both conventional DM tests and the robust versions reject. Under QLIKE loss we observe that the tDM and tHAC-statistic reject, but not the tFB and the long memory robust versions. This confirms our finding that there is, if at all, weak statistical evidence that the inclusion of the VIX in HAR-RV models improves the predictive accuracy. The memory of the loss differential, however, is found to be even stronger than it is if overnight returns are included. This again highlights the need for memory robust methods. For the separation of continuous and jump components in the HAR-RV model, the results are also similar to our previous ones. The memory parameters of the loss differentials are a bit lower and the tEFB-statistic under MSE loss rejects only for small bandwidths, whereas the tMAC-statistic now also provides some evidence for predictability with h = 5. Overall, we find that our previous finding is robust: the null hypothesis of equal predictive accuracy is rejected for shorter forecast horizons using both conventional and robust tests, but not for h = 22. 7 Conclusion This article deals with forecast evaluation under long memory. We show that long memory can be transmitted from the forecasts ŷit and the forecast objective yt to the forecast error loss differential series zt in various settings. We demonstrate that standard implementations of the popular test of Diebold and Mariano (1995) work poorly in these cases. Rejections of the null hypothesis of equal predictive accuracy might therefore be spurious if the series of interest exhibits long memory. To address these problems, the MAC estimator of Robinson (2005) and Abadir, Distaso, and Giraitis (2009), as well as the extended fixed-b approach of McElroy and Politis (2012) are discussed. Simulations verify our theoretical results and demonstrate that the memory transmission extends to other loss functions such as QLIKE, non-Gaussian processes, and nonstationary processes. Furthermore, empirical forecast scenarios underline the practical relevance of these issues. Finally, when studying the finite-sample performance of the tMAC and tEFB-statistics, we find that the tEFB-statistic (with the MQS kernel) provides the best size control, whereas the tEFB-statistic using the Bartlett kernel has the best power. Under MSE loss, the size control of all tests is satisfactory in large samples. However, under QLIKE loss we find that only the MQS kernel allows for reliable size control. We therefore recommend to use the Bartlett kernel under MSE loss and in large samples, whereas in smaller samples and under QLIKE loss the MQS kernel should be preferred. An important example of long memory time series is the realized variance of the S&P500. It has been the subject of various forecasting exercises. In contrast to the previous studies, we only find weak statistical evidence for the hypothesis that the inclusion of the VIX index in HAR-RV-type models leads to an improved forecast performance. Taking the memory of the loss differentials into account reverses the test decisions and suggests that the corresponding findings might be spurious. With regard to the separation of continuous components and jump components, as suggested by Andersen, Bollerslev, and Diebold (2007), on the other hand, the improvements in forecast accuracy remain significant at a daily horizon. These examples stress the importance of long memory robust statistics in practice. Other time series that are routinely found to exhibit long memory include exchange rates, inflation rates, and interest rates (for a recent survey cf. Gil-Alana and Hualde (2009)). The robust test statistics considered here can therefore be helpful in a wider range of applications beyond volatility. Footnotes 1 See, for example, Davidson (1994), chapter 24, for sets of suitable assumptions. 2 Sometimes the terms long memory and fractional integration are used interchangeably. However, a stationary fractionally integrated process at has spectral density fa(λ)=|1−eiλ|−2daGa(λ), so that fa(λ)∼G(λ)|λ|−2da as λ→0 since |1−eiλ|→λ as λ→0. Therefore, fractional integration is a special case of long memory, but many other processes would satisfy Definition 1, too. Examples include non-causal processes and processes with trigonometric power law coefficients, as recently discussed in Kechagias and Pipiras (2015). 3 The bandwidth parameter of the fixed-b estimator is set to b = 0.8 since using a larger fraction of the autocorrelations provides a higher emphasis on size control (cf. Kiefer and Vogelsang (2005)). Other bandwidth choices lead to similar results. 4 Abadir et al. (2009) also consider long memory versions of the classic HAC estimators. However, these extensions have some important shortcomings. First, the extended HAC estimators are very sensitive toward the bandwidth choice as the MSE-optimal rate depends on d. Second, this MSE-optimal rate (similar to the distribution of the long-run variance estimators) has a discontinuity at d = 1/4 and differs for d∈(0,1/4) and d∈(1/4,1/2), see Equation (2.14) in Abadir et al. (2009). The situation is further complicated by the fact that d is generally unknown. On the contrary, these criticisms do not apply to the MAC estimator. In particular, the MSE-optimal bandwidth choice m=⌊T4/5⌋ is independent of d. Thus, we focus on the MAC estimator and do not consider extended HAC estimators further. 5 All common kernels (e.g., Bartlett, Parzen) as well as others considered in Kiefer and Vogelsang (2005) can be used. In addition to the aforementioned, McElroy and Politis (2012) use the Daniell, the Trapezoid, the Modified Quadratic Spectral, the Tukey-Hanning, and the Bohman kernel. 6 To keep the implied degree of spurious long memory constant as the sample size increases, we set p = 0.02. 7 To decrease the computational burden, we take the actual mean shift process as a forecast. We thus abstract from estimation error in the (unconditional) mean component. 8 Additional simulations with qd=0.80 yield nearly identical results. 9 This is in line with short memory simulations in Patton and Sheppard (2009). 10 To calculate ζj, we follow Corsi, Pirino, and Renò (2010). 11 We choose Ry = 1 and Rw = 0 concerning the polynomial degrees and a bandwidth md=⌊T0.8⌋. 12 As a robustness check, we repeat the analysis for a larger window of 2500 observations and obtain qualitatively similar results. * We would like to thank the editor Andrew Patton and two anonymous referees for their helpful remarks which improved the quality of the article significantly. Moreover, we thank Philipp Sibbertsen, Karim Abadir, Guillaume Chevillion, Mauro Costantini, Matei Demetrescu, Niels Haldrup, Uwe Hassler, Michael Massmann, Tucker McElroy, and Uta Pigorsch as well as the participants of the 3rd Time Series Workshop in Rimini, the 2nd IAAE conference in Thessaloniki, the Statistische Woche 2015 in Hamburg, the 4th Long-Memory Symposium in Aarhus, the 16th IWH-CIREQ Workshop in Halle and the CFE 2015 in London for their helpful comments. R.K. gratefully acknowledge support from CREATES—Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. References Abadir K. M. , Distaso W. , Giraitis L. . 2009 . Two Estimators of the Long-Run Variance: Beyond Short Memory . Journal of Econometrics 150 ( 1 ): 56 – 70 . Google Scholar CrossRef Search ADS Andersen T. G. , Bollerslev T. , Diebold F. X. . 2007 . Roughing It Up: Including Jump Components in the Measurement, Modeling, and Forecasting of Return Volatility . The Review of Economics and Statistics 89 ( 4 ): 701 – 720 . Google Scholar CrossRef Search ADS Andersen T. G. , Bollerslev T. , Diebold F. X. , Labys P. . 2001 . The Distribution of Realized Exchange Rate Volatility . Journal of the American Statistical Association 96 ( 453 ): 42 – 55 . Google Scholar CrossRef Search ADS Andrews D. W. 1991 . Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation . Econometrica 59 ( 3 ): 817 – 858 . Google Scholar CrossRef Search ADS Andrews D. W. , Monahan J. C. . 1992 . An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator . Econometrica 60 ( 4 ): 953 – 966 . Google Scholar CrossRef Search ADS Barndorff-Nielsen O. E. , Shephard N. . 2002 . Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models . Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 ( 2 ): 253 – 280 . Google Scholar CrossRef Search ADS Barndorff-Nielsen O. E. , Shephard N. . 2004 . Power and Bipower Variation with Stochastic Volatility and Jumps . Journal of Financial Econometrics 2 ( 1 ): 1 – 37 . Google Scholar CrossRef Search ADS Barndorff-Nielsen O. E. , Shephard N. . 2006 . Econometrics of Testing for Jumps in Financial Economics Using Bipower Variation . Journal of Financial Econometrics 4 ( 1 ): 1 – 30 . Google Scholar CrossRef Search ADS Becker R. , Clements A. E. , McClelland A. . 2009 . The Jump Component of S&P 500 Volatility and the VIX Index . Journal of Banking & Finance 33 ( 6 ): 1033 – 1038 . Google Scholar CrossRef Search ADS Becker R. , Clements A. E. , White S. I. . 2007 . Does Implied Volatility Provide any Information Beyond that Captured in Model-Based Volatility Forecasts? Journal of Banking & Finance 31 ( 8 ): 2535 – 2549 . Google Scholar CrossRef Search ADS Bekaert G. , Hoerova M. . 2014 . The VIX, the Variance Premium and Stock Market Volatility . Journal of Econometrics 183 ( 2 ): 181 – 192 . Google Scholar CrossRef Search ADS Beran J. , Feng Y. , Ghosh S. , Kulik R. . 2013 . Long Memory Processes: Probabilistic Properties and Statistical Methods . London : Springer . Google Scholar CrossRef Search ADS Berkes I. , Rorvath L. , Kokoszka P. , Shao Q.-M. . 2006 . On Discriminating between Long-Range Dependence and Changes in Mean . The Annals of Statistics 34 ( 3 ): 1140 – 1165 . Google Scholar CrossRef Search ADS Bollerslev T. , Osterrieder D. , Sizova N. , Tauchen G. . 2013 . Risk and Return: Long-Run Relations, Fractional Cointegration, and Return Predictability . Journal of Financial Economics 108 ( 2 ): 409 – 424 . Google Scholar CrossRef Search ADS Bollerslev T. , Tauchen G. , Zhou H. . 2009 . Expected Stock Returns and Variance Risk Premia . Review of Financial Studies 22 ( 11 ): 4463 – 4492 . Google Scholar CrossRef Search ADS Busch T. , Christensen B. J. , Nielsen M. Ø. . 2011 . The Role of Implied Volatility in Forecasting Future Realized Volatility and Jumps in Foreign Exchange, Stock, and Bond Markets . Journal of Econometrics 160 ( 1 ): 48 – 57 . Google Scholar CrossRef Search ADS Chambers M. J. 1998 . Long Memory and Aggregation in Macroeconomic Time Series . International Economic Review 39 ( 4 ): 1053 – 1072 . Google Scholar CrossRef Search ADS Chen X. , Ghysels E. . 2011 . News–Good or Bad–and its Impact on Volatility Predictions Over Multiple Horizons . Review of Financial Studies 24 ( 1 ): 46 – 81 . Google Scholar CrossRef Search ADS Chernov M. 2007 . On the Role of Risk Premia in Volatility Forecasting . Journal of Business & Economic Statistics 25 ( 4 ): 411 – 426 . Google Scholar CrossRef Search ADS Chiriac R. , Voev V. . 2011 . Modelling and Forecasting Multivariate Realized Volatility . Journal of Applied Econometrics 26 ( 6 ): 922 – 947 . Google Scholar CrossRef Search ADS Choi H.-S. , Kiefer N. M. . 2010 . Improving Robust Model Selection Tests for Dynamic Models . The Econometrics Journal 13 ( 2 ): 177 – 204 . Google Scholar CrossRef Search ADS Christensen B. J. , Nielsen M. Ø. . 2006 . Asymptotic Normality of Narrow-Band Least Squares in the Stationary Fractional Cointegration Model and Volatility Forecasting . Journal of Econometrics 133 ( 1 ): 343 – 371 . Google Scholar CrossRef Search ADS Clark T. E. 1999 . Finite-Sample Properties of Tests for Equal Forecast Accuracy . Journal of Forecasting 18 ( 7 ): 489 – 504 . Google Scholar CrossRef Search ADS Corsi F. 2009 . A Simple Approximate Long-Memory Model of Realized Volatility . Journal of Financial Econometrics 7 ( 2 ): 174 – 196 . Google Scholar CrossRef Search ADS Corsi F. , Pirino D. , Renò R. . 2010 . Threshold Bipower Variation and the Impact of Jumps on Volatility Forecasting . Journal of Econometrics 159 ( 2 ): 276 – 288 . Google Scholar CrossRef Search ADS Corsi F. , Renò R. . 2012 . Discrete-Time Volatility Forecasting with Persistent Leverage Effect and the Link with Continuous-Time Volatility Modeling . Journal of Business & Economic Statistics 30 ( 3 ): 368 – 380 . Google Scholar CrossRef Search ADS Davidson J. 1994 . Stochastic Limit Theory: An Introduction for Econometricians . Oxford: Oxford University Press . Google Scholar CrossRef Search ADS Deo R. , Hurvich C. , Lu Y. . 2006 . Forecasting Realized Volatility using a Long-Memory Stochastic Volatility Model: Estimation, Prediction and Seasonal Adjustment . Journal of Econometrics 131 ( 1 ): 29 – 58 . Google Scholar CrossRef Search ADS Diebold F. X. 2015 . Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests . Journal of Business & Economic Statistics 33 ( 1 ): 1 – 8 . Google Scholar CrossRef Search ADS Diebold F. X. , Inoue A. . 2001 . Long Memory and Regime Switching . Journal of Econometrics 105 : 131 – 159 . Google Scholar CrossRef Search ADS Diebold F. X. , Mariano R. S. . 1995 . Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 ( 3 ): 253 – 263 . Dittmann I. , Granger C. W. . 2002 . Properties of Nonlinear Transformations of Fractionally Integrated Processes . Journal of Econometrics 110 ( 2 ): 113 – 133 . Google Scholar CrossRef Search ADS Fitzsimmons P. , McElroy T. . 2010 . On Joint Fourier–Laplace Transforms . Communications in Statistics—Theory and Methods 39 ( 10 ): 1883 – 1885 . Google Scholar CrossRef Search ADS Frederiksen P. , Nielsen F. S. , Nielsen M. Ø. . 2012 . Local Polynomial Whittle Estimation of Perturbed Fractional Processes . Journal of Econometrics 167 ( 2 ): 426 – 447 . Google Scholar CrossRef Search ADS Giacomini R. , White H. , 2006 . Tests of Conditional Predictive Ability . Econometrica 74 ( 6 ): 1545 – 1578 . Google Scholar CrossRef Search ADS Gil-Alana L. A. , Hualde J. . 2009 . “ Fractional Integration and Cointegration: an Overview and an Empirical Application .” In Palgrave Handbook of Econometrics , 434 – 469 . London: Springer . Google Scholar CrossRef Search ADS Granger C. W. , Hyung N. . 2004 . Occasional Structural Breaks and Long Memory with an Application to the S&P 500 Absolute Stock Returns . Journal of Empirical Finance 11 ( 3 ): 399 – 421 . Google Scholar CrossRef Search ADS Harvey D. , Leybourne S. , Newbold P. . 1997 . Testing the Equality of Prediction Mean Squared Errors . International Journal of Forecasting 13 ( 2 ): 281 – 291 . Google Scholar CrossRef Search ADS Hou J. , Perron P. . 2014 . Modified Local Whittle Estimator for Long Memory Processes in the Presence of Low Frequency (and other) Contaminations . Journal of Econometrics 182 ( 2 ): 309 – 328 . Google Scholar CrossRef Search ADS Hualde J. , Velasco C. . 2008 . Distribution-Free Tests of Fractional Cointegration . Econometric Theory 24 ( 1 ): 216 – 255 . Google Scholar CrossRef Search ADS Kechagias S. , Pipiras V. . 2015 . Definitions and Representations of Multivariate Long-Range Dependent Time Series . Journal of Time Series Analysis 36 ( 1 ): 1 – 25 . Google Scholar CrossRef Search ADS Kiefer N. M. , Vogelsang T. J. . 2005 . A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests . Econometric Theory 21 ( 6 ): 1130 – 1164 . Google Scholar CrossRef Search ADS Kruse R. 2015 . A Modified Test against Spurious Long Memory . Economics Letters 135 : 34 – 38 . Google Scholar CrossRef Search ADS Leschinski C. 2017 . On the Memory of Products of Long Range Dependent Time Series . Economics Letters 153 : 72 – 76 . Google Scholar CrossRef Search ADS Li J. , Patton A. . 2018 . Asymptotic Inference about Predictive Accuracy using High Frequency Data . Journal of Econometrics 203 ( 2 ): 223 – 240 . Google Scholar CrossRef Search ADS Lu Y. K. , Perron P. . 2010 . Modeling and Forecasting Stock Return Volatility using a Random Level Shift Model . Journal of Empirical Finance 17 ( 1 ): 138 – 156 . Google Scholar CrossRef Search ADS McElroy T. , Politis D. N. . 2012 . Fixed-b Asymptotics for the Studentized Mean from Time Series with Short, Long, or Negative Memory . Econometric Theory 28 ( 2 ): 471 – 481 . Google Scholar CrossRef Search ADS Mariano R. S. , Preve D. . 2012 . Statistical Tests for Multiple Forecast Comparison . Journal of Econometrics 169 ( 1 ): 123 – 130 . Google Scholar CrossRef Search ADS Martens M. , Dijk D. Van , De Pooter M. . 2009 . Forecasting S&P 500 Volatility: Long Memory, Level Shifts, Leverage Effects, Day-of-the-Week Seasonality, and Macroeconomic Announcements . International Journal of Forecasting 25 ( 2 ): 282 – 303 . Google Scholar CrossRef Search ADS Newey W. K. , West K. D. . 1987 . A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix . Econometrica 55 ( 3 ): 703 – 708 . Google Scholar CrossRef Search ADS Nielsen M. Ø. 2007 . Local Whittle Analysis of Stationary Fractional Cointegration and the Implied–Realized Volatility Relation . Journal of Business & Economic Statistics 25 ( 4 ): 427 – 446 . Google Scholar CrossRef Search ADS Patton A. J. 2011 . Volatility Forecast Comparison using Imperfect Volatility Proxies . Journal of Econometrics 160 ( 1 ): 246 – 256 . Google Scholar CrossRef Search ADS Patton A. J. , Sheppard K. . 2009 . “ Evaluating Volatility and Correlation Forecasts .” In Handbook of Financial Time Series , 801 – 838 . Berlin, Heidelberg: Springer . Google Scholar CrossRef Search ADS Perron P. , Qu Z. . 2010 . Long-Memory and Level Shifts in the Volatility of Stock Market Return Indices . Journal of Business & Economic Statistics 28 ( 2 ): 275 – 290 . Google Scholar CrossRef Search ADS Phillips P. C. B. , Kim C. S. . 2007 . Long-Run Covariance Matrices for Fractionally Integrated Processes . Econometric Theory 23 ( 6 ): 1233 – 1247 . Google Scholar CrossRef Search ADS Qu Z. 2011 . A Test against Spurious Long Memory . Journal of Business & Economic Statistics 29 ( 3 ): 423 – 438 . Google Scholar CrossRef Search ADS Robinson P. M. 2005 . Robust Covariance Matrix Estimation: HAC Estimates with Long Memory/Antipersistence Correction . Econometric Theory 21 ( 1 ): 171 – 180 . Google Scholar CrossRef Search ADS Rossi B. 2005 . Testing Long-Horizon Predictive Ability with High Persistence, and the Meese–Rogoff Puzzle . International Economic Review 46 ( 1 ): 61 – 92 . Google Scholar CrossRef Search ADS Sun Y. , Phillips P. C. , Jin S. . 2008 . Optimal Bandwidth Selection in Heteroskedasticity–Autocorrelation Robust Testing . Econometrica 76 ( 1 ): 175 – 194 . Google Scholar CrossRef Search ADS Varneskov R. T. , Perron P. . 2017 . Combining Long Memory and Level Shifts in Modelling and Forecasting the Volatility of Asset Returns . Quantitative Finance 18 ( 3 ): 371 – 393 . Google Scholar CrossRef Search ADS West K. D. 1996 . Asymptotic Inference about Predictive Ability . Econometrica 64 ( 5 ): 1067 – 1084 . Google Scholar CrossRef Search ADS Appendix Proofs Proof (Proposition 2) By defining at*=at−μa, for at∈{yt,ŷ1t,ŷ2t}, the loss differential zt in (7) can be re-expressed as zt=−2yt(ŷ1t−ŷ2t)+ŷ1t2−ŷ2t2=−2(yt*+μy)[ŷ1t*+μ1−ŷ2t*−μ2]+(ŷ1t*+μ1)2−(ŷ2t*+μ2)2=−2{yt*ŷ1t*+μ1yt*−yt*ŷ2t*−yt*μ2+μyŷ1t*+μyμ1−ŷ2t*μy−μ2μy} +ŷ1t*2+2ŷ1t*μ1+μ12−ŷ2t*2−2ŷ2t*μ2−μ22=−2[yt*(μ1−μ2)+ŷ1t*(μy−μ1)−ŷ2t*(μy−μ2)]︸I−2[yt*(ŷ1t*−ŷ2t*)]︸II+ŷ1t*2−ŷ2t*2︸III+const. (15) Proposition 3 in Chambers (1998) states that the memory of a linear combination of fractionally integrated processes is equal to the maximum of the memory orders of the components. As discussed in Leschinski (2017), this result also applies for long memory processes in general since the proof is only based on the long memory properties of the fractionally integrated processes. We can therefore also apply it to (15). To determine the memory of the forecast error loss differential zt, we have to determine the memory orders of the three individual components I, II, and III in the linear combination. Regarding I, we have yt*∼LM(dy), ŷ1t*∼LM(d1) and ŷ2t*∼LM(d2). For terms II and III, we refer to Proposition 1 from Leschinski (2017). We thus have for i∈{1,2} yt*ŷit*∼{LM(max⁡{dy+di−1/2,0}), if Sy,ŷi≠0LM(dy+di−1/2), if Sy,ŷi=0 (16) and ŷit*2 ∼LM(max⁡{2di−1/2,0}). (17) Further note that dy>dy+di−1/2 and di>dy+di−1/2 (18) and di>2di−1/2, (19) since 0≤da<1/2 for a∈{y,1,2}. Using these properties, we can determine the memory dz in (15) via a case-by-case analysis. First, if μ1≠μ2≠μy the memory of the original terms dominates because of (18) and (19) and we obtain dz=max⁡{dy, d1, d2}. Second, if μ1=μ2≠μy, then yt* drops out from (15), but the two forecasts ŷ1t* and ŷ2t* remain. From (18) and (19), we have that d1 and d2 dominate their transformations leading to the result dz=max⁡{d1, d2}. Third, if μ1=μy≠μ2, the forecast ŷ1t* vanishes and d2 and dy dominate their reduced counterparts by (18) and (19), so that dz=max⁡{2d1−1/2, d2, dy}. Fourth, by the same arguments just as before, dz=max⁡{2d2−1/2, d1, dy} if μ2=μy≠μ1. Finally, if μ1=μ2=μy, the forecast objective yt* as well as both forecasts ŷ1t* and ŷ2t* drop from (15). The memory of the loss differential is therefore the maximum of the memory orders in the remaining four terms in II and III that are given in (16) and (17). Furthermore, the memory of the squared series given in (17) is always non-negative from Corollary 1 in Leschinski (2017) and a linear combination of an antipersistent process with an LM(0) series is LM(0), from Proposition 3 of Chambers (1998). Therefore, the lower bound for dz is zero and dz=max⁡{2max⁡{d1, d2}−1/2, dy+max⁡{d1, d2}−1/2, 0}. □ Proof (Proposition 3) For the case that common long memory is permitted, we consider three possible situations: CLM between the forecasts ŷ1t and ŷ2t, CLM between the forecast objective yt and one of the forecasts ŷ1t or ŷ2t and finally CLM between yt and each ŷ1t and ŷ2t. First, note that as a direct consequence of Assumption 3, we have μi=βi+ξiμx (20) and μy=βy+ξyμx. (21) We can now re-express the forecast error loss differential zt in (15) for each possible CLM relationship. In all cases, tedious algebraic steps are not reported to save space. In the case of CLM between ŷ1t and ŷ2t, we have zt=−2{yt*(μ1−μ2)+xt*[ξ1(μy−μ1)−ξ2(μy−μ2)]+xt*yt*(ξ1−ξ2)−xt*(ξ1ɛ1t−ξ2ɛ2t) +ɛ1t(μy−μ1)−ɛ2t(μy−μ2)+μx(ɛ1tξ1−ɛ2tξ2)+yt*(ɛ1t−ɛ2t)} +xt*2(ξ12−ξ22)+ɛ1t2−ɛ2t2+2μx(ɛ1tξ1−ɛ2tξ2)+const. (22) If the forecast objective yt and one of the ŷit have CLM, we have for ŷ1t: zt=−2{xt*[(μy−μ1)ξ1+ξy(μ1−μ2)]−ŷ2t*[μy−μ2]−ξyxt*ŷ2t*+xt*[ɛ1t(ξy−ξ1)+ξ1ηt] +ɛ1t(ξyμx−μ1)+ηt(μ1−μ2)+ɛ1tηt−ŷ2t*ηt} −(2ξ1ξy−ξ12)xt*2+ɛ1t2−ŷ2t*2−2βyɛ1t+const. (23) The result for CLM between yt and ŷ2t is entirely analogous, but with index “1” being replaced by “2”. 3. Finally, if yt has CLM with both ŷ1t and ŷ2t, we have: zt=−2{xt*[ξ1(μy−μ1)−ξ2(μy−μ2)+ξy(μ1−μ2)] +xt*[(ξy−ξ1)ɛ1t−(ξy−ξ2)ɛ2t+(ξ1−ξ2)ηt] +xt*2[ξy(ξ1−ξ2)−12(ξ12−ξ22)] +ɛ1t(μy−μ1)−ɛ2t(μy+μ2)+μx(ξ1ɛ1t+ξ2ɛ2t)+ηt(ɛ1t−ɛ2t)+ηt[μ1−μ2]} +ɛ1t2−ɛ2t2+2μx(ξ1ɛ1t−ξ2ɛ2t)+const. (24) As in the proof of Proposition 2, we can now determine the memory orders of zt in (22), (23), and (24) by first considering the memory of each term in each of the linear combinations and then by applying Proposition 3 of Chambers (1998) thereafter. However, note that yt*(μ1−μ2)+xt*[ξ1(μy−μ1)−ξ2(μy−μ2)] in(22),xt*[(μy−μ1)ξ1+ξy(μ1−μ2)]−ŷ2t*(μy−μ2) in(23) and xt*[ξ1(μy−μ1)−ξ2(μy−μ2)+ξy(μ1−μ2)] in(24) have the same structure as yt*(μ1−μ2)+ŷ1t*(μy−μ1)−ŷ2t*(μy−μ2) in(15) and that all of the other nonconstant terms in (22), (23), and (24) are either squares or products of demeaned series, so that their memory is reduced according to Proposition 1 from Leschinski (2017). From Assumption 3, xt* is the common factor driving the series with CLM and from dx>dɛ1,dɛ2,dη and the dominance of the largest memory in a linear combination from Proposition 3 in Chambers (1998), xt* has the same memory as the series involved in the CLM relationship. Now from (18) and (19), the reduced memory of the product series and the squared series is dominated by that of either xt*, yt*, ŷ1t* or ŷ2t*. Therefore, whenever a bias term is nonzero, the memory of the linear combination can be no smaller than that of the respective original series. To obtain the results in Proposition 3, set the terms in square brackets in Equations (22), (23), and (15) equal to zero and solve for the quotient of the factor loadings. This determines the transmission of the memory of xt*. For the effect of the series that is not involved in the CLM relationship, we impose the restrictions μ1≠μ2 and μi≠μy, as stated in the proposition. □ Proof (Proposition 4). The results in Proposition 4 follow directly from Equations (22), (23), and (24), above. For (22) the terms in square brackets can be re-expressed as [ξ1(μy−μ1)−ξ2(μy−μ2)]=(ξ1−ξ2)μy+ξ2μ2−ξ1μ1. Obviously, for ξ1=ξ2, this is reduced to ξ2μ2−ξ1μ1, which does not vanish since μ1≠μ2. The other cases are treated entirely analogous. For (23) we have [(μy−μ1)ξ1+ξy(μ1−μ2)]=ξ1μy−ξyμ2, and in (24) [ξ1(μy−μ1)−ξ2(μy−μ2)+ξy(μ1−μ2)]=(ξ1−ξ2)μy−(ξ1−ξy)μ1+(ξ2−ξy)μ2=0, so that xt* drops out and the memory is reduced. □ Proof (Proposition 5). Under the assumptions of Proposition 3, (22) is reduced to zt=−2{−xt*(ξ1ɛ1t−ξ2ɛ2t)+yt*(ɛ1t−ɛ2t)}+ɛ1t2−ɛ2t2+const,=−2{−ξ1xt*ɛ1t︸I+ξ2xt*ɛ2t︸II+yt*ɛ1t︸III−yt*ɛ2t︸IV}+ɛ1t2︸V−ɛ2t2︸VI+const, (25) (23) becomes zt=−2{−xt*(ξyŷ2t*−ξ1ηt)+(ɛ1t−ŷ2t*)ηt+ɛ1t(ξyμx−μ1)}+ɛ1t2−ŷ2t*2−2βyɛ1t−ξ1ξyxt*2+const,=−2{−ξyxt*ŷ2t*︸I+ξ1xt*ηt︸II+ɛ1tηt︸III−ŷ2t*ηt︸IV+ɛ1t(ξyμx−μ1)︸V}+ɛ1t2︸VI−ŷ2t*2︸VII−2βyɛ1t︸VIII−ξ1ξyxt*2︸IX+const, (26) and finally (24) is zt=−2(ɛ1t−ɛ2t)ηt+ɛ1t2−ɛ2t2+const,=−2ɛ1tηt︸I+2ɛ2tηt︸II+ɛ1t2︸III−ɛ2t2︸IV+const. (27) We can now proceed as in the proof of Proposition 2 and infer the memory orders of each term in the respective linear combination from Proposition 1 and then determine the maximum as in Proposition 3 in Chambers (1998). In the following, we label the terms appearing in each of the equations by consecutive letters with the equation number as an index. For the terms in (25), we have I25∼{LM(max⁡{dx+dɛ1−1/2, 0}), if Sx,ɛ1≠0LM(dx+dɛ1−1/2), if Sx,ɛ1=0II25∼{LM(max⁡{dx+dɛ2−1/2, 0}), if Sx,ɛ2≠0LM(dx+dɛ2−1/2), if Sx,ɛ2=0III25∼{LM(max⁡{dy+dɛ1−1/2, 0}), if Sy,ɛ1≠0LM(dy+dɛ1−1/2), if Sy,ɛ1=0IV25∼{LM(max⁡{dy+dɛ2−1/2, 0}), if Sy,ɛ2≠0LM(dy+dɛ2−1/2), if Sy,ɛ2=0V25∼LM(max⁡{2dɛ1−1/2, 0})and VI25∼LM(max⁡{2dɛ2−1/2, 0}). Since by definition dx>dɛi, the memory of V25 and VI25 is always of a lower order than that of I25 and II25. As in the proof of Proposition 2, the squares in terms V25 and VI25 establish zero as the lower bound of dz. Therefore, we have dz=max⁡{max⁡{dx, dy}+max⁡{dɛ1, dɛ2}−1/2, 0}. Similarly, in (26), we have I26∼{LM(max⁡{dx+d2−1/2, 0}), if Sx,ŷ2≠0LM(dx+d2−1/2), if Sx,ŷ2=0II26∼{LM(max⁡{dx+dη−1/2, 0}), if Sx,η≠0LM(dx+dη−1/2), if Sx,η=0III26∼{LM(max⁡{dɛ1+dη−1/2, 0}), if Sɛ1,η≠0LM(dɛ1+dη−1/2), if Sɛ1,η=0IV26∼{LM(max⁡{d2+dη−1/2, 0}), if Sŷ2,η≠0LM(d2+dη−1/2), if Sŷ2,η=0V26∼LM(dɛ1)VI26∼LM(max⁡{2dɛ1−1/2, 0})VII26∼LM(max⁡{2d2−1/2, 0})VIII26∼LM(dɛ1)and IX26∼LM(max⁡{2dx−1/2,0}). Here, V26 can be disregarded since it is of the same order as VIII26. VIII26 dominates VI26, because dɛ1<1/2. Finally, as dɛ1<dx holds by assumption, III26 is dominated by II26 and dη<dx, so that IX26 dominates II26. Therefore, dz=max⁡{d2+max⁡{dx,dη}−1/2, 2max⁡{dx,d2}−1/2, dɛ1}. As before, for the case of CLM between yt and ŷ2t, the proof is entirely analogous, but with index “1” replaced by “2” and vice versa. Finally, in (27), we have I27∼{LM(max⁡{dη+dɛ1−1/2, 0}), if Sη,ɛ1≠0LM(dη+dɛ1−1/2), if Sη,ɛ1=0II27∼{LM(max⁡{dη+dɛ2−1/2, 0}), if Sη,ɛ1≠0LM(dη+dɛ2−1/2), if Sη,ɛ2=0III27∼LM(max⁡{2dɛ1−1/2, 0})IV27∼LM(max⁡{2dɛ2−1/2, 0}). Here, no further simplifications can be made since we do not impose restrictions on the relationship between dη, dɛ1, and dɛ2, so that dz=max⁡{dη+max⁡{dɛ1, dɛ2}−1/2, 2max⁡{dɛ1, dɛ2}−1/2, 0}, where again the zero is established as the lower bound by the squares in III27 and IV27. □ Proof (Proposition 6). Under short memory, the tHAC-statistic is given by tHAC=T1/2z¯V̂HAC, with V̂HAC=∑j=−T+1T−1k(jB)γ̂z(j) and B being the bandwidth satisfying B→∞ and B=O(T1−ε) for some ε>0. From Abadir et al. (2009), the appropriately scaled long-run variance estimator for a long memory processes is given by B−1−2d∑i,j=1Bγ̂z(|i−j|), see Equation (2.2) in Abadir et al. (2009). Corresponding long memory robust HAC-type estimators (with a Bartlett kernel, for instance) take the form V̂HAC,d=B−2d(γ̂z(0)+2∑j=1B(1−j/B)γ̂z(j)). The long memory robust tHAC,d-statistic is then given by tHAC,d=T1/2−dz¯V̂HAC,d. We can therefore write tHAC,d=T1/2T−dz¯B−2dV̂HAC=T−dB−dtHAC and thus, tHAC=TdBdtHAC,d. The short memory tHAC-statistic is inflated by the scaling factor Td/Bd=O(Tdε). This leads directly to the divergence of the tHAC-statistic ( tHAC→∞ as T→∞) which implies that lim⁡T→∞P(|tHAC|>c1−α/2)=1 for all values of d∈(0,1/2) and where c1−α/2 is the 1−α/2-quantile from the N(0, 1)-distribution. The proof is analogous for other kernels and thus omitted. □ Additional Material for the Empirical Applications Table A.1. Predictive ability of models including the VIX for future RV calculated excluding overnight returns (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.08 0.28 0.27 0.36* 0.34* 1.23 1.36 1.08 0.25 0.28 0.33 1.08 1.40 2.10 2.33 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ 0.07 0.28 0.27 0.36* 0.31* 1.10 1.22 0.973 0.27 0.31 0.36 0.97 1.35 2.04 2.02 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ-L 0.05 0.27 0.26 0.30* 0.27* 0.79 0.87 0.78 0.24 0.27 0.32 0.78 1.10 1.61 1.56 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE HAR-RV 0.13 1.92 1.92 0.30* 0.21 2.27 2.40 1.93 1.00 1.15 1.32 2.00 4.91 6.99 3.66 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.12 1.92 1.92 0.25* 0.23* 2.22 2.40 1.76 0.85 0.94 1.06 1.82 3.95 3.34 2.83 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ-L 0.10 1.92 1.92 0.25* 0.20 1.92 2.01 1.72 0.85 0.97 1.11 1.79 3.47 2.95 2.68 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.08 0.28 0.27 0.36* 0.34* 1.23 1.36 1.08 0.25 0.28 0.33 1.08 1.40 2.10 2.33 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ 0.07 0.28 0.27 0.36* 0.31* 1.10 1.22 0.973 0.27 0.31 0.36 0.97 1.35 2.04 2.02 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ-L 0.05 0.27 0.26 0.30* 0.27* 0.79 0.87 0.78 0.24 0.27 0.32 0.78 1.10 1.61 1.56 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE HAR-RV 0.13 1.92 1.92 0.30* 0.21 2.27 2.40 1.93 1.00 1.15 1.32 2.00 4.91 6.99 3.66 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.12 1.92 1.92 0.25* 0.23* 2.22 2.40 1.76 0.85 0.94 1.06 1.82 3.95 3.34 2.83 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ-L 0.10 1.92 1.92 0.25* 0.20 1.92 2.01 1.72 0.85 0.97 1.11 1.79 3.47 2.95 2.68 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Notes: Models excluding the VIX are tested against models including the VIX. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the nominal 5% level; an additional star indicates significance at the nominal 1% level. Critical values of the tests are given in parentheses. View Large Table A.1. Predictive ability of models including the VIX for future RV calculated excluding overnight returns (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.08 0.28 0.27 0.36* 0.34* 1.23 1.36 1.08 0.25 0.28 0.33 1.08 1.40 2.10 2.33 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ 0.07 0.28 0.27 0.36* 0.31* 1.10 1.22 0.973 0.27 0.31 0.36 0.97 1.35 2.04 2.02 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ-L 0.05 0.27 0.26 0.30* 0.27* 0.79 0.87 0.78 0.24 0.27 0.32 0.78 1.10 1.61 1.56 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE HAR-RV 0.13 1.92 1.92 0.30* 0.21 2.27 2.40 1.93 1.00 1.15 1.32 2.00 4.91 6.99 3.66 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.12 1.92 1.92 0.25* 0.23* 2.22 2.40 1.76 0.85 0.94 1.06 1.82 3.95 3.34 2.83 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ-L 0.10 1.92 1.92 0.25* 0.20 1.92 2.01 1.72 0.85 0.97 1.11 1.79 3.47 2.95 2.68 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Summary statistics Short memory inference Long memory inference tMAC tEFB Model z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE HAR-RV 0.08 0.28 0.27 0.36* 0.34* 1.23 1.36 1.08 0.25 0.28 0.33 1.08 1.40 2.10 2.33 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ 0.07 0.28 0.27 0.36* 0.31* 1.10 1.22 0.973 0.27 0.31 0.36 0.97 1.35 2.04 2.02 (4.70) (5.55) (6.41) (7.28) HAR-RV-TCJ-L 0.05 0.27 0.26 0.30* 0.27* 0.79 0.87 0.78 0.24 0.27 0.32 0.78 1.10 1.61 1.56 (1.65) (1.65) (2.09) (1.65) (4.70) (5.55) (6.41) (7.28) Panel B: QLIKE HAR-RV 0.13 1.92 1.92 0.30* 0.21 2.27 2.40 1.93 1.00 1.15 1.32 2.00 4.91 6.99 3.66 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ 0.12 1.92 1.92 0.25* 0.23* 2.22 2.40 1.76 0.85 0.94 1.06 1.82 3.95 3.34 2.83 (4.23) (5.96) (8.50) (11.34) HAR-RV-TCJ-L 0.10 1.92 1.92 0.25* 0.20 1.92 2.01 1.72 0.85 0.97 1.11 1.79 3.47 2.95 2.68 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Notes: Models excluding the VIX are tested against models including the VIX. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the nominal 5% level; an additional star indicates significance at the nominal 1% level. Critical values of the tests are given in parentheses. View Large Table A.2. Separation of continuous and jump components for RV calculated excluding overnight returns (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.07 0.31 0.30 0.08 0.11 4.16 4.59 2.79 2.27 2.30 2.35 2.79 2.54 2.66 2.96 (2.61) (3.15) (3.69) (4.23) h = 5 0.069 0.229 0.222 0.057 0.000 2.368 2.398 2.134 2.474 2.704 3.155 2.134 2.267 2.671 3.093 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.28 0.28 0.23* 0.16 1.19 1.18 1.48 0.55 0.63 0.75 1.48 1.64 1.96 2.12 (1.65) (1.65) (3.40) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE h = 1 0.03 1.82 1.82 0.04 0.05 2.38 2.24 1.95 1.76 1.68 1.62 1.97 2.83 5.80 6.08 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.88 1.88 0.05 0.01 1.77 1.79 1.92 1.69 1.72 1.88 2.10 2.51 4.65 4.41 (2.77) (4.10) (5.97) (7.99) h = 22 0.06 1.92 1.92 0.22* 0.16 1.53 1.39 4.32 0.66 0.75 0.86 26.03 3.68 5.36 10.29 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.07 0.31 0.30 0.08 0.11 4.16 4.59 2.79 2.27 2.30 2.35 2.79 2.54 2.66 2.96 (2.61) (3.15) (3.69) (4.23) h = 5 0.069 0.229 0.222 0.057 0.000 2.368 2.398 2.134 2.474 2.704 3.155 2.134 2.267 2.671 3.093 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.28 0.28 0.23* 0.16 1.19 1.18 1.48 0.55 0.63 0.75 1.48 1.64 1.96 2.12 (1.65) (1.65) (3.40) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE h = 1 0.03 1.82 1.82 0.04 0.05 2.38 2.24 1.95 1.76 1.68 1.62 1.97 2.83 5.80 6.08 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.88 1.88 0.05 0.01 1.77 1.79 1.92 1.69 1.72 1.88 2.10 2.51 4.65 4.41 (2.77) (4.10) (5.97) (7.99) h = 22 0.06 1.92 1.92 0.22* 0.16 1.53 1.39 4.32 0.66 0.75 0.86 26.03 3.68 5.36 10.29 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Notes: The forecast performance of the HAR-RV-TCJ model is tested against the HAR-RV for different forecast horizons. Reported are the standardized mean (z¯/σ^z) and estimated memory parameter (d^) of the forecast error loss differential. Furthermore, the respective out-of-sample loss of the models (g1 and g2) and the results of various DM test statistics are given. Bold-faced values indicate significance at the 5% level and an additional star indicates significance at the 1% level. Critical values of the tests are given in parentheses. View Large Table A.2. Separation of continuous and jump components for RV calculated excluding overnight returns (evaluated under (i) MSE loss in upper Panel A, or (ii) QLIKE loss in lower Panel B) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.07 0.31 0.30 0.08 0.11 4.16 4.59 2.79 2.27 2.30 2.35 2.79 2.54 2.66 2.96 (2.61) (3.15) (3.69) (4.23) h = 5 0.069 0.229 0.222 0.057 0.000 2.368 2.398 2.134 2.474 2.704 3.155 2.134 2.267 2.671 3.093 (2.05) (2.52) (2.98) (3.39) h = 22 0.05 0.28 0.28 0.23* 0.16 1.19 1.18 1.48 0.55 0.63 0.75 1.48 1.64 1.96 2.12 (1.65) (1.65) (3.40) (1.65) (3.40) (4.06) (4.75) (5.39) Panel B: QLIKE h = 1 0.03 1.82 1.82 0.04 0.05 2.38 2.24 1.95 1.76 1.68 1.62 1.97 2.83 5.80 6.08 (2.77) (4.10) (5.97) (7.99) h = 5 0.04 1.88 1.88 0.05 0.01 1.77 1.79 1.92 1.69 1.72 1.88 2.10 2.51 4.65 4.41 (2.77) (4.10) (5.97) (7.99) h = 22 0.06 1.92 1.92 0.22* 0.16 1.53 1.39 4.32 0.66 0.75 0.86 26.03 3.68 5.36 10.29 (1.65) (1.65) (2.37) (1.65) (4.23) (5.96) (8.50) (11.34) Summary statistics Short memory inference Long memory inference tMAC tEFB Horizon z¯/σ^z g1 g2 d^LW d^LPWN tDM tHAC tFB 0.7 0.75 0.8 0.2 0.4 0.6 0.8 Panel A: MSE h = 1 0.07 0.31 0.30 0.08 0.11 4.16 4.59 2.79 2.27 2.30 2.35 2.79 2.54 2.66 2.96 (2.61) (3.15) (3.69) (4.23) h = 5 0.069 0.229 0.222 0.057 0.000 2.368 2.398 2.134 2.474 2.704 3.155 2.134 2.267 2.671 3.093 (2.05) (2.52) (2.98) (3.39) h = 22 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Financial Econometrics Oxford University Press

Comparing Predictive Accuracy under Long Memory, With an Application to Volatility Forecasting

Loading next page...
 
/lp/ou_press/comparing-predictive-accuracy-under-long-memory-with-an-application-to-XU6aQpDDn8
Publisher
Oxford University Press
ISSN
1479-8409
eISSN
1479-8417
D.O.I.
10.1093/jjfinec/nby011
Publisher site
See Article on Publisher Site

Abstract

Abstract This article extends the popular Diebold–Mariano test for equal predictive accuracy to situations when the forecast error loss differential exhibits long memory. This situation can arise frequently since long memory can be transmitted from forecasts and the forecast objective to forecast error loss differentials. The nature of this transmission depends on the (un)biasedness of the forecasts and whether the involved series share common long memory. Further theoretical results show that the conventional Diebold–Mariano test is invalidated under these circumstances. Robust statistics based on a memory and autocorrelation consistent estimator and an extended fixed-bandwidth approach are considered. The subsequent extensive Monte Carlo study provides numerical results on various issues. As empirical applications, we consider recent extensions of the HAR model for the S&P500 realized volatility. While we find that forecasts improve significantly if jumps are considered, improvements achieved by the inclusion of an implied volatility index turn out to be insignificant. 1 Introduction If the accuracy of competing forecasts is to be evaluated in a (pseudo-)out-of-sample setup, it has become standard practice to employ the test of Diebold and Mariano (1995) (hereafter DM test). Let ŷ1t and ŷ2t denote two competing forecasts for the forecast objective series yt and let the loss function be given by g(yt,ŷit)≥0 for i = 1, 2. The forecast error loss differential is then denoted by zt=g(yt,ŷ1t)−g(yt,ŷ2t). (1) By only imposing restrictions on the loss differential zt, instead of the forecast objective and the forecasts, Diebold and Mariano (1995) test the null hypothesis of equal predictive accuracy, that is H0:E(zt)=0, by means of a simple t-statistic for the mean of the loss differentials. To account for serial correlation, a long-run variance estimator such as the heteroscedasticity and autocorrelation consistent (HAC) estimator is applied (see Newey and West (1987), Andrews (1991) and Andrews and Monahan (1992)). For weakly dependent and second-order stationary processes, this leads to an asymptotic standard normal distribution of the t-statistic. Apart from the development of other forecast comparison tests such as those of West (1996) or Giacomini and White (2006), several direct extensions and improvements of the DM test have been proposed. Harvey, Leybourne, and Newbold (1997) suggest a version that corrects for the bias of the long-run variance estimation in finite samples. A multivariate DM test is derived by Mariano and Preve (2012). To mitigate the well-known size issues of HAC-based tests in finite samples of persistent short memory processes, Choi and Kiefer (2010) construct a DM test using the so-called fixed-bandwidth (or in short, fixed-b) asymptotics, originally introduced in Kiefer and Vogelsang (2005) (see also Li and Patton (2018)). The issue of near unit root asymptotics is tackled by Rossi (2005). These studies belong to the classical I(0)/I(1) framework. Contrary to the aforementioned studies, we consider the situation in which the loss differentials follow long memory processes. Our first contribution is to show that long memory can be transmitted from the forecasts and the forecast objective to the forecast errors and subsequently to the forecast error loss differentials. We provide theoretical results for the mean squared error (MSE) loss function and Gaussian processes. We give conditions under which the transmission occurs and characterize the memory properties of the forecast error loss differential. The memory transmission for non-Gaussian processes and other loss functions is demonstrated by means of Monte Carlo simulations resembling typical forecast scenarios. As a second contribution, we show (both theoretically and via simulations) that the original DM test is invalidated under long memory and suffers from severe upward size distortions. Third, we study two simple extensions of the DM statistic that permit valid inference under long and short memory. These extensions are based on the memory and autocorrelation consistent (MAC) estimator of Robinson (2005) (see also Abadir, Distaso, and Giraitis (2009)) and the extended fixed-b asymptotics (EFB) of McElroy and Politis (2012). The performance of these modified statistics is analyzed in a Monte Carlo study that is specifically tailored to reflect the properties that are likely to occur in the loss differentials. We compare several bandwidth and kernel choices that allow recommendations for practical applications. Our fourth contribution is an empirical application in which we reconsider two recent extensions of the heterogeneous autoregressive model for realized volatility (HAR-RV) by Corsi (2009). First, we test whether forecasts obtained from HAR-RV type models can be improved by including information on model-free risk-neutral implied volatility, which is measured by the CBOE volatility index (VIX). We find that short memory approaches (classic DM test and fixed-b versions) reject the null hypothesis of equal predictive accuracy in favor of models including implied volatility. On the contrary, our long memory robust statistics do not indicate a significant improvement in forecast performance which implies that previous rejections might be spurious due to neglected long memory. The second issue we tackle in our empirical applications relates to earlier work by inter alia Andersen, Bollerslev, and Diebold (2007) and Corsi, Pirino, and Renò (2010), who consider the decomposition of the quadratic variation of the log-price process into a continuous integrated volatility component and a discrete jump component. Here, we find that the separate treatment of continuous components and jump components significantly improves forecasts of realized variance for short forecast horizons even if the memory in the loss differentials is accounted for. The rest of this article is organized as follows. Section 2 reviews the classic DM test and presents the fixed-b approach for the short memory case. Section 3 covers the case of long-range dependence and contains our theoretical results on the transmission of long memory to the loss differential series. Two distinct approaches to design a robust t-statistic are discussed in Section 4. Section 5 contains our Monte Carlo study and in Section 6 we present our empirical results. Conclusions are drawn in Section 7. All proofs are contained in the Appendix. 2 DM Test Diebold and Mariano (1995) construct a test for H0:E[g(yt,ŷ1t)−g(yt,ŷ2t)]=E(zt)=0, solely based on assumptions on the loss differential series zt. Suppose that zt follows the weakly stationary linear process zt=μz+∑j=0∞θjvt−j , (2) where it is required that |μz|<∞ and ∑j=0∞θj2<∞ hold. For simplicity of the exposition we additionally assume that vt∼iid(0,σv2). If ŷ1t and ŷ2t perform equally well according to the loss function g(·), then μz=0 holds, otherwise μz≠0. The corresponding t-statistic is based on the sample mean z¯=T−1∑t=1Tzt and an estimate (V̂) of the long-run variance V=lim⁡T→∞Var(Tτ(z¯−μz)). The DM statistic is given by tDM=Tτz¯V̂ . (3) Under stationary short memory, we have τ=1/2, while the rate changes to τ=1/2−d under stationary long memory, with 0<d<1/2 being the long memory parameter. The (asymptotic) distribution of this t-statistic hinges on the autocorrelation properties of the loss differential series zt. In the following, we shall distinguish two cases: (i) zt is a stationary short memory process with d = 0 and (ii) strong dependence in form of a long memory process (with 0<d<1/2) is present in zt as presented in Section 3. 2.1 Conventional Approach: HAC For the estimation of the long-run variance V, Diebold and Mariano (1995) suggest using the truncated long-run variance of an MA(h – 1) process for an h-step-ahead forecast. This is motivated by the fact that optimal h-step-ahead forecast errors of a linear time series process follow an MA(h – 1) process. Nevertheless, as pointed out by Diebold (2015), among others, the test is readily extendable to more general situations if, for example, HAC estimators are used (see also Clark (1999) for some early simulation evidence). The latter have become the standard class of estimators for the long-run variance. In particular, V̂HAC=∑j=−T+1T−1k(jB)γ̂z(j) , (4) where k(·) is a user-chosen kernel function, B denotes the bandwidth and γ̂z(j) =1T∑t=|j|+1T(zt−z¯)(zt−|j|−z¯) is the usual estimator for the autocovariance of process zt at lag j. The corresponding DM statistic is given by tHAC=T1/2z¯V̂HAC. (5) If zt is weakly stationary with absolutely summable autocovariances γz(j), it holds that V=∑j=−∞∞γz(j). Regularity conditions assumed1 a central limit theorem applies for z¯. Consistency of the long-run variance estimator V̂HAC requires some additional regularity conditions (see, for instance, Andrews (1991) for additional technical details), and in particular the assumption that the ratio b=B/T converges to zero as T→∞. It follows that the tHAC-statistic is asymptotically standard normal under the null hypothesis, that is tHAC⇒N(0,1). For the sake of a comparable notation to the long memory case, note that V=2πfz(0), where fz(0) is the spectral density function of zt at frequency zero. 2.2 Fixed-bandwidth Approach Even though nowadays the application of HAC estimators is standard practice, related tests are often found to be seriously size-distorted in finite samples, especially under strong persistence. Kiefer and Vogelsang (2005) develop an alternative asymptotic framework in which the ratio B / T approaches a fixed constant b∈(0,1] as T→∞. Therefore, it is called fixed-b inference as opposed to the classical small-b HAC approach where b→0. In the case of fixed-b (FB), the estimator V̂(k,b) does not converge to V any longer. Instead, V̂(k,b) converges to V multiplied by a functional of a Brownian bridge process. Hence, V̂(k,b)⇒VQ(k,b). The corresponding t-statistic tFB=T1/2z¯V̂(k,b) (6) has a nonnormal and nonstandard limiting distribution, that is tFB⇒W(1)Q(k,b) . Here, W(r) is a standard Brownian motion on r∈[0,1]. Both, the choice of the bandwidth parameter b and the (twice continuously differentiable) kernel k appear in the limit distribution. For example, for the Bartlett kernel we have Q(k,b)=2b(∫01W˜(r)2dr−∫01−bW˜(r+b)W˜(r)dr), with W˜(r)=W(r)−rW(1) denoting a standard Brownian bridge. Thus, critical values reflect the user choices on the kernel and the bandwidth even in the limit. In many settings, fixed-b inference is more accurate than the conventional HAC estimation approach. An example of its application to forecast comparisons are the aforementioned articles of Choi and Kiefer (2010) and Li and Patton (2018), who apply both techniques (HAC and fixed-b) to compare exchange rate forecasts. Our Monte Carlo simulation study sheds additional light on their relative empirical performance. 3 Long Memory in Forecast Error Loss Differentials 3.1 Preliminaries Under long-range dependence in zt, one has to expect that neither conventional HAC estimators nor the fixed-b approach can be applied without any further modification since strong dependence such as fractional integration is ruled out by assumption of a weakly stationary linear process. Given that zt has long memory, we show that HAC-based tests reject with probability one in the limit (as T→∞) even under the null. This result is stated in our Proposition 6 (at the end of this section). As our finite-sample simulations clearly demonstrate, this implies strong upward size distortions and invalidates the use of the classic DM test statistic. Before we actually state these results formally, we first show that the loss differential zt may exhibit long memory in various situations. We start with a basic definition of stationary long memory time series, c.f. Definition 1.2 of Beran et al. (2013). Definition 1 A time series at with spectral density fa(λ), for λ∈[−π,π], has long memory with memory parameter da∈(0,1/2), if fa(λ)∼Lf|λ|−2da, for da∈(0,1/2), as λ→0. The symmetric function Lf(·)is slowly varying at the origin. We then write at∼LM(da). This is the usual definition of a stationary long memory process and Theorem 1.3 of Beran et al. (2013) states that under this restriction and mild regularity conditions, Definition 1 is equivalent to γa(j)∼Lγ|j|2da−1 as j→∞, where γa(j) is the autocovariance function of at at lag j and Lγ(·) is slowly varying at infinity. If da = 0 holds, the process has short memory. Our results build on the asymptotic behavior of the autocovariances that have the long memory property from Definition 1. Whether this memory is generated by fractional integration can not be inferred. However, this does not affect the validity of the test statistics introduced in Section 4. We therefore adopt Definition 1 that covers fractional integration as a special case. A similar approach is taken by Dittmann and Granger (2002).2 Given Definition 1, we now state some assumptions regarding the long memory structure of the forecast objective and the forecasts. Assumption 1 (Long Memory). The time series yt, ŷ1t, ŷ2twith expectations E(yt)=μy, E(ŷ1t)=μ1and E(ŷ2t)=μ2are causal Gaussian long memory processes (according to Definition 1) of orders dy, d1,and d2, respectively. Similar to Dittmann and Granger (2002), we rely on the assumption of Gaussianity since no results for the memory structure of squares and cross-products of non-Gaussian long memory processes are available in the existing literature. It shall be noted that Gaussianity is only assumed for the derivation of the memory transmission from the forecasts and the forecast objective to the loss differential, but not for the subsequent results. In the following, we make use of the concept of common long memory in which a linear combination of long memory series has reduced memory. The amount of reduction is labeled as δ. Definition 2 (Common Long Memory). The time series at and bt have common long memory (CLM) if both at and bt are LM(d) and there exists a linear combination ct=at−ψ0−ψ1btwith ψ0∈Rand ψ1∈R\0such that ct∼LM(d−δ), for some d≥δ>0. We write at,bt∼CLM(d,d−δ). For simplicity and ease of exposition, we first exclude the possibility of common long memory among the series. This assumption is relaxed later on. Assumption 2 (Absence of Common Long Memory). If at,bt∼LM(d), then at−ψ0−ψ1bt∼LM(d)for all ψ0∈R,ψ1∈Rand at,bt∈{yt,ŷ1t,ŷ2t}. To derive the long memory properties of the forecast error loss differential, we make use of a result in Leschinski (2017) that characterizes the memory structure of the product series atbt for two long memory time series at and bt. Such products play an important role in the following analysis. The result is therefore shown as Proposition 1 below, for convenience. Proposition 1 (Leschinski (2017), Memory of Products). Let at and bt be long memory series according to Definition 1 with memory parameters da and db, and means μa and μb, respectively. Then atbt∼ {LM(max⁡{da,db}),for μa,μb≠0LM(da),for μa=0,μb≠0LM(db),for μb=0,μa≠0LM(max⁡{da+db−1/2,0}),for μa=μb=0 and Sa,b≠0LM(da+db−1/2),for μa=μb=0 and Sa,b=0, where Sa,b=∑j=−∞∞γa(j)γb(j) with γa(·) and γb(·) denoting the autocovariance functions of at and bt, respectively. Proposition 1 shows that the memory of products of long memory time series critically depends on the means μa and μb of the series at and bt. If both series are mean zero, the memory of the product is either the maximum of the sum of the memory parameters of both factor series minus one half—or it is zero—depending on the sum of autocovariances. Since da,db<1/2, this is always smaller than any of the original memory parameters. If only one of the series is mean zero, the memory of the product atbt is determined by the memory of this particular series. Finally, if both series have nonzero means, the memory of the product is equal to the maximum of the memory orders of the two series. Furthermore, Proposition 1 makes a distinction between antipersistent series and short memory series if the processes have zero means and da+db−1/2<0. Our results below, however, do not require this distinction. The reason being that a linear combination involving the square of at least one of the series appears in each case, and these cannot be antipersistent long memory processes (see the proofs of Propositions 2 and 5 for details). As discussed in Leschinski (2017), Proposition 1 is related to the results in Dittmann and Granger (2002), who consider the memory of nonlinear transformations of zero mean long memory time series that can be represented through a finite sum of Hermite polynomials. Their results include the square at2 of a time series which is also covered by Proposition 1 if at = bt. If the mean is zero ( μa=0), we have at2∼LM(max⁡{2da−1/2,0}). Therefore, the memory is reduced to zero if d≤1/4. However, as can be seen from Proposition 1, this behavior depends critically on the expectation of the series. Since it is the most widely used loss function in practice, we focus on the MSE loss function g(yt,ŷit)=(yt−ŷit)2 for i = 1, 2. The quadratic forecast error loss differential is then given by zt=(yt−ŷ1t)2−(yt−ŷ2t)2=ŷ1t2−ŷ2t2−2yt(ŷ1t−ŷ2t). (7) As usual in DM tests, we do not need to know, or assume the form for, the forecasting models or methods used to generate the forecasts. The forecasts are taken as “primitives” in this analysis. 3.2 Transmission of Long Memory to the Loss Differential Following the introduction of the necessary definitions and a preliminary result, we now present the result for the memory order of zt defined via (7) in Proposition 2. It is based on the memory of yt, ŷ1t and ŷ2t and assumes the absence of common long memory for simplicity. Proposition 2 (Memory Transmission in the Absence of Common Long Memory). Under Assumptions 1 and 2, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy, d1, d2}, if μ1≠μ2≠μymax⁡{d1, d2}, if μ1=μ2≠μymax⁡{2d1−1/2, d2, dy}, if μ1=μy≠μ2max⁡{2d2−1/2, d1, dy}, if μ1≠μy=μ2max⁡{2max⁡{d1, d2}−1/2, dy+max⁡{d1, d2}−1/2, 0}, if μ1=μ2=μy. Proof See the Appendix. The basic idea of the proof relates to Proposition 3 of Chambers (1998). It shows that the behavior of a linear combination of long memory series is dominated by the series with the strongest memory. Since we know from Proposition 1 that μ1,μ2, and μy play an important role for the memory of a squared long memory series, we set yt=yt*+μy and ŷit=ŷit*+μi, so that the starred series denote the demeaned series and μi denotes the expected value of the respective series. Straightforward algebra yields zt=ŷ1t*2−ŷ2t*2−2[yt*(μ1−μ2)+ŷ1t*(μy−μ1)+ŷ2t*(μy−μ2)]−2[yt*(ŷ1t*−ŷ2t*)]+const. (8) From (8), it is apparent that zt is a linear combination of (i) the squared forecasts ŷ1t*2 and ŷ2t*2, (ii) the forecast objective yt, (iii) the forecast series ŷ1t*, and ŷ2t* and (iv) products of the forecast objective with the forecasts, that is yt*ŷ1t* and yt*ŷ2t*. The memory of the squared series and the product series is determined in Proposition 1, from which the zero mean product series yt*ŷit* is LM(max⁡{dy+di−1/2, 0}) or LM(dy+di−1/2). Moreover, the memory of the squared zero mean series ŷit*2 is max⁡{2di−1/2, 0}. By combining these results with that of Chambers (1998), the memory of the loss differential zt is the maximum of all memory parameters of the components in (8). Proposition 2 then follows from a case-by-case analysis. It demonstrates the transmission of long memory from the forecasts ŷ1t, ŷ2t and the forecast objective yt to the loss differential zt. The nature of this transmission, however, critically hinges on the (un)biasedness of the forecasts. If both forecasts are unbiased (i.e., if μ1=μ2=μy), the memory from all three input series is reduced and the memory of the loss differential zt is equal to the maximum of (i) these reduced orders and (ii) zero. Therefore, only if memory parameters are small enough such that dy+max⁡{d1+d2}<1/2, the memory of the loss differential zt is reduced to zero. In all other cases, there is a transmission of dependence from the forecast and/or the forecast objective to the loss differential. The reason for this can immediately be seen from (8). Terms in the first bracket have larger memory than the remaining ones, because di>2di−1/2 and max⁡{dy,di}>dy+di−1/2. Therefore, these terms dominate the memory of the products and squares whenever biasedness is present, that is μi−μy≠0 holds. Interestingly, the transmission of memory from the forecast objective yt is prevented if both forecasts have equal bias, that is μ1=μ2. On the contrary, if μ1≠μ2, dz is at least as high as dy. 3.3 Memory Transmission under Common Long Memory The results in Proposition 2 are based on Assumption 2 that precludes common long memory among the series. Of course, in practice it is likely that such an assumption is violated. In fact, it can be argued that reasonable forecasts of long memory time series should have common long memory with the forecast objective. Therefore, we relax this assumption and replace it with Assumption 3, below. Assumption 3 (Common Long Memory). The causal Gaussian process xt has long memory according to Definition 1 of order dx with expectation E(xt)=μx. If at,bt∼CLM(dx,dx−δ), then they can be represented as yt=βy+ξyxt+ηtfor at,bt=ytand ŷit=βi+ξixt+ɛit, for at,bt=ŷit, with ξy,ξi≠0. Both, ηt and ɛitare mean zero causal Gaussian long memory processes with parameters dηand dɛifulfilling 1/2>dx>dη,dɛi≥0, for i = 1, 2. Assumption 3 restricts the common long memory to be of a form so that both series at and bt can be represented as linear functions of their joint factor xt. This excludes more complicated forms of dependence that are sometimes considered in the cointegration literature such as nonlinear or time-varying cointegration. We know from Proposition 2 that the transmission of memory critically depends on the biasedness of the forecasts which leads to a complicated case-by-case analysis. If common long memory according to Assumption 3 is allowed for, we have an even more complex situation since there are several possible relationships: CLM of yt with one of the ŷit; CLM of both ŷit with each other, but not with yt; and CLM of each ŷit with yt. Each of these situations has to be considered with all possible combinations of the ξj and the μj for all j∈{y,1,2}. To deal with this complexity, we focus on three important special cases: (i) the forecasts are biased and the ξj differ from each other, (ii) the forecasts are biased, but the ξj are equal, and (iii) the forecasts are unbiased and ξa=ξb if at and bt are in a common long memory relationship. To understand the role of the coefficients ξa and ξb in the series that are subject to CLM, note that the forecast errors yt−ŷit impose a cointegrating vector of (1,−1). A different scaling of the forecast objective and the forecasts is not possible. In the case of CLM between yt and ŷit, for example, we have from Assumption 3 that yt−ŷit=βy−βi+xt(ξy−ξi)+ηt−ɛit, (9) so that xt(ξy−ξi) does not disappear from the linear combination if the scaling parameters ξy and ξi are different from each other. We refer to a situation where ξa=ξb as “balanced CLM,” whereas CLM with ξa≠ξb is referred to as “unbalanced CLM.” In the special case (i) both forecasts are biased and the presence of CLM does not lead to a cancellation of the memory of xt in the loss differential. Of course this can be seen as an extreme case, but it serves to illuminate the mechanisms at work—especially in contrast to the results in Propositions 4 and 5, below. By substituting the linear relations from Assumption 3 for those series involved in the CLM relationship in the loss differential zt=ŷ1t2−ŷ2t2−2yt(ŷ1t−ŷ2t) and again setting at=at*+μa for those series that are not involved in the CLM relationship, it is possible to find expressions that are analogous to (8). Since analogous terms to those in the first bracket of (8) appear in each case, it is possible to focus on the transmission of memory from the forecasts and the objective function to the loss differential. We obtain the following result. Proposition 3 (Memory Transmission with Biased Forecasts and Unbalanced CLM). Let ξi≠ξy, ξ1≠ξ2, μi≠μy, and μ1≠μ2, for i = 1, 2. Then under Assumptions 1 and 3, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy,dx}, if ŷ1t,ŷ2t∼CLM(dx,dx−δ), except if ξ1/ξ2=(μy−μ2)/(μy−μ1)max⁡{d2,dx}, if ŷ1t,yt∼CLM(dx,dx−δ), except if ξ1/ξy=−(μ1−μ2)/(μy−μ1)max⁡{d1,dx}, if ŷ2t,yt∼CLM(dx,dx−δ), except if ξ2/ξy=−(μ1−μ2)/(μy−μ2)dx, if ŷ1t,ŷ2t,yt∼CLM(dx,dx−δ), except if ξ1(μy−μ1)+ξy(μ1−μ2)=ξ2(μy−μ2). Proof See the Appendix. In absence of common long memory we observe in Proposition 1 that the memory is given as max⁡{d1,d2,dy} if the means differ from each other. Now, if two of the series share common long memory, they both have memory dx. Hence, Proposition 3 shows that the transmission mechanism is essentially unchanged and the memory of the loss differential is still dominated by the largest memory parameter. The only exception to this rule is the knife-edge case where the differences in the means and the memory parameters offset each other. Similar to (i), case (ii) refers to a situation of biasedness, but now with balanced CLM, so that the underlying long memory factor xt cancels out in the forecast error loss differentials. The memory transmission can thus be characterized by the following proposition. Proposition 4 (Memory Transmission with Biased Forecasts and Balanced CLM). Let ξ1=ξ2=ξy. Then under Assumptions 1 and 3, the forecast error loss differential in (7) is zt∼LM(dz), where dz={max⁡{dy,dx}, if ŷ1t,ŷ2t∼CLM(dx,dx−δ), and μ1≠μ2max⁡{d2,dx}, if ŷ1t,yt∼CLM(dx,dx−δ), and μy≠μ2max⁡{d1,dx}, if ŷ2t,yt∼CLM(dx,dx−δ), and μy≠μ1d˜, if ŷ1t,ŷ2t,yt∼CLM(dx,dx−δ),for some 0≤d˜<dx. Proof See the Appendix. We refer to the first three cases in Propositions 3 and 4 as “partial CLM” as there is always one of the ŷit or yt that is not part of the CLM relationship and the fourth case as “full CLM.” We can observe that the dominance of the memory of the most persistent series under partial CLM is preserved for both balanced and unbalanced CLM. We therefore conclude that this effect is generated by the interaction with the series that is not involved in the CLM relationship. This can also be seen from Equations (22) to (24) in the proof. Only in the fourth case with full CLM, the memory transmission changes between Propositions 3 and 4. In this case, the memory in the loss differential is reduced to dz < dx. The third special case (iii) refers to a situation of unbiasedness similar to the last case in Proposition 2. In addition to that, it is assumed that there is balanced CLM as in Proposition 4, where ξa=ξb if at and bt are in a common long memory relationship. Compared to the setting of the previous propositions this is the most ideal situation in terms of forecast accuracy. Here, we have the following result. Proposition 5 (Memory Transmission with Unbiased Forecasts and Balanced CLM). Under Assumptions 1 and 3, and if μy=μ1=μ2and ξy=ξa=ξb, then zt∼LM(dz), with dz={max⁡{d2+max⁡{dx,dη}−1/2, 2max⁡{dx,d2}−1/2, dɛ1},if yt,ŷ1t∼CLM(dx,dx−δ˜)max⁡{d1+max⁡{dx,dη}−1/2, 2max⁡{dx,d1}−1/2, dɛ2},if yt,ŷ2t∼CLM(dx,dx−δ˜)max⁡{max⁡{dx, dy}+max⁡{dɛ1, dɛ2}−1/2, 0},if ŷ1t,ŷ2t∼CLM(dx,dx−δ˜)max⁡{dη+max⁡{dɛ1, dɛ2}−1/2, 2max⁡{dɛ1, dɛ2}−1/2, 0},if yt,ŷ1t∼CLM(dx,dx−δ˜)and yt,ŷ2t∼CLM(dx,dx−δ˜).Here, 0<δ˜≤1/2denotes a generic constant for the reduction in memory. Proof See the Appendix. Proposition 5 shows that the memory of the forecasts and the objective variable can indeed cancel out if the forecasts are unbiased and if they have the same factor loading on xt (i.e., if ξ1=ξ2=ξy). However, in the first two cases, the memory of the error series ɛ1t and ɛ2t imposes a lower bound on the memory of the loss differential. Furthermore, even though the memory can be reduced to zero in the third and fourth case, this situation only occurs if the memory orders of xt, yt and the error series are sufficiently small. Otherwise, the memory is reduced, but does not vanish. Overall, the results in Propositions 2, 3, 4, and 5 show that long memory can be transmitted from forecasts or the forecast objective to the forecast error loss differentials. Our results also show that the biasedness of the forecasts plays an important role for the transmission of dependence to the loss differentials. To get further insights into the mechanisms found in Propositions 2, 3, 4, and 5, let us consider a situation in which two forecasts with different nonzero biases are compared. In the absence of CLM, it is obvious from Proposition 2 that the memory of the loss differential is determined by the maximum of the memory orders of the forecasts and the forecast objective. If one of the forecasts has common long memory with the objective, the same holds true—irrespective of the loadings ξa on the common factor. As can be seen from Proposition 3, even if both forecasts have CLM with the objective, the maximal memory order is transmitted to zt if the factor loadings ξa differ. Only if the factor loadings are equal, the memory is reduced as stated in Proposition 4. If we consider two forecasts that are unbiased in the absence of CLM, it can be seen from Proposition 2 that the memory of the loss differential is lower than that of the original series. The same holds true in the presence of CLM, as covered by Proposition 5. In practical situations, it might be overly restrictive to impose exact unbiasedness (under which memory would be reduced according to Proposition 5). Our empirical application regarding the predictive ability of the VIX serves as an example since it is a biased forecast of future quadratic variation due to the existence of a variance risk premium (see Section 6). Biases can also be caused by estimation errors. This issue might be of less importance in a setup where the estimation period grows at a faster rate than the (pseudo-) out-of-sample period that is used for forecast evaluation. For the DM test, however, it is usually assumed that this is not the case. Otherwise, it could not be used for the comparison of forecasts from nested models due to a degenerated limiting distribution (cf. Giacomini and White (2006) for a discussion). Instead, the sample of size T* is split into an estimation period TE and a forecasting period T such that T*=TE+T and it is assumed that T grows at a faster rate than TE so that TE/T→0 as T*→∞. Therefore, the estimation error shrinks at a lower rate than the growth rate of the evaluation period and it remains relevant, asymptotically. 3.4 Asymptotic and Finite-Sample Behaviour under Long Memory After establishing that forecast error loss differentials may exhibit long memory in various situations, we now consider the effect of long memory on the HAC-based DM test. The following Proposition establishes that the size of the test approaches unity, as T→∞. Thus, the test indicates with probability one that one of the forecasts is superior to the other one, even if both tests perform equally well according to g(·). Note that the test also has an asymptotic rejection probability of one under the alternative. Proposition 6 (DM under Long Memory). For zt∼LM(d)with d∈(0,1/2), the asymptotic size (under H0) of the tHAC-statistic equals unity as T→∞. Proof See the Appendix. This result shows that inference based on HAC estimators is asymptotically invalid under long memory. To explore to what extent this finding also affects the finite-sample performance of the tHAC- and tFB-statistics, we conduct a small-scale Monte Carlo experiment as an illustration. The results shown in Figure 1 are obtained with M = 5000 Monte Carlo repetitions. We simulate samples of T = 50 and T = 2000 observations from a fractionally integrated process using different values of the memory parameter d in the range from 0 to 0.4. The HAC estimator and the fixed-b approach are implemented with the commonly used Bartlett- and Quadratic Spectral (QS) kernels.3 Figure 1. View largeDownload slide Size of the tHAC- and tFB-tests with T∈{50,2000} for different values of the memory parameter d. Figure 1. View largeDownload slide Size of the tHAC- and tFB-tests with T∈{50,2000} for different values of the memory parameter d. We start by commenting on the results for the small sample size of T = 50 in the left panel of Figure 1. As demonstrated by Kiefer and Vogelsang (2005), the fixed-b approach works exceptionally well for the short memory case of d = 0, with the Bartlett and QS kernel achieving approximately equal size control. The tHAC-statistic over-rejects more than the fixed-b approach and, as stated in Andrews (1991), better size control is provided if the Quadratic Spectral kernel is used. If the memory parameter d is positive, we observe that all tests severely over-reject the null hypothesis. For d = 0.4, the size of the HAC-based test is approximately 65% and that of the fixed-b version using the Bartlett kernel is around 40%. We therefore find that the size distortions are not only an asymptotic phenomenon, but they are already severe in samples of just T = 50 observations. Moreover, even for small deviations of d from zero, all tests are over-sized. These findings motivate the use of long memory robust procedures. Continuing with the results for T = 2000 in the right panel of Figure 1, we observe similar findings in general. For the short memory case, size distortions observed in small samples vanish. All tests statistics are well behaved for d = 0. On the contrary, for d > 0 size distortions are stronger compared to T = 50, although the magnitude of the additional distortion is moderate. This feature can be attributed to the slow divergence rate (as given in the proof of Proposition 6) of the test statistic under long memory. 4 Long-Run Variance Estimation under Long Memory Since conventional HAC estimators lead to spurious rejections under long memory, we consider memory robust long-run variance estimators. To the best of our knowledge only two extensions of this kind are available in the literature: the MAC estimator of Robinson (2005) and an extension of the fixed-b estimator from McElroy and Politis (2012). We do not assume that forecasts are obtained from some specific class of model. We merely extend the typical assumptions of Diebold and Mariano (1995) on the loss differentials so that long memory is allowed. 4.1 MAC Estimator The MAC estimator is developed by Robinson (2005) and further explored and extended by Abadir, Distaso, and Giraitis (2009). Albeit stated in a somewhat different form, the same result is derived independently by Phillips and Kim (2007), who consider the long-run variance of a multivariate fractionally integrated process. Robinson (2005) assumes that zt is linear (in the sense of our Equation (1), see also Assumption L in Abadir, Distaso, and Giraitis (2009)) and that for λ→0 its spectral density fulfills f(λ)=b0|λ|−2d+o(|λ|−2d), with b0>0, |λ|≤π, d∈(−1/2,1/2) and b0=lim⁡λ→0|λ|2df(λ). Among others, this assumption covers stationary and invertible ARFIMA processes. For notational convenience, here we drop the index z from the spectral density and the memory parameter. A key result for the MAC estimator is that as T→∞ Var(T1/2−dz¯)→b0p(d) with p(d)={2Γ(1−2d)sin⁡(πd)d(1+2d)if d=0,2πif d=0. The case of short memory (d = 0) yields the familiar result that the long-run variance of the sample mean equals 2πb0=2πf(0). Hence, estimation of the long-run variance requires estimation of f(0) in the case of short memory. If long memory is present in the data generating process (DGP), estimation of the long-run variance additionally hinges on the estimation of d. The MAC estimator is therefore given by V̂(d̂,md,m)=b̂m(d̂)p(d̂) . In more detail, the estimation of V works as follows: First, if the estimator for d fulfills the condition d̂−d=op(1/log⁡T), plug-in estimation is valid (cf. Abadir, Distaso, and Giraitis (2009)). Thus, p(d) can simply be estimated through p(d̂). A popular estimator that fulfills this rather weak requirement is the local Whittle estimator with bandwidth md=⌊Tqd⌋, where 0<qd<1 denotes a generic bandwidth parameter and ⌊·⌋ denotes the largest integer smaller than its argument. This estimator is given by d̂LW=arg⁡min⁡d∈(−1/2,1/2)RLW(d), where RLW(d)=log⁡(1md∑j=1mdj2dIT(λj))−2dmd∑j=1mdlog⁡j, IT(λj) is the periodogram (which is independent of d̂), IT(λj)=(2πT)−1|∑t=1Texp⁡(itλj)zt|2 and the λj=2πj/T are the Fourier frequencies for j=1,...,⌊T/2⌋. Many other estimation approaches (e.g., log-periodogram estimation, etc.) would be a possibility as well. Since the loss differential in (7) is a linear combination of processes with different memory orders, the local polynomial Whittle plus noise (LPWN) estimator of Frederiksen, Nielsen, and Nielsen (2012) is a particularly useful alternative. This estimator extends the local Whittle estimator by approximating the log-spectrum of possible short memory components and perturbation terms in the vicinity of the origin by polynomials. This leads to a reduction of finite-sample bias. The estimator is consistent for d∈(0,1) and asymptotically normal in the presence of perturbations for d∈(0,0.75), but with the variance inflated by a multiplicative constant compared with the local Whittle estimator. Based on a consistent estimator d̂, as those discussed above, b0 can be estimated consistently by b̂m(d̂)=m−1∑j=1mλj2d̂IT(λj). The bandwidth m is determined according to m=⌊Tq⌋ such that m→∞ and m=o(T/(log⁡T)2). The MAC estimator is consistent as long as d̂→pd and b̂m(d̂)→pb0. These results hold under very weak assumptions—neither linearity of zt nor Gaussianity are required. Under somewhat stronger assumptions the tMAC-statistic is also normal distributed (see Theorem 3.1. of Abadir, Distaso, and Giraitis (2009)): tMAC⇒N(0,1) . The t-statistic using the feasible MAC estimator can be written as tMAC=T1/2−d̂z¯V̂(d̂,md,m), with md and m being the bandwidths for estimation of d and b0, respectively.4 4.2 Extended Fixed-Bandwidth Approach Following up on the work by Kiefer and Vogelsang (2005), McElroy and Politis (2012) extend the fixed-bandwidth approach to long-range dependence. Their approach is similar to the one of Kiefer and Vogelsang (2005) in many respects, as can be seen below. The test statistic suggested by McElroy and Politis (2012) is given by tEFB=T1/2z¯V̂(k,b). In contrast to the tMAC-statistic, the tEFB-statistic involves a scaling of T1/2. This has an effect on the limit distribution, which depends on the memory parameter d. Analogously to the short memory case, the limiting distribution is derived by assuming that a functional central limit theorem for the partial sums of zt applies, so that tEFB⇒Wd(1)Q(k,b,d), where Wd(r) is a fractional Brownian motion and Q(k,b,d) depends on the fractional Brownian bridge W˜d(r)=Wd(r)−rWd(1). Furthermore, Q(k,b,d) depends on the first and second derivatives of the kernel k(·). In more detail, for the Bartlett kernel we have Q(k,b,d)=2b(∫01W˜d(r)2dr−∫01−bW˜d(r+b)W˜d(r)dr) and thus, a similar structure as for the short memory case. Further details and examples can be found in McElroy and Politis (2012). The joint distribution of Wd(1) and Q(k,b,d) is found through their joint Fourier-Laplace transformation, see Fitzsimmons and McElroy (2010). It is symmetric around zero and has a cumulative distribution function which is continuous in d. Besides the similarities to the short memory case, there are some important conceptual differences to the MAC estimator. First, the MAC estimator belongs to the class of “small-b” estimators in the sense that it estimates the long-run variance directly, whereas the fixed-b approach leads also in the long memory case to an estimate of the long-run variance multiplied by a functional of a fractional Brownian bridge. Second, the limiting distribution of the tEFB-statistic is not a standard normal, but rather depending on the chosen kernel k, the fixed-bandwidth parameter b, and the long memory parameter d. While the first two are user-specific, the latter one requires a plug-in estimator, as does the MAC estimator. As a consequence, the critical values are depending on d.McElroy and Politis (2012) offer response curves for various kernels.5 5 Monte Carlo Study This section presents further results on memory transmission to the forecast error loss differentials and the relative performance of the tMAC and tEFB-statistics by means of extensive Monte Carlo simulations. It is divided into three parts. First, we conduct Monte Carlo experiments to verify the results obtained in Propositions 2–5 and to explore whether similar results apply for non-Gaussian processes and under the QLIKE loss function. The second part studies the memory properties of the loss differential in a number of empirically motivated forecasting scenarios. Finally, in the third part we explore the finite-sample size and power properties of the robustified tests discussed above and make recommendations for their practical application. 5.1 Memory Transmission to the Forecast Error Loss Differentials: Beyond MSE and Gaussianity The results on the transmission of long memory from the forecasts or the forecast objective to the loss differentials in Propositions 2–5 are restricted to stationary Gaussian processes and forecasts evaluated using MSE as a loss function. In this section, we first verify the validity of the predictions from our propositions. Furthermore, we study how these results translate to non-Gaussian processes, nonstationary processes, and the QLIKE loss function which we use in our empirical application in Section 6 on volatility forecasting. It is given by QLIKE(ŷit,yt)=log⁡ŷit+ytŷit . (10) For a discussion of the role and importance of this loss function in the evaluation of volatility forecasts see Patton (2011). All DGPs are based on fractional integration. Due to the large number of cases in Propositions 2–5, we restrict ourselves to representative situations. The first two DGPs are based on cases (i) and (v) in Proposition 2 that covers situations when the forecasts and the forecast objective are generated from a system without common long memory. We simulate processes of the form at=μa+at*σ̂a*, (11) where at∈{yt,ŷ1t,ŷ2t}, and at*=(1−L)−daɛat. As in Section 3, the starred variable at* is a zero-mean process, whereas at has mean μa and the ɛat are iid. The innovation sequences are either standard normal or t(5)-distributed. The standardization of at* neutralizes the effect of increasing values of the memory parameter d on the process variance and controls the scaling of the mean relative to the variance. The loss differential series zt is then calculated as in (1). We use 5000 Monte Carlo replications and consider sample sizes of T={250,2000}. The first two DPGs for zt are obtained by setting the means μa in (11) as follows DGP1: (μ1,μ2,μy)=(1,−1,0)DGP2: (μ1,μ2,μy)=(0,0,0). The other DGPs represent the last cases of Propositions 3–5. These are based on the fractionally cointegrated system (yt*ŷ1t*ŷ2t*xt)=(100ξy010ξ1001ξ20001)(ηtɛ1tɛ2txt), where ηt, ɛ1t, ɛ2t, and xt are mutually independent and fractionally integrated with parameters dη, dɛ1, dɛ2, and dx. DGPs 3 to 5 are then obtained by selecting the following parameter constellations: DGP3: (μ1,μ2,μy,ξ1,ξ2,ξy)=(1,−1,0,1,2,1.5)DGP4: (μ1,μ2,μy,ξ1,ξ2,ξy)=(1,−1,0,1,1,1)DGP5: (μ1,μ2,μy,ξ1,ξ2,ξy)=(0,0,0,1,1,1). Each of our DGPs 2 to 5 is formulated such that the reduction in the memory parameter is the strongest among all cases covered in the respective proposition. Simulation results for other cases would therefore show an even stronger transmission of memory to the loss differentials. Since the QLIKE criterion is only defined for nonnegative forecasts, we consider a long memory stochastic volatility specification if QLIKE is used and simulate forecasts and forecast objective of the form exp⁡(at/2), whereas the MSE is calculated directly for the at. It should be noted that the loss differential zt is a linear combination of several persistent and antipersistent component series. This is a very challenging setup for the empirical estimation of the memory parameter. We therefore resort to the aforementioned LPWN estimator of Frederiksen, Nielsen, and Nielsen (2012) with a bandwidth of md=⌊T0.65⌋ and a polynomial of degree one for the noise term that can be expected to have the lowest bias in this setup among the available methods to estimate the memory parameters. However, the estimation remains difficult and any mismatch between the theoretical predictions from our propositions and the finite-sample results reported here is likely to be due to the finite-sample bias of the semiparametric estimators. The results for DGPs 1 and 2 are given in Table 1. We start with the discussion of simulation results for cases covered by our theoretical results. Table 1 shows the results for DGPs 1 and 2. Under MSE loss, and with Gaussian innovations, Proposition 2 states that for DGP1 we have dz=0.25 if d1,d2∈{0,0.2} and dz=0.4 if either d1 or d2 is equal to 0.4, in the top left panel. In the bottom left panel results for DGP2 are reported. Proposition 2 states that dz = 0 if d1,d2∈{0,0.2} and dz=0.3, for d1=0.4 or d2=0.4. We can observe that the memory tends to be slightly larger than predicted for small d1 and d2 and it tends to be slightly smaller for dz=0.4. However, the results closely mirror the theoretical results from Proposition 2 in general. Table 1. Monte Carlo averages of estimated memory in the loss differential zt for DGP1 and DGP2 with dy=0.25 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 Table 1. Monte Carlo averages of estimated memory in the loss differential zt for DGP1 and DGP2 with dy=0.25 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T d1/d2 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0 0.2 0.4 0.6 1 250 0 0.32 0.32 0.35 0.43 0.30 0.31 0.34 0.41 0.29 0.32 0.38 0.44 0.22 0.25 0.32 0.41 0.2 0.33 0.34 0.36 0.43 0.31 0.32 0.34 0.42 0.30 0.31 0.37 0.45 0.23 0.26 0.32 0.40 0.4 0.36 0.36 0.38 0.45 0.33 0.34 0.37 0.43 0.31 0.32 0.37 0.45 0.24 0.27 0.33 0.41 0.6 0.43 0.43 0.45 0.50 0.42 0.41 0.44 0.49 0.36 0.37 0.41 0.48 0.29 0.31 0.36 0.44 2000 0 0.30 0.29 0.32 0.42 0.29 0.28 0.31 0.42 0.29 0.28 0.35 0.46 0.18 0.20 0.28 0.40 0.2 0.29 0.29 0.32 0.41 0.28 0.28 0.31 0.41 0.29 0.28 0.35 0.46 0.18 0.20 0.27 0.40 0.4 0.32 0.32 0.35 0.42 0.32 0.31 0.34 0.41 0.29 0.28 0.35 0.46 0.20 0.21 0.28 0.41 0.6 0.42 0.42 0.43 0.48 0.42 0.41 0.42 0.47 0.35 0.34 0.38 0.48 0.26 0.25 0.31 0.44 2 250 0 0.13 0.15 0.26 0.43 0.10 0.14 0.23 0.41 0.11 0.14 0.26 0.41 0.10 0.13 0.20 0.35 0.2 0.15 0.17 0.26 0.43 0.14 0.17 0.24 0.41 0.15 0.18 0.27 0.41 0.12 0.15 0.21 0.35 0.4 0.26 0.27 0.31 0.43 0.23 0.24 0.29 0.42 0.25 0.27 0.31 0.41 0.20 0.21 0.25 0.36 0.6 0.42 0.42 0.43 0.48 0.41 0.40 0.42 0.48 0.41 0.40 0.41 0.47 0.35 0.35 0.36 0.43 2000 0 0.07 0.11 0.23 0.43 0.07 0.09 0.21 0.41 0.06 0.13 0.27 0.41 0.05 0.08 0.15 0.32 0.2 0.11 0.13 0.23 0.42 0.09 0.11 0.20 0.40 0.13 0.17 0.26 0.40 0.07 0.09 0.17 0.31 0.4 0.23 0.23 0.25 0.41 0.21 0.21 0.24 0.39 0.27 0.26 0.29 0.40 0.16 0.17 0.22 0.33 0.6 0.43 0.42 0.41 0.46 0.41 0.40 0.39 0.45 0.41 0.41 0.40 0.46 0.32 0.31 0.33 0.41 With regard to the cases not covered by the theoretical derivations, we can observe that the results for t-distributed innovations are nearly identical to those obtained for the Gaussian distribution. The same holds true for the Gaussian long memory stochastic volatility model and the QLIKE loss function. If the innovations of the LMSV model are t-distributed, the memory in the loss differential is slightly lower, but still substantial. Finally, in presence of nonstationary long memory with d1 or d2 equal to 0.6, we can observe that the loss differential exhibits long memory with an estimated degree between 0.4 and 0.5. The only exception is when the QLIKE loss function is used for DGP1. Here, we observe some asymmetry in the results, in the sense that the estimated memory parameter of the loss differential is slightly lower if d2 is low, relative to d1. However, the memory transmission is still substantial. The results for DGP3 to DGP5, where forecasts and the forecast objective have common long memory, are shown in Table 2. If we again consider the left column that displays the results for MSE loss and Gaussian innovations, Proposition 3 states for the case of DGP3 that the memory for all d in the stationary range should be dx=0.45. Proposition 4 does not give an exact prediction for DGP4, but states that the memory in the loss differential should be reduced compared with DGP3. Finally, for DGP5, Proposition 5 implies that dz = 0, for d1,d2∈{0,0.2} and dz=0.3 if d1 or d2 equal 0.4. Table 2. Monte Carlo averages of estimated memory in the loss differential zt for DGP3, DGP4, and DGP5 with dη=0.2 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 Table 2. Monte Carlo averages of estimated memory in the loss differential zt for DGP3, DGP4, and DGP5 with dη=0.2 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 MSE QLIKE Gaussian t(5) Gaussian t(5) DGP T dɛ1/dɛ2 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 3 250 0 0.31 0.32 0.38 0.29 0.31 0.37 0.29 0.33 0.40 0.27 0.31 0.39 0.2 0.34 0.35 0.39 0.33 0.34 0.38 0.32 0.33 0.41 0.29 0.32 0.39 0.4 0.41 0.42 0.44 0.40 0.40 0.43 0.37 0.38 0.43 0.36 0.37 0.42 2000 0 0.34 0.32 0.38 0.34 0.32 0.38 0.32 0.30 0.39 0.31 0.29 0.38 0.2 0.30 0.30 0.36 0.30 0.30 0.35 0.30 0.29 0.38 0.29 0.29 0.37 0.4 0.39 0.39 0.41 0.38 0.38 0.40 0.37 0.36 0.40 0.36 0.35 0.39 4 250 0 0.29 0.31 0.35 0.27 0.30 0.35 0.28 0.30 0.37 0.24 0.27 0.35 0.2 0.30 0.32 0.36 0.29 0.31 0.35 0.28 0.30 0.38 0.25 0.28 0.35 0.4 0.35 0.36 0.39 0.34 0.35 0.38 0.31 0.32 0.39 0.27 0.30 0.37 2000 0 0.26 0.26 0.33 0.25 0.26 0.33 0.25 0.25 0.35 0.23 0.24 0.33 0.2 0.26 0.26 0.33 0.26 0.25 0.32 0.26 0.26 0.35 0.24 0.24 0.33 0.4 0.33 0.33 0.36 0.33 0.32 0.36 0.29 0.29 0.36 0.27 0.27 0.34 5 250 0 0.12 0.14 0.25 0.11 0.13 0.23 0.10 0.13 0.25 0.10 0.12 0.21 0.2 0.14 0.16 0.26 0.13 0.15 0.23 0.14 0.16 0.25 0.12 0.13 0.22 0.4 0.26 0.25 0.30 0.23 0.24 0.28 0.25 0.25 0.30 0.22 0.22 0.26 2000 0 0.06 0.10 0.23 0.07 0.10 0.21 0.05 0.10 0.25 0.04 0.07 0.19 0.2 0.09 0.11 0.23 0.09 0.11 0.21 0.09 0.12 0.25 0.06 0.09 0.19 0.4 0.23 0.23 0.26 0.22 0.21 0.24 0.25 0.24 0.27 0.19 0.19 0.23 As for DGP1 and DGP2, our estimates of dz are roughly in line with the theoretical predictions. For DGP3, where dz should be large, we see that the estimates are a bit lower, but the estimated degree of memory is still considerable. The results for DGP4 are indeed slightly lower than those for DGP3, as predicted by Proposition 4. Finally, for DGP5 we again observe that dz is somewhat overestimated if the true value is low and vice versa. As in Table 1, we see that the results are qualitatively the same if we consider t-distributed innovation sequences and the QLIKE loss function. Additional simulations with dx=0.65 show that the results are virtually identical for d1,d2≤0.4. If either d1=0.6 or d2=0.6, the memory transmission becomes even stronger, which is also in line with the findings for DGP1 and DGP2 in Table 1. Overall, we find that the finite-sample results presented in this section are in line with the theoretical findings from Section 3. Moreover, the practical relevance of the results in Propositions 2–5 extends far beyond the stationary Gaussian case with MSE loss as demonstrated by the finding that the transmission results obtained with t-distributed innovations, non-stationary processes, and the QLIKE loss function are nearly identical. 5.2 Empirical Forecast Scenarios The relevance of memory transmission to the forecast error loss differentials in practice is further examined by considering a number of simple forecast scenarios motivated by typical empirical examples. To ensure that the null hypothesis of equal predictive accuracy holds, we have to construct two competing forecasts that are different from each other, but perform equally well in terms of a loss function—here the MSE. The length of the estimation period equals TE = 250 and the memory parameter estimates are obtained by the LPWN estimator. The first scenario is motivated by the spurious long memory literature. The DGP is a fractionally integrated process with a time-varying mean that is generated by a random level shift process as in Perron and Qu (2010) or Qu (2011). In detail, yt=xt+μtxt=(1−L)−1/4ɛx,tμt=μt−1+πtɛμ,t, where ɛx,t∼iidN(0,1), ɛμ,t∼iidN(0,1), πt∼iidBern(p) and ɛxt, ɛμt and πt are mutually independent.6 It is well known that it can be difficult to distinguish long memory and low frequency contaminations such as structural breaks (cf. Diebold and Inoue (2001) or Granger and Hyung (2004)). Therefore, it is often assumed that the process is either driven by the one or the other, see, for example, Berkes et al. (2006), who suggest a test that allows to test for the null hypothesis of a weakly dependent process with breaks against the alternative of long-range dependence, or Lu and Perron (2010) who demonstrate that a pure level shift process has superior predictive performance compared with ARFIMA and HAR models for the log-absolute returns of the S&P500. See also Varneskov and Perron (2017) for a related recent contribution. In the spirit of this dichotomy, we compare forecasts which solely consider the breaks with those that assume the absence of breaks and predict the process based on a fractionally integrated model (with the memory estimated by the local Whittle method).7 Table 3 shows the results of this exercise. It is clear to see that the average loss differential is close to zero. The estimated memory of the loss differentials is around 0.17 for larger sample sizes. While the classical DM test based on a HAC estimator over-rejects, both the tMAC and the tEFB-statistics control the size well, at least in larger samples. Table 3. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for a spurious long memory scenario T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 Note: The true DGP is fractionally integrated with random level shifts and the forecasts assume either a pure shift process or a pure long memory process. Table 3. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for a spurious long memory scenario T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 T d^z z¯ tHAC tMAC tEFB 250 0.113 0.012 0.201 0.110 0.098 500 0.144 0.003 0.227 0.080 0.073 1000 0.171 0.000 0.270 0.057 0.058 2000 0.172 −0.001 0.324 0.046 0.057 Note: The true DGP is fractionally integrated with random level shifts and the forecasts assume either a pure shift process or a pure long memory process. As a second scenario, we consider simple predictive regressions based on two regressors that are fractionally cointegrated with the forecast objective. Here xt is fractionally integrated of order d. Then yt=xt+(1−L)−(d−δ)ηtxi,t=xt+(1−L)−(d−δ)ɛi,tŷit=β̂0i+β̂1ixi,t−1, where ηt and the ɛi,t are mutually independent and normally distributed with unit variances, β̂0i and β̂1i are the OLS estimators and 0<δ<d. To resemble processes in the lower nonstationary long memory region (as the realized volatilities in our empirical application) we set d = 0.6. This corresponds to a situation where we forecast realized volatility of the S&P500 with either past values of the VIX or another past realized volatility such as that of a sector index. The cointegration strength is set to δ=0.3. The results are shown in Table 4. Again, one can see that the Monte Carlo averages ( z¯) are close to zero. The tMAC and tEFB-statistics tend to be conservative in larger samples, whereas the tHAC test rejects far too often. The strength of the memory in the loss differential lies roughly at 0.24. Table 4. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables are fractionally cointegrated with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 Table 4. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables are fractionally cointegrated with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 T d^z z¯ tHAC tMAC tEFB 250 0.182 0.011 0.289 0.063 0.059 500 0.207 −0.011 0.336 0.040 0.041 1000 0.228 −0.001 0.378 0.023 0.025 2000 0.238 −0.009 0.441 0.016 0.020 Our third scenario is closely related to the previous one. In practice it is hard to distinguish fractionally cointegrated series from fractionally integrated series with highly correlated short-run components (cf. the simulation studies in Hualde and Velasco (2008)). Therefore, our third scenario is similar to the second, but with correlated innovations, yt=(1−L)−dηtxit=(1−L)−dɛi,tand ŷit=β̂0i+β̂1ixi,t−1. Here, all pairwise correlations between ηt and the ɛi,t are ρ=0.4. Furthermore, we set d = 0.4, so that we operate in the stationary long memory region. The situation is the same as in the previous scenarios, with strong long memory of d̂z≈0.3 in the loss differentials, see Table 5. Apparently, the tests are quite conservative for this DGP. This can be attributed to the complicated memory estimation in the forecast error loss differential series via the standard local Whittle estimator, see also our discussion in Section 5.1. Table 5. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables have correlated innovations with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 Table 5. Estimated memory of the loss differentials d^z, mean loss differential z¯, and rejection frequencies of the t-statistics for comparison of forecasts obtained from predictive regressions where the regressor variables have correlated innovations with the forecast objective T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 T d^z z¯ tHAC tMAC tEFB 250 0.275 0.001 0.364 0.028 0.026 500 0.288 0.001 0.417 0.017 0.020 1000 0.288 0.005 0.445 0.008 0.012 2000 0.287 0.003 0.485 0.004 0.009 Altogether, these results demonstrate that memory transmission can indeed occur in a variety of situations whether that is due to level shifts, cointegration, or correlation—in a nonstationary series, or in a stationary series. 5.3 Size and Power of Long Memory Robust t-Statistics We now turn our attention to the empirical size and power properties of the memory robust tEFB and tMAC-statistics by using the same DGPs as in Section 5.1. Thereby, we reflect the situations covered by our propositions and the distributional properties that are realistic for the forecast error loss differential zt. However, we have to ensure that the loss differentials have zero expectation (size) and that the distance from the null is comparable for different DGPs (power). As DGP1, DGP2, DGP4, and DGP5 are constructed in a symmetric way, we have E[MSE(yt,ŷ1t)−MSE(yt,ŷ2t)]=0 and E[QLIKE(yt,ŷ1t)−QLIKE(yt,ŷ2t)]≠0, due to the asymmetry of the QLIKE loss function. Furthermore, DGP3 is not constructed in a symmetric way, so that E[z˜t]≠0, irrespective of the loss function. We therefore have to correct the means of the loss differentials. In addition to that, different DGPs generate different degrees of long memory. Given that sample means of long memory processes with memory d converge with the rate T1/2−d, we consider the local power to achieve comparable results across DGPs. Let z˜t be generated as in (1), with the ŷit and yt as described in Section 5.1, and z¯=(MT)−1∑i=1M∑t=1Tz˜t, where M denotes the number of Monte Carlo repetitions. Then the loss differentials are obtained via zt=z˜t−z¯+cSD(z˜t)T1/2−dz. (12) The parameter c controls the distance from the null hypothesis (c = 0). Here, each realization of z˜t is centered with the average sample mean from M = 5000 simulations of the respective DGP. Similarly, dz is determined as the Monte Carlo average of the LPWN estimates for the respective setup. In the power simulations, the memory parameters are set to d1=d2=d and dɛ1=dɛ2=d to keep the tables reasonably concise. Table 6 presents the size results for the tMAC-statistic. It tends to be liberal for small T, but generally controls the size well in larger samples. There are, however, two exceptions. First, the test remains liberal for DGP3, even if the sample size increases. This effect is particularly pronounced for d = 0 and if zt is based on the QLIKE loss function. Second, the test is conservative for DGP2, particularly for increasing values of d. With regard to the bandwidth parameters, we find that the size is slightly better controlled with qd=0.65 and q = 0.6. However, the bandwidth choice seems to have limited effects on the size of the test, especially in larger samples. Table 6. Size results of the tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 Table 6. Size results of the tMAC-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 q 0.5 0.6 MSE QLIKE MSE QLIKE qd T DGP /d 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0 0.2 0.4 0.65 250 1 0.10 0.08 0.07 0.10 0.09 0.13 0.09 0.07 0.07 0.10 0.10 0.13 2 0.01 0.02 0.03 0.02 0.04 0.05 0.01 0.03 0.03 0.02 0.04 0.04 3 0.10 0.09 0.09 0.13 0.10 0.09 0.10 0.08 0.09 0.13 0.11 0.10 4 0.10 0.08 0.09 0.10 0.09 0.10 0.09 0.09 0.09 0.10 0.08 0.10 5 0.04 0.05 0.05 0.04 0.06 0.05 0.03 0.04 0.05 0.03 0.05 0.06 2000 1 0.05 0.05 0.04 0.06 0.05 0.08 0.05 0.05 0.03 0.06 0.05 0.07 2 0.02 0.02 0.01 0.02 0.05 0.02 0.01 0.02 0.01 0.02 0.04 0.02 3 0.07 0.05 0.04 0.15 0.11 0.06 0.06 0.05 0.04 0.15 0.11 0.05 4 0.06 0.05 0.04 0.07 0.06 0.06 0.05 0.05 0.04 0.06 0.05 0.05 5 0.04 0.05 0.02 0.04 0.06 0.02 0.03 0.05 0.02 0.03 0.06 0.02 0.8 250 1 0.08 0.07 0.06 0.10 0.09 0.12 0.09 0.06 0.05 0.09 0.08 0.11 2 0.02 0.02 0.02 0.02 0.03 0.04 0.02 0.02 0.02 0.02 0.04 0.04 3 0.10 0.06 0.07 0.12 0.09 0.08 0.09 0.07 0.09 0.12 0.08 0.08 4 0.09 0.07 0.08 0.10 0.08 0.10 0.08 0.06 0.07 0.10 0.08 0.10 5 0.04 0.05 0.03 0.04 0.05 0.04 0.04 0.05 0.03 0.04 0.05 0.03 2000 1 0.06 0.05 0.04 0.06 0.06 0.08 0.05 0.04 0.03 0.06 0.06 0.07 2 0.02 0.01 0.00 0.02 0.04 0.02 0.02 0.01 0.00 0.02 0.04 0.02 3 0.07 0.05 0.06 0.16 0.11 0.07 0.06 0.04 0.05 0.14 0.11 0.06 4 0.06 0.04 0.06 0.06 0.05 0.07 0.05 0.04 0.04 0.06 0.05 0.06 5 0.04 0.04 0.01 0.04 0.05 0.02 0.04 0.04 0.01 0.04 0.05 0.01 Size results for the tEFB-statistic are displayed in Table 7. To analyze the impact of the bandwidth b and the kernel choice, we set qd=0.65.8 Size performance is more favorable with the MQS kernel than with the Bartlett kernel. Furthermore, it is positively impacted by using a large value of b (0.6 or 0.9). Similar to the tMAC-statistic, we observe that the test is liberal with T = 250, but that the overall performance is very satisfactory for T = 2000. Again, the test tends to be liberal for DGP3—especially for QLIKE. However, if the MQS kernel and a larger b is used, this effect disappears nearly completely. The conservative behavior of the test for DGP2 and large values of d is also the same as for the tMAC-statistic. The tMAC-statistic tends to be perform better than the tEFB-statistic using the Bartlett kernel, but worse when considering the MQS kernel (cf. Tables 6 and 7). Table 7. Size results of the tEFB-statistic for the DGPs described in Section 5.1 and (12) with Gaussian innovations and m=⌊T0.65⌋ MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.09 0.09 0.08 5 0.05 0.03 0.04 0.04 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 2000 1 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.04 0.04 0.07 0.05 0.05 2 0.03 0.03 0.03 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.05 3 0.07 0.05 0.06 0.08 0.07 0.08 0.07 0.06 0.06 0.07 0.07 0.08 4 0.05 0.04 0.04 0.06 0.06 0.05 0.06 0.05 0.05 0.06 0.06 0.06 5 0.06 0.04 0.05 0.05 0.05 0.06 0.05 0.05 0.05 0.07 0.07 0.06 0.4 250 1 0.05 0.05 0.05 0.07 0.07 0.06 0.10 0.09 0.08 0.14 0.13 0.11 2 0.03 0.03 0.03 0.03 0.03 0.04 0.05 0.04 0.04 0.05 0.04 0.05 3 0.08 0.07 0.07 0.09 0.09 0.08 0.06 0.06 0.05 0.09 0.08 0.08 4 0.08 0.06 0.07 0.09 0.09 0.09 0.09 0.07 0.06 0.11 0.11 0.11 5 0.04 0.03 0.03 0.05 0.05 0.05 0.04 0.04 0.04 0.05 0.05 0.05 2000 1 0.04 0.04 0.04 0.04 0.04 0.04 0.08 0.05 0.06 0.08 0.08 0.07 2 0.02 0.02 0.02 0.02 0.02 0.01 0.03 0.03 0.03 0.03 0.03 0.03 3 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 4 0.05 0.05 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.06 0.06 0.06 5 0.03 0.03 0.03 0.02 0.02 0.02 0.03 0.03 0.03 0.03 0.03 0.03 MSE QLIKE Kernel MQS Bartlett MQS Bartlett d T DGP /b 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0.3 0.6 0.9 0 250 1 0.08 0.06 0.06 0.10 0.10 0.08 0.07 0.06 0.06 0.10 0.10 0.09 2 0.03 0.03 0.03 0.02 0.02 0.02 0.04 0.03 0.03 0.02 0.02 0.02 3 0.12 0.10 0.09 0.16 0.14 0.14 0.11 0.09 0.08 0.15 0.14 0.14 4 0.07 0.06 0.06 0.10 0.09 0.09 0.08 0.06 0.07 0.10 0.10 0.09 5 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.04 2000 1 0.06 0.05 0.05 0.07 0.05 0.06 0.06 0.04 0.05 0.07 0.06 0.06 2 0.03 0.03 0.03 0.02 0.02 0.03 0.04 0.03 0.03 0.03 0.03 0.03 3 0.07 0.06 0.06 0.10 0.10 0.09 0.08 0.06 0.07 0.10 0.09 0.10 4 0.06 0.05 0.05 0.06 0.06 0.06 0.07 0.06 0.06 0.08 0.07 0.07 5 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.04 0.04 0.05 0.05 0.05 0.2 250 1 0.06 0.05 0.05 0.08 0.08 0.07 0.08 0.06 0.06 0.09 0.08 0.08 2 0.04 0.02 0.03 0.03 0.03 0.03 0.04 0.03 0.03 0.04 0.04 0.04 3 0.09 0.07 0.08 0.11 0.10 0.10 0.09 0.07 0.07 0.11 0.10 0.10 4 0.07 0.06 0.06 0.08 0.08 0.07 0.07 0.06 0.05 0.0