Journal of Financial Econometrics, Volume Advance Article – Apr 16, 2018

42 pages

/lp/ou_press/the-vix-the-variance-premium-and-expected-returns-SAi5jm4mwX

- Publisher
- Oxford University Press
- Copyright
- © The Author(s), 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
- ISSN
- 1479-8409
- eISSN
- 1479-8417
- D.O.I.
- 10.1093/jjfinec/nby008
- Publisher site
- See Article on Publisher Site

Abstract Existing studies find conflicting estimates of the risk–return relation. We show that the trade-off parameter is inconsistently estimated when observed or estimated conditional variances measure risk. The inconsistency arises from misspecified, unbalanced, and endogenous return regressions. These problems are eliminated if risk is captured by the variance premium (VP) instead; it is unobservable, however. We propose a 2SLS estimator that produces consistent estimates without observing the VP. Using this method, we find a positive risk–return trade-off and long-run return predictability. Our approach outperforms commonly used risk–return estimation methods, and reveals a significant link between the VP and economic uncertainty. The risk–return trade-off is a central concept in modern finance theory.1 Yet, there is, at best, only weak empirical evidence for the positive risk premium implied by the risk–return trade-off.2 Often either the objective conditional variance of the stock market or the option-implied volatility is viewed as a measure for time-varying economic uncertainty or the aggregate risk level (see, e.g., Bloom, 2009 and Bali and Peng, 2006). If risk or uncertainty were indeed captured by volatility, the mainstream risk–return trade-off theory would suggest that (at fixed levels of risk aversion) a higher level of variance corresponds to higher expected excess returns. Alternatively, market variance and in particular the Chicago Board Options Exchange (CBOE) volatility index, VIX, is commonly referred to as the “investor fear gauge” (Whaley, 2000), and may thus be viewed as a popular indicator of aggregate risk aversion (Bekaert, Hoerova, and Duca, 2013). In this case, the risk–return trade-off theory would again suggest a positive relation between variance and expected excess returns at fixed levels of risk. Recently, conditional variances formed under the objective or the risk-neutral expectations measure have fallen on hard times, in the sense that empirical studies strongly challenge their role as variables that help gauge the risk–return trade-off. Given the lack of consensus, new theories have been brought forward suggesting that it is rather the variance premium (VP) that is positively related to the market risk premium. Structural models of Drechsler and Yaron (2011) and Bollerslev, Tauchen, and Zhou (2009) and Bollerslev, Tauchen, and Sizova (2012) show that VP is linked to economic uncertainty and that the latter commands a nonnegligible equity risk premium. Conversely, the model of Bekaert and Engstrom (2017) shows that VP is an indicator of aggregate risk aversion and hence is positively related to the equity premium. Financial market data strongly support the suggested relation.3 In this study, we contribute to the literature in three respects. Motivated by recent theories and empirical facts, we set up a reduced-form data generating process (DGP) in which the VP is the variable that truly drives conditionally expected future excess returns. We show that traditional empirical analyses where VIX or estimated measures of conditionally expected market variance are used to evaluate the risk–return trade-off result in a misspecified, unbalanced, and endogenous regression. The new result that we provide in this article demonstrates that even in this very “unfavorable” regression specification, the researcher can still use standard techniques of statistical inference to test for the significance of a risk–return trade-off under certain conditions. However, the ordinary least squares (OLS) estimator of the trade-off parameter is inconsistent, meaning that observed market variance measures cannot be used to gauge the magnitude of the risk–return trade-off. We view these results as a possible explanation for the fact that existing empirical studies of the trade-off have largely produced relationships of either sign and magnitude. An obvious solution to avoid the inconsistency above would be to rely on VP as a predictor instead of observed variance measures and to estimate the trade-off by OLS. Yet VP, that is the difference between the risk-neutral and the physical expectation of future quadratic variation, is latent since the latter term cannot be observed. This is the crux of the literature on VP and risk–return modeling.4 A consensus on modeling the objective expectation of the market’s quadratic variation is largely absent from the literature. The model uncertainty, as well as the estimation error in the resulting estimate for the unobserved VP, will directly affect the estimation of the risk–return trade-off parameter, likely biasing the results. To avoid these consequences, our second contribution is to suggest a two-stage least squares (2SLS) estimator that consistently estimates the relation between VP and the equity premium, without observing VP itself. To the best of our knowledge, we are the first to show that the risk–return trade-off parameter can be estimated without the necessity of observing, measuring, or estimating risk itself. The proposed 2SLS estimation approach allows for standard statistical inference on the parameters. We further develop methods to establish the validity and the relevance of the instruments. Our third contribution is empirical. Using data on the S&P 500, we demonstrate that there is ample empirical support for the assumed DGP. We find evidence for a positive significant risk–return trade-off relation. To that end, we identify two valid and relevant instruments that are closely related to the ex-post variance risk premium of Bollerslev, Tauchen, and Zhou (2009) and the jump component of the stock price process. The uncovered risk–return trade-off is of sizable magnitude. We find that for a unit increase in risk, investors demand a 2% annual increase in the equity premium. Further there is significantly positive excess return predictability from VP at different horizons, from 1 day to half a year. Even though VP remains latent throughout our study, we confirm that return predictability in VP is maximized at a 4-months investment horizon, which is in line with the studies that estimate VP (e.g., Bollerslev, Tauchen, and Zhou, 2009 and Bollerslev et al., 2014). We show that our estimation technique leads to stronger predictability of excess returns over these horizons compared with models that estimate VP, both in sample and out of sample. We argue that the main reason for this improvement is that the 2SLS approach avoids the estimation error in the estimate for VP that traditional OLS approaches produce. Finally, we inspect the degree of correlation between the latent VP measure uncovered here and popular indicators for economic uncertainty and risk aversion. Our empirical results tend to favor models that relate VP to either the volatility or the jump component of economic uncertainty. In contrast, the correlation results between VP and risk aversion are weak at best. 1 DGP and Initial Data Statistics We propose a simple framework for the DGP of excess returns and risk.5 VPt=φ(L)ɛt ∑i=0∞i|φi|<∞, φ(1)≠0 (1) VIXt2≡VPt+EtP(QVt,t+τ) (2) rt+1(e)=α+βVPt+ξt+1 (3) EtP(QVt,t+τ)=(1−L)−dηt, (4) where t=1,2,…,T and 0≤d<1/2. The conditional expectation of the market’s quadratic variation taken under the objective probability measure EtP(QVt,t+τ), where the horizon τ equals 30 days, is latent, whereas under the equivalent martingale measure EtQ(QVt,t+τ)=VIXt2 is observable for the U.S. market. A vector consisting of noise processes ɛt, ξt, ηt, and additional shocks μt and υk,t, k=1,2,…,K, is vector independently and identically distributed (i.i.d.) with mean zero and a diagonal variance matrix with elements σɛ2, σξ2, ση2, σμ2, and συk2. The variance of VPt is σVP2=σɛ2∑i=0∞φi2. The variance of EtP(QVt,t+τ) is σP2=ση2Γ(1−2d)/(Γ(1−d))2. The DGP (1)–(4) incorporates many stylized empirical facts as well as theoretical results. Expected excess returns are time-varying and are positively related to risk if β>0, which is in line with the empirical findings as well as new theoretical underpinnings (see references in the introduction). rt(e) is generated as a stationary I(0) process that exhibits some short-memory dynamics but the impact of shocks decays quickly, which is consistent with the empirical regularities of observed excess returns. In contrast, the conditional variance series VIXt2 and EtP(QVt,t+τ) are strongly persistent, an often observed property of financial data. It is well documented in the literature that fractionally integrated models with d∈(0,1/2) closely fit the dynamics of observed and model-implied conditional variances (see, e.g., Bollerslev et al., 2013 and references therein). Finally, the difference between our two variance series, VPt, is I(0). It possesses less memory than its two components, suggesting that the two variance series fractionally cointegrate as found by for example Christensen and Nielsen (2006) and Bandi and Perron (2006). Our data support the proposed DGP. We consider daily data for the S&P 500 stock market index from February 3, 2000 until June 30, 2014, resulting in a large number of T = 3622 observations. We obtain the CBOE volatility index, VIXCBOE,t, from the Wharton Research Data Services (WRDS) database. We transform the data series into maturity-scaled variance units by VIXt2=30365VIXCBOE,t2. We further obtain two variance measures that contain information about the unobserved quadratic variation, QVt,t+τ: (i) the realized variance, RVRL,t, computed from five-minute returns within a day, subsampled at a one-minute frequency. Under certain regularity conditions, RVRL,t converges to the daily quadratic variation of returns, as shown by Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002), and Meddahi (2002). (ii) the bipower variation, BVRL,t, of Barndorff-Nielsen and Shephard (2004), which converges to the daily integrated variance of returns. The latter equals the quadratic variation minus daily jumps.6 Whereas VIXt2 is related to the return variation over the next month, the raw series RVRL,t and BVRL,t measure daily variation. To align the three measures, we modify the latter two as follows: RVt=∑i=122(RVRL,t+i×1002+{[lnPt+i+1(open)Pt+i(close)]×100}2) (5) BVt=∑i=122(BVRL,t+i×1002+{[lnPt+i+1(open)Pt+i(close)]×100}2). (6) Finally, we measure rt+1(e) as daily annualized continuously compounded excess returns rt+1(e)=100×ln[(Pt+1(close)Pt(close))252]−rt(f). (7) We obtain the daily 3-month T-Bill rate from the FRED database and convert it into annualized continuously compounded rates rt(f). Our DGP (1)–(4) implies that rt(e) is a short-memory I(0) process, whereas VIXt2 is a long-memory I(d) process. Table 1 shows summary statistics for our data. For returns we find that the autocorrelation estimates are very close to zero, suggesting that there is very little persistence in the series. Conversely, for VIXt2 we find that the first three autocorrelation estimates are very close to 1. Even after 22 trading days, the serial correlation is still very strong; roughly 75% of a shock’s impact remains. If physical expectations are taken under the rational information set, it follows that the temporal dependence of the realized QVt,t+τ proxies in (5)–(6) give us an indication of the unknown dynamics of EtP(QVt,t+τ). Autocorrelation estimates for RVt and BVt in Table 1 are very similar to the estimates for VIXt2, suggesting that the variance series share similar strongly persistent dynamics. Table 1. Summary statistics Summary statistics Autocorrelation Average Std. Dev. 1 2 3 22 rt(e) 0.2447 20.4558 −0.0770 −0.0547 0.0233 0.0328 RVt 25.1492 40.4750 0.9972 0.9920 0.9849 0.6989 BVt 22.7706 37.8044 0.9971 0.9918 0.9844 0.6917 VIXt2 43.7903 47.3600 0.9697 0.9481 0.9337 0.7473 q1,t 0.7000 1.5949 0.9742 0.9505 0.9194 0.3404 q2,t 16.6596 20.4266 0.8378 0.7153 0.6320 0.0888 Summary statistics Autocorrelation Average Std. Dev. 1 2 3 22 rt(e) 0.2447 20.4558 −0.0770 −0.0547 0.0233 0.0328 RVt 25.1492 40.4750 0.9972 0.9920 0.9849 0.6989 BVt 22.7706 37.8044 0.9971 0.9918 0.9844 0.6917 VIXt2 43.7903 47.3600 0.9697 0.9481 0.9337 0.7473 q1,t 0.7000 1.5949 0.9742 0.9505 0.9194 0.3404 q2,t 16.6596 20.4266 0.8378 0.7153 0.6320 0.0888 The table reports summary statistics of the three variance series, excess returns on the S&P 500, and the instruments (from February 3, 2000 to May 28, 2014). All variance series are in squared percentage form and scaled by maturity. The statistics for excess returns are annualized percentages. q1,t denotes the jump instrument and q2,t is the variance risk premium instrument. Table 1. Summary statistics Summary statistics Autocorrelation Average Std. Dev. 1 2 3 22 rt(e) 0.2447 20.4558 −0.0770 −0.0547 0.0233 0.0328 RVt 25.1492 40.4750 0.9972 0.9920 0.9849 0.6989 BVt 22.7706 37.8044 0.9971 0.9918 0.9844 0.6917 VIXt2 43.7903 47.3600 0.9697 0.9481 0.9337 0.7473 q1,t 0.7000 1.5949 0.9742 0.9505 0.9194 0.3404 q2,t 16.6596 20.4266 0.8378 0.7153 0.6320 0.0888 Summary statistics Autocorrelation Average Std. Dev. 1 2 3 22 rt(e) 0.2447 20.4558 −0.0770 −0.0547 0.0233 0.0328 RVt 25.1492 40.4750 0.9972 0.9920 0.9849 0.6989 BVt 22.7706 37.8044 0.9971 0.9918 0.9844 0.6917 VIXt2 43.7903 47.3600 0.9697 0.9481 0.9337 0.7473 q1,t 0.7000 1.5949 0.9742 0.9505 0.9194 0.3404 q2,t 16.6596 20.4266 0.8378 0.7153 0.6320 0.0888 The table reports summary statistics of the three variance series, excess returns on the S&P 500, and the instruments (from February 3, 2000 to May 28, 2014). All variance series are in squared percentage form and scaled by maturity. The statistics for excess returns are annualized percentages. q1,t denotes the jump instrument and q2,t is the variance risk premium instrument. We estimate the respective fractional integration order, di, i={RV,BV,VIX,r}, of the four series, RVt, BVt, VIXt2, and rt(e), jointly for efficiency. It is common to rely on semiparametric techniques for the estimation of di. We apply a multivariate version of Shimotsu and Phillips’ (2005) exact local Whittle (EW) estimator derived by Nielsen and Shimotsu (2007).7 Table 2 summarizes our results. The realized variance and the bipower variation are integrated of the order I(0.32). At a 5% significance level, we reject that di = 0 and di = 1 for both series, yet we fail to reject that di = 0.5. The point estimate for the memory of the variance index, VIXt2, is somewhat higher, d^VIX=0.40. According to the t-test of Nielsen and Shimotsu (2007) for the equality of di, we cannot reject that the three variance series are integrated of the same order, however. Excess returns are integrated of the approximate order zero, and we fail to reject di = 0, but reject di = 0.5 and di = 1. Our data thus lend support to the proposed DGP.8 Table 2. Long-memory estimates Estimates for d RVt BVt VIXt2 rt(e) d^ 0.3234 0.3170 0.3961 0.1107 td=0 2.5872 2.5359 3.1684 0.8854 td=0.5 −1.4128 −1.4641 −0.8316 −3.1146 td=1 −5.4128 −5.4641 −4.8316 −7.1146 Estimates for d RVt BVt VIXt2 rt(e) d^ 0.3234 0.3170 0.3961 0.1107 td=0 2.5872 2.5359 3.1684 0.8854 td=0.5 −1.4128 −1.4641 −0.8316 −3.1146 td=1 −5.4128 −5.4641 −4.8316 −7.1146 tdi=dj statistics with h(T)=0.0721 RVt BVt VIXt2 rt(e) RVt – 0.3134 −0.9900 2.1295 BVt – −1.0559 2.0591 VIXt2 – 2.3276 rt(e) – tdi=dj statistics with h(T)=0.0721 RVt BVt VIXt2 rt(e) RVt – 0.3134 −0.9900 2.1295 BVt – −1.0559 2.0591 VIXt2 – 2.3276 rt(e) – The upper panel of the table reports estimates of d using the multivariate EW estimator of Nielsen and Shimotsu (2007) for Yt=[RVt, BVt, VIXt2, rt(e)]′. The size of the spectral window is set to m=T0.35; the choice is based on a graphical analysis of the slope of the log periodograms as suggested by Beran (1994). td=0, td=0.5, and td=1 denote the respective t-statistics of element i of Yt given by 2m(d^i−d). The lower panel of the table summarizes the t-statistics corresponding to the null hypothesis di = dj for i≠j. Nielsen and Shimotsu (2007) define the t-statistic as tdi=dj=m(d^i−d^j)12(1−ι^i,j2ι^i,iι^j,j)+h(T), where ι^i,j=1m∑l=1mreal{I(λl)} and I(λl) is the periodogram of a (4×1) vector with elements Δd^iYt,i at frequency λl. h(T) is a tuning parameter, which we set equal to (ln(T))−1.3. The resulting statistic tdi=dj should be compared with the critical values from a standard normal distribution. Table 2. Long-memory estimates Estimates for d RVt BVt VIXt2 rt(e) d^ 0.3234 0.3170 0.3961 0.1107 td=0 2.5872 2.5359 3.1684 0.8854 td=0.5 −1.4128 −1.4641 −0.8316 −3.1146 td=1 −5.4128 −5.4641 −4.8316 −7.1146 Estimates for d RVt BVt VIXt2 rt(e) d^ 0.3234 0.3170 0.3961 0.1107 td=0 2.5872 2.5359 3.1684 0.8854 td=0.5 −1.4128 −1.4641 −0.8316 −3.1146 td=1 −5.4128 −5.4641 −4.8316 −7.1146 tdi=dj statistics with h(T)=0.0721 RVt BVt VIXt2 rt(e) RVt – 0.3134 −0.9900 2.1295 BVt – −1.0559 2.0591 VIXt2 – 2.3276 rt(e) – tdi=dj statistics with h(T)=0.0721 RVt BVt VIXt2 rt(e) RVt – 0.3134 −0.9900 2.1295 BVt – −1.0559 2.0591 VIXt2 – 2.3276 rt(e) – The upper panel of the table reports estimates of d using the multivariate EW estimator of Nielsen and Shimotsu (2007) for Yt=[RVt, BVt, VIXt2, rt(e)]′. The size of the spectral window is set to m=T0.35; the choice is based on a graphical analysis of the slope of the log periodograms as suggested by Beran (1994). td=0, td=0.5, and td=1 denote the respective t-statistics of element i of Yt given by 2m(d^i−d). The lower panel of the table summarizes the t-statistics corresponding to the null hypothesis di = dj for i≠j. Nielsen and Shimotsu (2007) define the t-statistic as tdi=dj=m(d^i−d^j)12(1−ι^i,j2ι^i,iι^j,j)+h(T), where ι^i,j=1m∑l=1mreal{I(λl)} and I(λl) is the periodogram of a (4×1) vector with elements Δd^iYt,i at frequency λl. h(T) is a tuning parameter, which we set equal to (ln(T))−1.3. The resulting statistic tdi=dj should be compared with the critical values from a standard normal distribution. 2 Estimating the Risk–Return Trade-Off by OLS The correct specification to estimate the risk–return trade-off would be to regress rt+1(e) on VPt, given our DGP (1)–(4). Yet, VPt is not observed by the researcher, but VIXt2 is not latent. The researcher may be inclined to evaluate the following regression rt+1(e)=a+bVIXt2+et+1, (8) which is unbalanced since the integration orders of the regressor and the regressand differ (Banerjee et al., 1993). Regression (8) is further misspecified, since the predictor is imperfect. Similar to Pastor and Stambaugh (2009) and Binsbergen and Koijen (2010) we assume that the observed variable VIXt2 contains relevant information about the expected return, but it is imperfectly correlated with the latter. Finally, the econometrician’s model (8) is endogenous. The regression residuals of (8) are composed of two elements, that is et+1=−βEtP(QVt,t+τ)+ξt+1. Thus, et+1 will be naturally correlated with the observed regressor VIXt2.9 The results from the empirical literature on the risk–return trade-off using the observable measure VIXt2 are largely inconclusive to date (see references in the introduction). Analyses in the field typically evaluate a predictive regression such as (8) by OLS. Theorem 1 aids our understanding of the likely causes of finding a risk–return trade-off of either sign and magnitude. Define two matrices by X(T−1)×2 ≡ (11…1VIX12VIX22…VIXT−12)′, y(T−1)×1≡(r2(e)r3(e)…rT(e))′. (9) Theorem 1 Let VPt, VIXt2, rt(e), and EtP(QVt,t+τ) be generated by (1)–(4). Estimate Regression (8) by OLS, resulting in b̂OLS≡(a^, b^)′=(X′X)−1(X′y). (10) 1. If β = 0 a^→PαT1/2b^→DN(0,σξ2σVP2+σP2)T−1/2ta→Pασξtb→DN(0,1). In addition, s2→Pσξ2, where s2=T−1∑t=2Te^t2. ta=a^/Var(a^) and tb=b^/Var(b^) denote the t-statistics, where Var(b̂OLS)=s2(X′X)−1. 2. If β≠0 a^→Pαb^→PβσVP2σVP2+σP2T−1/2ta→Pα(σξ2+β2σVP2σP2σVP2+σP2)1/2T−1/2tb→PβσVP2(β2σVP2σP2+σξ2(σVP2+σP2))1/2, where s2→Pσξ2+β2σVP2σP2σVP2+σP2. A proof of Theorem 1 can be found in Appendix B. Small sample simulations supporting the results in this article can be found in the Online Appendix. The first part of Theorem 1 summarizes the case where there is no risk–return trade-off, β = 0. In this situation, the OLS slope estimate b^ correctly converges to zero and to a normal distribution at the usual rate T−1/2. More importantly, under the premise that there is a risk premium in the market, the second part of Theorem 1 shows that OLS produces an inconsistent estimate for β. In finite sample simulations, the estimate b^ is of either sign and value, which is in line with the findings in empirical studies that use VIXt2 as a predictor. Asymptotically, the OLS slope estimate b^ is biased towards zero, implying that in large samples the researcher would underestimate the implied predictive power from VPt on rt+1(e). Given the unbalancedness and endogeneity issues in (8), it may not be too surprising to the reader that the OLS estimator for β is inconsistent. The asymptotic bias of the estimator towards zero is in line with for example Maynard and Phillips (2001). What is truly new and largely different from the extant literature on return predictions with persistent regressors is the finding that standard statistical inference can be carried out. In particular, Theorem 1 shows that the t-statistic associated with b^ converges asymptotically to a standard normal limiting distribution that is free of nuisance parameters under the null hypothesis that β = 0. In small sample simulations, we find that the size of a simple t-test on the parameter β is always very close to the nominal size of 5%. Under the alternative hypothesis, the t-statistic tb diverges asymptotically at rate T1/2. Simulations suggest that a t-test generally has very good power in finite samples. The implication of these results is that one can draw valid statistical inference on the significance of the risk–return trade-off parameter β, even in the unbalanced, misspecified, and endogenous regression framework considered here. The empirical evidence in our data indeed lends support to the unbalancedness of Regression (8). The t-tests for H0:di=dj in Table 2 indicate that we reject the hypothesis that variance series and returns are integrated of the same order. Nevertheless, we demonstrate the results of estimating (8) by OLS with our data set; see Table 3.10 The estimated risk–return trade-off parameter of 0.27 is statistically different from zero. Since we know from Theorem 1 that valid inference can be carried out, we conclude that the latent variance premium VPt significantly predicts returns. The estimated coefficient is rather small, however, and we deduce from Theorem 1 that the estimate is inconsistent and asymptotically biased towards zero. The researcher could be tempted to make the erroneous conclusion that an increase in yesterday’s perceived risk by one standard deviation leads to a 12.68% increase of today’s annualized excess return expectations. Table 3. OLS and 2SLS estimation results OLS regression of (8) 2SLS regressions of (8) a^ 0.3544 0.3544 SE( a^) 5.3775 5.5338 HAC-SE( a^) 6.6770 19.2892 b^ 0.2678 1.9292 SE( b^) 0.1137 0.2883 HAC-SE( b^) 0.0960 0.8894 J-statistic 1.8536 p-value( J) 0.4380 Std. Coeff. 12.6612 91.2129 OLS regression of (8) 2SLS regressions of (8) a^ 0.3544 0.3544 SE( a^) 5.3775 5.5338 HAC-SE( a^) 6.6770 19.2892 b^ 0.2678 1.9292 SE( b^) 0.1137 0.2883 HAC-SE( b^) 0.0960 0.8894 J-statistic 1.8536 p-value( J) 0.4380 Std. Coeff. 12.6612 91.2129 The table summarizes the estimation results when the misspecified unbalanced and endogenous regression (8) is evaluated by OLS and 2SLS. SE denotes the usual standard error of the estimates that is not robust to heteroskedasticity or serial correlation. HAC-SE reports standard errors based on HAC covariance estimation using a Bartlett kernel that are robust to serial correlation and heteroskedastictiy. J is Sargan’s statistic from Theorem 4. The corresponding p-value is obtained from 200,000 simulations of independent χ2(1) variables multiplied by the eigenvalue estimate 5.513. The last row reports a Standardized Coefficient; it measures the expected percentage change in tomorrow’s annualized excess return, given a one standard deviation increase in perceived risk today. Table 3. OLS and 2SLS estimation results OLS regression of (8) 2SLS regressions of (8) a^ 0.3544 0.3544 SE( a^) 5.3775 5.5338 HAC-SE( a^) 6.6770 19.2892 b^ 0.2678 1.9292 SE( b^) 0.1137 0.2883 HAC-SE( b^) 0.0960 0.8894 J-statistic 1.8536 p-value( J) 0.4380 Std. Coeff. 12.6612 91.2129 OLS regression of (8) 2SLS regressions of (8) a^ 0.3544 0.3544 SE( a^) 5.3775 5.5338 HAC-SE( a^) 6.6770 19.2892 b^ 0.2678 1.9292 SE( b^) 0.1137 0.2883 HAC-SE( b^) 0.0960 0.8894 J-statistic 1.8536 p-value( J) 0.4380 Std. Coeff. 12.6612 91.2129 The table summarizes the estimation results when the misspecified unbalanced and endogenous regression (8) is evaluated by OLS and 2SLS. SE denotes the usual standard error of the estimates that is not robust to heteroskedasticity or serial correlation. HAC-SE reports standard errors based on HAC covariance estimation using a Bartlett kernel that are robust to serial correlation and heteroskedastictiy. J is Sargan’s statistic from Theorem 4. The corresponding p-value is obtained from 200,000 simulations of independent χ2(1) variables multiplied by the eigenvalue estimate 5.513. The last row reports a Standardized Coefficient; it measures the expected percentage change in tomorrow’s annualized excess return, given a one standard deviation increase in perceived risk today. 2.1 Alternative Risk–Return Regression Besides relying on the implied variance, measures for the objective conditional variance are a popular alternative predictor in risk–return regressions. Since EtP(QVt,t+τ) is unobserved, estimates must be used. It is common in the literature to first find an estimate for the quadratic variation QVt,t+τ (or the integrated variance); realized measures such as RVt and BVt are particularly popular. Then a model for the dynamics of QVt,t+τ is specified and successively estimated, producing estimates for the latent EtP(QVt,t+τ). For the latter, we assume EtP(QVt,t+τ)̂=EtP(QVt,t+τ)+ut (11) ut=ψ(L)μt, ∑i=0∞i|ψi|<∞, ψ(1)≠0. (12) The estimation error ut is assumed to be I(0), and its variance is σu2=σμ2∑i=0∞ψi2. The researcher may be tempted to evaluate the following misspecified, unbalanced, and endogenous predictive regression rt+1(e)=a*+b*EtP(QVt,t+τ)̂+et+1*. (13) The endogenous regression residuals of (13) are et+1*=βVPt−βEtP(QVt,t+τ)−βut+ξt+1. Define the matrix X*(T−1)×2 ≡ (11…1E1P(QV1,1+τ)̂E2P(QV2,2+τ)̂…ET−1P(QVT−1,T−1+τ)̂)′. (14) Theorem 2 provides an explanation for the inconclusive outcome of risk–return OLS regressions of the form (13). A proof of Theorem 2 can be found in Appendix B. Theorem 2 Let VPt, VIXt2, rt(e), and EtP(QVt,t+τ) be generated by (1)–(4). In addition, let EtP(QVt,t+τ) and ut be given by (11)–(12). Estimate Regression (13) with EtP(QVt,t+τ)̂ as a predictor by OLS, resulting in b̂OLS*≡(a^*, b^*)′=(X*′X*)−1(X*′y). (15) 1. If β = 0 a^*→PαT1/2b^*→DN(0,σξ2σP2+σu2)T−1/2ta*→Pασξtb*→DN(0,1). In addition, s*2→Pσξ2, where s*2=T−1∑t=2Te^t*2. ta*=a^*/Var(a^*) and tb*=b^*/Var(b^*) denote the t-statistics, where Var(b̂OLS*)=s*2(X*′X*)−1. 2. If β≠0 a^*→Pαb^*→P0T−1/2ta*→Pα(σξ2+β2σVP2)1/2tb*→DN(0,1), where s*2→Pσξ2+β2σVP2. The findings in the first part of Theorem 2, when β = 0, are very comparable to the conclusions drawn from Theorem 1. However, under the alternative hypothesis of return predictability, the asymptotic properties of Regression (13) are worse than (8). OLS produces an inconsistent estimate for β that converges to zero, implying that in large samples the researcher would estimate a zero trade-off between risk and return. Similarly, the t-statistic for b^* is standard normal, centering around zero in the limit. The risk–return trade-off can thus neither be estimated nor tested within regression framework (13). To analyze Regression (13) in our data, we compute several estimates for EtP(QVt,t+τ) in an attempt to replicate the many different proxies used in the extant literature. One of the most popular models for variance dynamics is the martingale model, which has been employed by Bollerslev, Tauchen, and Zhou (2009), Bollerslev et al. (2014), and Kelly and Jiang (2014). In this case, the latent EtP(QVt,t+τ) is simply replaced by RV˜t or BV˜t, which are the RVt and BVt series shifted backwards to capture the quadratic variation or integrated variance over the past month. This model has been criticized since the dynamics of the realized variance do not seem to resemble martingales. Instead, the HAR-RV model of Corsi (2009) has been found to fit the realized variance dynamics particularly well (see, e.g., Bollerslev, Tauchen, and Sizova, 2012 and Bollerslev et al., 2014). It is an autoregressive model of the order 22 for daily realized variance measures with restrictions on the parameters. For our data, we estimate the HAR-RV for the 1-day realized variance measures (including the overnight return), and form expectations for the variance over the next month from this model and the estimated parameters. Bali and Zhou (2016), Bekaert, Hoerova, and Duca (2013), and Drechsler and Yaron (2011) follow a different approach, suggesting that realized monthly variances can be described as linear functions of the previous month’s variance and VIXt2. We replicate this approach with our data, estimating the regression RV˜t+22=γ0+γ1VIXt2+γ2RV˜t+vt+22 and forming corresponding expectations. Given that long-memory models seem to fit the variance dynamics well, we finally also estimate an ARFIMA model for the realized series. Again, we estimate the model for the 1-day realized variance and bipower variation, and successively form expectations for the variance over the next month. The information criteria (BIC and AIC) both support a pure fractional noise specification: ARFIMA(0,0.39,0) for daily realized variances and ARFIMA(0,0.38,0) for the daily bipower variation. Table 4 outlines the results from estimating (13) by OLS. First, we observe slope estimates of either sign as is common in the literature. In line with the Theorem 2, we find the magnitude of the coefficient estimates to be very small; substantially smaller than for example the estimates from Regression (8). The predicted percentage changes in tomorrow’s annualized excess returns resulting from a standard deviation change in the predictor today range within [−3.14%,5.42%]. The results from Theorem 2 are further confirmed by the very small magnitude of the t-statistics in Table 4. Table 4. Alternative OLS estimation results EtP(QVt,t+τ)̂ b^* SE( b^*) HAC-SE( b^*) Std. Coeff. Martingale for RV −0.0778 0.1334 0.1201 −3.1392 Martingale for BV −0.0824 0.1428 0.1248 −3.1057 HAR-RV for RV 0.1463 0.1794 0.1668 4.3880 HAR-RV for BV 0.1702 0.1949 0.1843 4.7003 Drechsler & Yaron for RV 0.1864 0.1849 0.1712 5.4246 Drechsler & Yaron for BV 0.1981 0.1999 0.1827 5.3333 ARFIMA for RV 0.1615 0.2082 0.1653 4.1739 ARFIMA for BV 0.1690 0.2248 0.1768 4.0472 EtP(QVt,t+τ)̂ b^* SE( b^*) HAC-SE( b^*) Std. Coeff. Martingale for RV −0.0778 0.1334 0.1201 −3.1392 Martingale for BV −0.0824 0.1428 0.1248 −3.1057 HAR-RV for RV 0.1463 0.1794 0.1668 4.3880 HAR-RV for BV 0.1702 0.1949 0.1843 4.7003 Drechsler & Yaron for RV 0.1864 0.1849 0.1712 5.4246 Drechsler & Yaron for BV 0.1981 0.1999 0.1827 5.3333 ARFIMA for RV 0.1615 0.2082 0.1653 4.1739 ARFIMA for BV 0.1690 0.2248 0.1768 4.0472 The table summarizes the estimation results for the slope coefficient when the misspecified unbalanced and endogenous regression (13) is evaluated by OLS. As regressors, we use six different estimates for EtP(QVt,t+τ), which are described in Section 2.1. SE denotes the usual standard error of the estimates that is not robust to heteroskedasticity or serial correlation. HAC-SE reports standard errors based on HAC covariance estimation using a Bartlett kernel that are robust to serial correlation and heteroskedasticity. The last column reports a Standardized Coefficient; it measures the expected percentage change in tomorrow’s annualized excess return given a one standard deviation increase in perceived risk today. Table 4. Alternative OLS estimation results EtP(QVt,t+τ)̂ b^* SE( b^*) HAC-SE( b^*) Std. Coeff. Martingale for RV −0.0778 0.1334 0.1201 −3.1392 Martingale for BV −0.0824 0.1428 0.1248 −3.1057 HAR-RV for RV 0.1463 0.1794 0.1668 4.3880 HAR-RV for BV 0.1702 0.1949 0.1843 4.7003 Drechsler & Yaron for RV 0.1864 0.1849 0.1712 5.4246 Drechsler & Yaron for BV 0.1981 0.1999 0.1827 5.3333 ARFIMA for RV 0.1615 0.2082 0.1653 4.1739 ARFIMA for BV 0.1690 0.2248 0.1768 4.0472 EtP(QVt,t+τ)̂ b^* SE( b^*) HAC-SE( b^*) Std. Coeff. Martingale for RV −0.0778 0.1334 0.1201 −3.1392 Martingale for BV −0.0824 0.1428 0.1248 −3.1057 HAR-RV for RV 0.1463 0.1794 0.1668 4.3880 HAR-RV for BV 0.1702 0.1949 0.1843 4.7003 Drechsler & Yaron for RV 0.1864 0.1849 0.1712 5.4246 Drechsler & Yaron for BV 0.1981 0.1999 0.1827 5.3333 ARFIMA for RV 0.1615 0.2082 0.1653 4.1739 ARFIMA for BV 0.1690 0.2248 0.1768 4.0472 The table summarizes the estimation results for the slope coefficient when the misspecified unbalanced and endogenous regression (13) is evaluated by OLS. As regressors, we use six different estimates for EtP(QVt,t+τ), which are described in Section 2.1. SE denotes the usual standard error of the estimates that is not robust to heteroskedasticity or serial correlation. HAC-SE reports standard errors based on HAC covariance estimation using a Bartlett kernel that are robust to serial correlation and heteroskedasticity. The last column reports a Standardized Coefficient; it measures the expected percentage change in tomorrow’s annualized excess return given a one standard deviation increase in perceived risk today. 3 Estimating the Risk–Return Trade-Off by 2SLS The risk–return trade-off parameter cannot be estimated by an OLS regression of (8) or (13). A possible solution could be to make VPt observable, that is replacing it by an estimate VP̂t, as is commonly done in the literature. However, the model uncertainty and estimation error in VP̂t would directly impact the OLS estimator b^, implying that the estimation of the risk–return trade-off with this approach would be prone to error. Instead, we suggest to resolve the problems of the OLS regression by estimating (8) by a 2SLS approach. Assume that the researcher has access to a valid and relevant I(0) instrument; that is a variable that is strongly correlated with VPt, but not with the variance EtP(QVt,t+τ) or the innovation ξt+1. Theorem 3 summarizes the asymptotic properties of the 2SLS estimates of (8). The proof of Theorem 3 can be found in Appendix B. Theorem 3 Let VPt, VIXt2, rt(e), and EtP(QVt,t+τ) be generated by (1)–(4). Assume there exist K instruments qk,t=ρkVPt+υk,t, k=1,2,…,K, (16)where ρk≠0 ∀k. Define Q≡(11…1q1,1q1,2…q1,T−1⋮⋱⋱⋮qK,1qK,2…qK,T−1)′. (17)Estimate Regression (8) by 2SLS using qk,t as instruments. The 2SLS estimate is given by b̂2SLS≡(a^, b^)′=(X′Q[Q′Q]−1Q′X)−1(X′Q[Q′Q]−1Q′y). (18) 1. If β = 0 a^→PαT1/2b^→DN(0,σξ2(σVP2∑k=1Kρk2συk2+1)σVP4∑k=1Kρk2συk2)T−1/2ta→Pασξtb→DN(0,1), In addition, s2→Pσξ2, where s2=T−1∑t=2Te^t2. ta=a^/Var(a^) and tb=b^/Var(b^) denote the t-statistics, where Var(b̂2SLS)=s2(X′Q[Q′Q]−1Q′X)−1. 2. If β≠0 a^→Pαb^→PβT−1/2ta→Pα(σξ2+β2σP2)1/2T−1/2tb→Pβ(σVP4∑k=1Kρk2συk2(σξ2+β2σP2)(σVP2∑k=1Kρk2συk2+1))1/2, where s2→Pσξ2+β2σP2. Theorem 3 shows that in the absence of a risk–return trade-off, the 2SLS estimator b^ converges to a normal distribution with zero mean at the standard rate T−1/2. More importantly, Theorem 3 demonstrates that 2SLS estimation results in a consistent estimator for β. In finite sample simulations the average relative bias, b^/β, is very small, bound between 1 and 1.05 across the set of chosen parameter values. Intuitively, the 2SLS approach to estimation works since first the use of a relevant but exogenous instrument resolves the endogeneity issue. Secondly, in computing the 2SLS estimator we multiply the I(d) regressor with an I(0) instrument. Lemma 1 in Appendix A shows that this multiplication destroys the long memory and the resulting process has standard I(0) dynamics. Hence, under the maintained assumption that the DGP follows (1)–(4), the predictive power of the latent variable VPt on rt(e) can be correctly estimated if the researcher finds a relevant and valid I(0) instrument as in (16). Theorem 3 further implies that the statistical significance of β can be correctly inferred from a simple t-test. Under the null hypothesis that H0:β=0 the t-statistic of the 2SLS estimate b^, tb, converges to a standard normal distribution. Simulations under the null hypothesis that H0:β=0 show that the size of the test is close to the nominal level of 5%, albeit marginally undersized for very small T. The statistic tb diverges at rate T1/2 under H1:β≠0. The finite sample power of the t-test is very close to 100% across the scenarios that we consider in the simulations. The researcher will thus be very likely to detect predictability and the risk–return trade-off, if it is present. Inspired by the results in Theorem 3, we identify a set of I(0) instruments for 2SLS estimation in our data. The existing literature provides substantial evidence that there is a linear long-run relation between RV˜t and VIXt2 that is I(0) (see, for instance, Bandi and Perron, 2006 and Christensen and Nielsen, 2006).11 Furthermore, if the fractional cointegrating vector is equal to [−1, 1]′, then the resulting cointegrating series corresponds to the monthly ex-post realized variance risk premium, VRPt, as defined by Bollerslev, Tauchen, and Zhou (2009).12 The latter argue that VRPt may be viewed as bet on pure volatility; as such we expect the measure to be closely linked to risk VPt. Bollerslev, Tauchen, and Zhou (2009) and Bollerslev et al. (2013) also present evidence that VRPt can predict aggregate market returns, which is further motivation for considering the measure to be a relevant instrument. We further expect there to be a long-run relation between RV˜t and BV˜t, as both series capture the monthly variation in stock returns over the past month. Following the arguments in Barndorff-Nielsen and Shephard (2004), Andersen, Bollerslev, and Diebold (2007), and Huang and Tauchen (2005), the cointegrating relation between RV˜t and BV˜t represents the contribution of price jumps to the variance if the cointegrating vector is equal to [1, −1]′. For instance, Andersen, Bollerslev, and Diebold (2007) find that the jump component exhibits a much lower degree of persistence than the two series RV˜t and BV˜t, providing evidence for a fractional cointegration relation. Jumps are closely related to VPt; for instance, Bollerslev and Todorov (2011) demonstrate the VP can be decomposed into a diffusive part and a discontinuous (jump) element. Therefore, we anticipate jumps to be a relevant instrument for risk. We investigate the potential cointegration relation by a restricted version of the co-fractional vector autoregressive model of Johansen (2008, 2009) and Johansen and Nielsen (2012), given by ΔdX˜t=ϕ[θ′(1−Δd)X˜t]+∑i=1nΓiΔd(1−Δd)iX˜t+wt, (19) where X˜t≡[RV˜t, BV˜t, VIXt2]′. We rely on model (19) because it allows us to identify a cointegration relation between the variables. At the same time, we can explicitly account for possible dynamics at higher frequencies, which may be present due to the overlapping nature of RV˜t and BV˜t.13 Given the identification problems of the model (see Carlini and Santucci de Magistris, 2013), we initially fix the cointegration rank r = 2 and estimate (19) by restricted maximum likelihood. Subsequently, we test for cointegration. For d^=0.38 (SE (d^) = 0.03) and n = 3 we find the two instruments qt=(q1,tq2,t)=θ^′X˜t=(1−1.070−1.0701)X˜t. (20) The restrictions that θ(2,1)=−1 and θ(1,2)=−1 are rejected with an LR statistic of 19.99, which implies that the parameters θ are very precisely estimated. While statistically different, numerically q2,t is very close to the ex-post realized variance risk premium VRPt of Bollerslev, Tauchen, and Zhou (2009). Similarly, q1,t differs only very marginally from the pure jump contribution, that is the squared jump sizes over the past month. Table 3 lists the outcomes of the 2SLS estimation of Regression (8), using q1,t and q2,t from (20) as instruments. If we predict rt+1(e) by VIXt2 using the two identified instruments and 2SLS estimation, we obtain a statistically significant slope estimate of b^=1.93. This estimate is more than seven times larger than the corresponding inconsistent OLS estimate. With the 2SLS approach, we find that for a unit increase in risk, investors demand an approximate 2% annual increase in the equity premium. An increase in risk by one VIX-standard deviation (=47.36) implies a 91.21% increase in tomorrow’s annualized predicted excess returns. That is, the equity premium almost doubles in reaction to such large changes in risk. Our results lend strong support to the new theories of risk–return trade-off, in that the VP captures risk and this risk is priced in aggregate markets resulting in sizable equity premiums. The implied trade-off between risk and return is positive. 4 Robustness The results in the previous section hinge on the adequacy of the assumptions made in the DGP. We review these assumptions here, and present several robustness checks. For the findings to hold, it is necessary that the instruments qk,t are not irrelevant. Simulations show that estimating (8) by 2SLS with an irrelevant instrument leads to an inconsistent and inefficient estimator of β. To avoid such an outcome, we suggest a simple testing procedure. Assume that the researcher has identified a candidate instrument with DGP (16). As VPt is unobserved the researcher cannot simply regress the instrument on VPt to conduct inference on the value of ρk and thus on the instrument relevance. Instead, qk,t can be regressed on the observed VIXt2 by OLS. Theorem 1 shows that the slope coefficient of this regression is an inconsistent estimate of ρk, yet valid statistical inference can be carried out. Thus, relying on a simple OLS t-test the researcher can infer whether the instrument is statistically irrelevant Applying this approach to the two instruments identified in Section 3, we find no reason for concern. Regressing q1,t on VIXt2, the corresponding t-statistic, tρ^1, is equal to 4.49. The jump instrument is thus a relevant instrument. Carrying out the same analysis for q2,t, we find the respective value for tρ^2 to be equal to 26.56, suggesting that the variance risk premium instrument is also strongly relevant. Besides being relevant, the instruments qk,t need to be valid. For an instrument to be valid, it may not be correlated with the residuals of the 2SLS regression of Equation (8), et+1. This implies that it may neither correlate with EtP(QVt,t+τ) nor with ξt+1. In simulations we generate an instrument with innovations υk,t=κkEtP(QVt,t+τ)+νt, where νt is an i.i.d. sequence. If κk≠0, this instrument is invalid as it violates the former assumption. We find that relying on such an invalid instrument leads to the same outcome as when estimating Regression (8) by simple OLS, that is standard inference is valid, but the risk–return trade-off parameter estimator is inconsistent. Alternatively, consider an instrument that violates the latter assumption, that is it is linearly related to unexpected returns ξt+1. In practice, using such an invalid instrument should be avoided at all costs. From our simulations, we conclude that the size of a t-test on the significance of the risk–return trade-off coefficient is approximately 100%. The power of the test is also close to 100% in most instances, yet in extreme cases it may drop down to as low as 31.42%. The estimation of (8) by 2SLS is also strongly inconsistent. A common approach to test for the validity of an instrument is to rely on Sargan’s J test (Sargan, 1958). Theorem 4 summarizes the asymptotic behavior of the J test for our DGP (1)–(4). Theorem 4 Let VPt, VIXt2, rt(e), and EtP(QVt,t+τ) be generated by (1)–(4). Assume there exist K instruments, generated by (16). Estimate the following second-stage regression by OLS ê=Qϖ+v, (21)where ê is the vector of regression residuals from estimating Equation (8) by 2SLS. ϖ is a (K+1) OLS coefficient vector and v is a vector of innovations. Compute the uncentered R2 of Regression (21) as Ru2=1−v̂′v̂ê′ê. Define a test statistic for the validity of the instruments as J≡TRu2. (22) J→D∑j=1K−1λjχj2(1),where χj2(1) are K – 1 independent χ2(1) distributed random variables. The weights λj are the eigenvalues of the (K × K) matrix A1/2MA1/2′, which are defined in Appendix C in Equations (E6) and (E13), respectively. A proof of Theorem 4 can be found in Appendix C. The theorem shows that even though the true predictor VPt is not observable, we can still test whether qk,t is a valid instrument. The statistical inference on the J-statistic can be based on simulated p-values, following the approach suggested in Jagannathan and Wang (1996). For the two instruments that we identified for our data set in Section 3, the implied J-statistic in Table 3 is equal to 1.85. The corresponding simulated p-value is 0.44. We thus strongly fail to reject the null hypothesis and conclude that the jump instrument and the variance risk premium instrument are valid. The adequacy of the proposed 2SLS approach further hinges on the assumption that the instruments are I(0). In practice, it is fairly straightforward for the researcher to verify this condition. For example, the integration order of the instruments can be estimated by relying on the semiparametric approaches in Section 1, or one can rely on hypothesis tests such as for example the KPSS test (Kwiatkowski et al., 1992). Since we identified our instruments in Section 3 by the co-fractional model, we can simply check Johansen’s (2008) three conditions for qt=θ′X˜t∼I(0). We confirm that the cointegration rank r is smaller than 3 (LR = 2.76), |ϕ⊥′(I3×3−∑i=1nΓi)θ⊥|=−1.57≠0, and the roots c of the characteristic polynomial |(1−c)I3×3−ϕθ′c−(1−c)∑i=1nΓici|=0 lie outside the complex disk Cd. Hence, qt are integrated of order zero. Our DGP further presumes that the VP is the only predictor of excess returns. This may be a rather stylized representation, since the extant literature suggests that other factors, such as for example the dividend–price ratio or the cay factor (see Lettau and Ludvigson, 2001) offer some return predictability. If there are such omitted factors, they are part of the error term ξt+1 in (3). As a result, ξt+1 may be serially correlated. Our derivations in Appendix D show that as long as ξt+1 remains independent of VPt and admits a linear representation with one-summable coefficients, our 2SLS estimation results continue to hold. The estimator for the risk–return trade-off is consistent, and standard inference can be carried out. Only the standard errors need to be adjusted to allow for serial correlation. Table 3 reports the robust HAC standard errors for our data. The risk–return trade-off remains significant. Alternatively, it is perceivable that the potentially omitted variables in ξt+1 are correlated with VPt. In this situation, the reported 2SLS estimates may be biased. Note that the J-test from Theorem 4 may be viewed as a test of the joint null hypothesis that VPt is orthogonal to EtP(QVt,t+τ), that VPt is orthogonal to unexpected returns ξt+1, that υk,t is orthogonal to EtP(QVt,t+τ), and that υk,t is orthogonal to ξt+1, which we fail to reject for our data with a p-value of 0.44. Appendix D shows that the results of the J-test continue to hold, even if ξt+1 is not i.i.d. Thus, we find no evidence in the data that potentially omitted variables affect our results. 5 Long-Horizon Return Predictability If the relation between excess returns and the lagged VP in (3) holds for daily data, as our results so far suggest, we would expect it to hold also for longer horizon returns. That is, we can assume rt+h(e)=αh+βhVPt+ξt+h. (23) We find consistent estimates for parameters of this long-run relation by estimating the regression rt+h(e)=ah+bhVIXt2+et+h by our proposed 2SLS approach, relying on the instruments q1,t and q2,t in (20). We measure cumulative returns rt+h(e) by 1h∑i=1hrt+i(e), where rt+i(e) are the log excess returns defined in (7). Given the overlapping nature of the cumulative returns, inference is based on Hansen and Hodrick standard errors, as is common in the literature (see, e.g., Campbell, Lo, and MacKinlay, 1997). Figure 1a plots the estimated prediction coefficient b^h. The estimate shows a steady decline from the initial value of 1.93 as the horizon h increases. For all horizons of up to 126 days, that is 6 months, the coefficient remains statistically different from zero at a significance level of 5%. For horizons one month (h = 21), 3 months, and 6 months, respectively, we find b^h = 0.57, 0.42, and 0.31. These numbers are qualitatively very similar to Bollerslev, Tauchen, and Zhou (2009), albeit somewhat larger for h = 21. Thus, for a unit increase in today’s risk, the investors demand an immediate increase in tomorrow’s equity premium of roughly 2%, but the effect of the same increase on the equity premium a month later is only 30% of this number; 3 months later it is merely 22%, and 6 months from now only 16% of the initial increase. We conclude that long-run excess return expectations are not strongly impacted by shocks to VPt, but the impact is nevertheless statistically significant. Figure 1. View largeDownload slide The figure plots the estimated risk–return trade-off parameter b^h over different horizons measured in days. We estimate the unbalanced misspecified and endogenous predictive regression for cumulative returns by 2SLS, using the instruments qt. The dashed lines represent 95% confidence intervals. Figure 1. View largeDownload slide The figure plots the estimated risk–return trade-off parameter b^h over different horizons measured in days. We estimate the unbalanced misspecified and endogenous predictive regression for cumulative returns by 2SLS, using the instruments qt. The dashed lines represent 95% confidence intervals. We further empirically investigate relation (23) in relatively tranquil periods compared to turbulent times. To that end, we include a dummy variable in the 2SLS regression to capture the Financial Crisis from February 27, 2007 to March 2, 2009.14 First we look at the estimated risk–return trade-off parameter in “normal” periods, where most likely overall market risk is lower. Compared with the entire sample period, the estimated coefficient drops significantly initially, but it decays slower over horizons, as Figure 1b shows. The estimated effect remains small and statistically significant for all horizons from 1 day to 6 months. A possible interpretation of these findings is that markets are generally less nervous during “normal” times. Investors do not demand an immediate high compensation in tomorrow’s returns for higher levels of risk, but such a shock does lead investors of all horizons, up to half a year, to require a modest increase in the equity premium. Conversely, as Figure 1c shows, in crisis periods investors react in the opposite way. Short-term estimates b^h are very large and significant up to the horizon of roughly 1 month. That is, for the same increase in risk as during “normal” periods, the immediately required equity premium in turbulent times is much larger, for instance equal to 3.54 for the next day. The effect tapers off relatively quickly, however, becoming insignificant for horizons longer than 1 month and shorter than approximately 3.5 months. In the long run for h > 75, the effect is very small but significant. The implied predictability of excess returns at different horizons h is equal to Rh2=b^h2σ^VP2σ^r,h2, (24) where σ^r,h2 is the sample variance of cumulative returns and σ^VP2 is the sample variance of the VP. As VPt is latent, σ^VP2 cannot be computed. Nevertheless, we can gauge how predictability evolves over different horizons for hypothetical values of σ^VP2. Figure 2 summarizes the behavior. We find that predictability increases (not entirely monotonically) from horizons of 1 day to h = 82 days, and decreases thereafter, showing a hump-shaped pattern. The initial increase in predictive power is also found by Drechsler and Yaron (2011) for the first 3 months. Interestingly, we find that for any value of σ^VP2, the predictability is maximized at almost exactly 4 months. This is in line with the findings for the U.S. market in Bollerslev, Tauchen, and Zhou (2009), as well as the international evidence provided by Bollerslev et al. (2014). We conclude that the VP is a good predictor for the equity premium at short and intermediate horizons, but for long-horizon returns its predictive power decays. Figure 2. View largeDownload slide The figure plots the implied predictability of the 2SLS regression over different horizons h. The R2 changes with different hypothetical values considered for the sample standard deviation of the latent variance premium, σ^VP. Figure 2. View largeDownload slide The figure plots the implied predictability of the 2SLS regression over different horizons h. The R2 changes with different hypothetical values considered for the sample standard deviation of the latent variance premium, σ^VP. 5.1 Comparative Predictability Much empirical work has been dedicated to the analysis of the return predictability implied by VPt. To the best of our knowledge, the methodology of previous work differs substantially from the methods proposed in this article. It is common in the literature to first obtain EtP(QVt,t+τ)̂ and then obtain an estimate for the VP by subtracting it from VIXt2. This estimated variance premium, VP̂t, is used as a lagged predictor in a return regression, typically estimated by OLS. Naturally, it is to be expected that the estimation error and the model uncertainty inherent in VP̂t will impact the OLS estimate for the risk–return trade-off. Relying on the proposed 2SLS approach instead, we can avoid this problem. We rely on the same estimates E1P(QV1,1+τ)̂ as in Section 2.1 to construct VP̂t and estimate a predictive return regression by OLS. Assuming that the sample variance of the true latent VPt is approximately equal to the sample variance of the estimate VP̂t, that is σ^VP≈σ^VP̂, we can compute a relative R2 measure as RPh=Rh,2SLS2Rh,OLS2=b^h,2SLS2σ^VP2/σ^r,h2b^h,OLS2σ^VP̂2/σ^r,h2≈b^h,2SLS2b^h,OLS2. (25) The measure RPh is plotted in Figure 3 for different horizons h. The proposed 2SLS estimation approach outperforms the competing models in the sense that it implies a stronger return predictability in sample. At almost all horizons the ARFIMA models imply the lowest comparative R2, resulting in a maximum RPh = 39.19 at the horizon of roughly 3 months. Somewhat more predictability is implied when variance expectations are derived from the Drechsler and Yaron (2011) regression, but still substantially less than the 2SLS method. The HAR-RV model for most horizons performs better than the two previous approaches, nevertheless still producing RPh measures that vary from 1.48 to 12.06. Overall, relying on the BVt measure relative to the RVt measure for return variances results in a lower predictive power. Investigating the statistical significance of the slope estimate b^h,OLS, we confirm that none of these models produce an estimate VP̂t that significantly predicts returns at a 5% level, apart from the initial 1 to 7 days. Figure 3. View largeDownload slide The figure plots the relative predictive R2, RPh, for different models. The numerator of the ratio is the squared slope estimate from the 2SLS regression. The denominator is the squared slope estimate from an OLS regression, where the latent VPt is replaced by different estimates. The y-axis has a logarithmic scale. Figure 3. View largeDownload slide The figure plots the relative predictive R2, RPh, for different models. The numerator of the ratio is the squared slope estimate from the 2SLS regression. The denominator is the squared slope estimate from an OLS regression, where the latent VPt is replaced by different estimates. The y-axis has a logarithmic scale. The only OLS approach that has a predictive power that comes close to the 2SLS method, especially in the long run of approximately half a year, is when VP̂t is the result of a martingale model for realized variances. Nevertheless, the 2SLS method still produces a 38% increase in the fit relative to the martingale model for RVt at a short horizon of h = 8, and a 23% increase at the 4 month horizon (h = 82). The martingale models are also the only competing models that result in a statistically significant slope estimate for all horizons from 1 to 126 days. This is an interesting finding given the criticism that the model does not represent variance dynamics well. However, the martingale is the only model that produces a proxy for EtP(QVt,t+τ) without estimation. We conclude therefore that model uncertainty does not strongly impact the discovery of predictability. On the other hand, the estimation error that is contained in all of the other competing models seems to impact predictability regressions rather severely. 5.2 Out-of-Sample Predictability Our results so far suggest that our 2SLS approach produces more precise estimates for the risk–return trade-off parameter β, resulting in a better in-sample fit relative to the traditional approaches from the literature. We have shown that these estimates do not require observations on risk VPt. However, to generate out-of-sample (OOS) forecasts for future excess returns, we need an observable measure for VPt. We compute cumulative return forecasts as r^TIS+h(e)=a^h+b^hVP̂TIS, where TIS is the number of in-sample observations. Estimates a^h and b^h are obtained by in-sample cumulative return 2SLS regression on VIXt2, relying only on data up to TIS. VP̂TIS is the variance premium proxy resulting from one of the models described in Section 5.1. Obviously, our “clean” estimate for the risk–return trade-off parameter then scales not only the true latent variance premium VPt, but also the estimation and model error inherent in the proxy VP̂t. For the first h-step ahead prediction, we consider the trough of the Financial Crisis on March 2, 2009 as the end of the in-sample period. The remaining OOS forecasts are produced with a rolling-window approach. We first evaluate the OOS predictions in terms of efficiency, which we measure by the root mean squared error (RMSE). For the majority of the 126 horizons, the lowest RMSE is achieved when VP̂TIS=VIXTIS2−BV˜TIS. The RMSE ranges anywhere from 17.65 for h = 1 to 1.49 for h = 126, continuously decreasing as h increases. To put these numbers into perspective, we contrast these findings to the RMSE from a historical mean model as in Welch and Goyal (2008). We always achieve a higher efficiency; the reduction in RMSE relative to the historical mean model is between 1% and 8.5%, where the lowest gain is at h = 1 and the largest at 81 days, which again corresponds to a horizon of roughly 4 months. We compare these forecasts, where VP̂TIS=VIXTIS2−BV˜TIS, to return predictions from the traditional OLS approach. More precisely, we produce a competing set of OOS forecasts, where the estimates a^h and b^h are obtained by in-sample OLS estimation, replacing VPt by a proxy VP̂t. Figure 4 shows that the suggested 2SLS approach leads to a forecasting efficiency gain at almost all horizons. The gain relative to all OLS models is again maximized at a horizon of 4 months. At h = 81 days, our approach leads to a reduction in RMSE of 11% relative to the OLS model with VP̂t resulting from the Drechsler and Yaron (2011) model for BVt. Just like in the in-sample analysis, the only serious competitors from the OLS models are the martingale models. For few intermediate and the very long horizons, we find a very small improvement in RMSE from the latter two models. Investigating this further, we find that for these horizons the OLS models result in a lower average forecast error, but higher forecast uncertainty. Figure 4. View largeDownload slide The figure plots the percentage difference in OOS forecasting efficiency for different forecasting horizons. RMSE2SLS is the root mean squared error resulting from predictions using the proposed 2SLS approach and replacing the unknown EtP(QVt,t+τ) by BV˜t in the forecast. RMSEOLS is the same measure for forecasts from the OLS predictions using different estimates for the unobserved VPt. Figure 4. View largeDownload slide The figure plots the percentage difference in OOS forecasting efficiency for different forecasting horizons. RMSE2SLS is the root mean squared error resulting from predictions using the proposed 2SLS approach and replacing the unknown EtP(QVt,t+τ) by BV˜t in the forecast. RMSEOLS is the same measure for forecasts from the OLS predictions using different estimates for the unobserved VPt. As a last step we analyze how much OOS predictability the models imply. That is, how much variation does the forecast produce relative to the variation of cumulative returns? Figure 5 plots the OOS R2, ROOS2=Var(r^TIS+h(e))/Var(rTIS+h(e)). We observe that the models where the risk–return trade-off parameter is estimated by in-sample 2SLS produce more volatility in the forecasts, which is necessary to match the variation in realized cumulative returns. For most horizons, the maximal forecast variation is implied when VP̂t follows from the ARFIMA models and β is estimated by 2SLS; this is closely followed by the HAR-RV and 2SLS estimation. Thus, on the one hand models that presumably fit the dynamics of realized variances best, generate estimates for VPt that best match the variation in returns. On the other hand, these ARFIMA models for VPt combined with in-sample OLS estimation of β have the worst OOS fit. Hence, in the OLS framework the large predictor volatility, which is needed to produce sufficient variation in the forecast, at the same time harms the in-sample estimation of the risk–return trade-off parameter severely. This can also be seen by looking at the martingale models. With 2SLS in-sample estimation they produce rather little forecast variation, meaning that VP̂t is not volatile enough. Yet, the small variation in VP̂t leads to the relatively most accurate OLS estimates of β. As before, we conclude that the estimation error strongly biases the OLS estimation of the risk–return trade-off, and that the proposed 2SLS estimation approach can help alleviate these shortcomings. Figure 5. View largeDownload slide The figure plots the OOS R-squared, ROOS2=Var(r^TIS+h(e))/Var(rTIS+h(e)). The solid lines refer to forecasts that use the proposed 2SLS approach for in-sample estimation; the out-of-sample prediction is made by multiplying the slope estimate by different estimates for VPt. The dashed lines refer to forecasts that use the standard OLS approach for in-sample estimation; the out-of-sample prediction is made by multiplying the slope estimate by different estimates for VPt. Figure 5. View largeDownload slide The figure plots the OOS R-squared, ROOS2=Var(r^TIS+h(e))/Var(rTIS+h(e)). The solid lines refer to forecasts that use the proposed 2SLS approach for in-sample estimation; the out-of-sample prediction is made by multiplying the slope estimate by different estimates for VPt. The dashed lines refer to forecasts that use the standard OLS approach for in-sample estimation; the out-of-sample prediction is made by multiplying the slope estimate by different estimates for VPt. 6 Risk Aversion or Economic Uncertainty Up to this point, we simply referred to VP as risk. As mentioned in the introduction, there are disagreeing views in the literature on whether the VP relates to economic uncertainty or risk aversion. The long-run risk type models of Drechsler and Yaron (2011) and Bollerslev, Tauchen, and Zhou (2009) and Bollerslev, Tauchen, and Sizova (2012) imply that VPt is intrinsically linked to certain components of economic uncertainty. More precisely, the latter show that VPt is a function of the variance of uncertainty (vol-of-vol). Conversely, in Drechsler and Yaron (2011), VPt is driven by rare but potentially large jumps in economic uncertainty (or rather the jump intensity). In both models, representative agents with recursive utility are assumed to have a strong preference for an early resolution of uncertainty and thus dislike increases in time-varying economic uncertainty. These two assumptions are necessary to produce a positive time-varying VP. In contrast, within an external-habit type framework, Bekaert and Engstrom (2017) show that VPt is linked to aggregate risk aversion. More precisely, they model consumption growth as being driven by good and bad shocks and an increase in the relative importance of the former (latter) shocks decreases (increases) the risk aversion. This time-varying importance of different shocks is also what generates the positive time-varying VP. In a similar spirit, Bekaert and Hoerova (2016) assume that the stock-return distribution has three different states: good, bad, and crash. In their model, an increase in risk aversion implies a higher weight on the crash state, which in turn leads to an increase in the VP. We contribute to the discussion by correlating our latent measure for VPt with popular indicators from both fields: economic uncertainty and risk aversion. Strictly speaking, to find evidence in favor of the long-run risk model implications, we should relate VP to the volatility or the jump component of the economic uncertainty, instead of the level of the process. These components are not observable, however. Yet, if VPt is positively linearly related to economic uncertainty, it also positively covaries with the jump intensity in Drechsler and Yaron (2011).15 15 This is true since Drechsler and Yaron (2011) focus their attention on the case where the jump intensity is affine in economic uncertainty. The loading factor, l1,σ, is set to be positive (see their Table 5). Table 5. Pseudo correlation measure zt PCorr̂(VPt,zt) Confidence interval T Start End Different estimates VP̂t (see Section 5.1) VP̂t: Martingale for RV 0.0513*** [0.0356, 0.1452] 3622 2/3/2000 6/30/2014 VP̂t: Martingale for BV 0.0496*** [0.0298, 0.1430] 3622 2/3/2000 6/30/2014 VP̂t: HAR-RV for RV 0.0343*** [0.0210, 0.1148] 3621 2/3/2000 6/27/2014 VP̂t: HAR-RV for BV 0.0325*** [0.0184, 0.1081] 3621 2/3/2000 6/27/2014 VP̂t: Drechsler & Yaron for RV 0.0345*** [0.0234, 0.1160] 3622 2/3/2000 6/30/2014 VP̂t: Drechsler & Yaron for BV 0.0328*** [0.0218, 0.1122] 3622 2/3/2000 6/30/2014 VP̂t: ARFIMA for RV 0.0296*** [0.0189, 0.1037] 3621 2/3/2000 6/27/2014 VP̂t: ARFIMA for BV 0.0286*** [0.0171, 0.1048] 3621 2/3/2000 6/27/2014 Different popular measures for economic uncertainty EMEUI 0.0137*** [0.0048, 0.0498] 3622 2/3/2000 6/30/2014 EPU 0.0145** [0.0004, 0.0464] 3622 2/3/2000 6/30/2014 CVCFNAI 0.0174 [−0.0318, 0.0638] 174 2/3/2000 6/30/2014 UC 0.0282* [−0.0039, 0.0866] 128 2/3/2000 8/31/2010 MUS(1) 0.0149** [0.0020, 0.0610] 173 2/15/2000 6/16/2014 MUS(3) 0.0146** [0.0018, 0.0564] 173 2/15/2000 6/16/2014 MUS(12) 0.0133* [−0.0009, 0.0551] 173 2/15/2000 6/16/2014 Different popular measures for risk aversion STLFSI 0.0095** [0.0035, 0.0407] 752 2/4/2000 6/27/2014 GFSI 0.0166** [0.0054, 0.0695] 3622 2/3/2000 6/30/2014 SSICCONF −0.0037 [−0.0401, 0.0281] 174 2/3/2000 6/30/2014 RAECB 0.0197*** [0.0094, 0.0727] 3622 2/3/2000 6/30/2014 CSRAI 0.0007 [−0.0175, 0.0174] 3491 2/3/2000 6/30/2014 SCGRRAI 0.0020 [−0.0179, 0.0222] 2933 11/1/2002 6/30/2014 WPRAI 0.0087*** [0.0052, 0.0375] 3502 2/3/2000 6/30/2014 WPFSI −0.0017 [−0.0174, 0.0138] 3502 2/3/2000 6/30/2014 RA −0.0246 [−0.0672, 0.1869] 128 2/3/2000 8/31/2010 zt PCorr̂(VPt,zt) Confidence interval T Start End Different estimates VP̂t (see Section 5.1) VP̂t: Martingale for RV 0.0513*** [0.0356, 0.1452] 3622 2/3/2000 6/30/2014 VP̂t: Martingale for BV 0.0496*** [0.0298, 0.1430] 3622 2/3/2000 6/30/2014 VP̂t: HAR-RV for RV 0.0343*** [0.0210, 0.1148] 3621 2/3/2000 6/27/2014 VP̂t: HAR-RV for BV 0.0325*** [0.0184, 0.1081] 3621 2/3/2000 6/27/2014 VP̂t: Drechsler & Yaron for RV 0.0345*** [0.0234, 0.1160] 3622 2/3/2000 6/30/2014 VP̂t: Drechsler & Yaron for BV 0.0328*** [0.0218, 0.1122] 3622 2/3/2000 6/30/2014 VP̂t: ARFIMA for RV 0.0296*** [0.0189, 0.1037] 3621 2/3/2000 6/27/2014 VP̂t: ARFIMA for BV 0.0286*** [0.0171, 0.1048] 3621 2/3/2000 6/27/2014 Different popular measures for economic uncertainty EMEUI 0.0137*** [0.0048, 0.0498] 3622 2/3/2000 6/30/2014 EPU 0.0145** [0.0004, 0.0464] 3622 2/3/2000 6/30/2014 CVCFNAI 0.0174 [−0.0318, 0.0638] 174 2/3/2000 6/30/2014 UC 0.0282* [−0.0039, 0.0866] 128 2/3/2000 8/31/2010 MUS(1) 0.0149** [0.0020, 0.0610] 173 2/15/2000 6/16/2014 MUS(3) 0.0146** [0.0018, 0.0564] 173 2/15/2000 6/16/2014 MUS(12) 0.0133* [−0.0009, 0.0551] 173 2/15/2000 6/16/2014 Different popular measures for risk aversion STLFSI 0.0095** [0.0035, 0.0407] 752 2/4/2000 6/27/2014 GFSI 0.0166** [0.0054, 0.0695] 3622 2/3/2000 6/30/2014 SSICCONF −0.0037 [−0.0401, 0.0281] 174 2/3/2000 6/30/2014 RAECB 0.0197*** [0.0094, 0.0727] 3622 2/3/2000 6/30/2014 CSRAI 0.0007 [−0.0175, 0.0174] 3491 2/3/2000 6/30/2014 SCGRRAI 0.0020 [−0.0179, 0.0222] 2933 11/1/2002 6/30/2014 WPRAI 0.0087*** [0.0052, 0.0375] 3502 2/3/2000 6/30/2014 WPFSI −0.0017 [−0.0174, 0.0138] 3502 2/3/2000 6/30/2014 RA −0.0246 [−0.0672, 0.1869] 128 2/3/2000 8/31/2010 The table reports the pseudo correlation between zt and latent VPt as defined in Equation (27). For zt, we first consider several estimates for the variance premium VP̂t, which are discussed in Section 5.1. Next, we let zt denote a number of commonly used indicators for economic uncertainty and risk aversion, all of which are described in Section 6. The latter data is obtained from: EMEUI: from https://fred.stlouisfed.org/series/WLEMUINDXD (accessed 4 April 2018) EPU: “EPUCNUSD Index” from Bloomberg CVCFNAI: GARCH(1,1) prediction on “CFNAI” from https://fred.stlouisfed.org/series/CFNAI (accessed 4 April 2018) UC: “uc”-series from http://mariehoerova.net/ (accessed 4 April 2018) MUS(i): “Macro Uncertainty Series” ( h={1,3,12}) from http://www.columbia.edu/∼sn2294/pub.html (accessed 4 April 2018) STLFSI: from https://fred.stlouisfed.org/series/STLFSI (accessed 4 April 2018) GFSI: “GFSI Index” (BofA Merrill Lynch GFSI) from Bloomberg SSICCONF: “SSICCONF Index” from Bloomberg RAECB: from http://sdw.ecb.europa.eu/quickview.do?SERIES_KEY=280.RDF.D.U2.Z0Z.4F.EC.U2_GRAI.HST (accessed 4 April 2018) CSRAI: “RAIIHRVU Index” (CS Risk Appetite HOLT Relative Value USD Index) from Bloomberg SCGRRAI: “SCGRRAI Index” from Bloomberg WPRAI: “WRAIRISK Index” from Bloomberg WPFSI: “WRAISTRS Index” from Bloomberg RA: “ra”-series from http://mariehoerova.net/ (accessed 4 April 2018) For all risk-aversion and economic-uncertainty series, we merge the series with our daily data set by finding our date that is closest to the date stamp in the respective series. MUS(i) are the only series where an exact date stamp is missing; we match it with the observation in our daily data set that is closest to the 15th day of a month. We report 95% confidence intervals in brackets, obtained from 9999 block-bootstrap samples (the length of a block corresponds roughly to half a year for all series). To bootstrap the VIX-series, we first filter the series by d^=0.3961, create the bootstrap sample, and then apply the inverse filter to the new series. ***, **, * signify that the pseudo correlation is different from zero at a 1%, 5%, and 10% significance level, respectively. Table 5. Pseudo correlation measure zt PCorr̂(VPt,zt) Confidence interval T Start End Different estimates VP̂t (see Section 5.1) VP̂t: Martingale for RV 0.0513*** [0.0356, 0.1452] 3622 2/3/2000 6/30/2014 VP̂t: Martingale for BV 0.0496*** [0.0298, 0.1430] 3622 2/3/2000 6/30/2014 VP̂t: HAR-RV for RV 0.0343*** [0.0210, 0.1148] 3621 2/3/2000 6/27/2014 VP̂t: HAR-RV for BV 0.0325*** [0.0184, 0.1081] 3621 2/3/2000 6/27/2014 VP̂t: Drechsler & Yaron for RV 0.0345*** [0.0234, 0.1160] 3622 2/3/2000 6/30/2014 VP̂t: Drechsler & Yaron for BV 0.0328*** [0.0218, 0.1122] 3622 2/3/2000 6/30/2014 VP̂t: ARFIMA for RV 0.0296*** [0.0189, 0.1037] 3621 2/3/2000 6/27/2014 VP̂t: ARFIMA for BV 0.0286*** [0.0171, 0.1048] 3621 2/3/2000 6/27/2014 Different popular measures for economic uncertainty EMEUI 0.0137*** [0.0048, 0.0498] 3622 2/3/2000 6/30/2014 EPU 0.0145** [0.0004, 0.0464] 3622 2/3/2000 6/30/2014 CVCFNAI 0.0174 [−0.0318, 0.0638] 174 2/3/2000 6/30/2014 UC 0.0282* [−0.0039, 0.0866] 128 2/3/2000 8/31/2010 MUS(1) 0.0149** [0.0020, 0.0610] 173 2/15/2000 6/16/2014 MUS(3) 0.0146** [0.0018, 0.0564] 173 2/15/2000 6/16/2014 MUS(12) 0.0133* [−0.0009, 0.0551] 173 2/15/2000 6/16/2014 Different popular measures for risk aversion STLFSI 0.0095** [0.0035, 0.0407] 752 2/4/2000 6/27/2014 GFSI 0.0166** [0.0054, 0.0695] 3622 2/3/2000 6/30/2014 SSICCONF −0.0037 [−0.0401, 0.0281] 174 2/3/2000 6/30/2014 RAECB 0.0197*** [0.0094, 0.0727] 3622 2/3/2000 6/30/2014 CSRAI 0.0007 [−0.0175, 0.0174] 3491 2/3/2000 6/30/2014 SCGRRAI 0.0020 [−0.0179, 0.0222] 2933 11/1/2002 6/30/2014 WPRAI 0.0087*** [0.0052, 0.0375] 3502 2/3/2000 6/30/2014 WPFSI −0.0017 [−0.0174, 0.0138] 3502 2/3/2000 6/30/2014 RA −0.0246 [−0.0672, 0.1869] 128 2/3/2000 8/31/2010 zt PCorr̂(VPt,zt) Confidence interval T Start End Different estimates VP̂t (see Section 5.1) VP̂t: Martingale for RV 0.0513*** [0.0356, 0.1452] 3622 2/3/2000 6/30/2014 VP̂t: Martingale for BV 0.0496*** [0.0298, 0.1430] 3622 2/3/2000 6/30/2014 VP̂t: HAR-RV for RV 0.0343*** [0.0210, 0.1148] 3621 2/3/2000 6/27/2014 VP̂t: HAR-RV for BV 0.0325*** [0.0184, 0.1081] 3621 2/3/2000 6/27/2014 VP̂t: Drechsler & Yaron for RV 0.0345*** [0.0234, 0.1160] 3622 2/3/2000 6/30/2014 VP̂t: Drechsler & Yaron for BV 0.0328*** [0.0218, 0.1122] 3622 2/3/2000 6/30/2014 VP̂t: ARFIMA for RV 0.0296*** [0.0189, 0.1037] 3621 2/3/2000 6/27/2014 VP̂t: ARFIMA for BV 0.0286*** [0.0171, 0.1048] 3621 2/3/2000 6/27/2014 Different popular measures for economic uncertainty EMEUI 0.0137*** [0.0048, 0.0498] 3622 2/3/2000 6/30/2014 EPU 0.0145** [0.0004, 0.0464] 3622 2/3/2000 6/30/2014 CVCFNAI 0.0174 [−0.0318, 0.0638] 174 2/3/2000 6/30/2014 UC 0.0282* [−0.0039, 0.0866] 128 2/3/2000 8/31/2010 MUS(1) 0.0149** [0.0020, 0.0610] 173 2/15/2000 6/16/2014 MUS(3) 0.0146** [0.0018, 0.0564] 173 2/15/2000 6/16/2014 MUS(12) 0.0133* [−0.0009, 0.0551] 173 2/15/2000 6/16/2014 Different popular measures for risk aversion STLFSI 0.0095** [0.0035, 0.0407] 752 2/4/2000 6/27/2014 GFSI 0.0166** [0.0054, 0.0695] 3622 2/3/2000 6/30/2014 SSICCONF −0.0037 [−0.0401, 0.0281] 174 2/3/2000 6/30/2014 RAECB 0.0197*** [0.0094, 0.0727] 3622 2/3/2000 6/30/2014 CSRAI 0.0007 [−0.0175, 0.0174] 3491 2/3/2000 6/30/2014 SCGRRAI 0.0020 [−0.0179, 0.0222] 2933 11/1/2002 6/30/2014 WPRAI 0.0087*** [0.0052, 0.0375] 3502 2/3/2000 6/30/2014 WPFSI −0.0017 [−0.0174, 0.0138] 3502 2/3/2000 6/30/2014 RA −0.0246 [−0.0672, 0.1869] 128 2/3/2000 8/31/2010 The table reports the pseudo correlation between zt and latent VPt as defined in Equation (27). For zt, we first consider several estimates for the variance premium VP̂t, which are discussed in Section 5.1. Next, we let zt denote a number of commonly used indicators for economic uncertainty and risk aversion, all of which are described in Section 6. The latter data is obtained from: EMEUI: from https://fred.stlouisfed.org/series/WLEMUINDXD (accessed 4 April 2018) EPU: “EPUCNUSD Index” from Bloomberg CVCFNAI: GARCH(1,1) prediction on “CFNAI” from https://fred.stlouisfed.org/series/CFNAI (accessed 4 April 2018) UC: “uc”-series from http://mariehoerova.net/ (accessed 4 April 2018) MUS(i): “Macro Uncertainty Series” ( h={1,3,12}) from http://www.columbia.edu/∼sn2294/pub.html (accessed 4 April 2018) STLFSI: from https://fred.stlouisfed.org/series/STLFSI (accessed 4 April 2018) GFSI: “GFSI Index” (BofA Merrill Lynch GFSI) from Bloomberg SSICCONF: “SSICCONF Index” from Bloomberg RAECB: from http://sdw.ecb.europa.eu/quickview.do?SERIES_KEY=280.RDF.D.U2.Z0Z.4F.EC.U2_GRAI.HST (accessed 4 April 2018) CSRAI: “RAIIHRVU Index” (CS Risk Appetite HOLT Relative Value USD Index) from Bloomberg SCGRRAI: “SCGRRAI Index” from Bloomberg WPRAI: “WRAIRISK Index” from Bloomberg WPFSI: “WRAISTRS Index” from Bloomberg RA: “ra”-series from http://mariehoerova.net/ (accessed 4 April 2018) For all risk-aversion and economic-uncertainty series, we merge the series with our daily data set by finding our date that is closest to the date stamp in the respective series. MUS(i) are the only series where an exact date stamp is missing; we match it with the observation in our daily data set that is closest to the 15th day of a month. We report 95% confidence intervals in brackets, obtained from 9999 block-bootstrap samples (the length of a block corresponds roughly to half a year for all series). To bootstrap the VIX-series, we first filter the series by d^=0.3961, create the bootstrap sample, and then apply the inverse filter to the new series. ***, **, * signify that the pseudo correlation is different from zero at a 1%, 5%, and 10% significance level, respectively. Similarly, if VPt is positively correlated with economic uncertainty, it is also positively related to the vol-of-vol in Bollerslev, Tauchen, and Zhou (2009).16 As VPt is a latent variable in our study, computing a correlation between the VP and various measures, zt, of either risk aversion or economic uncertainty is not straightforward. However, based on our previously reported econometric results, we can define a pseudo correlation measure, PCorr̂(VPt,zt). Its properties are summarized in Theorem 5. Theorem 5 Let VPt, VIXt2, rt(e), and EtP(QVt,t+τ) be generated by (1)–(4). Assume there exist K instruments, generated by (16). Let zt=(1−L)−δςt, (26)where ςt is i.i.d. with mean zero and variance σς2, ςt is independent of υk,t, and δ∈[0,1/2). Compute a pseudo correlation measure as PCorr̂(VPt,zt)=(01)T1/2(X′Q(Q′Q)−1Q′X)−1X′Q(Q′Q)−1Q′z(z′z)−1/2, (27)where z is the (T×1) vector of elements zt. PCorr̂(VPt,zt)→P1σVPCorr(VPt,zt).Under the null hypothesis that zt is independent of VPt, it further holds that as T→∞: TPCorr̂(VPt,zt)→DN(0,1σVP2+1σVP4∑k=1Kρk2συk2). A proof of Theorem 5 can be found in Appendix E. The theorem suggests that we can consistently estimate the scaled correlation between latent VPt and zt.17 The correlation is bounded between −1 and 1; asymptotically our measure is hence bounded within [−1/σVP,1/σVP]. Since the sign of σVP is always positive, the pseudo correlation has the same sign as the correlation itself in large samples, which helps us in interpreting the estimate. The measure does not depend on the measurement units of zt in the limit; thus we can compare the pseudo correlation across different economic and financial series. Under the null hypothesis that VPt and zt are unrelated, it further holds that the pseudo correlation converges to a normal distribution with zero mean at the standard rate T−1/2. However, the asymptotic variance depends on unknown nuisance parameters, which is why we conduct inference based on bootstrap confidence intervals. As a point of reference, we first compute PCorr̂(VPt,zt), where zt are all the commonly used estimates for VPt that we discussed in the previous section. Table 5 reports the findings. All pseudo correlations are positive and strongly statistically significant. We find the highest correlation estimate of 5.13×10−2 for PCorr̂(VPt,VP̂t) if VP̂t results from the martingale model for the realized variance, that is when EtP(QVt,t+τ) is merely replaced by RV˜t. Given our previous findings and the fact that one of our instruments is closely related to this estimate for VPt, this result is not surprising. In what follows, we refer to this value as the benchmark correlation. The lowest pseudo correlation of 2.86×10−2 is found if EtP(QVt,t+τ) is replaced by the ARFIMA model for BVt. We now turn to popular indicator series for economic uncertainty. Bali and Zhou (2016) propose a measure designed to capture the uncertainty in overall economic activity, the conditional variance of the Chicago Fed National Activity Index, CVCFNAI. Positive (negative) values of the index signify that the U.S. economy is growing at a faster (slower) rate relative to its historical trend. As in Bekaert and Hoerova (2016), we compute the conditional variance as a GARCH(1,1) prediction of the index. The pseudo correlation between CVCFNAI and VPt is 1.74×10−2. While it positively correlates with the latent VP, we cannot reject that the correlation is statistically insignificant, however. Whereas CVCFNAI is based on one underlying economic indicator, the macroeconomic uncertainty series MUS(i), where i={1,3,12} months, of Jurado, Ludvigson, and Ng (2015) merge the information of 132 i-period conditional volatilities of mostly macroeconomic indicators. We find PCorr̂(VPt,MUS(i))=1.49×10−2, 1.46×10−2, and 1.33×10−2, respectively. The correlations are positive, of considerable magnitude relative to the benchmark correlation, and statistically significant. The fact that the correlation is strongest for the 1-month series is to be expected, since our VP is the difference between risk neutral and objective expectations of quadratic variation over the next 30 days. Bekaert, Hoerova, and Duca (2013) define a further uncertainty measure, UC, by isolating the objective conditional variance component from the VIX. Thus, this measure is specifically related to the uncertainty in financial markets. We find a fairly high positive correlation between this measure of economic uncertainty and VPt, with PCorr̂(VPt,UC)=2.82×10−2. The estimate is statistically different from zero. Finally, economic uncertainty may also be closely related to the uncertainty about economic policy. The U.S. Economic Policy Uncertainty News-Based Index, EPU, computed by Baker, Bloom, and Davis is based on small and large national newspaper archives. We find a statistically significant positive pseudo correlation of 1.45×10−2. In contrast to EPU, which focuses on the overall economic policy uncertainty, the Equity Market-related Economic Uncertainty Index, EMEUI, computed by the same authors, is based on news pertaining to equity markets. The pseudo correlation between VPt and EMEUI is 1.37×10−2 and is strongly statistically significant. We conclude that we find considerable evidence that the latent measure VPt positively covaries with economic uncertainty. All pseudo correlations are positive and of sizable magnitude relative to the benchmark correlation, amounting to 26–55% of the benchmark correlation. All measures of economic uncertainty, with the one exception of CVCFNAI, have a statistically significant correlation with VPt. Next we look at common indicators for aggregate risk aversion. From Bekaert, Hoerova, and Duca (2013) we rely on their risk aversion series, RA, which is the difference between VIX and their uncertainty component UC. The pseudo correlation is of substantial magnitude, equal to −2.46×10−2, but it has the wrong sign and it is statistically insignificant. The RA series is monthly and ends in August 2010. Interestingly, if we compute the benchmark correlation for this first part of the sample at the same monthly frequency, we also find a negative pseudo correlation. This shows that our latent VPt does not simply replicate the information contained in VP̂t from the martingale model for RVt. Froot and O’Connell (2003) compute the State Street Investor Confidence Index, SSICCONF,18 a measure of investors’ risk tolerance or sentiment. If VPt represents risk aversion, we expect its pseudo correlation with SSICCONF to be negative. The estimate in Table 5 is indeed negative, but its value of −3.70×10−3 is very small relative to the benchmark value and it is not statistically different from zero. The Credit Suisse Risk Appetite Index, CSRAI, the Standard Chartered Risk Appetite Index, SCGRRAI, and the Westpac US Risk Aversion Index, WPFSI, are all examples of practitioners’ indices for risk aversion computed by aggregating information from financial markets. The pseudo correlations with these three series are 7.11×10−4, 1.95×10−3, −1.70×10−3, respectively. Thus, each correlation has the wrong sign, is very small relative to the benchmark, and is statistically insignificant. The Westpac Risk Aversion Index, WPRAI, is another indicator that takes a global perspective based inter alia on movements in major currency exchange rate markets and bond spreads in emerging economies. Here we find the expected positive significant pseudo correlation of 8.70×10−3. Note, however, that the estimate is decidedly small, amounting to only 17% of the benchmark correlation. The pseudo correlation between the Global Risk Aversion Indicator from the European Central Bank, RAECB, and VPt is positive, strongly statistically significant, and of reasonable magnitude equal to 1.97×10−2. Finally, financial market stress has been linked to the concept of risk aversion. In periods of stress, such as the recent Financial Crisis, we tend to observe an increased demand for safe securities. This can be interpreted as a sign that investors are less tolerant towards risk. When correlating VPt with the Global Financial Stress Index from Bank of America Merrill Lynch, GFSI, we find the expected positive estimate that is significant and with a value of 1.66×10−2 of notable magnitude relative to the benchmark. Another indicator for financial stress with a more U.S.-based focus is the St Louis Fed Financial Stress Index, STLFSI. We find a significantly positive, albeit rather small pseudo correlation of 9.45×10−3. Interpreting the latter two pseudo correlations as evidence in favor of the hypothesis that VPt captures risk aversion is not without controversy, however. While linked to risk aversion, financial stress can also be viewed as an indicator of economic uncertainty, or even Knightian uncertainty, as argued among others by Bekaert and Hoerova (2016). To summarize, we do not find strong convincing evidence that our latent VPt captures risk aversion. Most correlation estimates are either very small, statistically insignificant, have the wrong sign, or zt is not an irrefutable measure of risk aversion. The only exception to this rule is the correlation between VPt and RAECB. Our inference based on the pseudo correlation measure in (27) relies on the assumption that zt is integrated of an order d < 1∕2. Although we find no convincing evidence that the economic-uncertainty indices have an integration order greater than 0.5, the outcomes for some of the risk-aversion series are less clear. In particular, CSRAI, WPFSI, and STLFSI seem to have a d≥1/2. For robustness, we also compute PCorr̂(VPt,Δ0.5zt) and PCorr̂(VPt,Δzt) for these measures.19 The estimates hardly change and their values remain very small relative to the benchmark pseudo correlation. The robustness exercise therefore does not alter our general conclusion that VPt seems to be related to economic uncertainty, yet not necessarily to risk aversion. Our analysis thus slightly favors the long-run risk type models of Drechsler and Yaron (2011) and Bollerslev, Tauchen, and Zhou (2009), and Bollerslev, Tauchen, and Sizova (2012). As a caveat, note that our results are not detailed enough to judge which component of economic uncertainty drives the uncovered positive correlation with VP: vol-of-vol or jump intensity. 7 Concluding Remarks This article presents a novel reduced-form DGP that accounts for many theoretical and empirical features of the risk–return trade-off literature, such as the persistence in the observed risk measure and the stationary noise-type behavior of excess aggregate market returns. We argue that if the researcher uses an imperfect measure of the true risk measure VP to gauge the risk–return trade-off empirically, it could result in a misspecified, unbalanced, and endogenous predictive regression. We show that OLS estimation in this setting produces an inconsistent estimator for the trade-off parameter. Nevertheless, standard statistical inference based on t-tests remains valid if VIX is the predictor. To avoid the problem of obtaining an inconsistent estimate for the trade-off coefficient, we propose a 2SLS estimation method. If the econometrician has access to a valid and relevant I(0) instrument, 2SLS estimation results in a consistent estimate for the parameter and standard statistical inference on predictability can be carried out. While we specifically focus on the estimation of the risk–return trade-off parameter, the theoretical developments in this article apply more generally to the prediction literature with persistent imperfect regressors. We believe that we are the first to show that the persistent endogenous predictor problem, where the predictor has long-memory I(d) dynamics, can be readily solved by identifying instruments that only possess short memory, I(0). In particular, whenever the observed predictor may be viewed as the sum of a latent I(0) signal and a latent I(d) noise, we can rely on the proposed 2SLS estimation method. Thus, the 2SLS approach that we suggest is an innovative alternative to filtering, which has been advocated by for example Maynard, Smallwood, and Wohar (2013) and Christensen and Nielsen (2007), and one that eliminates the persistence without requiring exact knowledge of the strength of serial dependence. Intuitively, our method works because the multiplication of the persistent regressor with a less persistent instrument destroys the long memory in the series. Supplementary Data Supplementary data are available at Journal of Financial Econometrics online. Appendix A: Useful Lemma Lemma 1 will prove useful for the derivations of the results in this paper. Lemma 1 Let at and bt be two independent processes given by at=φ(L)ɛt and bt=(1−L)−dηt where φ(L)=∑i=0∞φiLi with ∑i=0∞i|φi|<∞, φ(1)≠0 and (1−L)d=∑i=0∞γiLi with γi=Γ(i+d)/(Γ(d)Γ(i+1)), 0≤d<12 and εt∼i.i.d.(0,σɛ2), ηt∼i.i.d.(0,ση2). Define Zt=atbt; then, T−1/2∑t=1TZt/σ¯→DN(0,1) where σ¯T2:=var[T−1/2∑t=1TZt]→σ¯2 as T→∞. Proof: Let at, bt, and Zt be as above and let Ft be the σ-algebra generated by {ɛt,ηt,ɛt−1,ηt−1,⋯}. Note that, given independence, Zt is a stationary ergodic process and that {Zt,Ft} is an adapted stochastic sequence with E[Zt2]=E[at2bt2]=σa2σb2<∞ where σa2=E[at2], σb2=E[bt2]. The lemma follows from Theorem 5.16 in White (2002), where we prove directly that ∑m=1∞(E[E[Z0|F−m]2])1/2<∞. First note that E[Z0|F−m]2=E[(∑i=0∞φiɛ−i)(∑i=0∞γiη−i)|F−m]2=(∑i=m∞φiɛ−i)2(∑i=m∞γiη−i)2. Thus, ∑m=1∞(E[E[Z0|F−m]2])1/2=∑m=1∞(σɛ2ση2∑i=m∞φi2∑i=m∞γi2)1/2≤∑m=1∞(σɛ2σb2∑i=m∞φi2)1/2 ≤σɛσb∑m=1∞(∑i=m∞|φi|)=σɛσb(∑i=0∞i|φi|)<∞. Note in particular that Lemma 1 proves that multiplying the long-memory process by an I(0) process reduces the order of convergence to the one of a short-memory process. Appendix B: Proofs of Theorems 1, 2, and 3 Throughout the appendices, we rely on the same notation as in the main text. Let (·̂) denote an estimator and introduce the following additional notation. xt=VIXt2yt=EtP(QVt,t+τ)e a (T−1)×1 vector given by [e2, e3 …, eT]′ι(i) the ith unit vectorb̂OLS=[a^OLS, b^OLS]′ and b=[α, β]′n a normally distributed random vector or scalarρ=a K×1 vector given by [ρ1, ρ2 …, ρK]′Συ=a K×K diagonal matrix,where the diagonal elements are equal toσυk2Γj=The autocovariance matrix at lag j. The proofs for Theorems 1, 2, and 3 follow the same overall structure. To conserve space, we show here how to derive the asymptotic results of the 2SLS method, Theorem 3. The proofs for the OLS alternatives, Theorems 1 and 2, are analogous while involving fewer terms. More precisely, by replacing Q by X below, the proof of Theorem 1 is obtained. Similarly, replacing Q and X by X*, and e by e* is necessary for the proof of Theorem 2. Complete step-by-step proofs are available in the Online Appendix. To obtain the 2SLS estimators, along with the associated t-statistics, it is necessary to obtain the limit expression of the sums that define them. These are summarized in Table B.1, along with their respective convergence rates. All of the convergence rates (see the underbraced expressions) can be found in Tsay and Chung (2000) or Hayashi (2000) except for the normalization ratios of ∑VPtyt and ∑ξt+1yt, which follows from Lemma 1. Table B.1. Expressions for sums in Theorem 3 with j≠k; k=1,⋯,K ∑xt = ∑VPt︸Op(T1/2)+∑yt︸Op(Td+1/2) ∑xt2 = ∑VPt2︸Op(T)+∑yt2︸Op(T)+2∑VPtyt︸Op(T1/2) ∑et+1 = −β∑yt+∑ξt+1︸Op(T1/2) ∑et+12 = β2∑yt2+∑ξt+12︸Op(T)−2β∑ξt+1yt︸Op(T1/2) ∑xtet+1 = −β∑yt2−β∑VPtyt+∑ξt+1yt+∑ξt+1VPt︸Op(T1/2) ∑qk,t = ρk∑VPt+∑υk,t︸Op(T1/2) ∑qk,t2 = ρk2∑VPt2+∑υk,t2︸Op(T)+2ρk∑VPtυk,t︸Op(T1/2) ∑qk,tqj,t = ρkρj∑VPt2+ρk∑VPtυj,t+ρj∑VPtυk,t+∑υk,tυj,t︸Op(T1/2) ∑et+1qk,t = −βρk∑VPtyt+ρk∑ξt+1VPt−β∑ytυk,t︸Op(T1/2)+∑ξt+1υk,t︸Op(T1/2) ∑xtqk,t = ρk∑VPt2+∑VPtυk,t+ρk∑VPtyt+∑ytυk,t ∑xt = ∑VPt︸Op(T1/2)+∑yt︸Op(Td+1/2) ∑xt2 = ∑VPt2︸Op(T)+∑yt2︸Op(T)+2∑VPtyt︸Op(T1/2) ∑et+1 = −β∑yt+∑ξt+1︸Op(T1/2) ∑et+12 = β2∑yt2+∑ξt+12︸Op(T)−2β∑ξt+1yt︸Op(T1/2) ∑xtet+1 = −β∑yt2−β∑VPtyt+∑ξt+1yt+∑ξt+1VPt︸Op(T1/2) ∑qk,t = ρk∑VPt+∑υk,t︸Op(T1/2) ∑qk,t2 = ρk2∑VPt2+∑υk,t2︸Op(T)+2ρk∑VPtυk,t︸Op(T1/2) ∑qk,tqj,t = ρkρj∑VPt2+ρk∑VPtυj,t+ρj∑VPtυk,t+∑υk,tυj,t︸Op(T1/2) ∑et+1qk,t = −βρk∑VPtyt+ρk∑ξt+1VPt−β∑ytυk,t︸Op(T1/2)+∑ξt+1υk,t︸Op(T1/2) ∑xtqk,t = ρk∑VPt2+∑VPtυk,t+ρk∑VPtyt+∑ytυk,t Table B.1. Expressions for sums in Theorem 3 with j≠k; k=1,⋯,K ∑xt = ∑VPt︸Op(T1/2)+∑yt︸Op(Td+1/2) ∑xt2 = ∑VPt2︸Op(T)+∑yt2︸Op(T)+2∑VPtyt︸Op(T1/2) ∑et+1 = −β∑yt+∑ξt+1︸Op(T1/2) ∑et+12 = β2∑yt2+∑ξt+12︸Op(T)−2β∑ξt+1yt︸Op(T1/2) ∑xtet+1 = −β∑yt2−β∑VPtyt+∑ξt+1yt+∑ξt+1VPt︸Op(T1/2) ∑qk,t = ρk∑VPt+∑υk,t︸Op(T1/2) ∑qk,t2 = ρk2∑VPt2+∑υk,t2︸Op(T)+2ρk∑VPtυk,t︸Op(T1/2) ∑qk,tqj,t = ρkρj∑VPt2+ρk∑VPtυj,t+ρj∑VPtυk,t+∑υk,tυj,t︸Op(T1/2) ∑et+1qk,t = −βρk∑VPtyt+ρk∑ξt+1VPt−β∑ytυk,t︸Op(T1/2)+∑ξt+1υk,t︸Op(T1/2) ∑xtqk,t = ρk∑VPt2+∑VPtυk,t+ρk∑VPtyt+∑ytυk,t ∑xt = ∑VPt︸Op(T1/2)+∑yt︸Op(Td+1/2) ∑xt2 = ∑VPt2︸Op(T)+∑yt2︸Op(T)+2∑VPtyt︸Op(T1/2) ∑et+1 = −β∑yt+∑ξt+1︸Op(T1/2) ∑et+12 = β2∑yt2+∑ξt+12︸Op(T)−2β∑ξt+1yt︸Op(T1/2) ∑xtet+1 = −β∑yt2−β∑VPtyt+∑ξt+1yt+∑ξt+1VPt︸Op(T1/2) ∑qk,t = ρk∑VPt+∑υk,t︸Op(T1/2) ∑qk,t2 = ρk2∑VPt2+∑υk,t2︸Op(T)+2ρk∑VPtυk,t︸Op(T1/2) ∑qk,tqj,t = ρkρj∑VPt2+ρk∑VPtυj,t+ρj∑VPtυk,t+∑υk,tυj,t︸Op(T1/2) ∑et+1qk,t = −βρk∑VPtyt+ρk∑ξt+1VPt−β∑ytυk,t︸Op(T1/2)+∑ξt+1υk,t︸Op(T1/2) ∑xtqk,t = ρk∑VPt2+∑VPtυk,t+ρk∑VPtyt+∑ytυk,t We start by showing the convergence results for some linear combinations. First we consider the asymptotic properties of (1TQ′Q)−1. plim(1TQ′Q)−1=(plim1TQ′Q)−1=(plim1T[T−1∑q1,t∑q2,t…∑qK,t∑q1,t∑q1,t2∑q1,tq2,t…∑q1,tqK,t∑q2,t∑q2,tq1,t∑q2,t2…∑q2,tqK,t⋮⋮⋮⋱⋮∑qK,t∑qK,tq1,t∑qK,tq2,t…∑qK,t2])−1=[10′0ρρ′σVP2+Συ]−1=[10′0Συ−1−σVP2Συ−1ρρ′Συ−11+σVP2ρ′Συ−1ρ], (B1) where the last step follows by the Sherman–Morrison formula (see, e.g., Hager, 1989). Next, we focus on the dynamics of 1TQ′X. Note that plim1TQ′X=plim1T[T−1∑xt∑q1,t∑q1,txt∑q2,t∑q2,txt⋮⋮∑qK,t∑qK,txt]=[100σVP2ρ]. (B2) Finally, we show the asymptotic behavior of 1TQ′e. 1TQ′e=1T[∑et+1∑q1,tet+1∑q2,tet+1⋮∑qK,tet+1]=[−β1T∑yt+1T∑ξt+1β1T∑q1,tyt+1T∑q1,tξt+1β1T∑q2,tyt+1T∑q2,tξt+1⋮β1T∑qK,tyt+1T∑qK,tξt+1]. (B3) If β≠0, it follows that plim1TQ′e=0′, since all terms are of order Op(Td−1/2) or lower. Hence, the terms converge to their corresponding population moments, where E(et+1)=0 and E(qk,tet+1)=0 for valid instruments. Conversely, if β = 0, we find that 1TQ′e=1T1/2[1T1/2∑ξt+11T1/2∑q1,tξt+11T1/2∑q2,tξt+1⋮1T1/2∑qK,tξt+1]=1T1/2[1T1/2∑ξt+11T1/2∑ρ1VPtξt+1+1T1/2∑υ1,tξt+11T1/2∑ρ2VPtξt+1+1T1/2∑υ2,tξt+1⋮1T1/2∑ρKVPtξt+1+1T1/2∑υK,tξt+1] (B4) The expression (B4) involves the random variables ξt+1 and ξt+1(ρkVPt+υk,t), both of which are strictly stationary and ergodic and fulfill the conditions outlined in Lemma 1. The term thus has a zero mean and a constant variance. Hence, by the CLT in Lemma 1 the term converges in distribution to N(0,∑j=∞∞Γj=σξ2[10′0ρρ′σVP2+Συ]) (B5) at rate T1/2. B.1. Asymptotic Properties of the 2SLS Estimator Note that b̂2SLS−b=(1TX′Q(1TQ′Q)−11TQ′X)−11TX′Q(1TQ′Q)−11TQ′e. Using the results above we find that if β≠0 b̂2SLS−b→P([10′0ρ′σVP2][10′0Συ−1−σVP2Συ−1ρρ′Συ−11+σVP2ρ′Συ−1ρ][100ρσVP2])−1 ×[10′0ρ′σVP2][10′0Συ−1−σVP2Συ−1ρρ′Συ−11+σVP2ρ′Συ−1ρ][00]=[1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ][00]=[00], (B6) and therefore a^2SLS→Pα and b^2SLS→Pβ. Conversely, if β = 0 we find T1/2(b̂2SLS−b)=(1TX′Q(1TQ′Q)−11TQ′X)−11TX′Q(1TQ′Q)−11T1/2Q′e →D([10′0ρ′σVP2][10′0Συ−1−σVP2Συ−1ρρ′Συ−11+σVP2ρ′Συ−1ρ][100ρσVP2])−1 ×[10′0ρ′σVP2][10′0Συ−1−σVP2Συ−1ρρ′Συ−11+σVP2ρ′Συ−1ρ]n=[1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ][10′0σVP2ρ′Συ−11+σVP2ρ′Συ−1ρ]n=[10′0ρ′Συ−1σVP2ρ′Συ−1ρ]n∼N(0,σξ2[1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ]). (B7) B.2. Asymptotic Properties of the Estimator of the Error Variance Note that s2=1Tê′ê=1Te′e−1Te′X(b̂2SLS−b)−(b̂2SLS−b)′1TX′e+(b̂2SLS−b)′1TX′X(b̂2SLS−b). Using the results above and in Appendix B we find that if β≠0 s2→P(β2σP2+σξ2)−([0, −βσP2][00])−([0, 0][0βσP2])+([0, 0][100σVP2+σP2][00])=β2σP2+σξ2. (B8) Conversely, if β = 0 we find that s2→Pσξ2. B.3. Asymptotic Properties of the t-statistics Note that the t-statistic can be written as t(i)=ι′(i)b̂2SLS(s21Tι′(i)(1TX′Q(1TQ′Q)−11TQ′X)−1ι(i))−1/2 for a test of either hypothesis H0:α=0 or H0:β=0. Using the results above we find that if β≠0 T−1/2ta→Pα(σξ2+β2σP2)−1/2([1, 0][1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ][10])−1/2=ασξ2+β2σP2 (B9) T−1/2tb→Pβ(σξ2+β2σP2)−1/2([0, 1][1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ][01])−1/2 =β(σVP4ρ′Συ−1ρ(σξ2+β2σP2)(1+σVP2ρ′Συ−1ρ))1/2. (B10) Conversely, if β = 0 we find that tb→Pα/σξ2, which readily follows from (B9) above. For tb we find that tb=T1/2ι′(2)(b̂2SLS−b)(s2ι′(2)(1TX′Q(1TQ′Q)−11TQ′X)−1ι(2))−1/2→Dn(σVP4ρ′Συ−1ρσξ2(1+σVP2ρ′Συ−1ρ))1/2 ∼N(0,σVP4ρ′Συ−1ρσξ2(1+σVP2ρ′Συ−1ρ)σξ21+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ)=N(0,1). (B11) Appendix C: Proof of Theorem 4 For convenience introduce the following additional notation for Appendices C, D, and E q=a K×1 vector given by[∑q1,t, ∑q2,t, …, ∑qK,t]′ B=a K×K matrix given by [∑q1,t2∑q1,tq2,t…∑q1,tqK,t∑q2,tq1,t∑q2,t2…∑q2,tqK,t⋮⋮⋱⋮∑qK,tq1,t∑qK,tq2,t…∑qK,t2]S=B−qq′/(T−1)∑qx→=a K×1 vector given by [∑q1,txt, ∑q2,txt, …, ∑qK,txt]′∑qe→=a K×1 vector given by [∑q1,tet+1, ∑q2,tet+1, …, ∑qK,tet+1]′γj(a×b)=The autocovariance of a series at×bt at lag j. Recall that the test statistic in Theorem 4 is given by J=T(1−v̂′v̂/ê′ê), where ê=y−Xb̂2SLS and v̂=ê−Qϖ^. Note that we can re-write the J-statistic as follows J=Te^′e^−v^′v^e^′e^=e^′Q(Q′Q)−1Q′e^e^′e^T=e′QLs2[I−L′Q′X(X′QLL′Q′X)−1X′QL]L′Q′es2, (E1) where L is a (K+1)×(K+1) matrix such that LL′=(Q′Q)−1. Since it holds that (Q′Q)−1=[T−1q′qB]−1=[1T−1+1(T−1)2q′S−1q−1T−1q′S−1−1T−1S−1qS−1], we can write L as L=[1(T−1)1/2−1T−1q′S−1/20S−1/2]. Hence, J in (E1) is the squared form of a linear combination of a (K+1)×1 vector and a (K+1)×(K+1) symmetric and idempotent matrix. We note that L′Q′X=[1(T−1)1/20′−1T−1S−1/2qS−1/2][T−1∑xtq∑qx→]=[(T−1)1/2∑xt(T−1)1/20S−1/2(−∑xtT−1q+∑qx→)]. (E2) Using (E2), we can rewrite the idempotent and symmetric matrix in the definition of J in (E1) as follows: I−L′Q′X(X′QLL′Q′X)−1X′QL=[00′0I−S−1/2(−∑xtT−1q+∑qx→)(−∑xtT−1q′+∑qx→′)S−1/2(−∑xtT−1q′+∑qx→′)S−1(−∑xtT−1q+∑qx→)]. (E3) Next we need to find the probability limit of the matrix I−L′Q′X(X′QLL′Q′X)−1X′QL. To that end, note that it holds that plim1TS=plim1TB−plim1T(T−1)qq′=σVP2ρρ′+Συ. (E4) Thus, we find the following probability limit I−L′Q′X(X′QLL′Q′X)−1X′QL →PI−plim1(T−1)1/2L′Q′X plim(1T−1X′QLL′Q′X)−1plim1(T−1)1/2X′QL→PI−[100(σVP2ρρ′+Συ)−1/2ρσVP2][1001+σVP2ρ′Συ−1ρσVP4ρ′Συ−1ρ][10′0σVP2ρ′(σVP2ρρ′+Συ)−1/2]→P[00′0I−(σVP2ρρ′+Συ)−1/2ρ1+σVP2ρ′Συ−1ρρ′Συ−1ρρ′(σVP2ρρ′+Συ)−1/2], (E5) where we have used result (B6) above. Note that the lower-right submatrix M≡I−(σVP2ρρ′+Συ)−1/2ρ1+σVP2ρ′Συ−1ρρ′Συ−1ρρ′(σVP2ρρ′+Συ)−1/2 (E6) in (E5) is a symmetric and idempotent matrix of size K × K. It therefore holds that rank(M)=tr(M)=tr(I)−tr((σVP2ρρ′+Συ)−1/2ρ1+σVP2ρ′Συ−1ρρ′Συ−1ρρ′(σVP2ρρ′+Συ)−1/2)=K−tr(1+σVP2ρ′Συ−1ρρ′Συ−1ρρ′(σVP2ρρ′+Συ)−1ρ)=K−1. (E7) The second part of the proof shows that (L′Q′e)/s2 converges to a normal distribution. Note that L′Q′e=[1(T−1)1/20′−1T−1S−1/2qS−1/2][∑et+1∑qe→]=[1(T−1)1/2∑et+1−(1TS)−1/21T1/2q1T−1∑et+1+(1TS)−1/21T1/2∑qe→]. (E8) Only the lower-right K × K submatrix of I−L′Q′X(X′QLL′Q′X)−1X′QL is nonzero, as Equation (E3) demonstrates. Thus the first scalar element of the vector in (E8), ∑et+1/T−1, cancels out from J in (E1). We focus on the bottom K elements of the vector in (E8). The first part of the sum converges to zero; that is −(1TS)−1/21T1/2q1T−1∑et+1→P0. The second part has an asymptotic normal distribution by the CLT in Lemma 1. That is 1T1/2∑qe→→DN(0,ρρ′β2∑j=−∞∞γj(VP×y)+ρρ′σVP2σξ2+β2σP2Συ+σξ2Συ), (E9) and hence it follows that (1TS)−1/21T1/2∑qe→→D(σVP2ρρ′+Συ)−1/2n ∼N(0,(σξ2+β2σP2)I+β2(σVP2ρρ′+Συ)−1/2ρρ′2∑j=1∞γj(VP×y)(σVP2ρρ′+Συ)−1/2). (E10) Finally, from Section B it follows that s2→P(β2σP2+σξ2)1/2 for any value of β. Hence, the bottom K elements of the vector (L′Q′e)/s2 converge to the normal distribution →DN(0,I+1σξ2+β2σP2β2(σVP2ρρ′+Συ)−1/2ρρ′2∑j=1∞γj(VP×y)(σVP2ρρ′+Συ)−1/2). (E11) J then is asymptotically distributed as follows J→Dn′Mn, (E12) where n has a K−variate normal distribution with asymptotic variance matrix A≡I+1σξ2+β2σP2β2(σVP2ρρ′+Συ)−1/2ρρ′2∑j=1∞γj(VP×y)(σVP2ρρ′+Συ)−1/2. (E13) Let m be the K-dimensional random vector of standard normal distribution. Then n=(A1/2)′m. It follows that J→Dm′A1/2MA1/2′m. (E14) Recall that M has rank K – 1 as shown above. The matrix A1/2MA1/2′ is symmetric and positive definite, and hence also has rank K – 1. Following the arguments in Jagannathan and Wang (1996) we know that A1/2MA1/2′ has K – 1 positive eigenvalues, λ1,λ2,…,λK−1. There exists a diagonal (K × K) matrix Λ=diag(λ1,λ2,…,λK−1,0), and an orthogonal matrix J, such that we can write A1/2MA1/2′=J′ΛJ. (E15) Finally, let o=Jm. Then o is standard normally distributed and hence it follows that J→Do′Λo=∑j=1K−1λjχj2(1), (E16) where χj2(1) are K – 1 independent χ2(1) distributed random variables. The asymptotic distribution of J is unknown, but we can simulated p-values as suggested by Jagannathan and Wang (1996) once the eigenvalues λj are estimated. To that end, we require a consistent estimator of A1/2MA1/2′. Above, we show that the lower right (K×K) submatrix of I−L′Q′X(X′QLL′Q′X)−1X′QL is consistent for M. An estimate for A1/2 is the upper triangular matrix following from a Cholesky decomposition of 1s2(1TS)−1/2Ω̂(1TS)−1/2′, where Ω̂ is a consistent estimator of size (K×K) for the asymptotic variance in (E9). We can compute Ω̂ as the lower right (K×K) submatrix of the usual HAC estimator, that is of the (K+1)×(K+1) estimator Ω˜̂=∑j=−TTκ(jn(T))Γ̂j, where κ is the kernel and n(T) is the bandwidth. We define the autocovariance estimates Γ̂j as Γ̂j=1T[1q1,t⋮qK,t][1q1,t−j…qK,t−j]e^t+1e^t+1−j. (E17) Following the previous derivations above, it is then straightforward to show that the long-run variance in (E9) can simply be consistently estimated from the kernel estimator Ω˜̂, assuming that E(qk,tqk,t−jxtxt−j) exists. Appendix D: Allowing ξt+1 to be serially correlated In this section we relax one of the assumptions of the DGP, and let ξt=ψ(L)μt, where μt is i.i.d. with mean zero and constant variance. Let the coefficients of the moving average filter, ψi, be one-summable. It is clear that this modification does not affect the representation of plim(1TQ′Q)−1 and plim1TQ′X in (B1) and (B2). If β≠0, it further continues to hold that 1TQ′e→P0′, and hence b̂2SLS−b→P0. Yet, if β = 0, we find that 1T1/2Q′e→DN(0,[∑j=−∞∞γj(ξ)0′0ρρ′∑j=−∞∞γj(ξ×VP)+σξ2Συ]), (F1) which implies that T1/2(b̂2SLS−b)→DN(0,[∑j=−∞∞γj(ξ)0′0ρ′Συ−1ρ∑j=−∞∞γj(ξ×VP)+σξ2σVP4ρ′Συ−1ρ]). (F2) A consistent estimator for the asymptotic variance in (F2) is given by Ĥ≡ (1TX′Q(1TQ′Q)−11TQ′X)−11TX′Q(1TQ′Q)−1Ω˜̂(1TQ′Q)−11TQ′X(1TX′Q(1TQ′Q)−11TQ′X)−1, (F3) where Ω˜̂ is the consistent HAC estimator in Appendix C. Replace the t-statistic in Appendix B.3 for the slope by the robust t-statistic: tb=T1/2ι′2b̂2SLS(ι′(2)Ĥι(2))−1/2. Then, under the null hypothesis that β = 0, this robust statistic converges to a standard normal distribution. Note that if further continues to holds that s2→Pβ2σP2+σξ2. The serial correlation in ξt affects only the asymptotic variance of 1T1/2∑qe→ in the derivation of the large-sample behavior of the J-statistic in Appendix C. In particular (E9) becomes 1T1/2∑qe→→DN(0,ρρ′β2∑j=−∞∞γj(VP×y)+ρρ′∑j=−∞∞γj(VP×ξ)+β2σP2Συ+σξ2Συ), (F4) for which Ω̂ remains a consistent estimator. The asymptotic distribution of J continues to be the sum of K – 1 weighted χ2(1) variables. Appendix E: Proof of Theorem 5 For convenience, introduce the following additional notation for Appendix E ∑qz→=a K×1 vector given by[∑q1,tzt, ∑q2,tzt, …, ∑qK,tzt]′. We assume that the series zt can be represented as zt=(1−L)−δςt, (H1) where δ∈[0,1/2) and ςt is i.i.d. and independent of the innovations υk,t. By Lemma 1 it follows that zt is stationary and ergodic. Note that (X′Q(Q′Q)−1Q′X)−1X′Q(Q′Q)−1=(∑xtTq′−∑qx→′)S−1T(∑xtTq−∑qx→)′S−1(∑xtTq−∑qx→)×(−∑qx→∑xtq−T), (H2) and Q′z=(∑zt∑qz→)′. (H3) It follows that (01)(X′Q(Q′Q)−1Q′X)−1X′Q(Q′Q)−1Q′z =(01)(∑xtTq′−∑qx→′)S−1T(∑xtTq−∑qx→)′S−1(∑xtTq−∑qx→)(−∑qx→∑zt+∑xt∑qz→q∑zt−T∑qz→) =T(∑xtTq′−∑qx→′)S−1(∑ztTq−∑qz→)T(∑xtTq−∑qx→)′S−1(∑xtTq−∑qx→). (H4) From our previous derivations above, we can then conclude the following: Corr̂(VPt,zt)=(01)T1/2(X′Q(Q′Q)−1Q′X)−1X′Q(Q′Q)−1Q′z(z′z)−1/2 →P−σVP2ρ′(σVP2ρρ′+Συ)−1(plim(∑ztT)×0−plim(1T∑qz→))σVP2ρ′(σVP2ρρ′+Συ)−1σVP2ρplim(1Tz′z)−1/2=ρ′Συ−1(−plim(∑ztT)×0+plim(1T∑qz→))σVP2ρ′Συ−1ρplim(1T∑zt2)−1/2. (H5) As long as zt is such that [zt, VPt]′ is a jointly stationary ergodic process, it holds by the Ergodic Theorem (see, e.g., Hayashi, 2000, pp. 101–102) that (1/T)∑zt→PE(zt), which is zero by assumption. In addition, (1/T)∑zt2→PE(zt2)=σz2 and (1/T)∑VPtzt→PE(VPtzt), provided that the expectations exist. Finally, (1/T)∑υk,tzt→PE(υk,tzt), which equals zero since υk,t and ςt are independent by assumption. The latter two convergence results are necessary to describe the asymptotic behavior of the vector ∑qz→ in (H5), which has elements equal to ρk∑VPtzt+∑υk,tzt. Hence, Corr̂(VPt,zt)→PE(VPtzt)σVP21σz=1σVPCorr(VPt,zt). (H6) Now impose the null hypothesis that VPt and zt are independent. Using (H4), we can rewrite the correlation estimator as TCorr̂(VPt,zt)=(∑xtT1Tq′−1T∑qx→′)(1TS)−1(∑ztT1T1/2q−1T1/2∑qz→)(∑xtT1Tq−1T∑qx→)′(1TS)−1(∑xtT1Tq−1T∑qx→)(1Tz′z)−1/2→D0+ρ′Συ−1σVP2σzρ′Συ−1ρn, (H7) where n is a random variable that is normally distributed with mean zero and variance σz2(σVP2ρρ′+Συ), to which T−1/2∑qz→ converges asymptotically. This follows by Lemma 1. Thus, under H0 it holds that TCorr̂(VPt,zt)→DN(0,1σVP2+1σVP4ρ′Συ−1ρ). (H8) Footnotes * The authors appreciate helpful comments from an anonymous referee and Federico Bandi, editor of the journal. We thank the organizers and attendees of the World Congress of the Econometric Society in Montréal, the Financial Econometrics and Risk Management meeting at Western University, the Barcelona GSE Summer Forum (High Frequency Financial Econometrics), the Long-Memory Symposium at CREATES in Aarhus, the CREATES 10-Year Anniversary Conference at Sandbjerg Manor, the Triple Crown Conference at Fordham University, the Royal Economic Society Annual Conference at University of Sussex, and the Nordic Econometric Meeting at University of Helsinki for giving us the opportunity to present our work. The authors thank the participants of the Applied Economics Seminar at the CUNY Graduate Center (New York), the Finance & Economics Seminar at Rutgers Business School, and the Econometric Institute Research Meeting at Erasmus University Rotterdam and Tinbergen Institute for their valuable feedback. This work was supported by CREATES—Center for Research in Econometric Analysis of Time Series, funded by the Danish National Research Foundation [DNRF78 to D.O. and J.E.V.-V.]. 1 See, for example, Shim and Siegel (2008) for a textbook reference. 2 A far from complete list of advocates of the positive risk–return trade-off includes, for instance, Pastor, Sinha, and Swaminathan (2008), Lundblad (2007), and Ludvigson and Ng (2007). In contrast, the trade-off is found to be negative in, for example, Brandt and Kang (2004) and Whitelaw (1994, 2000), among many others. 3 See, for example, Bali and Zhou (2016), Eraker and Wang (2015), Bollerslev et al. (2014), Bekaert and Hoerova (2014), Kelly and Jiang (2014), Bollerslev et al. (2013), Drechsler and Yaron (2011), and Bollerslev, Tauchen, and Zhou (2009). 4 To emphasize its importance, Bekaert and Hoerova (2014) dedicate an entire research article to the issue, analyzing an abundance of “state-of-the-art” dynamic variance models. 5 We assume that E(VIXt2)=E(EtP(QVt,t+τ))=E(VPt)=0. In practice, we think about these processes as the demeaned true series. Under the maintained assumptions, the mean can be consistently estimated by the sample average. 6 The series RVRL,t, and BVRL,t, as well as daily prices on the S&P 500, Pt(open) and Pt(close), are obtained from the Oxford-Man Institute’s “Realised Library.” 7 The consistency and asymptotic normality of the EW estimator rely on the knowledge of the true mean of the DGP. As this value is not known in practical applications, we modify the EW to account for this uncertainty, relying on the two-step feasible EW estimator of Shimotsu (2010). 8 Results did not change when relying on estimators that are robust to additive perturbation in the DGP. These robustness results are available from the authors upon request. 9 Note that the econometrician’s model gives rise to the well-known leverage effect. If β>0, Cov(et−j,VIXt2), j≥1, is negative, that is stock variances tend to increase in reaction to perceived “bad news” (see Nelson, 1991), and vice versa. The leverage effect only slowly tapers off; in our case at hyperbolic rate j2d−1, thus replicating the persistent dynamic effect found by, for example, Bollerslev, Tauchen, and Sizova (2012), and Corsi and Reno (2012). 10 Our DGP assumes that VPt, EtP(QVt,t+τ), and hence also VIXt2 have a mean of zero. Henceforth, we therefore consider all variables, except the excess return series, in deviation of their sample averages. 11 We rely on the backward-shifted series RV˜t and BV˜t to avoid problems that could arise in the successive estimation due to a look-ahead bias. 12 Note that VRPt is different from the true variance premium VPt in (1)–(4), unless d = 1 and RVt=QVt,t+τ. 13 The Matlab code for estimation of (19) has been provided by Nielsen and Morin (2012). 14 2007/02/27 is the start of the official crisis timeline of the Federal Reserve Bank of St Louis FED (https://www.stlouisfed.org/financial-crisis/full-timeline; accessed 4 April 2018) corresponding to the Freddie Mac Press Release. 2009/03/02 corresponds to the U.S. Treasury’s and Federal Reserve Board’s announcement to participate in the AIG restructuring plan. 16 To see this, let the proxy zt be the demeaned economic uncertainty series in Bollerslev, Tauchen, and Zhou (2009), σg,t+12−E(σg,t+12). It can be shown that Cov(VPt,zt)≤Cov(VPt,qt), where qt is the conditional volatility of economic uncertainty (vol-of-vol). Hence, if we find that the left-hand side term is >0, then so is the right-hand side term. The only necessary assumption is that VPt is independent of the Gaussian independent white noise process, zσ,t, in Bollerslev, Tauchen, and Zhou (2009). 17 Theorem 5 implies that E(zt)=0. In practice, we subtract the time-series average from all measures zt. 18 For an overview of some of these indices, see also Illing and Aaron (2005). 19 For the sake of conciseness, we omit the robustness results here, but they are available from the authors upon request. References Andersen T. B. , Bollerslev T. , Diebold F. X. , Ebens H. . 2001 . The Distribution of Realized Stock Return Volatility . Journal of Financial Economics 6 ( 1 ): 43 – 76 . Google Scholar CrossRef Search ADS Andersen T. G. , Bollerslev T. , Diebold F. X. . 2007 . Roughing It Up: Including Jump Components in the Measurement, Modeling, and Forecasting of Return Volatility . Review of Economics and Statistics 89 ( 4 ): 701 – 720 . Google Scholar CrossRef Search ADS Bali T. G. , Peng L. . 2006 . Is there a Risk-Return Trade-Off? Evidence from High-Frequency Data . Journal of Applied Econometrics 21 ( 8 ): 1169 – 1198 . Google Scholar CrossRef Search ADS Bali T. G. , Zhou H. . 2016 . Risk, Uncertainty, and Expected Returns . Journal of Financial and Quantitative Analysis 51 ( 3 ): 707 – 735 . Google Scholar CrossRef Search ADS Bandi F. M. , Perron B. . 2006 . Long Memory and the Relation between Implied and Realized Volatility . Journal of Financial Econometrics 4 ( 4 ): 636 – 670 . Google Scholar CrossRef Search ADS Banerjee A. , Dolado J. , Galbraith J. W. , Hendry D. F. . 1993 . Co-Integration, Error-Correction, and the Econometric Analysis of Non-Stationary Data . New York : Oxford University Press . Google Scholar CrossRef Search ADS Barndorff-Nielsen O. E. , Shephard N. . 2002 . Estimating Quadratic Variation Using Realised Variance . Journal of Applied Econometrics 17 : 457 – 477 . Google Scholar CrossRef Search ADS Barndorff-Nielsen O. E. , Shephard N. . 2004 . Power and Bipower Variation with Stochastic Volatility and Jumps . Journal of Financial Econometrics 2(1) : 1 – 37 . Google Scholar CrossRef Search ADS Bekaert G. , Engstrom E. . 2017 . Asset Return Dynamics under Habits and Bad Environment-Good Environment Fundamentals . Journal of Political Economy 125 ( 3 ): 713 – 760 . Google Scholar CrossRef Search ADS Bekaert G. , Hoerova M. . 2014 . The VIX, the Variance Premium and Stock Market Volatility . Journal of Econometrics 183 ( 2 ): 181 – 192 . Google Scholar CrossRef Search ADS Bekaert G. , Hoerova M. . 2016 . What Do Asset Prices Have to Say About Risk Appetite and Uncertainty? Journal of Banking and Finance 67 : 103 – 118 . Google Scholar CrossRef Search ADS Bekaert G. , Hoerova M. , Duca M. Lo . 2013 . Risk, Uncertainty and Monetary Policy . Journal of Monetary Economics 60 ( 7 ): 771 – 788 . Google Scholar CrossRef Search ADS Beran J. 1994 . Statistics for Long-Memory Processes . USA : Chapman and Hall . Binsbergen J. H. , Koijen R. S. J. . 2010 . Predictive Regressions: A Present-Value Approach . Journal of Finance 65 ( 4 ): 1439 – 1471 . Google Scholar CrossRef Search ADS Bloom N. 2009 . The Impact of Uncertainty Shocks . Econometrica 77 ( 3 ): 623 – 685 . Google Scholar CrossRef Search ADS Bollerslev T. , Marrone J. , Xu L. , Zhou H. 2014 . Stock Return Predictability and Variance Risk Premia: Statistical Inference and International Evidence . Journal of Financial and Quantitative Analysis 49 ( 3 ): 633 – 661 . Google Scholar CrossRef Search ADS Bollerslev T. , Osterrieder D. , Sizova N. , Tauchen G. . 2013 . Risk and Return: Long-Run Relations, Fractional Cointegration, and Return Predictability . Journal of Financial Economics 108 : 409 – 424 . Google Scholar CrossRef Search ADS Bollerslev T. , Tauchen G. , Sizova N. . 2012 . Volatility in Equilibrium: Asymmetries and Dynamic Dependencies . Review of Finance 16 ( 73 ): 31 – 80 . Google Scholar CrossRef Search ADS Bollerslev T. , Tauchen G. , Zhou H. . 2009 . Expected Stock Returns and Variance Risk Premia . Review of Financial Studies 22 ( 11 ): 4463 – 4492 . Google Scholar CrossRef Search ADS Bollerslev T. , Todorov V. . 2011 . Tails, Fears, and Risk Premia . Journal of Finance 66 ( 6 ): 2165 – 2211 . Google Scholar CrossRef Search ADS Brandt M. W. , Kang Q. . 2004 . On the Relationship between the Conditional Mean and Volatility of Stock Returns: A Latent VAR Approach . Journal of Financial Economics 72 ( 2 ): 217 – 257 . Google Scholar CrossRef Search ADS Campbell J. Y. , Lo A. W. , MacKinlay C. . 1997 . The Econometrics of Financial Markets . Princeton, NJ : Princeton University Press . Carlini F. , Santucci de Magistris P. . 2013 . “ On the Identification of Fractionally Cointegrated VAR Models with the F(d) Condition .” CREATES Research Paper 2013-44 . Google Scholar CrossRef Search ADS Christensen B. J. , Nielsen M. O. . 2006 . Asymptotic Normality of Narrow-Band Least Squares in the Stationary Fractional Cointegration Model and Volatility Forecasting . Journal of Econometrics 133 : 343 – 371 . Google Scholar CrossRef Search ADS Christensen B. J. , Nielsen M. O. . 2007 . The Effect of Long Memory in Volatility on Stock Market Fluctuations . Review of Economics and Statistics 89 ( 4 ): 684 – 700 . Google Scholar CrossRef Search ADS Corsi F. 2009 . A Simple Approximate Long-Memory Model of Realized Volatility . Journal of Financial Econometrics 7 ( 2 ): 174 – 196 . Google Scholar CrossRef Search ADS Corsi F. , Reno R. . 2012 . Discrete-Time Volatility Forecasting With Persistent Leverage Effect and the Link with Continuous-Time Volatility Modeling . Journal of Business and Economic Statistics 30 ( 3 ): 368 – 380 . Google Scholar CrossRef Search ADS Drechsler I. , Yaron A. . 2011 . Whats Vol Got to Do with It . Review of Financial Studies 24 ( 1 ): 1 – 45 . Google Scholar CrossRef Search ADS Eraker B. , Wang J. . 2015 . A Non-linear Dynamic Model of the Variance Risk Premium . Journal of Econometrics 187 ( 2 ): 547 – 556 . Google Scholar CrossRef Search ADS Froot K. A. , O’Connell P. G. J. . 2003 . “ The Risk Tolerance of International Investors .” NBER Working Paper Series 10157 . Hager W. W. 1989 . Updating the Inverse of a Matrix . SIAM Review 31 ( 2 ): 221 – 239 . Google Scholar CrossRef Search ADS Hayashi F. 2000 . Econometrics . Princeton : Princeton University Press . Huang X. , Tauchen G. . 2005 . The Relative Contribution of Jumps to Total Price Variance . Journal of Financial Econometrics 3 : 456 – 499 . Google Scholar CrossRef Search ADS Illing M. , Aaron M. . 2005 . A Brief Survey of Risk-Appetite Indexes . Bank of Canada Financial System Review , 37 – 43 . Jagannathan R. , Wang Z. . 1996 . The Conditional CAPM and the Cross-Section of Expected Returns . Journal of Finance 1 ( 51 ): 3 – 53 . Google Scholar CrossRef Search ADS Johansen S. 2008 . A Representation Theory for a Class of Vector Autoregressive Models for Fractional Processes . Econometric Theory 24 : 651676 . Google Scholar CrossRef Search ADS Johansen S. 2009 . Representation of Cointegrated Autoregressive Processes with Application to Fractional Processes . Econometric Reviews 28 : 121 – 145 . Google Scholar CrossRef Search ADS Johansen S. , Nielsen M. O. . 2012 . Likelihood Inference for a Fractionally Cointegrated Vector Autoregressive Model . Econometrica 80 ( 6 ): 2667 – 2732 . Google Scholar CrossRef Search ADS Jurado K. , Ludvigson S. , Ng S. . 2015 . Measuring Uncertainty . American Economic Review 105 ( 3 ): 1177 – 1216 . Google Scholar CrossRef Search ADS Kelly B. , Jiang H. . 2014 . Tail Risk and Asset Prices . Review of Financial Studies 27 ( 10 ): 2841 – 2871 . Google Scholar CrossRef Search ADS Kwiatkowski D. , Phillips P. C. B. , Schmidt P. , Shin Y. . 1992 . Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root . Journal of Econometrics 54 ( 1–3 ): 159 – 178 . Google Scholar CrossRef Search ADS Lettau M. , Ludvigson S. . 2001 . Consumption, Aggregate Wealth, and Expected Stock Returns . Journal of Finance 56 ( 3 ): 815 – 849 . Google Scholar CrossRef Search ADS Ludvigson S. , Ng S. . 2007 . The Empirical Risk-Return Relation: A Factor Analysis Approach . Journal of Financial Economics 83 ( 1 ): 171 – 222 . Google Scholar CrossRef Search ADS Lundblad C. 2007 . The Risk Return Tradeoff in the Long Run: 1836-2003 . Journal of Financial Economics 85 ( 1 ): 123 – 150 . Google Scholar CrossRef Search ADS Maynard A. , Phillips P. C. B. . 2001 . Rethinking an Old Empirical Puzzle: Econometric Evidence on the Forward Discount Anomaly . Journal of Applied Econometrics 16 : 671 – 708 . Google Scholar CrossRef Search ADS Maynard A. , Smallwood A. , Wohar M. E. . 2013 . Long Memory Regressors and Predictive Testing: A Two-Stage Rebalancing Approach . Econometric Reviews 32 ( 3 ): 318 – 360 . Google Scholar CrossRef Search ADS Meddahi N. 2002 . A Theoretical Comparison between Integrated and Realized Volatility . Journal of Applied Econometrics 17 : 479 – 508 . Google Scholar CrossRef Search ADS Nelson D. B. 1991 . Conditional Heteroskedasticity in Asset Returns: A New Approach . Econometrica 59 : 347 – 370 . Google Scholar CrossRef Search ADS Nielsen M. O. , Morin L. . 2012 . “ FCVARmodel.m: A Matlab Software Package for Estimation and Testing in the Fractionally Cointegrated VAR Model .” QED Working Paper 1273 . Nielsen M. O. , Shimotsu K. . 2007 . Determining the Cointegrating Rank in Nonstationary Fractional Systems by the Exact Local Whittle Approach . Journal of Econometrics 141 ( 2 ): 574 – 596 . Google Scholar CrossRef Search ADS Pastor L. , Sinha M. , Swaminathan B. . 2008 . Estimating the Intertemporal Risk Return Trade-Off Using the Implied Cost of Capital . Journal of Finance 63 ( 6 ): 2859 – 2897 . Google Scholar CrossRef Search ADS Pastor L. , Stambaugh R. F. . 2009 . Predictive Systems: Living with Imperfect Predictors . Journal of Finance 64 ( 4 ): 1583 – 1628 . Google Scholar CrossRef Search ADS Sargan D. 1958 . The Estimation of Economic Relationships Using Instrumental Variables . Econometrica 26 : 393 – 415 . Google Scholar CrossRef Search ADS Shim J. K. , Siegel J. G. . 2008 . Financial Management , 3rd edn , Barron’s Business Library, chapter 7, Understanding Return and Risk , pp. 133 – 134 , New York. Shimotsu K. 2010 . Exact Local Whittle Estimation of Fractional Integration with Unknown Mean and Time Trend . Econometric Theory 26 ( 2 ): 501 – 540 . Google Scholar CrossRef Search ADS Shimotsu K. , Phillips P. C. B. . 2005 . Exact Local Whittle Estimation of Fractional Integration . Annals of Statistics 33 : 1890 – 1933 . Google Scholar CrossRef Search ADS Tsay W. J. , Chung C. F. . 2000 . The Spurious Regression of Fractionally Integrated Processes . Journal of Econometrics 96 ( 1 ): 155 – 182 . Google Scholar CrossRef Search ADS Welch I. , Goyal A. . 2008 . A Comprehensive Look at The Empirical Performance of Equity Premium Prediction . Review of Financial Studies 21 ( 4 ): 1455 – 1508 . Google Scholar CrossRef Search ADS Whaley R. E. 2000 . The Investor Fear Gauge . Journal of Portfolio Management 26 ( 3 ): 12 – 17 . Google Scholar CrossRef Search ADS White H. 2002 . Asymptotic Theory for Econometricians . Orlando : Academic Press . Whitelaw R. 2000 . Stock Market Risk and Return: An Equilibrium Approach . Review of Financial Studies 13 ( 3 ): 521 – 547 . Google Scholar CrossRef Search ADS Whitelaw R. F. 1994 . Time Variations and Covariations in the Expectation and Volatility of Stock Market Returns . Journal of Finance 49 ( 2 ): 515 – 541 . Google Scholar CrossRef Search ADS © The Author(s), 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal of Financial Econometrics – Oxford University Press

**Published: ** Apr 16, 2018

Loading...

personal research library

It’s your single place to instantly

**discover** and **read** the research

that matters to you.

Enjoy **affordable access** to

over 18 million articles from more than

**15,000 peer-reviewed journals**.

All for just $49/month

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Read from thousands of the leading scholarly journals from *SpringerNature*, *Elsevier*, *Wiley-Blackwell*, *Oxford University Press* and more.

All the latest content is available, no embargo periods.

## “Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”

Daniel C.

## “Whoa! It’s like Spotify but for academic articles.”

@Phil_Robichaud

## “I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”

@deepthiw

## “My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”

@JoseServera

DeepDyve ## Freelancer | DeepDyve ## Pro | |
---|---|---|

Price | FREE | $49/month |

Save searches from | ||

Create lists to | ||

Export lists, citations | ||

Read DeepDyve articles | Abstract access only | Unlimited access to over |

20 pages / month | ||

PDF Discount | 20% off | |

Read and print from thousands of top scholarly journals.

System error. Please try again!

or

By signing up, you agree to DeepDyve’s Terms of Service and Privacy Policy.

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

**EndNote**

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.

ok to continue