Risk Premia and Volatilities in a Nonlinear Term Structure Model

Risk Premia and Volatilities in a Nonlinear Term Structure Model Abstract We introduce a reduced-form term structure model with closed-form solutions for yields where the short rate and market prices of risk are nonlinear functions of Gaussian state variables. The nonlinear model with three factors matches the time-variation in expected excess returns and yield volatilities of US Treasury bonds from 1961 to 2014. Yields and their variances depend on only three factors, yet the model exhibits features consistent with Unspanned Risk Premia (URP) and Unspanned Stochastic Volatility (USV). 1. Introduction The US Treasury bond market is a large and important financial market. Policy makers, investors, and researchers need models to disentangle market expectations from risk premiums, and estimate expected returns and Sharpe ratios, both across maturity and over time. The most prominent class of models are affine models. However, there are a number of empirical facts documented in the literature that these models struggle with matching simultaneously: a) excess returns are time-varying, b) a part of expected excess returns is unspanned by the yield curve, c) yield variances are time varying, and d) a part of yield variances is unspanned by the yield curve.1 Affine models have been shown to match each of these four findings separately, but not simultaneously and only by increasing the number of factors beyond the standard level, slope, and curvature factors.2 We introduce an arbitrage-free dynamic term structure model where the short rate and market prices of risk are nonlinear functions of Gaussian state variables. We provide closed-form solutions for bond prices and since the factors are Gaussian our nonlinear model is as tractable as a standard Gaussian model. We show that the model can capture all four findings mentioned above simultaneously and it does so with only three factors driving yields and their variances. The value of having few factors is illustrated by Duffee (2010) who estimates a five-factor Gaussian model to capture time variation in expected returns and finds huge Sharpe ratios due to overfitting. We use a monthly panel of five zero-coupon Treasury bond yields and their realized variances from 1961 to 2014 to estimate the nonlinear model with three factors. To compare the implications of the nonlinear model with those from the standard class of affine models, we also estimate three-factor affine models with no or one stochastic volatility factor, the essentially affine A0(3) and A1(3) models. We first assess the ability of the nonlinear model to predict excess bond returns in sample and regress realized excess returns on model-implied expected excess return. The average R2 across bond maturities and holding horizons is 27% for the nonlinear model, 6.5% for the A1(3) model, 8% for the A0(3) model, and no more than 15% for any affine model in which expected excess returns are linear functions of yields. Campbell and Shiller (1991) document a positive relation between the slope of the yield curve and expected excess returns, a finding that affine models with stochastic volatility have difficulty matching (see Dai and Singleton, 2002). In simulations, we show that the nonlinear model can capture this positive relation. There is empirical evidence that a part of expected excess bond returns is not spanned by linear combinations of yields, a phenomenon we refer to as Unspanned Risk Premia (URP).3 URP arises in our model due to a nonlinear relation between expected excess returns and yields. To quantitatively explore this explanation, we regress expected excess returns implied by the nonlinear model on its principal components (PCs) of yields and find that the first three PCs explain 67–72% of the variation in expected excess returns. Furthermore, the regression residuals correlate with expected inflation in the data (measured through surveys), not because inflation has any explanatory power in the model but because it happens to correlate with “the amount of nonlinearity.” Duffee (2011b); Wright (2011); and Joslin, Priebsch, and Singleton (2014) use five-factor Gaussian models where one or two factors that are orthogonal to the yield curve explain expected excess returns and are related to expected inflation. We capture the same phenomenon with a nonlinear model that retains a parsimonious three-factor structure to price bonds and yet allows for time variation in volatilities. The nonlinear and A1(3) model can capture the persistent time variation in volatilities and the high volatility during the monetary experiment in the early 80s. However, the two models have different implications for the cross-sectional and predictive distribution of yield volatility. In the nonlinear model more than one factor drives the cross-sectional variation in yield volatilities while by construction the A1(3) model only has one. Moreover, in the nonlinear model, the probability of a high volatility scenario increases with the monetary experiment and remains high during the Greenspan era even though volatilities came down significantly. This finding resembles the appearance and persistence of the equity option smile since the crash of 1987. In contrast, the distribution of future volatility in the A1(3) model is similar before and after the monetary experiment. The volatility in the Gaussian A0(3) model is constant and thus this model overestimates volatility during the Greenspan era and underestimates it during the monetary experiment. There is a large literature suggesting that interest rate volatility risk cannot be hedged by a portfolio consisting solely of bonds; a phenomenon referred to by Collin-Dufresne and Goldstein (2002) as Unspanned Stochastic Volatility (USV). The empirical evidence supporting USV typically comes from a low R2 when regressing a measure of volatility on interest rates.4 To test the ability of the nonlinear model to capture the empirical evidence on USV, we use the methodology of Andersen and Benzoni (2010) and regress the model-implied variance of yields on the PCs of model-implied yields. The first three PCs explain 42–44%, which is only slightly higher than in the data where they explain 30–35% of the variation in realized yield variance. If we include the fourth and fifth PC, these numbers increase to 55–62% and 40–43%, respectively. Hence, our nonlinear model quantitatively captures the R2s in USV regressions in the data. In contrast, since there is a linear relation between yield variance and yields in standard affine models, the first three PCs explain already 100% in the A1(3) model.5 The standard procedure in the reduced-form term structure literature is to specify the short rate and the market prices of risk as functions of the state variables. Instead, we model the functional form of the stochastic discount factor (SDF) directly by multiplying the SDF from a Gaussian term structure model with the term 1+γe−βX, where β and γ are parameters and X is the Gaussian state vector. This functional form is a special case of the SDF that arises in many equilibrium models in the literature. In such models, the SDF can be decomposed into a weighted average of different representative agent models. Importantly, the weights on the different models are time-varying and this is a source of time-varying risk premia and volatility of bond yields. Our paper is not the first to propose a nonlinear term structure model. Dai, Singleton, and Yang (2007) estimate a regime-switching model and show that excluding the monetary experiment in the estimation leads their model to pick up minor variations in volatility. In contrast, the nonlinear model can pick up states that did not occur in the sample used to estimate the model. Specifically, we estimate the model using a sample that excludes the monetary experiment and find that it still implies a significant probability of a strong increase in volatility. Furthermore, while the Gaussian model is a special case of both models our nonlinear model only increases the number of parameters from 23 to 27 whereas the regime-switching model in Dai, Singleton, and Yang (2007) has fifty-six parameters. Quadratic term structure models have been proposed by Ahn, Dittmar, and Gallant (2002) and Leippold and Wu (2003) among others, but Ahn, Dittmar, and Gallant (2002) find that quadratic term structure models are not able to generate the level of conditional volatility observed for short- and intermediate-term bond yields. Ahn et al. (2003) propose a class of nonlinear term structure models based on the inverted square-root model of Ahn and Gao (1999), but in contrast to our nonlinear model they do not provide closed-form solutions for bond prices. Dai, Le, and Singleton (2010) develop a class of discrete time models that are affine under the risk neutral measure, but show nonlinear dynamics under the historical measure. They illustrate that the model encompasses many equilibrium models with recursive preferences and habit formation. Carr, Gabaix, and Wu (2009) use the linearity generating framework of Gabaix (2009) to price swaps and interest rate derivatives. Similarly, in concurrent work Filipovic, Larsson, and Trolle (2015) introduce a linear-rational framework to price bonds and interest rate derivatives. Both approaches lead to closed-form solutions of discount bonds, but their pricing framework is based on the potential approach of Rogers (1997) while our approach is based on a large class of equilibrium models discussed in Appendix B.6 The rest of the paper is organized as follows. Section 2 motivates and describes the model. Section 3 estimates the model and Section 4 presents the empirical results. In Section 5, we estimate a one-factor version of the nonlinear model and describe how nonlinearity works in this simple case, while Section 6 concludes. 2. A Nonlinear Term Structure Model In this section, we present a nonlinear model of the term structure of interest rates. We first motivate the model by presenting regression evidence for nonlinearities in excess returns and yield variances in Section 2.1 and then we present the model in Section 2.2. 2.1 Motivating Regression Evidence In Panel A of Table I, we regress yearly excess returns measured on a monthly basis for the period 1961–2014 on the first three PCs of yields and product combinations of the PCs. Specifically, the dependent variable is the average 1-year excess return computed over US Treasury bonds with a maturity of 2, 3, 4, and 5 years (we explain the details of the data in Section 3.1). As independent variables, we first include all terms that are a product of up to three terms of the first three PCs (in short PC1, PC2, and PC3). We then exclude terms with the lowest t-statistics one-by-one until only significant terms remain. The first row of Panel A shows the result. There are only three significant terms in the regression and they are all nonlinear. The second row shows the regression when we include only the first three PCs, the linear relation implied by affine models, and we see that the R2 of 16% is substantially lower than the R2 of 29% in the first regression. Finally, the third row shows that the linear terms add almost no explanatory power to the first regression. Table I . Nonlinearities in expected excess returns and realized variances This table shows coefficients, standard errors (in brackets), and R2s from regressions of realized 1-year log excess bond returns (Panel A) and realized yield variances (Panel B), averaged over bond maturities two to five in Panel A and one to five in Panel B, on three different sets of yield PCs and powers thereof. The independent variables in the first row of both panels are obtained by first considering all product combinations of the first three PCs up to and including order three and excluding every variable with the lowest t-statistic until only significant variables remain. The monthly excess returns, realized variances, and PCs are calculated using daily zero-coupon bond yield data from 1961:07 to 2014:04. The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The number of observations is 622 for the predictive regressions in Panel A and 634 for the contemporaneous regressions in Panel B. All variables are standardized and standard errors are computed using the Hansen and Hodrick (1980) correction with twelve lags in Panel A and the Newey and West (1987) correction with twelve lags in Panel B. ** and * indicate statistical significance at the 1% and 5% levels, respectively. Panel A: 1-Year average excess bond returns   PC1  PC2  PC3  PC1PC2  PC13  PC23  R2        −0.37**(0.09)  0.40**(0.11)  0.33**(0.08)  0.29  0.07(0.13)  0.39**(0.12)  −0.05(0.11)        0.16  −0.14(0.17)  0.10(0.14)  −0.04(0.10)  −0.33**(0.10)  0.49**(0.16)  0.26*(0.11)  0.30    Panel B: Realized average yield variance  PC1  PC2  PC3  PC12  PC1PC3  PC2PC3  PC13  PC1PC2PC3  R2          0.12**(0.04)  −0.12*(0.05)  −0.18**(0.06)  0.39**(0.07)  −0.34**(0.05)  0.55  0.48**(0.12)  −0.10(0.09)  0.32**(0.09)            0.34  0.10(0.14)  0.04(0.05)  0.04(0.06)  0.14(0.08)  −0.10(0.05)  −0.16*(0.07)  0.30*(0.15)  −0.34**(0.08)  0.55  Panel A: 1-Year average excess bond returns   PC1  PC2  PC3  PC1PC2  PC13  PC23  R2        −0.37**(0.09)  0.40**(0.11)  0.33**(0.08)  0.29  0.07(0.13)  0.39**(0.12)  −0.05(0.11)        0.16  −0.14(0.17)  0.10(0.14)  −0.04(0.10)  −0.33**(0.10)  0.49**(0.16)  0.26*(0.11)  0.30    Panel B: Realized average yield variance  PC1  PC2  PC3  PC12  PC1PC3  PC2PC3  PC13  PC1PC2PC3  R2          0.12**(0.04)  −0.12*(0.05)  −0.18**(0.06)  0.39**(0.07)  −0.34**(0.05)  0.55  0.48**(0.12)  −0.10(0.09)  0.32**(0.09)            0.34  0.10(0.14)  0.04(0.05)  0.04(0.06)  0.14(0.08)  −0.10(0.05)  −0.16*(0.07)  0.30*(0.15)  −0.34**(0.08)  0.55  Panel B in Table I shows similar regressions with the average excess return replaced by the average monthly realized yield variance as dependent variable (again, we leave the detailed explanation of how we calculate realized variance to Section 3.1). The first regression in Panel B shows the regression result when the independent variables are products of up to three terms of PC1, PC2, and PC3, after excluding insignificant terms as in Panel A. None of the linear terms are significant and the five significant nonlinear terms generate an R2 of 55%. Row 2 shows that a regression with only the first three PCs, the linear relation implied by affine models, yields a substantially lower R2 of 34% and row 3 shows that the linear terms do not raise the R2 when included in the first regression in Panel B. These regressions show that there is a nonlinear relation both between yields and excess returns and between yields and yield variances. While the R2s in the nonlinear regressions are informative about the importance of nonlinearity, overfitting and collinearity limits the ability to pin down the precise nonlinear relation. In particular, when running the regressions for each bond maturity individually it is rare that the same set of nonlinear terms is significant. This evidence suggests that we need a parsimonious nonlinear model to study the nonlinearities in the first and second moments of bond returns, which we present in the next section. 2.2 The Model Uncertainty is represented by a d-dimensional Brownian motion W(t)=(W1(t),…,Wd(t))′. There is a d-dimensional Gaussian state vector X(t) that follows the dynamics   dX(t)=κ(X¯−X(t))dt+ΣdW(t), (1) where X¯ is d-dimensional and κ and Σ are d × d-dimensional. 2.2.a. The stochastic discount factor We assume that there is no arbitrage and that the strictly positive SDF is   M(t)=M0(t)(1+γe−β′X(t)), (2) where γ denotes a nonnegative constant, β a d-dimensional vector, and M0(t) a strictly positive stochastic process. Equation (2) is a key departure from standard term structure models (Vasicek, 1977; Cox, Ingersoll, and Ross, 1985; Duffie and Kan, 1996; Dai and Singleton, 2000). Rather than specifying the short rate and the market price of risk, which in turn pins down the SDF, we specify the functional form of the SDF directly.7 This approach is motivated by equilibrium models where the SDF is a function of structural parameters and thus the risk-free rate and market price of risk are interconnected. Moreover, we show in Appendix B that the SDF specified in Equation (2) is a special case of the SDF in many popular equilibrium models. To keep the model comparable to the existing literature on affine term structure models, we introduce a base model for which M0(t) is the SDF. The dynamics of M0(t) are   dM0(t)M0(t)=−r0(t)dt−Λ0(t)′dW(t), (3) where r0(t) and Λ0(t) are affine functions of the state vector X(t). Specifically,   r0(t)=ρ0,0+ρ0,X′X(t), (4)  Λ0(t)=λ0,0+λ0,XX(t), (5) where ρ0,0 is a scalar, ρ0,X and λ0,0 are d-dimensional vectors, and λ0,X is a d × d-dimensional matrix. It is well known that bond prices in the base model belong to the class of Gaussian term structure models (Dai and Singleton, 2002; Duffee, 2002) with essentially affine risk premia. If γ or every element of β is zero, then the nonlinear model collapses to the Gaussian base model. We now provide closed-form solutions for bond prices in the nonlinear model. 2.2.b. Closed-form bond prices Let P(t, T) denote the price at time t of a zero-coupon bond that matures at time T. Specifically,   P(t,T)=Et[M(T)M(t)]. (6) We show in the next theorem that the price of a bond is a weighted average of bond prices in artificial economies that belong to the class of essentially affine Gaussian term structure models. Theorem 1. The price of a zero-coupon bond that matures at time T is  P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), (7) where  s(t)=11+γe−β′X(t)∈(0,1] (8)  Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t). (9)The coefficient An(T−t)and the d-dimensional vector Bn(T−t)solve the ordinary differential equations  dAn(τ)dτ=12Bn(τ)′ΣΣ′Bn(τ)+Bn(τ)′(κX¯−Σλn,0)−ρn,0, An(0)=0, (10)  dBn(τ)dτ=−(κ+Σλn,X)′Bn(τ)−ρn,X, Bn(0)=0d, (11)where  ρn,0=ρ0,0+nβ′κX¯−nβ′Σλ0,0−12n2β′ΣΣ′β, (12)  ρn,X=ρ0,X−nκ′β−nλ0,X′Σ′β, (13)  λn,0=λ0,0+nΣ′β, (14)  λn,X=λ0,X. (15) The proof of this theorem is given in Appendix A where we provide a proof for a more general class of nonlinear models and also show how our nonlinear model is related to the class of reduced-form asset pricing model presented in Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012). To provide some intuition, we define M1(t)=γe−β′X(t)M0(t) and rewrite the bond pricing Equation (6) using the fact that s(t)=M0(t)/M(t)=1−M1(t)/M(t). Specifically,   P(t,T)=s(t)Et[M0(T)M0(t)]+(1−s(t))Et[M1(T)M1(t)].  (16) Applying Ito’s lemma to M1(t) leads to   dM1(t)M1(t)=−r1(t)dt−Λ1(t)′dW(t), (17) where r1(t) and Λ1(t) are affine functions of the state vector X(t). Specifically,   r1(t)=ρ1,0+ρ1,X′X(t), (18)  Λ1(t)=λ1,0+λ1,XX(t), (19) where ρ1,0, ρ1,X, λ1,0, and λ1,X are given in Equations (12), (13), (14), and (15), respectively. Hence, both expectations in Equation (16) are equal to bond prices in artificial economies with discount factors M0(t) and M1(t), respectively. These bond prices belong to the class of essentially affine term structure models and hence P(t, T) can be computed in closed form. 2.2.c. The short rate and the price of risk Applying Ito’s lemma to Equation (2) leads to the dynamics of the SDF:   dM(t)M(t)=−r(t)dt−Λ(t)′dW(t), (20) where both the short rate r(t) and the market price of risk Λ(t) are nonlinear functions of the state vector X(t) given in Equations (21) and (22), respectively. The short rate is given by   r(t)=s(t)r0(t)+(1−s(t))r1(t). (21) Our model allows the short rate to be nonlinear in the state variables without losing the tractability of closed-form solutions of bond prices and a Gaussian state space.8 The d-dimensional market price of risk is given by   Λ(t)=s(t)Λ0(t)+(1−s(t))Λ1(t). (22)Equation (22) shows that even if the market prices of risk in the base model are constant, the market prices of risks in the general model are stochastic due to variations in the weight s(t). When s(t) approaches zero or one, then Λ(t) approaches the market price of risk of an essentially affine Gaussian model. 2.2.d. Expected return and volatility We know that the bond price is a weighted average of exponential affine bond prices (see Equation (7)). Hence, variations of instantaneous bond returns are due to variations in the two artificial bond prices P0(t,T) and P1(t,T) and due to variations in the weight s(t). Specifically, the dynamics of the bond price P(t, T) are   dP(t,T)P(t,T)=(r(t)+e(t,T))dt+σ(t,T)′dW(t), (23) where e(t, T) denotes the instantaneous expected excess return and σ(t,T) denotes the local volatility vector of a zero-coupon bond that matures at time T. The local volatility vector of the bond is given by   σ(t,T)=ω(t,T)σ0(T−t)+(1−ω(t,T))σ1(T−t)+(s(t)−ω(t,T))β, (24) where σi(T−t)=Σ′Bi(T−t) denotes the local bond volatility vector in the Gaussian model with SDF Mi(t) and ω(t,T) denotes the contribution of P0(t,T) to the bond price P(t, T). Specifically,   ω(t,T)=P0(t,T)s(t)P(t,T)∈(0,1]. (25) When s(t) approaches zero or one, then σ(t,T) approaches the deterministic local volatility of a Gaussian model. However, in contrast to the short rate and the market price of risk, the local volatility can move outside the range of the two local Gaussian volatilities, σ0(T−t) and σ1(T−t), because of the last term in Equation (24). Intuitively, there are two distinct contributions to volatility in Equation (24). The direct term, defined as   σvol(t,T)=ω(t,T)σ0(T−t)+(1−ω(t,T))σ1(T−t), (26) arises because the two artificial Gaussian models have constant but different yield volatilities. The indirect term, defined as   σlev(t,T)=(s(t)−ω(t,T))β (27) is due to the Gaussian models having different yield levels. Two special cases illustrate the distinct contributions to volatility. If P0(t,T)=P1(t,T)=P(t,T), then σlev(t,T)=0 and the local volatility vector reduces to σ(t,T)=s(t)σ0(T−t)+(1−s(t))σ1(T−t). On the other hand, if σ0(T−t)=σ1(T−t), the first term is constant, but there is still stochastic volatility due to the second term which becomes more important the bigger the difference between the two artificial bond prices P1(t,T) and P0(t,T).9 The instantaneous expected excess return and volatility of the bond are   e(t,T)=Λ(t)′σ(t,T) (28)  v(t,T)=σ(t,T)′σ(t,T). (29) Equations (20)–(29) show that the nonlinear term structure model differs from the essentially affine Gaussian base model in two important aspects. First, the volatilities of bond returns and yields are time-varying and hence expected excess returns are moving with both the price and the quantity of risk.10 Second, the short rate r(t), the instantaneous volatility v(t, T), and the instantaneous expected excess return e(t, T) are nonlinear functions of X(t). 3. Estimation In this section, we estimate the nonlinear model described in Section 2 and compare it to standard essentially affine A0(3) and A1(3) models. All three models have three factors and the number of parameters is 22 in the A0(3) model, 23 in the A1(3) model, and 26 in the nonlinear model. The A0(3) is a special case of our nonlinear model where M0(t)=M(t). The A1(3) model is well know and thus we only present the setup with results in Section 3.2 and defer details to Feldhütter (2016). 3.1 Data We treat each period as a month and estimate the models using a monthly panel of five zero-coupon Treasury bond yields and their realized variances. Although it is in theory sufficient to use bond yields to estimate the model, we add realized variances in the estimation to improve the identification of model parameters (see Cieslak and Povala [2016] for a similar approach). We use daily (continuously compounded) 1-, 2-, 3-, 4-, and 5-year zero-coupon yields extracted from US Treasury security prices by the method of Gurkaynak, Sack, and Wright (2007). The data are available from the Federal Reserve Board’s webpage and cover the period 1961:07 to 2014:04. For each bond maturity, we average daily observations within a month to get a time series of monthly yields. We use realized yield variance to measure yield variance. Let ytτ and rvtτ denote the yield and realized yield variance of a τ-year bond in month t based on daily observations within that month. Specifically,   ytτ=1Nt∑i=1Ntyd,tτ(i), (30)  rvtτ=12∑i=1Nt(yd,tτ(i)−yd,tτ(i−1))2, (31) where yd,tτ(i) denotes the yield at day i within month t, Nt denotes the number of trading days within month t, and yd,tτ(0) denotes the last observation in month t – 1. The realized variance converges to the quadratic variation as N approaches infinity, see Andersen, Bollerslev, and Diebold (2010) and the references therein for a detailed discussion. To check the accuracy of realized variance based on daily data, we compare realized volatility with option-implied volatility (to be consistent with the options literature we look at implied volatility instead of implied variance). We obtain implied price volatility of 1 month at-the-money options on 5-year Treasury futures from Datastream and convert it to yield volatility.11 We then calculate monthly volatility by averaging over daily volatilities. Figure 1 shows that realized volatility tracks option-implied volatility closely (the correlation is 87%), and thus we conclude that realized variance is a useful measure for yield variance. Figure 1 View largeDownload slide Realized and option-implied yield volatility. We use monthly estimates of realized yield variance based on daily squared yield changes. This graph shows that option-implied volatility tracks the realized volatility closely over the last 10 years (the correlation is 87%). Option-implied volatility is obtained from 1-month at-the-money options on 5-year Treasury futures as explained in the text. The data are available from Datastream since October 2003. Figure 1 View largeDownload slide Realized and option-implied yield volatility. We use monthly estimates of realized yield variance based on daily squared yield changes. This graph shows that option-implied volatility tracks the realized volatility closely over the last 10 years (the correlation is 87%). Option-implied volatility is obtained from 1-month at-the-money options on 5-year Treasury futures as explained in the text. The data are available from Datastream since October 2003. 3.2 The A1(3) Model We briefly describe the A1(3) model in this section and refer the reader to Feldhütter (2016) for a detailed discussion. The dynamics of the three-dimensional state vector X(t)=(X1(t),X2(t),X3(t))′ are   dX(t)=κ(X¯−X(t))dt+S(t)dW(t), (32) where X¯=(X¯1,0,0)′ is the long run mean,   κ=(κ(1,1)00κ(2,1)κ(2,2)κ(2,3)κ(3,1)κ(3,2)κ(3,3)) (33) is the positive-definite mean reversion matrix, W(t) is a three-dimensional Brownian motion, and   S(t)=(δ1X1(t)0001+δ2X1(t)0001+δ3X1(t)) (34) is the local volatility matrix with δ=(1,δ2,δ3). The dynamics of the SDF M(t) are   dM(t)M(t)=−r(t)dt−Λ(t)′dW(t), (35) where the short rate r(t) and the three-dimensional vector S(t)Λ(t) are affine functions of X(t). Specifically,   r(t)=ρ0+ρX′X(t), (36) where ρ0 is a scalar and ρX is a three-dimensional vector. The market price of risk Λ(t) is the solution of the equation   S(t)Λ(t)=(λX,(1,1)X1(t)λ0,2+λX,(2,1)X1(t)+λX,(2,2)X2(t)+λX,(2,3)X3(t)λ0,3+λX,(3,1)X1(t)+λX,(3,2)X2(t)+λX,(3,3)X3(t)), (37) where λ0 denotes a three-dimensional vector and λX a three-dimensional matrix. The bond price and the instantaneous yield volatility are   P(X(t),T)=eA(T−t)+B(T−t)′X(t) (38)  v(X(t),T)=B(T−t)′S(X(t))S(X(t))B(T−t), (39) where A(τ) and B(τ) satisfy the ODEs   dA(τ)dτ=(κX¯−λ0)′B(τ)+12∑i=23Bi(τ)2−ρ0,  A(0)=0 (40)  dB(τ)dτ=(κ+λX)′B(τ)+12∑i=13Bi(τ)δi−ρX,  B(0)=03×1. (41) 3.3 Estimation Methodolgy We use the unscented Kalman filter (UKF) to estimate the nonlinear model, the extended Kalman filter to estimate the A1(3) model, and the Kalman filter to estimate the A0(3) model. Christoffersen et al. (2014) show that the UKF works well in estimating term structure models when highly nonlinear instruments are observed. We briefly discuss the setup but refer to Christoffersen et al. (2014) and Carr and Wu (2009) for a detailed description of this nonlinear filter. When we estimate the nonlinear and A1(3) model, we stack the five yields in month t in the vector Yt, the corresponding five realized yield variances in the vector RVt, and set up the model in state-space form. The measurement equation is   (YtRVt)=(f(Xt)g(Xt))+(σyI500σrvI5)ϵt,ϵt∼N(0,I10), (42) where f(·) is the function determining the relation between the latent variables and yields, g(·) is the function determining the relation between the latent variables and the variance of yields, and the positive parameters σy and σrv are the pricing errors for yields and their variances.12 Specifically, f=(f1,…,f5)′ and g=(g1,…,g5)′ where   fτ(Xt)=−1τln⁡(P(Xt,t+τ)) (43)  gτ(Xt)=1τ2v2(Xt,t+τ) (44) with P(Xt,t+τ) and v(Xt,t+τ) given in Equation (7) and (29), respectively. In the A0(3) model, yield volatility is constant and we therefore only include yields (and not realized variances) in the estimation. In the nonlinear model, the state space is Gaussian and thus the transition equation for the latent variables is   Xt+1=C+DXt+ηt+1,             ηt∼N(0,Q), (45) where C is a vector and D is a matrix that enters the 1-month ahead expectation of Xt, that is, Et(Xt+1)=C+DXt. The covariance matrix of Xt+1 given Xt is constant and equal to Q. In the A1(3) model, we use the Gaussian transition equation in (45) as an approximation because the dynamics of X are non-Gaussian. This is a standard approach in the literature (Feldhütter and Lando, 2008). The bond price P(Xt,t+τ) and volatility v(Xt,t+τ) in Equations (43) and (44) of the A1(3) model are given in Equation (38) and (39) in Section 3.2. We can use the approximate Kalman filter because both yields and variances are affine in X in the A1(3) model. We use the normalization proposed in Dai and Singleton (2000) to guarantee that the parameters are well identified if s(Xt) is close to zero or one, or if γ and all elements of β are close to zero. In the nonlinear model, we assume in Equation (1) that the mean reversion matrix, κ, is lower triangular, the mean of the state variables, X¯, is the zero vector, and that the local volatility, Σ, is the identity matrix. The normalizations in the A1(3) model are given in Section 3.2. 3.4 Estimation Results Estimated parameters with asymptotic standard errors (in parenthesis) are reported in Tables II and III. Columns 2–4 of Table II show parameter estimates based on the whole sample (1961:07–2014:04) that includes the period of the monetary experiments where the 1-year bond yield and its volatility exceeded 15% and 5%, respectively. We re-estimate the nonlinear model using only yield and volatility data for the period 1987:08–2014:04, which excludes the high yield and yield volatility regime during the early 80s.13 Columns 5–7 of Table II show that the estimated parameters for this period are similar to the estimated parameters for the whole sample period. In particular, the nonlinear parameters β and γ have the same sign and are of similar magnitude. The parameter estimates for the A1(3) and the A0(3) model are reported in Table III. Table II . Parameter estimates of the nonlinear three-factor model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for the nonlinear three-factor model. The left column shows parameters estimates based on yield and realized variance data for the whole sample (1961:07–2014:04) and the right column shows parameter estimates based on yield and realized variance data for the Post-Volcker period (1987:08–2014:04). The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The UKF is used to estimate the nonlinear model.   Nonlinear model (1961–2014)  Nonlinear model (1987–2014)    0.3127(0.04224)  0  0  0.3452(0.08753)  0  0  κ  0.3063(0.05601)  0.002189(2.246e−05)  0  0.5507(0.09825)  0.003245(0.002091)  0    1.258(0.1103)  0.03804(0.02125)  0.4098(0.0377)  1.057(0.2745)  1.072e−05(0.0002734)  0.4449(0.2494)    ρ0    −0.001756(0.01408)      −0.001002(0.02238)      ρX  0.0002071(0.0001846)  0.003061(0.0002364)  0.004345(0.0001742)  0.0002036(0.0009384)  0.005161(0.0004481)  0.004939(0.0005533)    λ0  0.7569(0.04302)  −0.01631(0.5559)  −0.4413(0.3375)  0.3814(0.09227)  −0.02483(0.09312)  −0.3191(0.2209)      −0.2187(0.04129)  0.005572(0.001321)  −0.02053(0.005609)  −0.2244(0.06907)  0.003604(0.00792)  −0.02491(0.04552)  λX  −1.735e−06(4.238e−05)  0.001197(0.03785)  0.6863(0.03001)  −1.558e−06(2.248e−05)  0.001282(0.03908)  0.7165(0.05695)    −0.2943(0.1053)  −0.02387(0.01562)  0.04613(0.05121)  −0.3973(0.2578)  −0.0237(0.02542)  0.05947(0.2159)    γ    0.0003857(0.0004591)      0.0005653(0.0007368)      β  −1.444(0.008187)  −0.2376(0.01831)  0.2846(0.02526)  −1.196(0.0521)  −0.2737(0.07188)  0.3483(0.08285)    σy    0.0005463(6.945e−05)      0.0004679(9.47e−05)      σrv    7.281e−05(8.491e−06)      2.857e−05(3.381e−06)      Nonlinear model (1961–2014)  Nonlinear model (1987–2014)    0.3127(0.04224)  0  0  0.3452(0.08753)  0  0  κ  0.3063(0.05601)  0.002189(2.246e−05)  0  0.5507(0.09825)  0.003245(0.002091)  0    1.258(0.1103)  0.03804(0.02125)  0.4098(0.0377)  1.057(0.2745)  1.072e−05(0.0002734)  0.4449(0.2494)    ρ0    −0.001756(0.01408)      −0.001002(0.02238)      ρX  0.0002071(0.0001846)  0.003061(0.0002364)  0.004345(0.0001742)  0.0002036(0.0009384)  0.005161(0.0004481)  0.004939(0.0005533)    λ0  0.7569(0.04302)  −0.01631(0.5559)  −0.4413(0.3375)  0.3814(0.09227)  −0.02483(0.09312)  −0.3191(0.2209)      −0.2187(0.04129)  0.005572(0.001321)  −0.02053(0.005609)  −0.2244(0.06907)  0.003604(0.00792)  −0.02491(0.04552)  λX  −1.735e−06(4.238e−05)  0.001197(0.03785)  0.6863(0.03001)  −1.558e−06(2.248e−05)  0.001282(0.03908)  0.7165(0.05695)    −0.2943(0.1053)  −0.02387(0.01562)  0.04613(0.05121)  −0.3973(0.2578)  −0.0237(0.02542)  0.05947(0.2159)    γ    0.0003857(0.0004591)      0.0005653(0.0007368)      β  −1.444(0.008187)  −0.2376(0.01831)  0.2846(0.02526)  −1.196(0.0521)  −0.2737(0.07188)  0.3483(0.08285)    σy    0.0005463(6.945e−05)      0.0004679(9.47e−05)      σrv    7.281e−05(8.491e−06)      2.857e−05(3.381e−06)    Table III . Parameter estimates of the A1(3) and the A0(3) model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for two three-factor affine models: the A1(3) model with one stochastic volatility factor and the A0(3) model with only Gaussian factors. The parameter estimates for the A1(3) model are based on yield and realized variance data for the whole sample (1961:07–2014:04) and the parameter estimates for the A0(3) model are based on yield data for the whole sample. The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The extended Kalman filter is used to estimate the A1(3) model and the Kalman filter is used to estimate the A0(3) model.   A1(3) Model (1961–2014)  A0(3) Model (1961–2014)    1.421(0.1863)  0  0  0.7064(0.1982)  0  0  κ  −0.04787(1.899)  0.07225(0.01938)  −0.003283(4.101)  0.3558(0.2189)  0.06629(0.06185)  0    0.283(0.6523)  −0.009014(0.07474)  0.356(0.01893)  0.6473(0.1987)  0.3549(0.2011)  0.8202(0.1865)    ρ0    0.08832(0.3038)      0.02046(0.06848)      ρX  0.0003736(0.0002645)  0.001131(0.0009603)  1.385e−05(0.000302)  −0.001232(0.002566)  0.01626(0.002255)  0.01085(0.003361)    λ0  0  0.6101(106.4)  0.006454(7.178)  0.1353(0.1707)  −0.3741(0.1998)  0.1233(0.4018)      6.75e−05(0.07544)  0  0  −0.335(0.1954)  −0.01799(0.03515)  0.006627(0.09816)  λX  2.378(3.64)  −0.0006549(0.01964)  3.381(5.878)  −7.847e−05(0.001684)  0.1821(0.1682)  0.5751(0.114)    0.01683(0.7003)  −0.0001671(0.0733)  1.302e−05(0.01966)  0.183(0.2063)  −0.09196(0.08949)  −0.03485(0.1974)    δ  0  491.5(836.6)  2.417(0.3336)          (κX¯)  1.509(0.1109)  0  0          σy    0.0006001(8.676e−05)    0.0001038(1.698e−05)        σrv    6.18e−05(6.019e−06)            A1(3) Model (1961–2014)  A0(3) Model (1961–2014)    1.421(0.1863)  0  0  0.7064(0.1982)  0  0  κ  −0.04787(1.899)  0.07225(0.01938)  −0.003283(4.101)  0.3558(0.2189)  0.06629(0.06185)  0    0.283(0.6523)  −0.009014(0.07474)  0.356(0.01893)  0.6473(0.1987)  0.3549(0.2011)  0.8202(0.1865)    ρ0    0.08832(0.3038)      0.02046(0.06848)      ρX  0.0003736(0.0002645)  0.001131(0.0009603)  1.385e−05(0.000302)  −0.001232(0.002566)  0.01626(0.002255)  0.01085(0.003361)    λ0  0  0.6101(106.4)  0.006454(7.178)  0.1353(0.1707)  −0.3741(0.1998)  0.1233(0.4018)      6.75e−05(0.07544)  0  0  −0.335(0.1954)  −0.01799(0.03515)  0.006627(0.09816)  λX  2.378(3.64)  −0.0006549(0.01964)  3.381(5.878)  −7.847e−05(0.001684)  0.1821(0.1682)  0.5751(0.114)    0.01683(0.7003)  −0.0001671(0.0733)  1.302e−05(0.01966)  0.183(0.2063)  −0.09196(0.08949)  −0.03485(0.1974)    δ  0  491.5(836.6)  2.417(0.3336)          (κX¯)  1.509(0.1109)  0  0          σy    0.0006001(8.676e−05)    0.0001038(1.698e−05)        σrv    6.18e−05(6.019e−06)          The bond price in the nonlinear model is a weighted average of two Gaussian bond prices (see Theorem 1). Figure 2 shows the weight s(Xt) on the Gaussian base model. If the stochastic weight approaches zero or one, then the bond price approaches the bond price in a Gaussian model where yields are affine functions of the state variables and yield variances are constant. The stochastic weight is distinctly different from one and varies substantially over the sample period, that is, the mean and volatility of s(Xt) are 79.98% and 21.35%, respectively. Moreover, there are both high-frequency and low-frequency movements in s(Xt). The high-frequency movements push s(Xt) away from one during recessions; we see spikes during the 1970, 1973–75, 1980, 2001, and 2007–09 recessions. The low-frequency movement starts in the early 80s where the weight moves significantly below one and slowly returns over the next 30 years. Figure 2 View largeDownload slide Stochastic weight on Gaussian base model. The bond price in the nonlinear model is P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), where P0(t,T) and P1(t,T) are bond prices that belong to the class of essentially affine Gaussian term structure models and s(t) is a stochastic weight between 0 and 1. This figure shows the stochastic weight and the shaded areas show NBER recessions. Figure 2 View largeDownload slide Stochastic weight on Gaussian base model. The bond price in the nonlinear model is P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), where P0(t,T) and P1(t,T) are bond prices that belong to the class of essentially affine Gaussian term structure models and s(t) is a stochastic weight between 0 and 1. This figure shows the stochastic weight and the shaded areas show NBER recessions. To quantify the impact of nonlinearities in our model, we regress yields and their variances on the three state variables. By construction the R2 of these regressions in the A1(3) model is 100%. In the nonlinear model, the R2s when regressing the 1- to 5-year yields on the three state variables are 89.40%, 89.64%, 90.12%, 90.66%, and 91.14%, respectively, showing a considerable amount of nonlinearity. Nonlinearity shows up even stronger in the relation between yield variances and the three factors. Specifically, the R2s when regressing the 1- to 5-year yield variances on the three state variables are 29.52%, 27.99%, 28.18%, 29.52%, and 31.67%, respectively. For comparison, regressing the stochastic weight s(Xt) on all three state variables leads to an R2 of 80.88%. Overall, these initial results suggest an important role for nonlinearity and we explore this in detail in the next section. 4. Empirical Results In this section, we show that the nonlinear three-factor model captures time variation in expected excess bond returns and yield volatility. Moreover, the nonlinearity leads to URP and USV, an empirical stylized fact, that affine models cannot capture without knife-edge restrictions and additional state variables that describe variations in expected excess returns and yield variances but not yields. While nonlinearities help explain time-variation in excess returns and yield variances, we show in Section 4.3 that the amount of nonlinearity in the cross-section is small and thus our model retains the linear relation of US-Treasury yields across maturities. 4.1 Expected Excess Returns Expected excess returns of US Treasury bonds vary over time as documented in among others Fama and Bliss (1987) and Campbell and Shiller (1991) (CS). CS document this by regressing future yield changes on the scaled slope of the yield curve. Specifically, for all bond maturities τ=2,3,4,5 we have   yt+1τ−1−ytτ=const+φτ(ytτ−yt1τ−1)+residual, (46) where ytτ is the (log) yield at time t of a zero-coupon bond maturing at time t+τ. The slope regression coefficient is one if excess holding period returns are constant, but CS find negative regression coefficients implying that a steep slope predicts high future excess bond returns. Table IV replicates their findings for the sample period 1961:07–2014:04, that is, slope coefficients are negative, decreasing with maturity, and significantly different from one. Table IV . Campbell–Shiller regressions This table shows the coefficients φτ from the regressions yt+1τ−1−ytτ=const+φτ(ytτ−yt1τ−1) + residual, where ytτ is the zero-coupon yield at time t of a bond maturing at time t+τ ( τ and t are measured in years). The actual coefficients are calculated using monthly data of 1- to 5-year zero-coupon bond yields from 1961:7 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). Standard errors in parentheses are computed using the Hansen and Hodrick (1980) correction with twelve lags. The population coefficients for each model are based on one simulated sample path of 1,000,000 months. Campbell–Shiller regression coefficients   Bond maturity  2-Year  3-Year  4-Year  5-Year  Data  −0.63(0.64)  −0.93(0.69)  −1.21(0.73)  −1.47(0.77)  Nonlinear model  −0.61  −0.61  −0.63  −0.65  A1(3) model  −0.01  0.01  0.04  0.07  A0(3) model  −0.18  −0.37  −0.54  −0.71  Campbell–Shiller regression coefficients   Bond maturity  2-Year  3-Year  4-Year  5-Year  Data  −0.63(0.64)  −0.93(0.69)  −1.21(0.73)  −1.47(0.77)  Nonlinear model  −0.61  −0.61  −0.63  −0.65  A1(3) model  −0.01  0.01  0.04  0.07  A0(3) model  −0.18  −0.37  −0.54  −0.71  To check whether each model can match this stylized fact, we simulate a sample path of 1,000,000 months for 2-, 3-, 4-, and 5-year excess bond returns and compare the model implied CS regression coefficients with those observed in the data. Table IV shows that the nonlinear model and A0(3) model captures the negative CS regression coefficients in population. Figure 3 shows that 1-year expected excess returns in the nonlinear model are negative in the early 80s and positive since the mid-80s while they are alternating between positive and negative in the A1(3) model. Expected excess returns in the A0(3) model are also positive since the mid-80s but both affine models cannot capture the very low and high realized excess returns during the monetary experiment. To formally test whether the nonlinear model captures expected excess returns better than the two affine models we run regressions of realized excess returns on model implied expected excess returns in sample. Specifically,   rxt,t+nτ=ατ,n+βτ,nEt[rxt,t+nτ]+residual,  ∀τ>n=1,2,3,4,5, (47) where rxt,t+nτ is the n-year log return on a bond with maturity τ in excess of the n-year yield and Et[rxt,t+nτ] is the corresponding model implied expected excess return.14 The estimated expected excess returns for the nonlinear, A1(3), and A0(3) model are based on the sample period 1961:07 to 2014:04. The regression results are reported in Table V. If the model captures expected excess returns well, then the slope coefficient should be one, the constant zero. The slope coefficients are lower but generally close to one in the nonlinear model. In the A1(3) model, the slope coefficients are close to one at the 1-year horizon but are too low at longer horizon, while in the A0(3) model, the slope coefficients are too high at the 1-year horizon and too low for the 3- and 4-year horizon. The average R2 across bond maturity and holding horizon is 27.4% in the nonlinear model while it is only 6.5% in the A1(3) and 7.8% in the A0(3) model. Table V . Excess return regressions This table shows regression coefficients from a regression of realized (log) excess returns on model implied expected (log) excess returns in sample. Monthly data for bonds with maturities ranging from 1 to 5 years are from Gurkaynak, Sack, and Wright (2007) for the period 1961:07 to 2014:04. For n=1,2,3,4, n-year excess returns are computed by subtracting the n-year (log) yield from the n-year (log) holding-period return of a τ-year bond ( τ>n). The fraction of variance explained is defined as FVE=1−1T∑t=1T(rxt,t+nτ−Et[rxt,t+nτ])21T∑t=1T(rxt,t+nτ−1T∑t=1Trxt,t+nτ)2, where rxt,t+nτ is the n-year excess return on a bond with maturity τ and Et[rxt,t+nτ] is the corresponding model implied expected excess return. The last two rows contain the R2 from the regression of realized excess returns on the five yields and the five yields and five yield variances, respectively. Standard errors in parentheses are computed using the Hansen and Hodrick (1980) correction with the number of lags equal to the number of overlapping months. Regressing realized excess returns on model-implied expected excess in sample (1961–2014)     Nonlinear model   A1(3) model   A0(3) model   5Y  5Y+ 5VAR  Maturity  α×103  β  R2  FVE  α×103  β  R2  FVE  α×103  β  R2  FVE  R2  R2  1-Year holding horizon  τ=2  −2.58(2.89)  0.81(0.20)  0.22  0.15  2.53(2.62)  0.96(0.35)  0.12  0.10  −3.77(3.92)  1.30(0.45)  0.12  0.11  0.13  0.15  τ=3  −4.85(5.06)  0.83(0.20)  0.23  0.16  4.05(4.64)  0.97(0.33)  0.13  0.12  −7.35(6.71)  1.35(0.44)  0.14  0.12  0.14  0.16  τ=4  −6.60(6.74)  0.85(0.19)  0.24  0.18  5.07(6.32)  1.00(0.32)  0.15  0.13  −10.29(8.71)  1.37(0.41)  0.15  0.13  0.16  0.18  τ=5  −7.51(8.09)  0.86(0.19)  0.25  0.20  5.85(7.77)  1.03(0.32)  0.16  0.15  −12.61(10.23)  1.37(0.38)  0.17  0.15  0.17  0.20    2-Year holding horizon  τ=3  −3.73(4.60)  0.87(0.22)  0.29  0.23  6.31(4.73)  0.62(0.38)  0.07  0.00  −2.21(7.99)  0.91(0.55)  0.07  0.05  0.09  0.10  τ=4  −6.36(8.08)  0.89(0.21)  0.30  0.25  10.75(8.51)  0.67(0.37)  0.08  0.03  −5.19(13.52)  0.95(0.51)  0.09  0.07  0.10  0.11  τ=5  −7.72(10.70)  0.91(0.21)  0.32  0.29  14.24(11.71)  0.73(0.36)  0.10  0.05  −8.94(17.40)  1.01(0.47)  0.11  0.09  0.12  0.13    3-Year holding horizon  τ=4  −2.30(5.54)  0.83(0.23)  0.29  0.24  10.30(6.45)  0.31(0.41)  0.02  −0.15  2.20(11.54)  0.57(0.60)  0.03  −0.02  0.07  0.13  τ=5  −3.96(10.05)  0.87(0.22)  0.33  0.29  17.60(12.09)  0.40(0.40)  0.03  −0.11  0.43(19.90)  0.68(0.55)  0.05  0.01  0.08  0.15    4-Year holding horizon  τ=5  0.53(6.57)  0.75(0.24)  0.25  0.20  12.72(8.12)  0.25(0.42)  0.01  −0.21  1.82(14.53)  0.59(0.62)  0.04  −0.03  0.08  0.23  Regressing realized excess returns on model-implied expected excess in sample (1961–2014)     Nonlinear model   A1(3) model   A0(3) model   5Y  5Y+ 5VAR  Maturity  α×103  β  R2  FVE  α×103  β  R2  FVE  α×103  β  R2  FVE  R2  R2  1-Year holding horizon  τ=2  −2.58(2.89)  0.81(0.20)  0.22  0.15  2.53(2.62)  0.96(0.35)  0.12  0.10  −3.77(3.92)  1.30(0.45)  0.12  0.11  0.13  0.15  τ=3  −4.85(5.06)  0.83(0.20)  0.23  0.16  4.05(4.64)  0.97(0.33)  0.13  0.12  −7.35(6.71)  1.35(0.44)  0.14  0.12  0.14  0.16  τ=4  −6.60(6.74)  0.85(0.19)  0.24  0.18  5.07(6.32)  1.00(0.32)  0.15  0.13  −10.29(8.71)  1.37(0.41)  0.15  0.13  0.16  0.18  τ=5  −7.51(8.09)  0.86(0.19)  0.25  0.20  5.85(7.77)  1.03(0.32)  0.16  0.15  −12.61(10.23)  1.37(0.38)  0.17  0.15  0.17  0.20    2-Year holding horizon  τ=3  −3.73(4.60)  0.87(0.22)  0.29  0.23  6.31(4.73)  0.62(0.38)  0.07  0.00  −2.21(7.99)  0.91(0.55)  0.07  0.05  0.09  0.10  τ=4  −6.36(8.08)  0.89(0.21)  0.30  0.25  10.75(8.51)  0.67(0.37)  0.08  0.03  −5.19(13.52)  0.95(0.51)  0.09  0.07  0.10  0.11  τ=5  −7.72(10.70)  0.91(0.21)  0.32  0.29  14.24(11.71)  0.73(0.36)  0.10  0.05  −8.94(17.40)  1.01(0.47)  0.11  0.09  0.12  0.13    3-Year holding horizon  τ=4  −2.30(5.54)  0.83(0.23)  0.29  0.24  10.30(6.45)  0.31(0.41)  0.02  −0.15  2.20(11.54)  0.57(0.60)  0.03  −0.02  0.07  0.13  τ=5  −3.96(10.05)  0.87(0.22)  0.33  0.29  17.60(12.09)  0.40(0.40)  0.03  −0.11  0.43(19.90)  0.68(0.55)  0.05  0.01  0.08  0.15    4-Year holding horizon  τ=5  0.53(6.57)  0.75(0.24)  0.25  0.20  12.72(8.12)  0.25(0.42)  0.01  −0.21  1.82(14.53)  0.59(0.62)  0.04  −0.03  0.08  0.23  Figure 3 View largeDownload slide Expected excess returns. The graphs show the expected 1-year log excess returns of zero-coupon Treasury bonds with maturities of 2, 3, 4, and 5 years. The blue, black, and red lines show expected excess returns in the three-factor A0(3), A1(3), and nonlinear model, respectively. The shaded areas show NBER recessions. Figure 3 View largeDownload slide Expected excess returns. The graphs show the expected 1-year log excess returns of zero-coupon Treasury bonds with maturities of 2, 3, 4, and 5 years. The blue, black, and red lines show expected excess returns in the three-factor A0(3), A1(3), and nonlinear model, respectively. The shaded areas show NBER recessions. To measure how well the nonlinear model predicts excess returns we compare the mean squared error of the predictor to the unconditional variance of excess returns. Specifically, we define the statistic “fraction of variance explained” that measures the explanatory power of the model implied in sample expected excess return as follows:15  FVE=1−1T∑t=1T(rxt,t+nτ−Et[rxt,t+nτ])21T∑t=1T(rxt,t+nτ−1T∑t=1Trxt,t+nτ)2. (48) If the predictor is unbiased, then the R2 from the regression of realized on expected excess returns is equal to the FVE and otherwise it is an upper bound. Table V shows the FVEs of the nonlinear, A1(3), and A0(3) model for the sample period 1961:07–2014:04. The in sample FVEs for the nonlinear model are higher than for the A1(3) and A0(3) model. In contrast to the nonlinear and A0(3) model, the performance of the A1(3) model deteriorates as we increase the holding horizon. To compare the nonlinear model to affine models more generally we regress future excess returns on the five yields. The R2s from this regression, shown in the second to last column of Table V, is an upper bound for the FVE of any affine model for which expected excess returns are spanned by yields, for example, the Cochrane and Piazzesi (2005) factor.16 The FVEs of the nonlinear model are equal to or higher than the explanatory power of the Cochrane–Piazzesi factor. This implies that no affine model without hidden risk premium factors (see discussion below) can explain more of the variation in realized excess returns than the nonlinear model. The last column of Table V shows that the explanatory power of any estimator for expected excess returns that is spanned by yields and their variances is lower than the FVE of our nonlinear model. 4.1.a. Unspanned Risk Premia There is a lot of empirical evidence that shows that a part of excess bond returns is explained by macro factors not spanned by linear combinations of yields.17 For example, Bauer and Rudebusch (2017) find that the R2 when regressing realized excess returns on the first three PC of yields along with expected inflation is 85% higher when regressing on just the first three PCs.18 We refer to this empirical finding as Unspanned Risk Premia or URP. To quantitatively capture URP in a term structure model, Duffee (2011b); Joslin, Priebsch, and Singleton (2014); and Chernov and Mueller (2012) use five-factor Gaussian models. The reason for using five factors is that three factors are needed to explain the cross-section of bond yields and then one or two factors orthogonal to the yield curve explain expected excess returns. An alternative explanation for the spanning puzzle that has not been explored in the literature is that there is a nonlinear relation between yields and expected excess returns. We therefore ask the question: are nonlinearities empirically important for understanding the spanning puzzle? To answer the question, we start by regressing model-implied 1-year expected excess return on the first PC, the first and second PC, …, and all five PCs of model-implied yields for the sample period 1961:07–2014:04. Specifically, for all bond maturities τ=2,3,4,5 we run the in sample URP regressions   Et[rxt,t+1τ]=ατ,1:n+∑i=1nβτ,1:n PCi,t+εtτ,1:n,  ∀n=1,2,3,4,5, (49) where PCi,t denotes the i-th PC of all five yields (ordered by decreasing contribution to the total variation in yields). The in sample R2s of these regressions are reported in Panels B, C, and D of Table VI. Panels C and D show that by construction the first three PCs explain all the variation in expected excess returns in the A1(3) and A0(3) model since expected excess returns are linear functions of yields in affine models. Panel B shows that the first three PCs explain on average 69.4% of the variation of expected excess returns in the nonlinear model. That is, almost one-third of the variation of expected excess returns is due to a nonlinear relation between expected excess returns and yields in sample. Table VI . URP regressions This table shows R2s (in percent) from regressions of excess returns on the five PCs of yields. Panel A shows R2 from regressions of 1-year actual realized excess return on PCs of actual yields based on the sample 1961:07–2014:04. Panels B, C, and D show for each model in sample R2 from regressions of model-implied 1-year excess return on model-implied PCs of yields. Panels E, F, and G show for each model population R2s from regressions of realized 1-year excess return on PCs of yields based on a simulated data sample of 1,000,000 months. The final column of Panels E–G shows the R2s when using the model-implied excess return instead of the model-implied PCs as independent variable. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Et[rxt,t+1τ]  Panel A: R2 in data (1961–2014)  τ=2  2.1  12.6  13.2  14.4  14.6    τ=3  0.8  13.9  14.3  15.9  16.2    τ=4  0.3  15.6  15.8  17.8  18.1    τ=5  0.1  17.2  17.3  19.7  19.9      Panel B: In sample R2 for nonlinear three-factor model  τ=2  5.7  64.9  67.5  85.3  91.2    τ=3  4.6  67.7  69.1  84.8  90.8    τ=4  4.2  69.7  70.6  84.8  90.7    τ=5  4.4  71.0  72.0  85.4  90.9      Panel C: In sample R2 for A1(3) model  τ=2  10.8  99.8  100.0        τ=3  10.5  99.7  100.0        τ=4  10.2  99.6  100.0        τ=5  9.9  99.5  100.0          Panel D: In sample R2 for A0(3) model  τ=2  5.3  99.6  100.0        τ=3  1.4  99.9  100.0        τ=4  0.2  100.0  100.0        τ=5  0.0  99.6  100.0          Panel E: Population R2 for nonlinear three-factor model  τ=2  0.0  10.7  14.5  15.7  15.8  28.0  τ=3  0.1  10.5  14.4  15.2  15.3  26.2  τ=4  0.1  10.6  14.5  15.2  15.3  25.3  τ=5  0.1  11.1  14.7  15.6  15.6  25.2    Panel F: Population R2 for A1(3) model  τ=2  3.9  4.5  4.5  4.5  4.5  4.5  τ=3  3.9  4.5  4.5  4.5  4.5  4.5  τ=4  3.9  4.5  4.5  4.5  4.5  4.5  τ=5  3.9  4.5  4.5  4.5  4.5  4.5  Panel G: Population R2 for A0(3) model  τ=2  0.4  9.5  9.6  9.6  9.6  9.6  τ=3  0.1  9.8  9.8  9.8  9.8  9.8  τ=4  0.0  10.6  10.6  10.6  10.6  10.6  τ=5  0.0  11.7  11.7  11.7  11.7  11.7  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Et[rxt,t+1τ]  Panel A: R2 in data (1961–2014)  τ=2  2.1  12.6  13.2  14.4  14.6    τ=3  0.8  13.9  14.3  15.9  16.2    τ=4  0.3  15.6  15.8  17.8  18.1    τ=5  0.1  17.2  17.3  19.7  19.9      Panel B: In sample R2 for nonlinear three-factor model  τ=2  5.7  64.9  67.5  85.3  91.2    τ=3  4.6  67.7  69.1  84.8  90.8    τ=4  4.2  69.7  70.6  84.8  90.7    τ=5  4.4  71.0  72.0  85.4  90.9      Panel C: In sample R2 for A1(3) model  τ=2  10.8  99.8  100.0        τ=3  10.5  99.7  100.0        τ=4  10.2  99.6  100.0        τ=5  9.9  99.5  100.0          Panel D: In sample R2 for A0(3) model  τ=2  5.3  99.6  100.0        τ=3  1.4  99.9  100.0        τ=4  0.2  100.0  100.0        τ=5  0.0  99.6  100.0          Panel E: Population R2 for nonlinear three-factor model  τ=2  0.0  10.7  14.5  15.7  15.8  28.0  τ=3  0.1  10.5  14.4  15.2  15.3  26.2  τ=4  0.1  10.6  14.5  15.2  15.3  25.3  τ=5  0.1  11.1  14.7  15.6  15.6  25.2    Panel F: Population R2 for A1(3) model  τ=2  3.9  4.5  4.5  4.5  4.5  4.5  τ=3  3.9  4.5  4.5  4.5  4.5  4.5  τ=4  3.9  4.5  4.5  4.5  4.5  4.5  τ=5  3.9  4.5  4.5  4.5  4.5  4.5  Panel G: Population R2 for A0(3) model  τ=2  0.4  9.5  9.6  9.6  9.6  9.6  τ=3  0.1  9.8  9.8  9.8  9.8  9.8  τ=4  0.0  10.6  10.6  10.6  10.6  10.6  τ=5  0.0  11.7  11.7  11.7  11.7  11.7  Empirically, realized excess returns are invariably used in lieu of expected excess returns as dependent variable. Hence, for all bond maturities τ=2,3,4,5 we run the URP regressions   rxt,t+1τ=ατ,1:n+∑i=1nβτ,1:n PCi,t+residual,  ∀n=1,2,3,4,5. (50) Panel A in Table VI shows R2 from regressions of realized excess returns on PCs of model-implied yields in the data based on the sample period 1961:07–2014:04. To check whether each model can match the actual R2 from the URP regression, we simulate a sample path of 1,000,000 months for 2- to 5-year excess bond returns and 1- to 5-year bond yields and compare the model implied URP regression R2s to those observed in the data. Panels E, F, and G show the population R2 for the nonlinear, A1(3), and A0(3) model, respectively. In contrast to both affine models the population R2 in the nonlinear model is largely in line with the actual R2 observed in the data. The final column in Panels E, F, and G shows the population R2 when we replace the model implied PCs in URP regression (50) with the model implied expected excess return, that is,   rxt,t+1τ=ατ+βτEt[rxt,t+1τ]+residual,  ∀τ=2,3,4,5. (51) In the nonlinear model, the average (over all bond maturities) population R2 in regression (50) when n = 3 is 81% higher than in regression (51), that is, 26.2% versus 14.5%. This implies that if there is a macro variable that perfectly tracks expected excess returns, average R2s when regressing realized excess returns on the first three PCs and this macro factor would be 81% higher than when regressing on just the first three PCs; similar to the incremental R2 documented in Bauer and Rudebusch (2017). Of course, this is not because this macro factor contains any information not in the yield curve. Is it plausible that macro factors (partially) pick up nonlinearities? To address this question, we take the in sample residuals from regressing expected excess returns on PCs in the nonlinear model (Panel B in Table VI) and regress them on expected inflation. Specifically, for all bond maturities τ=2,3,4,5 we run the regression   εtτ,1:n=ατ,n+βτ,nπt+residual,  ∀ n=3,4,5, (52) where εtτ,1:n is the residual from URP regression (49) and πt is an estimator for expected inflation that is based on the Michigan Survey of Consumers (MSC).19Table VII shows the R2, slope coefficient, and 12-lag Newey–West corrected t-statistics of regression (52). Expected inflation explains about 11% of the variation in sample URP residuals based on the first three PCs and it is statistically significant at the 5% level. The R2s increase to slightly less than 20% when adding the fourth PC. Expected inflation remains statistically significant even when considering in sample URP residuals based on all five PCs. Hence, although all information about expected excess returns is contained in the yield curve, expected inflation appears to contain information about them when running linear regressions. Table VII . URP regression residuals and expected inflation We first run a regression of model-implied 1-year expected excess returns on the PCs of model-$132#?>implied yields, Et[rxt,t+1τ]=ατ,1:n+∑i=1nβτ,1:nPCi,t+εtτ,1:n; n=3,4,5. Then we run a regression of the residual of this regression on expected inflation, πt, measured by the cross-sectional average forecasts of the Michigan Surveys of Consumers (MSC), εtτ,1:n=ατ,n+βτ,nπt+residual,n=3,4,5. This table shows the R2 (in percent), slope coefficient, and the t-statistic from the second regression. Expected excess returns are measured as the expected 1-year bond return in excess of the 1-year yield. Standard errors are Newey and West (1987) corrected using twelve lags. The data sample is 1978:1–2014:4 as MSC is not available at monthly frequencies before 1978.   PC1−PC3   PC1−PC4   PC1−PC5   Maturity  R2  Slope  t-Statistic  R2  Slope  t-Statistic  R2  Slope  t-Statistic  τ=2  11.52  −0.0011  −1.97  20.56  −0.0011  −4.27  17.76  −0.0008  −3.88  τ=3  11.66  −0.0020  −2.09  19.55  −0.0020  −4.00  16.44  −0.0015  −3.63  τ=4  11.56  −0.0027  −2.17  18.65  −0.0027  −3.79  15.28  −0.0020  −3.39  τ=5  11.09  −0.0033  −2.15  17.78  −0.0032  −3.59  14.17  −0.0024  −3.17    PC1−PC3   PC1−PC4   PC1−PC5   Maturity  R2  Slope  t-Statistic  R2  Slope  t-Statistic  R2  Slope  t-Statistic  τ=2  11.52  −0.0011  −1.97  20.56  −0.0011  −4.27  17.76  −0.0008  −3.88  τ=3  11.66  −0.0020  −2.09  19.55  −0.0020  −4.00  16.44  −0.0015  −3.63  τ=4  11.56  −0.0027  −2.17  18.65  −0.0027  −3.79  15.28  −0.0020  −3.39  τ=5  11.09  −0.0033  −2.15  17.78  −0.0032  −3.59  14.17  −0.0024  −3.17  Overall, our nonlinear model highlights an alternative channel that helps explain the spanning puzzle: expected excess returns are nonlinearly related to yields and therefore a part of expected excess returns appears to be “hidden” from a linear combination of yields and this part can be picked up by macro factors. This is achieved in a parsimonious three-factor model rather than a five-factor model as is common in the literature. 4.2 Stochastic Volatility Table VIII shows that there is more than one factor in realized yield variances in our data: the first PC of yield variances explain 94.5% of the variation while the first two PCs explain 99.2%. The A1(3) model has by definition only one factor explaining volatilities and therefore the first PC explains all the variation in model-implied realized variances.20 In the nonlinear model, the first PC explains 97.5% of the variation in model-implied variances and the first two PCs explain 99.9%. Hence, yield variances in the nonlinear model exhibit a linear multi-factor structure as in the data. Table VIII . PC analysis of realized yield variances PCs are constructed from a panel of realized yield variances of constant-maturity zero-coupon bond yields with maturities ranging from 1 to 5 years. The contribution of the first PC, the first and second PC, and the first, second, and third PC to the total variation in the five realized yield variances are shown for the data, the nonlinear model, and the A1(3) model. Actual PC contributions are computed using monthly realized variance data (based on daily squared yield changes) from 1961:07 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). Population PC contributions for the nonlinear and A1(3) model are computed using monthly realized variance data (based on daily squared yield changes) based on one simulated sample path of 1,000,000 months.   PC1  PC1−PC2  PC1−PC3  Data  0.9454  0.9922  0.9996  Nonlinear model  0.9750  0.9993  1.0000  A1(3) model  1.0000  1.0000  1.0000    PC1  PC1−PC2  PC1−PC3  Data  0.9454  0.9922  0.9996  Nonlinear model  0.9750  0.9993  1.0000  A1(3) model  1.0000  1.0000  1.0000  The nonlinear and A1(3) model also have significantly different distributions of future yield volatility. Figure 4 shows the 1-year ahead conditional distribution of the instantaneous yield volatility for the bond with 3 years to maturity (the distributions for bonds with other maturities are similar).21 The volatility is a linear function of only one factor in the A1(3) model and the distribution of future volatility is fairly symmetric and does not change much over time. In the nonlinear model, volatility is a nonlinear function of three factors and the volatility distribution takes on a variety of shapes that persist over time. Figure 4 View largeDownload slide Distribution of 1-year ahead yield volatility. The graphs show quantiles in the 1-year ahead distribution of instantaneous volatility for the bond with a maturity of 3 years. The top graph shows the distribution in the three-factor nonlinear model, while the bottom graph shows the distribution in the three-factor A1(3) model. The data sample is 07:1961 to 04:2014 and the results for July in each year are plotted. Figure 4 View largeDownload slide Distribution of 1-year ahead yield volatility. The graphs show quantiles in the 1-year ahead distribution of instantaneous volatility for the bond with a maturity of 3 years. The top graph shows the distribution in the three-factor nonlinear model, while the bottom graph shows the distribution in the three-factor A1(3) model. The data sample is 07:1961 to 04:2014 and the results for July in each year are plotted. The 97.5 quantiles of the 1-year ahead volatility distribution in the nonlinear model show that the market did not anticipate the possibility of very volatile yields before the monetary experiment in the early 80s, apart from brief periods around the 1970s recessions. However, there is a significant probability of a high yield volatility scenario since the 80s, despite the fact that volatilities have come down to levels similar to those in the 60s and 70s. It is only in the calm 2005–06 period where a high-volatility scenario was unlikely. This finding suggests that there is information about the risk of a high volatility regime in Treasury bond data which is similar to the appearance of the smile in equity options since the stock market crash of eighty-seven. Figure 5 shows the 97.5 quantiles of the 1-year ahead distribution of yield volatility for sample periods with (1961:07–2014:04) and without (1987:08–2014:04) the early 80s. There is a fat right-tail in the volatility distribution in both cases and hence the nonlinear model captures the risk of strong increase in volatility, even when such an event is not in the sample used to estimate the model. Figure 5 View largeDownload slide Distribution of 1-year ahead yield volatility for the nonlinear model estimated using the period 1961-2014 and estimated using the period 1987-2014. The graphs show the 97.5% quantiles in the 1-year ahead distribution of instantaneous volatility. The red line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the whole sample period 1961–2014. The yellow line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the period 1987–2014. The results for September in each year are plotted. Figure 5 View largeDownload slide Distribution of 1-year ahead yield volatility for the nonlinear model estimated using the period 1961-2014 and estimated using the period 1987-2014. The graphs show the 97.5% quantiles in the 1-year ahead distribution of instantaneous volatility. The red line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the whole sample period 1961–2014. The yellow line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the period 1987–2014. The results for September in each year are plotted. The regime-switching models of Dai, Singleton, and Yang (2007); Bansal and Zhou (2002); and Bansal, Tauchen, and Zhou (2004) capture time variation in the probabilities of high volatility regimes by adding a state variable that picks up the regime. However, if a high-volatility regime is not in the sample used to estimate the model, then the regimes in the model will pick up minor variations in volatility (see the discussion in Dai, Singleton, and Yang, 2007). Everything works through nonlinearities in our model and therefore the probability of a high-volatility regime can be pinned down in a sample that does not include such an episode. 4.2.a. Unspanned Stochastic Volatility There is a large literature suggesting that interest rate volatility risk cannot be hedged by a portfolio consisting solely of bonds; a phenomenon referred to by Collin-Dufresne and Goldstein (2002) as Unspanned Stochastic Volatility or USV. The empirical evidence supporting USV typically comes from a low R2 when regressing measures of volatility on interest rates. For instance, Collin-Dufresne and Goldstein (2002) regress straddle returns on changes in swap rates and document R2s as low as 10%. Similarly, Andersen and Benzoni (2010) regress yield variances—measured using high frequency data—on the first six PCs of yields and find low R2s. Inconsistent with this evidence, standard affine models produce high R2s in USV regressions because there is a linear relation between yield variances and yields in the model. The nonlinear model provides an alternative explanation for low R2s in USV regression because the relation between yield variances and yields is nonlinear. However, it is an empirical question if nonlinearities in the model are strong enough to produce R2s similar to those found in the data. To answer this question, we follow Andersen and Benzoni (2010) and regress realized yield variance on PCs of yields. Specifically, for each bond maturity τ=1,2,3,4,5 and number n=1,2,3,4,5 of PCs we run the following USV regression in the data:   rvtτ=ατ+∑i=1nβiτPCi,t+εtτ, (53) where as in the previous section, PCi,t denotes the i-th PC of all five yields (ordered by decreasing contribution to the total variation in yields). The R2s of these USV regressions in the data are reported in Panel A of Table IX. The average R2 when regressing realized variance on the first three PCs is 32.4%, confirming that the PCs of yields only explain a fraction of the variation in yield variance in the data.22 Table IX . USV regressions Panel A shows R2s (in percent) from regressing realized variance on the five PCs of yields. Panel B shows in sample R2s for the nonlinear model from regressing model-implied instantaneous variance on the PCs of model-implied yields. Panel C shows in population R2s for the nonlinear model from regressing monthly realized variance (based on daily model-implied yields) on the PCs of monthly yields (based on averages over daily model-implied yields) based on a sample of 1,000,000 simulated months. Panels D and E show corresponding results for the A1(3) model, where only results for one maturity is shown because R2s are the same for all maturities. Panel F shows the explanatory power of the PCs of residuals from the USV regressions in Panels A and B. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: R2 in the data (1961–2014)  τ=1  24.3  26.8  35.0  35.7  40.2  τ=2  23.2  24.8  33.7  35.4  41.6  τ=3  21.9  22.8  32.6  35.8  42.5  τ=4  20.3  20.7  31.1  35.9  42.6  τ=5  18.8  18.9  29.6  36.0  42.6    Panel B: In sample R2 for nonlinear three-factor model  τ=1  21.6  21.8  44.0  47.9  55.1  τ=2  19.1  19.1  42.3  49.2  57.4  τ=3  17.5  17.6  41.8  50.9  59.9  τ=4  16.7  16.8  42.0  52.9  61.7  τ=5  16.9  17.2  42.4  54.6  62.1    Panel C: Population R2 for nonlinear three-factor model  τ=1  31.8  32.7  40.8  46.0  56.9  τ=2  32.8  33.8  40.7  48.6  60.8  τ=3  32.9  34.2  40.1  50.3  63.4  τ=4  32.6  34.4  39.2  51.0  65.0  τ=5  31.9  34.7  38.3  51.0  66.1    Panel D: In sample R2 for A1(3) model  τ=1,…,5  21.5  22.3  100.0  100.0  100.0    Panel E: Population R2 for A1(3) model  τ=1,…,5  0.0  0.0  45.8  45.8  45.8    Panel F: In sample PC analysis of USV regression residuals  Data  91.8  98.7  99.9  100.0  100.0  Nonlinear model  97.9  99.9  100.0  100.0  100.0  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: R2 in the data (1961–2014)  τ=1  24.3  26.8  35.0  35.7  40.2  τ=2  23.2  24.8  33.7  35.4  41.6  τ=3  21.9  22.8  32.6  35.8  42.5  τ=4  20.3  20.7  31.1  35.9  42.6  τ=5  18.8  18.9  29.6  36.0  42.6    Panel B: In sample R2 for nonlinear three-factor model  τ=1  21.6  21.8  44.0  47.9  55.1  τ=2  19.1  19.1  42.3  49.2  57.4  τ=3  17.5  17.6  41.8  50.9  59.9  τ=4  16.7  16.8  42.0  52.9  61.7  τ=5  16.9  17.2  42.4  54.6  62.1    Panel C: Population R2 for nonlinear three-factor model  τ=1  31.8  32.7  40.8  46.0  56.9  τ=2  32.8  33.8  40.7  48.6  60.8  τ=3  32.9  34.2  40.1  50.3  63.4  τ=4  32.6  34.4  39.2  51.0  65.0  τ=5  31.9  34.7  38.3  51.0  66.1    Panel D: In sample R2 for A1(3) model  τ=1,…,5  21.5  22.3  100.0  100.0  100.0    Panel E: Population R2 for A1(3) model  τ=1,…,5  0.0  0.0  45.8  45.8  45.8    Panel F: In sample PC analysis of USV regression residuals  Data  91.8  98.7  99.9  100.0  100.0  Nonlinear model  97.9  99.9  100.0  100.0  100.0  To assess the ability of the nonlinear model to capture USV we regress model-implied instantaneous yield variance on the PCs of model-implied yields:   v(t,t+τ)2=ατ+∑i=1nβi(τ)PCi,t+εtτ, (54) where v(t,t+τ) is given in Equation (29). Panel B shows that the average in sample R2 from USV regression (54) on the first three PCs (n = 3) is with 42.5% not substantially higher than in the data. In contrast, Panel D shows that in the A1(3) model the in sample R2 is 100% once the first three PCs are included in the USV regression (54). Hence, the presence of nonlinearities gives rise to low R2s in USV regressions. To understand why a significant part of variance is (linearly) unspanned by yields we recall that Equation (24) shows that the local volatility consists of two components, σlev and σvol, and thus the instantaneous yield variance is   σ(t,T)′σ(t,T)=σvol(t,T)′σvol(t,T)+σlev(t,T)′σlev(t,T)+2σvol(t,T)′σlev(t,T). (55) While the average (across maturities) in sample R2 from regressing the yield variance on the first five PCs of model-implied yields is only 59.2% (see Panel B of Table IX), the average in sample R2 from regressing each component in Equation (55) on the five PCs of yields is 94.4%, 88.2%, and 94.9%, respectively. Hence, each component is close to being linearly spanned, but they partially offset each other.23 When P1(t,T)=P2(t,T), the second and third term in Equation (55) vanish and volatility is largely spanned. Hence, the fraction of volatility that is unspanned varies significantly over time consistent with findings in Jacobs and Karoui (2009). The actual R2 of USV regression (53) reported in Panel A of Table IX are not directly comparable to the in sample R2 of USV regression (54) reported in Panel B for the nonlinear model and Panel C in the A1(3) model because realized variance based on daily data is a noisy proxy for yield variance. To check whether the nonlinear model can quantitatively capture USV in the data, we simulate 1,000,000 months of daily data (with 21 days in each month), compute the monthly realized variance and monthly average yield, and run the same URP regression as in the data, that is, regression (53). Panel C shows that the population R2s for the nonlinear model are very similar to the in sample R2 of Panel B where we use instantaneous variance instead of realized variance, that is, the average R2 is 39.8% when including the first three PCs. Hence, our results are robust to taking into account that realized variance based on daily data is a noisy proxy for instantaneous variance. Panel E shows that the average population R2 is 45.8% in the A1(3) model when regressing realized variance on the first three PCs of yields, which brings the population R2s much closer to the R2s in the data. However, the population R2s when using only one or two PCs in the A1(3) model are zero which is strongly at odds with the data.24 Bikbov and Chernov (2009) discuss how measurement error due to microstructure effects such as the bid-ask spread in option and bond prices affects the explanatory power of USV regressions. Collin-Dufresne and Goldstein (2002) argue that measurement error cannot be the reason for low R2s in USV regressions because there is a strong factor structure in the regression residuals across bond maturities. Panel F of Table IX confirms the factor structure in the data because the first PC of the residuals εt1,…,εt5 of the USV regression (53) explains 91.8% of the total variation in the USV residuals. Similarly, the first PC explains 98% of the variation in the residuals of USV regression (54) implied by the nonlinear model. Hence, our nonlinear model can capture the low explanatory power and the strong residual factor structure of the USV regressions that is observed in the data. Collin-Dufresne and Goldstein (2002) introduce knife edge parameter restrictions in affine models such that volatility state variable(s) do not affect bond yields, the so-called USV models. The most commonly used USV models—the A1(3) and A1(4)—have one factor driving volatility and this factor does not affect yields. These models generate zero R2s in the above USV regression in population, inconsistent with the empirical evidence. In contrast, the nonlinear model retains a parsimonious three-factor structure and yet can generate R2s in USV regressions which are broadly in line with those in the data. 4.3 Linearity in the Cross-Section of Yields The nonlinear bond pricing model allows us to capture the observed time variation in the mean and volatility of excess bond returns. However, Balduzzi and Chiang (2012) show that in the cross-section there is an almost linear relation between yields of different maturities. To check whether the nonlinear model captures the cross-sectional linearity we follow Duffee (2011a) and determine the PCs of zero-coupon bond yield changes with maturities ranging from 1 to 5 years and regress the yield changes of each bond on a constant and the first three PCs. The results for the data (based on 634 observations) and the three models (based on 1 million simulated observations) are shown in Table X. Table X . PC analysis of yields Principal Components (PCs) are constructed from a panel of constant-maturity zero-coupon bond yields with maturities ranging from 1 to 5 years. The contribution of the first PC, the first and second PC, and the first, second, and third PC to the total variation in the five bond yields are shown in Panel A. In Panel B yields for each bond are then regressed on the first three PCs and a constant (omitted). Actual PC contributions, slope coefficients, and R2s are computed using monthly data of 1- to 5-year zero-coupon bond yields from 1961:07 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). For all three models population PC contributions, population slope coefficients, and population R2s are based on one simulated sample path of 1,000,000 months.   PC1  PC1−PC2  PC1−PC3  Panel A: PCs of yields  Data  99.1909  99.9779  99.9996  Nonlinear model  99.6866  99.9977  100.0000  A1(3) model  99.9738  100.0000  100.0000  A0(3) model  99.3788  99.9819  100.0000    Panel B: Linearity in the cross-section of yields  Maturity   PC1  PC2   PC3   R2    Data (1961–2014)  τ = 1  0.47  −0.72  0.48  1.00  τ = 2  0.46  −0.22  −0.52  1.00  τ = 3  0.45  0.12  −0.46  1.00  τ = 4  0.43  0.36  −0.02  1.00  τ = 5  0.42  0.54  0.54  1.00    Nonlinear three-factor model in population  τ = 1  0.45  −0.67  0.52  1.00  τ = 2  0.45  −0.29  −0.37  1.00  τ = 3  0.45  0.04  −0.52  1.00  τ = 4  0.45  0.33  −0.17  1.00  τ = 5  0.44  0.59  0.54  1.00    A1(3) model in population  τ = 1  0.51  −0.66  0.53  1.00  τ = 2  0.48  −0.21  −0.58  1.00  τ = 3  0.44  0.14  −0.41  1.00  τ = 4  0.41  0.39  0.04  1.00  τ = 5  0.38  0.58  0.46  1.00    A0(3) model in population  τ = 1  0.47  −0.72  0.47  1.00  τ = 2  0.46  −0.21  −0.52  1.00  τ = 3  0.45  0.13  −0.46  1.00  τ = 4  0.43  0.36  −0.01  1.00  τ = 5  0.42  0.54  0.54  1.00    PC1  PC1−PC2  PC1−PC3  Panel A: PCs of yields  Data  99.1909  99.9779  99.9996  Nonlinear model  99.6866  99.9977  100.0000  A1(3) model  99.9738  100.0000  100.0000  A0(3) model  99.3788  99.9819  100.0000    Panel B: Linearity in the cross-section of yields  Maturity   PC1  PC2   PC3   R2    Data (1961–2014)  τ = 1  0.47  −0.72  0.48  1.00  τ = 2  0.46  −0.22  −0.52  1.00  τ = 3  0.45  0.12  −0.46  1.00  τ = 4  0.43  0.36  −0.02  1.00  τ = 5  0.42  0.54  0.54  1.00    Nonlinear three-factor model in population  τ = 1  0.45  −0.67  0.52  1.00  τ = 2  0.45  −0.29  −0.37  1.00  τ = 3  0.45  0.04  −0.52  1.00  τ = 4  0.45  0.33  −0.17  1.00  τ = 5  0.44  0.59  0.54  1.00    A1(3) model in population  τ = 1  0.51  −0.66  0.53  1.00  τ = 2  0.48  −0.21  −0.58  1.00  τ = 3  0.44  0.14  −0.41  1.00  τ = 4  0.41  0.39  0.04  1.00  τ = 5  0.38  0.58  0.46  1.00    A0(3) model in population  τ = 1  0.47  −0.72  0.47  1.00  τ = 2  0.46  −0.21  −0.52  1.00  τ = 3  0.45  0.13  −0.46  1.00  τ = 4  0.43  0.36  −0.01  1.00  τ = 5  0.42  0.54  0.54  1.00  Panel A of Table X shows that the first three PCs describe almost all the variation of bond yield changes in the nonlinear model which is consistent with the data. Moreover, Panel B of Table X shows that the population loading for each yield on the level, slope, and curvature factor in the nonlinear model is similar to the data. We conclude that the cross-sectional variation of bond yields implied by the nonlinear model is well explained by the first three PCs and no yield breaks this linear relation. 5. One Factor Model—an Illustration In this section, we estimate a one-factor nonlinear model to highlight the role of nonlinearity in a simple setting. Table XI shows the estimated parameters with asymptotic standard errors (in parenthesis) based on the sample period 1961:07–2014:04. Panel A of Figure 6 shows the stochastic weight s(X), defined in Equation (8), over the sample period. The dynamics of s(X) in the one-factor model are similar to the dynamics in the three-factor model—shown in Figure 2—although s(X) moves closer to zero in the three-factor model. Table XI . Parameter estimates of the one-factor nonlinear model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for the nonlinear one factor model. The parameter estimates are based on yield and realized variance data for the sample period 1961:07–2014:04. The bond maturities range from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The UKF is used to estimate the nonlinear model. κ  ρ0  ρX  λ0  λX  γ  β  σy  σrv  0.04027(0.03695)  0.03061(0.06164)  0.01093(0.0001309)  −0.6473(0.5656)  0.05966(0.03703)  0.01456(0.03452)  −0.4206(0.003205)  0.003122(0.0004019)  0.0001671(1.161e−05)  κ  ρ0  ρX  λ0  λX  γ  β  σy  σrv  0.04027(0.03695)  0.03061(0.06164)  0.01093(0.0001309)  −0.6473(0.5656)  0.05966(0.03703)  0.01456(0.03452)  −0.4206(0.003205)  0.003122(0.0004019)  0.0001671(1.161e−05)  Table XII . URP and USV regressions in the one-factor nonlinear model Panel A shows in sample R2s from regressions of model-implied 1-year excess returns on the PCs of model-implied yields. Panel B shows in sample R2s from regressing model-implied instantaneous variance on the PCs of model-implied yields. Model-implied PCs are constructed from a panel of constant-maturity zero-coupon bond yields with maturities ranging from 1- to 5- years. The in sample results are based on the sample period 1961:07–2014:04. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: In sample R2 for nonlinear one-factor model  τ=2  13.0  69.3  98.1  98.4  100.0  τ=3  14.4  73.2  98.1  98.4  100.0  τ=4  16.7  76.7  98.1  98.4  100.0  τ=5  19.9  79.9  98.1  98.4  100.0    Panel B: In sample R2 for nonlinear one-factor model  τ= 1  52.0  64.8  96.1  96.8  99.9  τ= 2  55.1  68.2  96.6  97.2  99.9  τ= 3  57.9  71.3  97.0  97.5  100.0  τ= 4  60.4  73.9  97.4  97.8  100.0  τ= 5  62.7  76.2  97.7  98.0  100.0  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: In sample R2 for nonlinear one-factor model  τ=2  13.0  69.3  98.1  98.4  100.0  τ=3  14.4  73.2  98.1  98.4  100.0  τ=4  16.7  76.7  98.1  98.4  100.0  τ=5  19.9  79.9  98.1  98.4  100.0    Panel B: In sample R2 for nonlinear one-factor model  τ= 1  52.0  64.8  96.1  96.8  99.9  τ= 2  55.1  68.2  96.6  97.2  99.9  τ= 3  57.9  71.3  97.0  97.5  100.0  τ= 4  60.4  73.9  97.4  97.8  100.0  τ= 5  62.7  76.2  97.7  98.0  100.0  Figure 6 View largeDownload slide Stochastic weight, yields, volatilities, and excess returns in a one-factor nonlinear model. Panel A shows the estimated stochastic weight on the Gaussian base model for the sample period 1961–2014 and Panel C shows it as function of the factor X. Panels B and F show yields as function of the factor X and the 1-year yield, respectively. Panels D and E show yield variance and expected excess returns as a function of the 1-year yield. The parameters for the one-factor nonlinear model are estimated using yields and realized yield variance of zero-coupon Treasury bonds with maturities ranging from 1 to 5 years. The range of X on the x-axis equals the range of X in the sample period 1961–2014. Figure 6 View largeDownload slide Stochastic weight, yields, volatilities, and excess returns in a one-factor nonlinear model. Panel A shows the estimated stochastic weight on the Gaussian base model for the sample period 1961–2014 and Panel C shows it as function of the factor X. Panels B and F show yields as function of the factor X and the 1-year yield, respectively. Panels D and E show yield variance and expected excess returns as a function of the 1-year yield. The parameters for the one-factor nonlinear model are estimated using yields and realized yield variance of zero-coupon Treasury bonds with maturities ranging from 1 to 5 years. The range of X on the x-axis equals the range of X in the sample period 1961–2014. Panel B of Figure 6 shows bond yields as a function of the state variable X. The relation between yields and X is close to linear for low Xs, while for high Xs the rate of change picks up and yields increase more rapidly with X. The reason is that s(X) starts to move away from one as X increases as seen in Panel C and moreover, the speed with which s(X) moves away from one increases for high Xs. Hence, for a given change in X, yields respond more for a high X, that corresponds to a high yield environment than for a low X, that corresponds to a low yield environment. Taken together, yield variances must be substantially higher for high yield environments than for low yield environments, which Panel D indeed shows. Moreover, the nonlinear relation between yields and their variances shown in Panel D leads to USV. Specifically, Panel B in Table XII shows that the first PC of yields only explains between 52% and 63% of yield variance in sample. In contrast, in any affine one-factor stochastic volatility model the R2 is 100%. Table XIII . Equilibrium models The table shows various equilibrium models and how they map into the nonlinear term structure models. Model  N  d  X  α  γ  β  Stationary  Two trees  1  2  log (D1(t)/D2(t))  −R  1  1  No  Multiple consumption goods  1  2  log (D1(t)/D2(t))  −Rb  (1−φφ)1−b  b  No  External habit formation  1  1  X  R  1  β  Yes  Heterogeneous beliefs  1  1  log (λ(t))  R  1  −1R  No  HARA utility  1  1  log (b/C(t))  −R  1  1  No  Model  N  d  X  α  γ  β  Stationary  Two trees  1  2  log (D1(t)/D2(t))  −R  1  1  No  Multiple consumption goods  1  2  log (D1(t)/D2(t))  −Rb  (1−φφ)1−b  b  No  External habit formation  1  1  X  R  1  β  Yes  Heterogeneous beliefs  1  1  log (λ(t))  R  1  −1R  No  HARA utility  1  1  log (b/C(t))  −R  1  1  No  Panel E of Figure 6 shows the relation between yields and instantaneous expected excess returns. In a standard affine one-factor model the relation is linear, but we see that in the nonlinear model there is a U-shaped relation. This nonlinearity creates URP in the model. Indeed, Panel A in Table XII shows that the first PC of yields only explains between 13.0% and 19.9% of the variation in expected excess returns. Given the U-shaped relation between excess returns and yields it is not surprising that the level factor does not have more explanatory power but it provides a stark contrast to one-factor affine models where the first PC always explains 100%. Finally, Figure 6’s Panel F shows that the relation between the yields themselves is approximately linear. Thus, although there are significant nonlinear effects in the time series of excess returns and yield volatilities, there is an approximately linear relation between yields in the cross-section which is consistent with the data. 6. Conclusion We introduce a new reduced-form term structure model where the short rate and market prices of risk are nonlinear functions of Gaussian state variables and derive closed-form solutions for yields. The nonlinear model with three Gaussian factors matches both the time-variation in expected excess returns and yield volatilities of US Treasury bonds from 1961 to 2014. Because there are nonlinear relations between factors, yields, and variances, the model exhibits features consistent with empirical evidence on URP and USV. We are not aware of any term structure models—in particular a model with only three factors—that have empirical properties consistent with evidence on time-variation in expected excess returns and volatilities, URP, and USV. Although our empirical analysis has focused on a nonlinear generalization of an affine Gaussian model, it is possible to generalize a wide range of term structure models such as affine models with stochastic volatility and quadratic models. Our generalization introduces new dynamics for bond returns while keeping the new model as tractable as the standard model. Furthermore, the method extends to processes such as jump-diffusions and continuous time Markov chains. We explore this in Feldhütter, Heyerdahl-Larsen, and Illeditsch (2016). Appendix A: General Nonlinear Gaussian Model In this section, we provide closed-form solutions for a more general class of nonlinear term structure models, prove Theorem 1, and relate our results to the class of reduced-form asset pricing models with closed-form solutions discussed in Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012). A.1 The Stochastic Discount Factor Let γ denote a nonnegative constant and M0(t) a strictly positive stochastic process with dynamics given in Equation (3). The SDF is defined as   M(t)=M0(t)(1+γe−β′X(t))α, (56) where β∈Rd and α∈N. A.2 Closed-Form Bond Prices We show in the next theorem that the price of a bond is a weighted average of bond prices in artificial economies that belong to the class of essentially affine Gaussian term structure models. Theorem 2 The price of a zero-coupon bond that matures at time T is   P(t,T)=∑n=0αsn(t)Pn(t,T), (57) where  Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t), (58)  sn(t)=(αn)γne−nβ′X(t)(1+γe−β′X(t))α. (59) The coefficient An(T−t)and the d-dimensional vector Bn(T−t)solve the ordinary differential equations given in Equations (10) and (11). Proof: Using the binomial expansion theorem, the SDF in Equation (56) can be expanded as   M(t)=∑n=0αMn(t), (60) where   Mn(t)=(αn)γne−nβ′X(t)M0(t). (61) Each summand can be interpreted as a SDF in an artificial economy.25 The dynamics of the strictly positive stochastic process Mn(t) are   dMn(t)Mn(t)=−rn(t)dt−Λn(t)′dW(t), (62) where   Λn(t)=Λ0(t)+nΣ′β, (63)  rn(t)=r0(t)+nβ′κ(X¯−X(t))−n22β′ΣΣ′β−nβ′ΣΛ0(t). (64) Plugging in for r0(t) and Λ0(t), it is straightforward to show that Λn(t) and rn(t) are affine functions of X(t) with coefficients given in Equations (12)–(15). If Mn(t) is interpreted as a SDF of an artificial economy indexed by n then we know that bond prices in this economy belong to the class of essentially (exponential) affine Gaussian term structure models and hence   Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t), (65) where coefficient An(T−t) and the d-dimensional vector Bn(T−t) solve the ordinary differential equations (10) and (11). Hence, the bond price is   P(t,T)=∑n=0αsn(t)Pn(t,T), (66) where sn(t) is given in Equation (59). Proof of Theorem 1. Set α = 1 in Theorem 2. A.3 Expected Return and Bond Volatility Applying Ito’s lemma to Equation (56) leads to the dynamics of the SDF:   dM(t)M(t)=−r(t) dt−Λ(t)′dW(t), (67) where   r(t)=r0(t)+α(1−s(t))β′κ(X¯−X(t))−α(1−s(t))β′ΣΛ0(t)−α2(1−s(t))(α(1−s(t))+s(t))β′ΣΣ′β. (68) and   Λ(t)=Λ0(t)+α(1−s(t))Σ′β. (69) Let ωn(t,T) denote the contribution of each artificial exponential affine bond price to the total bond price. Specifically,   ωn(t,T)=Pn(t,T)sn(t)P(t,T). (70) The dynamics of the bond price P(t, T) are   dP(t,T)P(t,T)=(r(t)+Λ(t)′σ(t,T))dt+σ(t,T)′dW(t), (71) where   σ(t,T)=Σ′(∑n=0αωn(t,T)Bn(T−t)+β(∑n=0αn ωn(t,T)−α(1−s(t)))). (72) A.4 Link to Reduced-Form Asset Pricing Models How is this model related to the large literature on reduced-form asset pricing models with closed-form solutions? At a first glance it does not seem to be related because the Gaussian state dynamics of X(t) under the data generating or physical measure are no longer Gaussian under the risk neutral measure Q. Specifically,   dX(t)=(κX¯−κX(t)−ΣΛ(t)) dt+ΣdWQ(t), (73) where Λ(t), given in Equation (69), is a nonlinear function of X(t) and   dQ=e−12∫0tΛ(a)′Λ(a)da−∫0tΛ(a)′dWP(a)dP. (74) However, we can compute the state dynamics under the risk neutral measure in the benchmark model defined as   dQ0=e−12∫0tΛ0(a)′Λ0(a)da−∫0tΛ0(a)′ dWP(a)dP, (75) where Λ0(t), which is given in Equation (5), is an affine function of X(t) and thus Gaussian under Q0. Specifically,   dX(t)=(κX¯−Σλ0,0−(κ+Σλ0,X)X(t))dt+ΣdWQ0(t). (76) Define   f(XT)=(1+γe−β′X(T))α(1+γe−β′X(t))α (77) and rewrite the bond price as an expectation under the risk neutral measure in the benchmark model. Specifically,   P(t,T)=Et[M(T)M(t)]=Et[M0(T)M0(t)f(XT)]=EtQ0[e−∫tTr0(a)daf(X(T))], (78) where r0(t), given in Equation (4), is affine in X(t). Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012) show that the expectation in Equation (78) can be solved in closed form if f(x)=∑n(cn+vnx)eβnx, the short rate is affine in X(t), and X(t) is Gaussian under Q. As shown in the proof of Theorem 1, the function f(Xt) can be expanded into the exponential polynomial   f(XT)=∑n=0α(αn)γn(1+γe−β′X(t))αe−nβ′X(T)=∑n=0αvne−nβ′X(T) (79) using the Binomial expansion theorem and hence the bond price is given in closed form. B. Equilibrium Models In this section, we show that the functional form of the state price density in Equations (2) and (56) naturally comes out of several equilibrium models.26 We need to allow for state variables that follow arithmetic Brownian motions and hence we rewrite the dynamics of the state vector in Equation (1) in the slightly more general form   dX(t)=(θ−κX(t))dt+ΣdW(t), (80) where θ is d-dimensional and κ and Σ are d × d-dimensional. In what follows, the standard consumption-based asset pricing model with a representative agent power utility and log-normally distributed consumption will serve as our benchmark model. Specifically, the state price density takes the following form:   M0(t)=e−ρtC(t)−R, (81) where R is the coefficient of RRA and C(t) is aggregate consumption with dynamics   dC(t)C(t)=μCdt+σC′dW(t). (82) The short rate and the market price of risk are both constant and given by   Λ0=RσC (83)  r0=ρ+RμC−12R(R+1)σC′σC. (84)Table XIII summarizes the relation between the nonlinear term structure models and the equilibrium models discussed in this section. B.1 Two Trees Cochrane, Longstaff, and Santa-Clara (2008) study an economy in which aggregate consumption is the sum of two Lucas trees. In particular they assume that the dividends of each tree follow a geometric Brownian motion   dDi(t)=Di(t)(μidt+σi′dW(t)). (85) Aggregate consumption is C(t)=D1(t)+D2(t). There is a representative agent with power utility and risk aversion R. Hence, the SDF is   M(t)=e−ρtC(t)−R=e−ρt(D1(t)+D2(t))−R=e−ρtD1(t)−R(1+D2(t)D1(t))−R=M0(t)(1+elog(D2(t))−log(D1(t)))−R, (86) where M0(t)=e−ρtD1−R and X(t)= log (D1(t)/D2(t)). Equation (86) has the same form as the SDF in Equation (56) with α∉N. Specifically, γ  =  1, β  =  1, and α=−R. Note that in this case the state variable is the log-ratio of two geometric Brownian motions and thus κ  =  0. The share s(X(t)) and hence yields are not stationary. B.2 Multiple Consumption Goods Models with multiple consumption goods and CES consumption aggregator naturally fall within the functional form of the SDF in Equation (56). Consider a setting with two consumption goods. The aggregate output of the two goods are given by   dDi(t)=Di(t)(μidt+σi′dW(t)). (87) Assume that the representative agent has the following utility over aggregate consumption C:   u(C,t)=e−ρt11−RC1−R, (88) where   C(C1,C2)=(φ1−bC1b+(1−φ)1−bC2b)1b. (89) We use the aggregate consumption bundle as numeraire, and consequently the state price density is   M(t)=e−ρtC(t)−R=(ϕ)bR1−be−ρtD1(t)−R(1+(1−ϕϕ)1−b(D2(t)D1(t))b)−Rb. (90) After normalizing Equation (90) has the same form as the SDF in Equation (56) with α∉N. Specifically, X(t)= log(D1(t)/D2(t)), γ=(1−φφ)1−b, β=b, and α=−Rb. As in the case with Two Trees, the share s(X(t)) and hence yields are not stationary. B.3 External Habit Formation The utility function in Campbell and Cochrane (1999) is   U(C,H)=e−ρt11−R(C−H)1−R, (91) where H is the habit level. Rather than working directly with the habit level, Campbell and Cochrane (1999) define the surplus consumption ratio s=C−HC. The SDF is   M(t)=e−ρtC(t)−Rs(t)−R (92)  =M0(t)s(t)−R. (93) Define the state variable   dX(t)=κ(X¯−X(t))dt+bdW(t), (94) where κ>0, σc>0 and b > 0. Now let s(t)=11+e−βX(t). Note that s(t) is between 0 and 1. In particular, s(t) follows   ds(t)=s(t)(μs(t)dt+σs(t)dW(t)), (95) where   μs(t)=(1−s(t))(βκ(X¯−X(t))+12(1−2s(t))β2b2) (96)  σs(t)=(1−s(t))βb. (97) The functional form of the surplus consumption ratio differs from Campbell and Cochrane (1999). However, note that the surplus consumption ratio is locally perfectly correlated with consumption shocks, mean-reverting and bounded between 0 and 1 just as in Campbell and Cochrane (1999). The state price density can be written as   M(t)=M0(t)(1+e−βX(t))R. (98) The above state price density has the same form as Equation (56) with parameters γ  =  1, β=β, and α=R. Note that the state variable X in this case is mean-reverting and therefore the share s(X(t)) and hence yields are stationary. B.4 Heterogeneous Beliefs Consider an economy with two agents that have different beliefs. Let both agents have power utility with the same coefficient of relative risk aversion, R. Moreover, assume that aggregate consumption follows the dynamics in Equation (82). The agents do not observe the expected growth rate and agree to disagree.27 The equilibrium can be solved by forming the central planner problem with stochastic weight λ that captures the agents’ initial relative wealth and their differences in beliefs (see Basak (2000), e.g.),   U(C,λ)=max⁡{C1+C2=C}(11−RC11−R+λ11−RC21−R). (99) Solving the above problem leads to the optimal consumption of the agents   C1(t)=s(t)C(t), (100)  C2(t)=(1−s(t))C(t), (101) where s(t)=11+λ(t)1R is the consumption share of the first agent and C is the aggregate consumption. The state price density as perceived by the first agent is   M(t)=e−ρtC1(t)−R  =e−ρtC(t)−Rs(t)−R  =M0(t)(1+e1Rlog(λ(t)))R. (102) This has the same form as Equation (56) with X(t)= log(λ(t)), γ  =  1, β=−1R, and α=R. The dynamics of the state variable is driven by the log-likelihood ratio of the two agents and consequently the share s(X(t)) and hence yields are not stationary. B.5 Hara Utility Consider a pure exchange economy with a representative agent with utility u(t,c)=e−ρt1−R(C+b)1−R, where R  >  0 and b  >  0. We can write the SDF as   M(t)=e−ρtC(t)−R  =e−ρt(C(t)+b)−R  =e−ρtC(t)−R(1+bC(t))−R  =M0(t)(1+elog(b)−log(C(t)))−R. (103) After normalizing Equation (103) has the same form as the SDF in Equation (56) with α∉N. Specifically, X(t)= log(b/C(t)), γ  =  1, β  =  1, and α=−R. Similarly to the model with Two Trees and multiple consumption goods, the share s(X(t)) and hence yields are nonstationary as the ratio b/C(t) will eventually converge to zero or infinity depending on the expected growth in the economy. C. Gauss–Hermite Quadrature While bond prices and bond yields are given in closed form, conditional moments of yields and bond returns are not. However, it is straightforward to calculate conditional expectations using Gauss–Hermite polynomials because the state vector X(t) is Gaussian.28 In this section, we illustrate how to calculate the expectation of a function of Gaussian state variables. Let μX and ΣX denote the conditional mean and variance of X(u) at time t < u. Let f(X(t)) be a function of the state vector at time t. For instance, if you want to calculate at time t the n-th uncentered moment of the bond yield with maturity τ at time u, then f(X(u))=(y(τ)(X(u)))n. Hence, the conditional expectation of y(τ)(X(u)) at time t is   Et[f(X(u))]=∫Rdf(x)1((2π)d|ΣX|)0.5e−12(x−μX)′ΣX−1(x−μX)dx. (104) Define y=2σX−1(x−μX) where σX is determined by the Cholesky decomposition of ΣX=σXσX′. Hence, we can write Equation (104) as   π−d2∫Rdf(2σXy+μX)e−y′ydy. (105) Let g(y)=f(2σXy+μX). We set d = 3 in the empirical section of the paper and thus the integral in Equation (105) can be approximated by the n point Gauss–Hermite quadrature   ∫Rdf(2σXy+μX)e−y′ydy≈∑i=1n∑j=1n∑k=1nwiwjwkg(y1(i),y2(j),y3(k)), (106) where wi are the weights and yl(i) are the nodes for the n point Gauss–Hermite quadrature for i=1,..,n and l=1,..,3. We use n  =  4 in Equation (106). Footnotes 1 Although the literature is too large to cite in full, examples include Campbell and Shiller (1991) and Cochrane and Piazzesi (2005) on time-varying excess returns, Duffee (2011b) and Joslin, Priebsch, and Singleton (2014) on unspanned expected excess returns, Jacobs and Karoui (2009) and Collin-Dufresne, Goldstein, and Jones (2009) on time-varying volatility, and Collin-Dufresne and Goldstein (2002) and Andersen and Benzoni (2010) on Unspanned Stochastic Volatility. 2 Dai and Singleton (2002), and Tang and Xia (2007) find that the only affine three-factor model that can capture time-variation in expected excess returns is the Gaussian model that has no stochastic volatility. Duffee (2011b), Wright (2011), and Joslin, Priebsch, and Singleton (2014) capture unspanned expected excess in four- and five-factor affine models that have no stochastic volatility. Unspanned Stochastic Volatility is typically modeled by adding additional factors to the standard three factors (Collin-Dufresne, Goldstein, and Jones, 2009; Creal and Wu, 2015). See also Dai and Singleton (2003) and Duffee (2010) and the references therein. 3 See Ludvigson and Ng (2009), Cooper and Priestley (2009), Cieslak and Povala (2015), Duffee (2011b), Joslin, Priebsch, and Singleton (2014), Chernov and Mueller (2012), and Bauer and Rudebusch (2017). 4 Papers on this topic include Collin-Dufresne and Goldstein (2002), Heidari and Wu (2003), Fan, Gupta, and Ritchken (2003), Li and Zhao (2006), Carr, Gabaix, and Wu (2009), Andersen and Benzoni (2010), Bikbov and Chernov (2009), Joslin (2014), and Creal and Wu (2015). 5 Collin-Dufresne and Goldstein (2002) introduce knife edge parameter restrictions in affine models such that volatility state variable(s) do not affect bond pricing, the so-called USV models. The most commonly used USV models—the A1(3) and A1(4) USV models—have one factor driving volatility and this factor is independent of yields. These models generate zero R2s in USV regressions inconsistent with the empirical evidence. 6 It is also possible to combine the general exponential-type SDF in our paper with the affine-type SDF in Filipovic, Larsson, and Trolle (2015) to get an exponential polynomial-type SDF similar to the setting of Chen and Joslin (2012). 7 Constantinides (1992), Rogers (1997), Gabaix (2009), Carr, Gabaix, and Wu (2009), and Filipovic, Larsson, and Trolle (2015) also specify the functional form of the SDF directly and provide closed-form solutions for bond prices. 8 Chan et al. (1992), Ait-Sahalia (1996a, 1996b), Stanton (1997), Pritsker (1998), Chapman and Pearson (2000), Ang and Bekaert (2002), and Jones (2003) study the nonlinearity of the short rate. Jermann (2013) and Richard (2013) study nonlinear term structure models, but they do not obtain closed-form solutions for bond prices. 9 If λ0,X and κ are zero, then σ0(T−t)=σ1(T−t). 10 The instantaneous volatility of the bond yield is 1τv(t,t+τ). 11 We calculate yield volatility by dividing price volatility with the bond duration. We calculate bond duration in two steps. We first find the coupon that makes the present value of a five year bond’s cash flow equal to the at-the-money price of the underlying bond the option is written on (available from Datastream). We then calculate the modified duration of this bond. 12 We choose to keep the estimation as parsimonious as possible by letting the σrv be the same for all realized variances. An alternative is to use the theoretical result in Barndorff-Nielsen and Shephard (2002) that the variance of the measurement noise is approximately two times the square of the spot variance and allow for different measurement errors across bond maturity. 13 Alan Greenspan became chairman of the Fed on August 11, 1987. 14 Moments of yields and returns in the nonlinear model are easily calculated using Gauss–Hermite quadrature, see Appendix C for details. In the rest of the paper we use Gauss–Hermite quadrature when we do not have closed-form solutions for expectations or variances. 15 Almeida, Graveline, and Joslin (2011) refer to this measure as a modified R2. 16 The average R2 from regressing excess returns onto yields for a 1-year holding horizon is 17% which is lower than the 37% reported in Cochrane and Piazzesi (2005). There are two reasons for this. First, the data sets are different. If we use the Fama–Bliss data, then the average R2 increases to 25%. Second, Cochrane and Piazzesi (2005) use the period 1964–2003 and R2s are lower outside this sample period as documented in Duffee (2012). 17 See Ludvigson and Ng (2009), Cooper and Priestley (2009), Cieslak and Povala (2015), Duffee (2011b), Joslin, Priebsch, and Singleton (2014), and Chernov and Mueller (2012). Bauer and Rudebusch (2017) argue that this evidence can be explained by measurement error. 18 The R2 is 0.36 in the former and 0.195 in the latter, see Bauer and Rudebusch (2017)’s Table 3. Joslin, Priebsch, and Singleton (2014) present similar evidence. 19 Expected inflation is measured as the cross-sectional average of one-year ahead price growth forecasts of consumers surveyed by the University of Michigan. MSC is a survey conducted on monthly frequencies covering a large cross-section of consumers and Ang, Bekaert, and Wei (2007) show that it is a good unbiased predictor of inflation. 20 Even though realized variances are noisy measures of integrated variances, average yields nevertheless span realized variances, see Andersen and Benzoni (2010). 21 The instantaneous yield volatility is 1τv(τ)(t) with v(τ)(t) given in Equation (29). 22 The R2 are higher than those found in Andersen and Benzoni (2010) because the sample period includes the monetary experiment, see Jacobs and Karoui (2009) for a discussion of the explanatory power in USV regressions for different time periods. 23 In particular, as s(t) moves toward the high volatility model, the yield difference between the two models tends to decrease. That is, as the first part in Equation (55) increases, the second part in the same equation tends to decrease. 24 Since measurement errors when using realized variance in the A1(3) model result in a drop in R2s from 100% to 45.8%, an interesting question is if the population R2s in the nonlinear model in Panel C would be substantially higher if instantaneous variance is used instead of realized variance. The answer is no. If instantaneous model-implied variance is used the average R2 is 48.4% instead of 43.6% in Panel C. 25 Similar expansions of the SDF appear in Yan (2008); Dumas, Kurshev, and Uppal (2009); Bhamra and Uppal (2014); and Ehling et al. (2016). 26 Chen and Joslin (2012) provide an alternative way to solve many of these equilibrium models that is based on a nonlinear transform of processes with tractable characteristic functions. 27 The model can easily be generalized to a setting with disagreement about multiple stochastic processes and learning. For instance, Ehling et al. (2016) show that in a model with disagreement about inflation, the bond prices are weighted averages of quadratic Gaussian term structure models. 28 For more details see Judd (1998). References Ahn D. H., Dittmar R., Gallant A. ( 2002) Quadratic term structure models: theory and evidence, Review of Financial Studies  15, 243– 288. Google Scholar CrossRef Search ADS   Ahn D. H., Dittmar R. F., Gallant A. R., Gao B. ( 2003) Purebred or hybrid? Reproducing the volatility in term structure dynamics, Journal of Econometrics  116, 147– 180. Google Scholar CrossRef Search ADS   Ahn D. H., Gao B. ( 1999) A parametric non-linear model of term structure dynamics, Review of Financial Studies  12, 721– 762. Google Scholar CrossRef Search ADS   Ait-Sahalia Y. ( 1996a) Nonparametric pricing of interest rate derivative securities, Econometrica  64, 527– 560. Google Scholar CrossRef Search ADS   Ait-Sahalia Y. ( 1996b) Testing continuous-time models of the spot interest rate, Review of Financial Studies  9, 385– 426. Google Scholar CrossRef Search ADS   Almeida C., Graveline J. J., Joslin S. ( 2011) Do interest rate options contain information about excess bond returns?, Journal of Econometrics  164, 35– 44. Google Scholar CrossRef Search ADS   Andersen T. G., Benzoni L. ( 2010) Do bonds span volatility risk in the U.S. Treasury market? A specification test for affine term structure models, Journal of Finance  65, 603– 653. Google Scholar CrossRef Search ADS   Andersen T. G., Bollerslev T., Diebold F. X. ( 2010) Parametric and nonparametric volatility measurement, in: Ait-Sahalia Y., Hansen L. P. (eds.), Handbook of Financial Econometrics, Vol 1: Tools and Techniques (Handbooks in Finance) , Elsevier, pp. 67– 137. Google Scholar CrossRef Search ADS   Ang A., Bekaert G. ( 2002) Short rate nonlinearities and regime switches, Journal of Economic Dynamics and Control  26, 1243– 1274. Google Scholar CrossRef Search ADS   Ang A., Bekaert G., Wei M. ( 2007) Do macro variables, asset markets, or surveys forecast inflation better?, Journal of Monetary Economics  54, 1163– 1212. Google Scholar CrossRef Search ADS   Balduzzi P., Chiang I. H. E. ( 2012) A simple test of the affine class of term structure models, Review of Asset Pricing Studies  2, 203– 244. Google Scholar CrossRef Search ADS   Bansal R., Tauchen G., Zhou H. ( 2004) Regime shifts, risk premiums in the term structure, and the business cycle, Journal of Business and Economic Statistics  22, 396– 409. Google Scholar CrossRef Search ADS   Bansal R., Zhou H. ( 2002) Term structure of interest rates with regime shifts, Journal of Finance  57, 1997– 2043. Google Scholar CrossRef Search ADS   Barndorff-Nielsen O., Shephard N. ( 2002) Econometric analysis of realized volatility and its use in estimating stochastic volatility models, Journal of the Royal Statistical Society B  64, 253– 280. Google Scholar CrossRef Search ADS   Basak S. ( 2000) A model of dynamic equilibrium asset pricing with heterogeneous beliefs and extraneous risk, Journal of Economic Dynamics and Control  24, 63– 95. Google Scholar CrossRef Search ADS   Bauer M., Rudebusch G. ( 2017) Resolving the spanning puzzle in macro-finance term structure models, Review of Finance  21, 511– 553. Bhamra H., Uppal R. ( 2014) Asset prices with heterogeneity in preferences and beliefs, Review of Financial Studies  27, 519– 580. Google Scholar CrossRef Search ADS   Bikbov R., Chernov M. ( 2009) Unspanned stochastic volatility in affine models: evidence from eurodollar futures and options, Management Science  55, 1292– 1305. Google Scholar CrossRef Search ADS   Campbell J. Y., Cochrane J. H. ( 1999) By force of habit: a consumption-based explanation of aggregate stock market behavior, The Journal of Political Economy  107, 205– 251. Google Scholar CrossRef Search ADS   Campbell J. Y., Shiller R. J. ( 1991) Yield spread and interest rate movements: a bird’s eye view, Review of Economic Studies  58, 495– 514. Google Scholar CrossRef Search ADS   Carr P., Gabaix X., Wu L. ( 2009) Linearity-generating processes, unspanned stochastic volatility, and interest-rate option pricing. Working paper. Carr P., Wu L. ( 2009) Stock options and credit default swaps: a joint framework for valuation and estimation, Journal of Financial Econometrics  2009, 1– 41. Chan K., Karolyi A., Longstaff F., Sanders A. B. ( 1992) An empirical comparison of alternative models of the short-term interest rate, Journal of Finance  47, 1209– 1227. Google Scholar CrossRef Search ADS   Chapman D. A., Pearson N. D. ( 2000) Is the short rate drift actually nonlinear?, Journal of Finance  LV, 355– 388. Google Scholar CrossRef Search ADS   Chen H., Joslin S. ( 2012) Generalized transform analysis of affine processes and applications in finance, Review of Financial Studies  25, 2225– 2256. Google Scholar CrossRef Search ADS   Chernov M., Mueller P. ( 2012) The term structure of inflation expectations, Journal of Financial Economics  106, 367– 394. Google Scholar CrossRef Search ADS   Christoffersen P., Dorion C., Jacobs K., Karoui L. ( 2014) Nonlinear Kalman filtering in affine term structure models, Management Science  60, 2248– 2268. Google Scholar CrossRef Search ADS   Cieslak A., Povala P. ( 2015) Expected returns in treasury bonds, conditionally accepted, Review of Financial Studies  28, 2859– 2901. Google Scholar CrossRef Search ADS   Cieslak A., Povala P. ( 2016) Information in the term structure of yield curve volatility, Journal of Finance  71, 1393– 1436. Google Scholar CrossRef Search ADS   Cochrane J., Piazzesi M. ( 2005) Bond risk premia, American Economic Review  95, 138– 160. Google Scholar CrossRef Search ADS   Cochrane J. H., Longstaff F. A., Santa-Clara P. ( 2008) Two trees, Review of Financial Studies  21, 347– 385. Google Scholar CrossRef Search ADS   Collin-Dufresne P., Goldstein R. S. ( 2002) Do bonds span the fixed income markets? Theory and evidence for unspanned stochastic volatility, Journal of Finance  57, 1685– 1730. Google Scholar CrossRef Search ADS   Collin-Dufresne P., Goldstein R. S., Jones C. S. ( 2009) Can interest rate volatility be extracted from the cross-section of bond yields?, Journal of Financial Economics  94, 47– 66. Google Scholar CrossRef Search ADS   Constantinides G. M. ( 1992) A theory of the nominal term structure of interest rates, Review of Financial Studies  5, 531– 552. Google Scholar CrossRef Search ADS   Cooper I., Priestley R. ( 2009) Time-varying risk premiums and the output gap, Review of Financial Studies  22, 2601– 2633. Google Scholar CrossRef Search ADS   Cox J. C., Ingersoll J. E., Ross S. A. ( 1985) A theory of the term structure of interest rates, Econometrica  53, 385– 408. Google Scholar CrossRef Search ADS   Creal D. D., Wu J. C. ( 2015) Estimation of affine term structure models with spanned or unspanned stochastic volatility, Journal of Econometrics  185, 60– 81. Google Scholar CrossRef Search ADS   Dai Q., Le A., Singleton K. ( 2010) Discrete-time affine term structure models with generalized market prices of risk, Review of Financial Studies  23, 2184– 2227. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. ( 2000) Specification analysis of affine term structure models, Journal of Finance  55, 1943– 1978. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. ( 2003) Term structure dynamics in theory and reality, Review of Financial Studies  16, 631– 678. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. J. ( 2002) Expectation puzzles, time-varying risk premia, and affine models of the term structure, Journal of Financial Economics  63, 415– 441. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. J., Yang W. ( 2007) Regime shifts in a dynamic term structure model of U.S. treasury bond yields, Review of Financial Studies  20, 1669– 1706. Google Scholar CrossRef Search ADS   Duffee G. ( 2002) Term premia and interest rate forecast in affine models, Journal of Finance  57, 405– 443. Google Scholar CrossRef Search ADS   Duffee G. ( 2010) Sharpe ratios in term structure models. Working paper, University of California, Berkeley. Duffee G. ( 2011a) Forecasting with the term structure: the role of no-arbitrage restrictions. Working Paper, Johns Hopkins University. Duffee G. ( 2011b) Information in (and not in) the term structure, Review of Financial Studies  24, 2895– 2934. Google Scholar CrossRef Search ADS   Duffee G. ( 2012) Forecasting interest rates. Working paper. Duffie D., Kan R. ( 1996) A yield-factor model of interest rates, Mathematical Finance  6, 379– 406. Google Scholar CrossRef Search ADS   Duffie D., Pan J., Singleton K. ( 2000) Transform analysis and asset pricing for affine jump-diffusions, Econometrica  68, 1343– 1376. Google Scholar CrossRef Search ADS   Dumas B., Kurshev A., Uppal R. ( 2009) Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility, The Journal of Finance  64, 579– 629. Google Scholar CrossRef Search ADS   Ehling P., Gallmeyer M., Heyerdahl-Larsen C., Illeditsch P. ( 2016) Disagreement about inflation and the yield curve. Working paper. Fama E. F., Bliss R. R. ( 1987) The information in long-maturity forward rates, American Economic Review  77, 680– 692. Fan R., Gupta A., Ritchken P. ( 2003) Hedging in the possible presence of unspanned stochastic volatility: evidence from Swaption markets, Journal of Finance  58, 2219– 2248. Google Scholar CrossRef Search ADS   Feldhütter P. ( 2016) Can affine models match the moments in bond yields?, Quarterly Journal of Finance  6, 1– 56. Google Scholar CrossRef Search ADS   Feldhütter P., Heyerdahl-Larsen C., Illeditsch P. ( 2016) Expanded term structure models. Work in progress. Feldhütter P., Lando D. ( 2008) Decomposing swap spreads, Journal of Financial Economics  88, 375– 405. Google Scholar CrossRef Search ADS   Filipovic D., Larsson M., Trolle A. B. ( 2015) Linear-rational term structure models, Journal of Finance , forthcoming. Gabaix X. ( 2009) Linearity-generating processes: a modelling tool yielding closed forms for asset prices. Working paper, CEPR and NBER. Gurkaynak R. S., Sack B., Wright J. H. ( 2007) The U.S. treasury yield curve: 1961 to the present, Journal of Monetary Economics  54, 2291– 2304. Google Scholar CrossRef Search ADS   Hansen L. P., Hodrick R. J. ( 1980) Forward exchange rates as optimal predictors of future spot rates: an econometric analysis, Journal of Political Economy  88, 829– 853. Google Scholar CrossRef Search ADS   Heidari M., Wu L. ( 2003) Are interest rate derivatives spanned by the term structure of interest rates?, Journal of Fixed Income  12, 75– 86. Google Scholar CrossRef Search ADS   Jacobs K., Karoui L. ( 2009) Conditional volatility in affine term-structure models: evidence from Treasury and swap markets, Journal of Financial Economics  91, 288– 318. Google Scholar CrossRef Search ADS   Jermann U. ( 2013) A production-based model for the term structure, Journal of Financial Economics  109, 293– 306. Google Scholar CrossRef Search ADS   Jones C. S. ( 2003) Nonlinear mean reversion in the short-term interest rate, Review of Financial Studies  16, 793– 843. Google Scholar CrossRef Search ADS   Joslin S. ( 2014) Can unspanned stochastic volatility models explain the cross section of bond volatilities?, Management Science , forthcoming. Joslin S., Priebsch M., Singleton K. ( 2014) Risk premiums in dynamic term structure models with unspanned macro risks, Journal of Finance  69, 1197– 1233. Google Scholar CrossRef Search ADS   Judd K. L. ( 1998) Numerical Methods in Economics , 1st edition, MIT Press. Leippold M., Wu L. ( 2003) Design and estimation of quadratic term structure models, European. Finance Review  7, 47– 73. Google Scholar CrossRef Search ADS   Li H., Zhao F. ( 2006) Unspanned stochastic volatility: evidence from hedging interest rate derivatives, Journal of Finance  61, 341– 378. Google Scholar CrossRef Search ADS   Ludvigson S., Ng S. ( 2009) Macro factors in bond risk premia, Review of Financial Studies  22, 5027– 5067. Google Scholar CrossRef Search ADS   Newey W. K., West K. D. ( 1987) A simple, positive sem-definite, heteroscedasticity, and autocorrelation consistent covariance matrix, Econometrica  55, 703– 708. Google Scholar CrossRef Search ADS   Pritsker M. ( 1998) Nonparametric density estimation and tests of continuous time interest rate models, Review of Financial Studies  11, 449– 487. Google Scholar CrossRef Search ADS   Richard S. ( 2013) A non-linear macroeconomic term structure model. Working paper. Rogers L. ( 1997) The potential approach to the term structure of interest rates and foreign exchange rates, Mathematical Finance  7, 157– 164. Google Scholar CrossRef Search ADS   Stanton R. ( 1997) A nonparametric model of term structure dynamics and the market price of interest rate risk, Journal of Finance  VII, 1973– 2002. Google Scholar CrossRef Search ADS   Tang H., Xia Y. ( 2007) An international examination of affine term structure models and the expectation hypothesis, Journal of Financial and Quantitative Analysis  42, 41– 80. Google Scholar CrossRef Search ADS   Vasicek O. ( 1977) An equilibrium characterization of the term structure, Journal of Financial Economics  5, 177– 188. Google Scholar CrossRef Search ADS   Wright J. ( 2011) Term premia and inflation uncertainty: empirical evidence from an international panel dataset, American Economic Review  101, 1514– 1534. Google Scholar CrossRef Search ADS   Yan H. ( 2008) Natural selection in financial markets: does it work?, Management Science  54, 1935– 1950. Google Scholar CrossRef Search ADS   © The Authors 2016. Published by Oxford University Press on behalf of the European Finance Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Review of Finance Oxford University Press

Risk Premia and Volatilities in a Nonlinear Term Structure Model

Loading next page...
 
/lp/ou_press/risk-premia-and-volatilities-in-a-nonlinear-term-structure-model-NOnwxGTEYW
Publisher
Oxford University Press
Copyright
© The Authors 2016. Published by Oxford University Press on behalf of the European Finance Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1572-3097
eISSN
1573-692X
D.O.I.
10.1093/rof/rfw052
Publisher site
See Article on Publisher Site

Abstract

Abstract We introduce a reduced-form term structure model with closed-form solutions for yields where the short rate and market prices of risk are nonlinear functions of Gaussian state variables. The nonlinear model with three factors matches the time-variation in expected excess returns and yield volatilities of US Treasury bonds from 1961 to 2014. Yields and their variances depend on only three factors, yet the model exhibits features consistent with Unspanned Risk Premia (URP) and Unspanned Stochastic Volatility (USV). 1. Introduction The US Treasury bond market is a large and important financial market. Policy makers, investors, and researchers need models to disentangle market expectations from risk premiums, and estimate expected returns and Sharpe ratios, both across maturity and over time. The most prominent class of models are affine models. However, there are a number of empirical facts documented in the literature that these models struggle with matching simultaneously: a) excess returns are time-varying, b) a part of expected excess returns is unspanned by the yield curve, c) yield variances are time varying, and d) a part of yield variances is unspanned by the yield curve.1 Affine models have been shown to match each of these four findings separately, but not simultaneously and only by increasing the number of factors beyond the standard level, slope, and curvature factors.2 We introduce an arbitrage-free dynamic term structure model where the short rate and market prices of risk are nonlinear functions of Gaussian state variables. We provide closed-form solutions for bond prices and since the factors are Gaussian our nonlinear model is as tractable as a standard Gaussian model. We show that the model can capture all four findings mentioned above simultaneously and it does so with only three factors driving yields and their variances. The value of having few factors is illustrated by Duffee (2010) who estimates a five-factor Gaussian model to capture time variation in expected returns and finds huge Sharpe ratios due to overfitting. We use a monthly panel of five zero-coupon Treasury bond yields and their realized variances from 1961 to 2014 to estimate the nonlinear model with three factors. To compare the implications of the nonlinear model with those from the standard class of affine models, we also estimate three-factor affine models with no or one stochastic volatility factor, the essentially affine A0(3) and A1(3) models. We first assess the ability of the nonlinear model to predict excess bond returns in sample and regress realized excess returns on model-implied expected excess return. The average R2 across bond maturities and holding horizons is 27% for the nonlinear model, 6.5% for the A1(3) model, 8% for the A0(3) model, and no more than 15% for any affine model in which expected excess returns are linear functions of yields. Campbell and Shiller (1991) document a positive relation between the slope of the yield curve and expected excess returns, a finding that affine models with stochastic volatility have difficulty matching (see Dai and Singleton, 2002). In simulations, we show that the nonlinear model can capture this positive relation. There is empirical evidence that a part of expected excess bond returns is not spanned by linear combinations of yields, a phenomenon we refer to as Unspanned Risk Premia (URP).3 URP arises in our model due to a nonlinear relation between expected excess returns and yields. To quantitatively explore this explanation, we regress expected excess returns implied by the nonlinear model on its principal components (PCs) of yields and find that the first three PCs explain 67–72% of the variation in expected excess returns. Furthermore, the regression residuals correlate with expected inflation in the data (measured through surveys), not because inflation has any explanatory power in the model but because it happens to correlate with “the amount of nonlinearity.” Duffee (2011b); Wright (2011); and Joslin, Priebsch, and Singleton (2014) use five-factor Gaussian models where one or two factors that are orthogonal to the yield curve explain expected excess returns and are related to expected inflation. We capture the same phenomenon with a nonlinear model that retains a parsimonious three-factor structure to price bonds and yet allows for time variation in volatilities. The nonlinear and A1(3) model can capture the persistent time variation in volatilities and the high volatility during the monetary experiment in the early 80s. However, the two models have different implications for the cross-sectional and predictive distribution of yield volatility. In the nonlinear model more than one factor drives the cross-sectional variation in yield volatilities while by construction the A1(3) model only has one. Moreover, in the nonlinear model, the probability of a high volatility scenario increases with the monetary experiment and remains high during the Greenspan era even though volatilities came down significantly. This finding resembles the appearance and persistence of the equity option smile since the crash of 1987. In contrast, the distribution of future volatility in the A1(3) model is similar before and after the monetary experiment. The volatility in the Gaussian A0(3) model is constant and thus this model overestimates volatility during the Greenspan era and underestimates it during the monetary experiment. There is a large literature suggesting that interest rate volatility risk cannot be hedged by a portfolio consisting solely of bonds; a phenomenon referred to by Collin-Dufresne and Goldstein (2002) as Unspanned Stochastic Volatility (USV). The empirical evidence supporting USV typically comes from a low R2 when regressing a measure of volatility on interest rates.4 To test the ability of the nonlinear model to capture the empirical evidence on USV, we use the methodology of Andersen and Benzoni (2010) and regress the model-implied variance of yields on the PCs of model-implied yields. The first three PCs explain 42–44%, which is only slightly higher than in the data where they explain 30–35% of the variation in realized yield variance. If we include the fourth and fifth PC, these numbers increase to 55–62% and 40–43%, respectively. Hence, our nonlinear model quantitatively captures the R2s in USV regressions in the data. In contrast, since there is a linear relation between yield variance and yields in standard affine models, the first three PCs explain already 100% in the A1(3) model.5 The standard procedure in the reduced-form term structure literature is to specify the short rate and the market prices of risk as functions of the state variables. Instead, we model the functional form of the stochastic discount factor (SDF) directly by multiplying the SDF from a Gaussian term structure model with the term 1+γe−βX, where β and γ are parameters and X is the Gaussian state vector. This functional form is a special case of the SDF that arises in many equilibrium models in the literature. In such models, the SDF can be decomposed into a weighted average of different representative agent models. Importantly, the weights on the different models are time-varying and this is a source of time-varying risk premia and volatility of bond yields. Our paper is not the first to propose a nonlinear term structure model. Dai, Singleton, and Yang (2007) estimate a regime-switching model and show that excluding the monetary experiment in the estimation leads their model to pick up minor variations in volatility. In contrast, the nonlinear model can pick up states that did not occur in the sample used to estimate the model. Specifically, we estimate the model using a sample that excludes the monetary experiment and find that it still implies a significant probability of a strong increase in volatility. Furthermore, while the Gaussian model is a special case of both models our nonlinear model only increases the number of parameters from 23 to 27 whereas the regime-switching model in Dai, Singleton, and Yang (2007) has fifty-six parameters. Quadratic term structure models have been proposed by Ahn, Dittmar, and Gallant (2002) and Leippold and Wu (2003) among others, but Ahn, Dittmar, and Gallant (2002) find that quadratic term structure models are not able to generate the level of conditional volatility observed for short- and intermediate-term bond yields. Ahn et al. (2003) propose a class of nonlinear term structure models based on the inverted square-root model of Ahn and Gao (1999), but in contrast to our nonlinear model they do not provide closed-form solutions for bond prices. Dai, Le, and Singleton (2010) develop a class of discrete time models that are affine under the risk neutral measure, but show nonlinear dynamics under the historical measure. They illustrate that the model encompasses many equilibrium models with recursive preferences and habit formation. Carr, Gabaix, and Wu (2009) use the linearity generating framework of Gabaix (2009) to price swaps and interest rate derivatives. Similarly, in concurrent work Filipovic, Larsson, and Trolle (2015) introduce a linear-rational framework to price bonds and interest rate derivatives. Both approaches lead to closed-form solutions of discount bonds, but their pricing framework is based on the potential approach of Rogers (1997) while our approach is based on a large class of equilibrium models discussed in Appendix B.6 The rest of the paper is organized as follows. Section 2 motivates and describes the model. Section 3 estimates the model and Section 4 presents the empirical results. In Section 5, we estimate a one-factor version of the nonlinear model and describe how nonlinearity works in this simple case, while Section 6 concludes. 2. A Nonlinear Term Structure Model In this section, we present a nonlinear model of the term structure of interest rates. We first motivate the model by presenting regression evidence for nonlinearities in excess returns and yield variances in Section 2.1 and then we present the model in Section 2.2. 2.1 Motivating Regression Evidence In Panel A of Table I, we regress yearly excess returns measured on a monthly basis for the period 1961–2014 on the first three PCs of yields and product combinations of the PCs. Specifically, the dependent variable is the average 1-year excess return computed over US Treasury bonds with a maturity of 2, 3, 4, and 5 years (we explain the details of the data in Section 3.1). As independent variables, we first include all terms that are a product of up to three terms of the first three PCs (in short PC1, PC2, and PC3). We then exclude terms with the lowest t-statistics one-by-one until only significant terms remain. The first row of Panel A shows the result. There are only three significant terms in the regression and they are all nonlinear. The second row shows the regression when we include only the first three PCs, the linear relation implied by affine models, and we see that the R2 of 16% is substantially lower than the R2 of 29% in the first regression. Finally, the third row shows that the linear terms add almost no explanatory power to the first regression. Table I . Nonlinearities in expected excess returns and realized variances This table shows coefficients, standard errors (in brackets), and R2s from regressions of realized 1-year log excess bond returns (Panel A) and realized yield variances (Panel B), averaged over bond maturities two to five in Panel A and one to five in Panel B, on three different sets of yield PCs and powers thereof. The independent variables in the first row of both panels are obtained by first considering all product combinations of the first three PCs up to and including order three and excluding every variable with the lowest t-statistic until only significant variables remain. The monthly excess returns, realized variances, and PCs are calculated using daily zero-coupon bond yield data from 1961:07 to 2014:04. The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The number of observations is 622 for the predictive regressions in Panel A and 634 for the contemporaneous regressions in Panel B. All variables are standardized and standard errors are computed using the Hansen and Hodrick (1980) correction with twelve lags in Panel A and the Newey and West (1987) correction with twelve lags in Panel B. ** and * indicate statistical significance at the 1% and 5% levels, respectively. Panel A: 1-Year average excess bond returns   PC1  PC2  PC3  PC1PC2  PC13  PC23  R2        −0.37**(0.09)  0.40**(0.11)  0.33**(0.08)  0.29  0.07(0.13)  0.39**(0.12)  −0.05(0.11)        0.16  −0.14(0.17)  0.10(0.14)  −0.04(0.10)  −0.33**(0.10)  0.49**(0.16)  0.26*(0.11)  0.30    Panel B: Realized average yield variance  PC1  PC2  PC3  PC12  PC1PC3  PC2PC3  PC13  PC1PC2PC3  R2          0.12**(0.04)  −0.12*(0.05)  −0.18**(0.06)  0.39**(0.07)  −0.34**(0.05)  0.55  0.48**(0.12)  −0.10(0.09)  0.32**(0.09)            0.34  0.10(0.14)  0.04(0.05)  0.04(0.06)  0.14(0.08)  −0.10(0.05)  −0.16*(0.07)  0.30*(0.15)  −0.34**(0.08)  0.55  Panel A: 1-Year average excess bond returns   PC1  PC2  PC3  PC1PC2  PC13  PC23  R2        −0.37**(0.09)  0.40**(0.11)  0.33**(0.08)  0.29  0.07(0.13)  0.39**(0.12)  −0.05(0.11)        0.16  −0.14(0.17)  0.10(0.14)  −0.04(0.10)  −0.33**(0.10)  0.49**(0.16)  0.26*(0.11)  0.30    Panel B: Realized average yield variance  PC1  PC2  PC3  PC12  PC1PC3  PC2PC3  PC13  PC1PC2PC3  R2          0.12**(0.04)  −0.12*(0.05)  −0.18**(0.06)  0.39**(0.07)  −0.34**(0.05)  0.55  0.48**(0.12)  −0.10(0.09)  0.32**(0.09)            0.34  0.10(0.14)  0.04(0.05)  0.04(0.06)  0.14(0.08)  −0.10(0.05)  −0.16*(0.07)  0.30*(0.15)  −0.34**(0.08)  0.55  Panel B in Table I shows similar regressions with the average excess return replaced by the average monthly realized yield variance as dependent variable (again, we leave the detailed explanation of how we calculate realized variance to Section 3.1). The first regression in Panel B shows the regression result when the independent variables are products of up to three terms of PC1, PC2, and PC3, after excluding insignificant terms as in Panel A. None of the linear terms are significant and the five significant nonlinear terms generate an R2 of 55%. Row 2 shows that a regression with only the first three PCs, the linear relation implied by affine models, yields a substantially lower R2 of 34% and row 3 shows that the linear terms do not raise the R2 when included in the first regression in Panel B. These regressions show that there is a nonlinear relation both between yields and excess returns and between yields and yield variances. While the R2s in the nonlinear regressions are informative about the importance of nonlinearity, overfitting and collinearity limits the ability to pin down the precise nonlinear relation. In particular, when running the regressions for each bond maturity individually it is rare that the same set of nonlinear terms is significant. This evidence suggests that we need a parsimonious nonlinear model to study the nonlinearities in the first and second moments of bond returns, which we present in the next section. 2.2 The Model Uncertainty is represented by a d-dimensional Brownian motion W(t)=(W1(t),…,Wd(t))′. There is a d-dimensional Gaussian state vector X(t) that follows the dynamics   dX(t)=κ(X¯−X(t))dt+ΣdW(t), (1) where X¯ is d-dimensional and κ and Σ are d × d-dimensional. 2.2.a. The stochastic discount factor We assume that there is no arbitrage and that the strictly positive SDF is   M(t)=M0(t)(1+γe−β′X(t)), (2) where γ denotes a nonnegative constant, β a d-dimensional vector, and M0(t) a strictly positive stochastic process. Equation (2) is a key departure from standard term structure models (Vasicek, 1977; Cox, Ingersoll, and Ross, 1985; Duffie and Kan, 1996; Dai and Singleton, 2000). Rather than specifying the short rate and the market price of risk, which in turn pins down the SDF, we specify the functional form of the SDF directly.7 This approach is motivated by equilibrium models where the SDF is a function of structural parameters and thus the risk-free rate and market price of risk are interconnected. Moreover, we show in Appendix B that the SDF specified in Equation (2) is a special case of the SDF in many popular equilibrium models. To keep the model comparable to the existing literature on affine term structure models, we introduce a base model for which M0(t) is the SDF. The dynamics of M0(t) are   dM0(t)M0(t)=−r0(t)dt−Λ0(t)′dW(t), (3) where r0(t) and Λ0(t) are affine functions of the state vector X(t). Specifically,   r0(t)=ρ0,0+ρ0,X′X(t), (4)  Λ0(t)=λ0,0+λ0,XX(t), (5) where ρ0,0 is a scalar, ρ0,X and λ0,0 are d-dimensional vectors, and λ0,X is a d × d-dimensional matrix. It is well known that bond prices in the base model belong to the class of Gaussian term structure models (Dai and Singleton, 2002; Duffee, 2002) with essentially affine risk premia. If γ or every element of β is zero, then the nonlinear model collapses to the Gaussian base model. We now provide closed-form solutions for bond prices in the nonlinear model. 2.2.b. Closed-form bond prices Let P(t, T) denote the price at time t of a zero-coupon bond that matures at time T. Specifically,   P(t,T)=Et[M(T)M(t)]. (6) We show in the next theorem that the price of a bond is a weighted average of bond prices in artificial economies that belong to the class of essentially affine Gaussian term structure models. Theorem 1. The price of a zero-coupon bond that matures at time T is  P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), (7) where  s(t)=11+γe−β′X(t)∈(0,1] (8)  Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t). (9)The coefficient An(T−t)and the d-dimensional vector Bn(T−t)solve the ordinary differential equations  dAn(τ)dτ=12Bn(τ)′ΣΣ′Bn(τ)+Bn(τ)′(κX¯−Σλn,0)−ρn,0, An(0)=0, (10)  dBn(τ)dτ=−(κ+Σλn,X)′Bn(τ)−ρn,X, Bn(0)=0d, (11)where  ρn,0=ρ0,0+nβ′κX¯−nβ′Σλ0,0−12n2β′ΣΣ′β, (12)  ρn,X=ρ0,X−nκ′β−nλ0,X′Σ′β, (13)  λn,0=λ0,0+nΣ′β, (14)  λn,X=λ0,X. (15) The proof of this theorem is given in Appendix A where we provide a proof for a more general class of nonlinear models and also show how our nonlinear model is related to the class of reduced-form asset pricing model presented in Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012). To provide some intuition, we define M1(t)=γe−β′X(t)M0(t) and rewrite the bond pricing Equation (6) using the fact that s(t)=M0(t)/M(t)=1−M1(t)/M(t). Specifically,   P(t,T)=s(t)Et[M0(T)M0(t)]+(1−s(t))Et[M1(T)M1(t)].  (16) Applying Ito’s lemma to M1(t) leads to   dM1(t)M1(t)=−r1(t)dt−Λ1(t)′dW(t), (17) where r1(t) and Λ1(t) are affine functions of the state vector X(t). Specifically,   r1(t)=ρ1,0+ρ1,X′X(t), (18)  Λ1(t)=λ1,0+λ1,XX(t), (19) where ρ1,0, ρ1,X, λ1,0, and λ1,X are given in Equations (12), (13), (14), and (15), respectively. Hence, both expectations in Equation (16) are equal to bond prices in artificial economies with discount factors M0(t) and M1(t), respectively. These bond prices belong to the class of essentially affine term structure models and hence P(t, T) can be computed in closed form. 2.2.c. The short rate and the price of risk Applying Ito’s lemma to Equation (2) leads to the dynamics of the SDF:   dM(t)M(t)=−r(t)dt−Λ(t)′dW(t), (20) where both the short rate r(t) and the market price of risk Λ(t) are nonlinear functions of the state vector X(t) given in Equations (21) and (22), respectively. The short rate is given by   r(t)=s(t)r0(t)+(1−s(t))r1(t). (21) Our model allows the short rate to be nonlinear in the state variables without losing the tractability of closed-form solutions of bond prices and a Gaussian state space.8 The d-dimensional market price of risk is given by   Λ(t)=s(t)Λ0(t)+(1−s(t))Λ1(t). (22)Equation (22) shows that even if the market prices of risk in the base model are constant, the market prices of risks in the general model are stochastic due to variations in the weight s(t). When s(t) approaches zero or one, then Λ(t) approaches the market price of risk of an essentially affine Gaussian model. 2.2.d. Expected return and volatility We know that the bond price is a weighted average of exponential affine bond prices (see Equation (7)). Hence, variations of instantaneous bond returns are due to variations in the two artificial bond prices P0(t,T) and P1(t,T) and due to variations in the weight s(t). Specifically, the dynamics of the bond price P(t, T) are   dP(t,T)P(t,T)=(r(t)+e(t,T))dt+σ(t,T)′dW(t), (23) where e(t, T) denotes the instantaneous expected excess return and σ(t,T) denotes the local volatility vector of a zero-coupon bond that matures at time T. The local volatility vector of the bond is given by   σ(t,T)=ω(t,T)σ0(T−t)+(1−ω(t,T))σ1(T−t)+(s(t)−ω(t,T))β, (24) where σi(T−t)=Σ′Bi(T−t) denotes the local bond volatility vector in the Gaussian model with SDF Mi(t) and ω(t,T) denotes the contribution of P0(t,T) to the bond price P(t, T). Specifically,   ω(t,T)=P0(t,T)s(t)P(t,T)∈(0,1]. (25) When s(t) approaches zero or one, then σ(t,T) approaches the deterministic local volatility of a Gaussian model. However, in contrast to the short rate and the market price of risk, the local volatility can move outside the range of the two local Gaussian volatilities, σ0(T−t) and σ1(T−t), because of the last term in Equation (24). Intuitively, there are two distinct contributions to volatility in Equation (24). The direct term, defined as   σvol(t,T)=ω(t,T)σ0(T−t)+(1−ω(t,T))σ1(T−t), (26) arises because the two artificial Gaussian models have constant but different yield volatilities. The indirect term, defined as   σlev(t,T)=(s(t)−ω(t,T))β (27) is due to the Gaussian models having different yield levels. Two special cases illustrate the distinct contributions to volatility. If P0(t,T)=P1(t,T)=P(t,T), then σlev(t,T)=0 and the local volatility vector reduces to σ(t,T)=s(t)σ0(T−t)+(1−s(t))σ1(T−t). On the other hand, if σ0(T−t)=σ1(T−t), the first term is constant, but there is still stochastic volatility due to the second term which becomes more important the bigger the difference between the two artificial bond prices P1(t,T) and P0(t,T).9 The instantaneous expected excess return and volatility of the bond are   e(t,T)=Λ(t)′σ(t,T) (28)  v(t,T)=σ(t,T)′σ(t,T). (29) Equations (20)–(29) show that the nonlinear term structure model differs from the essentially affine Gaussian base model in two important aspects. First, the volatilities of bond returns and yields are time-varying and hence expected excess returns are moving with both the price and the quantity of risk.10 Second, the short rate r(t), the instantaneous volatility v(t, T), and the instantaneous expected excess return e(t, T) are nonlinear functions of X(t). 3. Estimation In this section, we estimate the nonlinear model described in Section 2 and compare it to standard essentially affine A0(3) and A1(3) models. All three models have three factors and the number of parameters is 22 in the A0(3) model, 23 in the A1(3) model, and 26 in the nonlinear model. The A0(3) is a special case of our nonlinear model where M0(t)=M(t). The A1(3) model is well know and thus we only present the setup with results in Section 3.2 and defer details to Feldhütter (2016). 3.1 Data We treat each period as a month and estimate the models using a monthly panel of five zero-coupon Treasury bond yields and their realized variances. Although it is in theory sufficient to use bond yields to estimate the model, we add realized variances in the estimation to improve the identification of model parameters (see Cieslak and Povala [2016] for a similar approach). We use daily (continuously compounded) 1-, 2-, 3-, 4-, and 5-year zero-coupon yields extracted from US Treasury security prices by the method of Gurkaynak, Sack, and Wright (2007). The data are available from the Federal Reserve Board’s webpage and cover the period 1961:07 to 2014:04. For each bond maturity, we average daily observations within a month to get a time series of monthly yields. We use realized yield variance to measure yield variance. Let ytτ and rvtτ denote the yield and realized yield variance of a τ-year bond in month t based on daily observations within that month. Specifically,   ytτ=1Nt∑i=1Ntyd,tτ(i), (30)  rvtτ=12∑i=1Nt(yd,tτ(i)−yd,tτ(i−1))2, (31) where yd,tτ(i) denotes the yield at day i within month t, Nt denotes the number of trading days within month t, and yd,tτ(0) denotes the last observation in month t – 1. The realized variance converges to the quadratic variation as N approaches infinity, see Andersen, Bollerslev, and Diebold (2010) and the references therein for a detailed discussion. To check the accuracy of realized variance based on daily data, we compare realized volatility with option-implied volatility (to be consistent with the options literature we look at implied volatility instead of implied variance). We obtain implied price volatility of 1 month at-the-money options on 5-year Treasury futures from Datastream and convert it to yield volatility.11 We then calculate monthly volatility by averaging over daily volatilities. Figure 1 shows that realized volatility tracks option-implied volatility closely (the correlation is 87%), and thus we conclude that realized variance is a useful measure for yield variance. Figure 1 View largeDownload slide Realized and option-implied yield volatility. We use monthly estimates of realized yield variance based on daily squared yield changes. This graph shows that option-implied volatility tracks the realized volatility closely over the last 10 years (the correlation is 87%). Option-implied volatility is obtained from 1-month at-the-money options on 5-year Treasury futures as explained in the text. The data are available from Datastream since October 2003. Figure 1 View largeDownload slide Realized and option-implied yield volatility. We use monthly estimates of realized yield variance based on daily squared yield changes. This graph shows that option-implied volatility tracks the realized volatility closely over the last 10 years (the correlation is 87%). Option-implied volatility is obtained from 1-month at-the-money options on 5-year Treasury futures as explained in the text. The data are available from Datastream since October 2003. 3.2 The A1(3) Model We briefly describe the A1(3) model in this section and refer the reader to Feldhütter (2016) for a detailed discussion. The dynamics of the three-dimensional state vector X(t)=(X1(t),X2(t),X3(t))′ are   dX(t)=κ(X¯−X(t))dt+S(t)dW(t), (32) where X¯=(X¯1,0,0)′ is the long run mean,   κ=(κ(1,1)00κ(2,1)κ(2,2)κ(2,3)κ(3,1)κ(3,2)κ(3,3)) (33) is the positive-definite mean reversion matrix, W(t) is a three-dimensional Brownian motion, and   S(t)=(δ1X1(t)0001+δ2X1(t)0001+δ3X1(t)) (34) is the local volatility matrix with δ=(1,δ2,δ3). The dynamics of the SDF M(t) are   dM(t)M(t)=−r(t)dt−Λ(t)′dW(t), (35) where the short rate r(t) and the three-dimensional vector S(t)Λ(t) are affine functions of X(t). Specifically,   r(t)=ρ0+ρX′X(t), (36) where ρ0 is a scalar and ρX is a three-dimensional vector. The market price of risk Λ(t) is the solution of the equation   S(t)Λ(t)=(λX,(1,1)X1(t)λ0,2+λX,(2,1)X1(t)+λX,(2,2)X2(t)+λX,(2,3)X3(t)λ0,3+λX,(3,1)X1(t)+λX,(3,2)X2(t)+λX,(3,3)X3(t)), (37) where λ0 denotes a three-dimensional vector and λX a three-dimensional matrix. The bond price and the instantaneous yield volatility are   P(X(t),T)=eA(T−t)+B(T−t)′X(t) (38)  v(X(t),T)=B(T−t)′S(X(t))S(X(t))B(T−t), (39) where A(τ) and B(τ) satisfy the ODEs   dA(τ)dτ=(κX¯−λ0)′B(τ)+12∑i=23Bi(τ)2−ρ0,  A(0)=0 (40)  dB(τ)dτ=(κ+λX)′B(τ)+12∑i=13Bi(τ)δi−ρX,  B(0)=03×1. (41) 3.3 Estimation Methodolgy We use the unscented Kalman filter (UKF) to estimate the nonlinear model, the extended Kalman filter to estimate the A1(3) model, and the Kalman filter to estimate the A0(3) model. Christoffersen et al. (2014) show that the UKF works well in estimating term structure models when highly nonlinear instruments are observed. We briefly discuss the setup but refer to Christoffersen et al. (2014) and Carr and Wu (2009) for a detailed description of this nonlinear filter. When we estimate the nonlinear and A1(3) model, we stack the five yields in month t in the vector Yt, the corresponding five realized yield variances in the vector RVt, and set up the model in state-space form. The measurement equation is   (YtRVt)=(f(Xt)g(Xt))+(σyI500σrvI5)ϵt,ϵt∼N(0,I10), (42) where f(·) is the function determining the relation between the latent variables and yields, g(·) is the function determining the relation between the latent variables and the variance of yields, and the positive parameters σy and σrv are the pricing errors for yields and their variances.12 Specifically, f=(f1,…,f5)′ and g=(g1,…,g5)′ where   fτ(Xt)=−1τln⁡(P(Xt,t+τ)) (43)  gτ(Xt)=1τ2v2(Xt,t+τ) (44) with P(Xt,t+τ) and v(Xt,t+τ) given in Equation (7) and (29), respectively. In the A0(3) model, yield volatility is constant and we therefore only include yields (and not realized variances) in the estimation. In the nonlinear model, the state space is Gaussian and thus the transition equation for the latent variables is   Xt+1=C+DXt+ηt+1,             ηt∼N(0,Q), (45) where C is a vector and D is a matrix that enters the 1-month ahead expectation of Xt, that is, Et(Xt+1)=C+DXt. The covariance matrix of Xt+1 given Xt is constant and equal to Q. In the A1(3) model, we use the Gaussian transition equation in (45) as an approximation because the dynamics of X are non-Gaussian. This is a standard approach in the literature (Feldhütter and Lando, 2008). The bond price P(Xt,t+τ) and volatility v(Xt,t+τ) in Equations (43) and (44) of the A1(3) model are given in Equation (38) and (39) in Section 3.2. We can use the approximate Kalman filter because both yields and variances are affine in X in the A1(3) model. We use the normalization proposed in Dai and Singleton (2000) to guarantee that the parameters are well identified if s(Xt) is close to zero or one, or if γ and all elements of β are close to zero. In the nonlinear model, we assume in Equation (1) that the mean reversion matrix, κ, is lower triangular, the mean of the state variables, X¯, is the zero vector, and that the local volatility, Σ, is the identity matrix. The normalizations in the A1(3) model are given in Section 3.2. 3.4 Estimation Results Estimated parameters with asymptotic standard errors (in parenthesis) are reported in Tables II and III. Columns 2–4 of Table II show parameter estimates based on the whole sample (1961:07–2014:04) that includes the period of the monetary experiments where the 1-year bond yield and its volatility exceeded 15% and 5%, respectively. We re-estimate the nonlinear model using only yield and volatility data for the period 1987:08–2014:04, which excludes the high yield and yield volatility regime during the early 80s.13 Columns 5–7 of Table II show that the estimated parameters for this period are similar to the estimated parameters for the whole sample period. In particular, the nonlinear parameters β and γ have the same sign and are of similar magnitude. The parameter estimates for the A1(3) and the A0(3) model are reported in Table III. Table II . Parameter estimates of the nonlinear three-factor model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for the nonlinear three-factor model. The left column shows parameters estimates based on yield and realized variance data for the whole sample (1961:07–2014:04) and the right column shows parameter estimates based on yield and realized variance data for the Post-Volcker period (1987:08–2014:04). The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The UKF is used to estimate the nonlinear model.   Nonlinear model (1961–2014)  Nonlinear model (1987–2014)    0.3127(0.04224)  0  0  0.3452(0.08753)  0  0  κ  0.3063(0.05601)  0.002189(2.246e−05)  0  0.5507(0.09825)  0.003245(0.002091)  0    1.258(0.1103)  0.03804(0.02125)  0.4098(0.0377)  1.057(0.2745)  1.072e−05(0.0002734)  0.4449(0.2494)    ρ0    −0.001756(0.01408)      −0.001002(0.02238)      ρX  0.0002071(0.0001846)  0.003061(0.0002364)  0.004345(0.0001742)  0.0002036(0.0009384)  0.005161(0.0004481)  0.004939(0.0005533)    λ0  0.7569(0.04302)  −0.01631(0.5559)  −0.4413(0.3375)  0.3814(0.09227)  −0.02483(0.09312)  −0.3191(0.2209)      −0.2187(0.04129)  0.005572(0.001321)  −0.02053(0.005609)  −0.2244(0.06907)  0.003604(0.00792)  −0.02491(0.04552)  λX  −1.735e−06(4.238e−05)  0.001197(0.03785)  0.6863(0.03001)  −1.558e−06(2.248e−05)  0.001282(0.03908)  0.7165(0.05695)    −0.2943(0.1053)  −0.02387(0.01562)  0.04613(0.05121)  −0.3973(0.2578)  −0.0237(0.02542)  0.05947(0.2159)    γ    0.0003857(0.0004591)      0.0005653(0.0007368)      β  −1.444(0.008187)  −0.2376(0.01831)  0.2846(0.02526)  −1.196(0.0521)  −0.2737(0.07188)  0.3483(0.08285)    σy    0.0005463(6.945e−05)      0.0004679(9.47e−05)      σrv    7.281e−05(8.491e−06)      2.857e−05(3.381e−06)      Nonlinear model (1961–2014)  Nonlinear model (1987–2014)    0.3127(0.04224)  0  0  0.3452(0.08753)  0  0  κ  0.3063(0.05601)  0.002189(2.246e−05)  0  0.5507(0.09825)  0.003245(0.002091)  0    1.258(0.1103)  0.03804(0.02125)  0.4098(0.0377)  1.057(0.2745)  1.072e−05(0.0002734)  0.4449(0.2494)    ρ0    −0.001756(0.01408)      −0.001002(0.02238)      ρX  0.0002071(0.0001846)  0.003061(0.0002364)  0.004345(0.0001742)  0.0002036(0.0009384)  0.005161(0.0004481)  0.004939(0.0005533)    λ0  0.7569(0.04302)  −0.01631(0.5559)  −0.4413(0.3375)  0.3814(0.09227)  −0.02483(0.09312)  −0.3191(0.2209)      −0.2187(0.04129)  0.005572(0.001321)  −0.02053(0.005609)  −0.2244(0.06907)  0.003604(0.00792)  −0.02491(0.04552)  λX  −1.735e−06(4.238e−05)  0.001197(0.03785)  0.6863(0.03001)  −1.558e−06(2.248e−05)  0.001282(0.03908)  0.7165(0.05695)    −0.2943(0.1053)  −0.02387(0.01562)  0.04613(0.05121)  −0.3973(0.2578)  −0.0237(0.02542)  0.05947(0.2159)    γ    0.0003857(0.0004591)      0.0005653(0.0007368)      β  −1.444(0.008187)  −0.2376(0.01831)  0.2846(0.02526)  −1.196(0.0521)  −0.2737(0.07188)  0.3483(0.08285)    σy    0.0005463(6.945e−05)      0.0004679(9.47e−05)      σrv    7.281e−05(8.491e−06)      2.857e−05(3.381e−06)    Table III . Parameter estimates of the A1(3) and the A0(3) model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for two three-factor affine models: the A1(3) model with one stochastic volatility factor and the A0(3) model with only Gaussian factors. The parameter estimates for the A1(3) model are based on yield and realized variance data for the whole sample (1961:07–2014:04) and the parameter estimates for the A0(3) model are based on yield data for the whole sample. The bond maturities are ranging from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The extended Kalman filter is used to estimate the A1(3) model and the Kalman filter is used to estimate the A0(3) model.   A1(3) Model (1961–2014)  A0(3) Model (1961–2014)    1.421(0.1863)  0  0  0.7064(0.1982)  0  0  κ  −0.04787(1.899)  0.07225(0.01938)  −0.003283(4.101)  0.3558(0.2189)  0.06629(0.06185)  0    0.283(0.6523)  −0.009014(0.07474)  0.356(0.01893)  0.6473(0.1987)  0.3549(0.2011)  0.8202(0.1865)    ρ0    0.08832(0.3038)      0.02046(0.06848)      ρX  0.0003736(0.0002645)  0.001131(0.0009603)  1.385e−05(0.000302)  −0.001232(0.002566)  0.01626(0.002255)  0.01085(0.003361)    λ0  0  0.6101(106.4)  0.006454(7.178)  0.1353(0.1707)  −0.3741(0.1998)  0.1233(0.4018)      6.75e−05(0.07544)  0  0  −0.335(0.1954)  −0.01799(0.03515)  0.006627(0.09816)  λX  2.378(3.64)  −0.0006549(0.01964)  3.381(5.878)  −7.847e−05(0.001684)  0.1821(0.1682)  0.5751(0.114)    0.01683(0.7003)  −0.0001671(0.0733)  1.302e−05(0.01966)  0.183(0.2063)  −0.09196(0.08949)  −0.03485(0.1974)    δ  0  491.5(836.6)  2.417(0.3336)          (κX¯)  1.509(0.1109)  0  0          σy    0.0006001(8.676e−05)    0.0001038(1.698e−05)        σrv    6.18e−05(6.019e−06)            A1(3) Model (1961–2014)  A0(3) Model (1961–2014)    1.421(0.1863)  0  0  0.7064(0.1982)  0  0  κ  −0.04787(1.899)  0.07225(0.01938)  −0.003283(4.101)  0.3558(0.2189)  0.06629(0.06185)  0    0.283(0.6523)  −0.009014(0.07474)  0.356(0.01893)  0.6473(0.1987)  0.3549(0.2011)  0.8202(0.1865)    ρ0    0.08832(0.3038)      0.02046(0.06848)      ρX  0.0003736(0.0002645)  0.001131(0.0009603)  1.385e−05(0.000302)  −0.001232(0.002566)  0.01626(0.002255)  0.01085(0.003361)    λ0  0  0.6101(106.4)  0.006454(7.178)  0.1353(0.1707)  −0.3741(0.1998)  0.1233(0.4018)      6.75e−05(0.07544)  0  0  −0.335(0.1954)  −0.01799(0.03515)  0.006627(0.09816)  λX  2.378(3.64)  −0.0006549(0.01964)  3.381(5.878)  −7.847e−05(0.001684)  0.1821(0.1682)  0.5751(0.114)    0.01683(0.7003)  −0.0001671(0.0733)  1.302e−05(0.01966)  0.183(0.2063)  −0.09196(0.08949)  −0.03485(0.1974)    δ  0  491.5(836.6)  2.417(0.3336)          (κX¯)  1.509(0.1109)  0  0          σy    0.0006001(8.676e−05)    0.0001038(1.698e−05)        σrv    6.18e−05(6.019e−06)          The bond price in the nonlinear model is a weighted average of two Gaussian bond prices (see Theorem 1). Figure 2 shows the weight s(Xt) on the Gaussian base model. If the stochastic weight approaches zero or one, then the bond price approaches the bond price in a Gaussian model where yields are affine functions of the state variables and yield variances are constant. The stochastic weight is distinctly different from one and varies substantially over the sample period, that is, the mean and volatility of s(Xt) are 79.98% and 21.35%, respectively. Moreover, there are both high-frequency and low-frequency movements in s(Xt). The high-frequency movements push s(Xt) away from one during recessions; we see spikes during the 1970, 1973–75, 1980, 2001, and 2007–09 recessions. The low-frequency movement starts in the early 80s where the weight moves significantly below one and slowly returns over the next 30 years. Figure 2 View largeDownload slide Stochastic weight on Gaussian base model. The bond price in the nonlinear model is P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), where P0(t,T) and P1(t,T) are bond prices that belong to the class of essentially affine Gaussian term structure models and s(t) is a stochastic weight between 0 and 1. This figure shows the stochastic weight and the shaded areas show NBER recessions. Figure 2 View largeDownload slide Stochastic weight on Gaussian base model. The bond price in the nonlinear model is P(t,T)=s(t)P0(t,T)+(1−s(t))P1(t,T), where P0(t,T) and P1(t,T) are bond prices that belong to the class of essentially affine Gaussian term structure models and s(t) is a stochastic weight between 0 and 1. This figure shows the stochastic weight and the shaded areas show NBER recessions. To quantify the impact of nonlinearities in our model, we regress yields and their variances on the three state variables. By construction the R2 of these regressions in the A1(3) model is 100%. In the nonlinear model, the R2s when regressing the 1- to 5-year yields on the three state variables are 89.40%, 89.64%, 90.12%, 90.66%, and 91.14%, respectively, showing a considerable amount of nonlinearity. Nonlinearity shows up even stronger in the relation between yield variances and the three factors. Specifically, the R2s when regressing the 1- to 5-year yield variances on the three state variables are 29.52%, 27.99%, 28.18%, 29.52%, and 31.67%, respectively. For comparison, regressing the stochastic weight s(Xt) on all three state variables leads to an R2 of 80.88%. Overall, these initial results suggest an important role for nonlinearity and we explore this in detail in the next section. 4. Empirical Results In this section, we show that the nonlinear three-factor model captures time variation in expected excess bond returns and yield volatility. Moreover, the nonlinearity leads to URP and USV, an empirical stylized fact, that affine models cannot capture without knife-edge restrictions and additional state variables that describe variations in expected excess returns and yield variances but not yields. While nonlinearities help explain time-variation in excess returns and yield variances, we show in Section 4.3 that the amount of nonlinearity in the cross-section is small and thus our model retains the linear relation of US-Treasury yields across maturities. 4.1 Expected Excess Returns Expected excess returns of US Treasury bonds vary over time as documented in among others Fama and Bliss (1987) and Campbell and Shiller (1991) (CS). CS document this by regressing future yield changes on the scaled slope of the yield curve. Specifically, for all bond maturities τ=2,3,4,5 we have   yt+1τ−1−ytτ=const+φτ(ytτ−yt1τ−1)+residual, (46) where ytτ is the (log) yield at time t of a zero-coupon bond maturing at time t+τ. The slope regression coefficient is one if excess holding period returns are constant, but CS find negative regression coefficients implying that a steep slope predicts high future excess bond returns. Table IV replicates their findings for the sample period 1961:07–2014:04, that is, slope coefficients are negative, decreasing with maturity, and significantly different from one. Table IV . Campbell–Shiller regressions This table shows the coefficients φτ from the regressions yt+1τ−1−ytτ=const+φτ(ytτ−yt1τ−1) + residual, where ytτ is the zero-coupon yield at time t of a bond maturing at time t+τ ( τ and t are measured in years). The actual coefficients are calculated using monthly data of 1- to 5-year zero-coupon bond yields from 1961:7 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). Standard errors in parentheses are computed using the Hansen and Hodrick (1980) correction with twelve lags. The population coefficients for each model are based on one simulated sample path of 1,000,000 months. Campbell–Shiller regression coefficients   Bond maturity  2-Year  3-Year  4-Year  5-Year  Data  −0.63(0.64)  −0.93(0.69)  −1.21(0.73)  −1.47(0.77)  Nonlinear model  −0.61  −0.61  −0.63  −0.65  A1(3) model  −0.01  0.01  0.04  0.07  A0(3) model  −0.18  −0.37  −0.54  −0.71  Campbell–Shiller regression coefficients   Bond maturity  2-Year  3-Year  4-Year  5-Year  Data  −0.63(0.64)  −0.93(0.69)  −1.21(0.73)  −1.47(0.77)  Nonlinear model  −0.61  −0.61  −0.63  −0.65  A1(3) model  −0.01  0.01  0.04  0.07  A0(3) model  −0.18  −0.37  −0.54  −0.71  To check whether each model can match this stylized fact, we simulate a sample path of 1,000,000 months for 2-, 3-, 4-, and 5-year excess bond returns and compare the model implied CS regression coefficients with those observed in the data. Table IV shows that the nonlinear model and A0(3) model captures the negative CS regression coefficients in population. Figure 3 shows that 1-year expected excess returns in the nonlinear model are negative in the early 80s and positive since the mid-80s while they are alternating between positive and negative in the A1(3) model. Expected excess returns in the A0(3) model are also positive since the mid-80s but both affine models cannot capture the very low and high realized excess returns during the monetary experiment. To formally test whether the nonlinear model captures expected excess returns better than the two affine models we run regressions of realized excess returns on model implied expected excess returns in sample. Specifically,   rxt,t+nτ=ατ,n+βτ,nEt[rxt,t+nτ]+residual,  ∀τ>n=1,2,3,4,5, (47) where rxt,t+nτ is the n-year log return on a bond with maturity τ in excess of the n-year yield and Et[rxt,t+nτ] is the corresponding model implied expected excess return.14 The estimated expected excess returns for the nonlinear, A1(3), and A0(3) model are based on the sample period 1961:07 to 2014:04. The regression results are reported in Table V. If the model captures expected excess returns well, then the slope coefficient should be one, the constant zero. The slope coefficients are lower but generally close to one in the nonlinear model. In the A1(3) model, the slope coefficients are close to one at the 1-year horizon but are too low at longer horizon, while in the A0(3) model, the slope coefficients are too high at the 1-year horizon and too low for the 3- and 4-year horizon. The average R2 across bond maturity and holding horizon is 27.4% in the nonlinear model while it is only 6.5% in the A1(3) and 7.8% in the A0(3) model. Table V . Excess return regressions This table shows regression coefficients from a regression of realized (log) excess returns on model implied expected (log) excess returns in sample. Monthly data for bonds with maturities ranging from 1 to 5 years are from Gurkaynak, Sack, and Wright (2007) for the period 1961:07 to 2014:04. For n=1,2,3,4, n-year excess returns are computed by subtracting the n-year (log) yield from the n-year (log) holding-period return of a τ-year bond ( τ>n). The fraction of variance explained is defined as FVE=1−1T∑t=1T(rxt,t+nτ−Et[rxt,t+nτ])21T∑t=1T(rxt,t+nτ−1T∑t=1Trxt,t+nτ)2, where rxt,t+nτ is the n-year excess return on a bond with maturity τ and Et[rxt,t+nτ] is the corresponding model implied expected excess return. The last two rows contain the R2 from the regression of realized excess returns on the five yields and the five yields and five yield variances, respectively. Standard errors in parentheses are computed using the Hansen and Hodrick (1980) correction with the number of lags equal to the number of overlapping months. Regressing realized excess returns on model-implied expected excess in sample (1961–2014)     Nonlinear model   A1(3) model   A0(3) model   5Y  5Y+ 5VAR  Maturity  α×103  β  R2  FVE  α×103  β  R2  FVE  α×103  β  R2  FVE  R2  R2  1-Year holding horizon  τ=2  −2.58(2.89)  0.81(0.20)  0.22  0.15  2.53(2.62)  0.96(0.35)  0.12  0.10  −3.77(3.92)  1.30(0.45)  0.12  0.11  0.13  0.15  τ=3  −4.85(5.06)  0.83(0.20)  0.23  0.16  4.05(4.64)  0.97(0.33)  0.13  0.12  −7.35(6.71)  1.35(0.44)  0.14  0.12  0.14  0.16  τ=4  −6.60(6.74)  0.85(0.19)  0.24  0.18  5.07(6.32)  1.00(0.32)  0.15  0.13  −10.29(8.71)  1.37(0.41)  0.15  0.13  0.16  0.18  τ=5  −7.51(8.09)  0.86(0.19)  0.25  0.20  5.85(7.77)  1.03(0.32)  0.16  0.15  −12.61(10.23)  1.37(0.38)  0.17  0.15  0.17  0.20    2-Year holding horizon  τ=3  −3.73(4.60)  0.87(0.22)  0.29  0.23  6.31(4.73)  0.62(0.38)  0.07  0.00  −2.21(7.99)  0.91(0.55)  0.07  0.05  0.09  0.10  τ=4  −6.36(8.08)  0.89(0.21)  0.30  0.25  10.75(8.51)  0.67(0.37)  0.08  0.03  −5.19(13.52)  0.95(0.51)  0.09  0.07  0.10  0.11  τ=5  −7.72(10.70)  0.91(0.21)  0.32  0.29  14.24(11.71)  0.73(0.36)  0.10  0.05  −8.94(17.40)  1.01(0.47)  0.11  0.09  0.12  0.13    3-Year holding horizon  τ=4  −2.30(5.54)  0.83(0.23)  0.29  0.24  10.30(6.45)  0.31(0.41)  0.02  −0.15  2.20(11.54)  0.57(0.60)  0.03  −0.02  0.07  0.13  τ=5  −3.96(10.05)  0.87(0.22)  0.33  0.29  17.60(12.09)  0.40(0.40)  0.03  −0.11  0.43(19.90)  0.68(0.55)  0.05  0.01  0.08  0.15    4-Year holding horizon  τ=5  0.53(6.57)  0.75(0.24)  0.25  0.20  12.72(8.12)  0.25(0.42)  0.01  −0.21  1.82(14.53)  0.59(0.62)  0.04  −0.03  0.08  0.23  Regressing realized excess returns on model-implied expected excess in sample (1961–2014)     Nonlinear model   A1(3) model   A0(3) model   5Y  5Y+ 5VAR  Maturity  α×103  β  R2  FVE  α×103  β  R2  FVE  α×103  β  R2  FVE  R2  R2  1-Year holding horizon  τ=2  −2.58(2.89)  0.81(0.20)  0.22  0.15  2.53(2.62)  0.96(0.35)  0.12  0.10  −3.77(3.92)  1.30(0.45)  0.12  0.11  0.13  0.15  τ=3  −4.85(5.06)  0.83(0.20)  0.23  0.16  4.05(4.64)  0.97(0.33)  0.13  0.12  −7.35(6.71)  1.35(0.44)  0.14  0.12  0.14  0.16  τ=4  −6.60(6.74)  0.85(0.19)  0.24  0.18  5.07(6.32)  1.00(0.32)  0.15  0.13  −10.29(8.71)  1.37(0.41)  0.15  0.13  0.16  0.18  τ=5  −7.51(8.09)  0.86(0.19)  0.25  0.20  5.85(7.77)  1.03(0.32)  0.16  0.15  −12.61(10.23)  1.37(0.38)  0.17  0.15  0.17  0.20    2-Year holding horizon  τ=3  −3.73(4.60)  0.87(0.22)  0.29  0.23  6.31(4.73)  0.62(0.38)  0.07  0.00  −2.21(7.99)  0.91(0.55)  0.07  0.05  0.09  0.10  τ=4  −6.36(8.08)  0.89(0.21)  0.30  0.25  10.75(8.51)  0.67(0.37)  0.08  0.03  −5.19(13.52)  0.95(0.51)  0.09  0.07  0.10  0.11  τ=5  −7.72(10.70)  0.91(0.21)  0.32  0.29  14.24(11.71)  0.73(0.36)  0.10  0.05  −8.94(17.40)  1.01(0.47)  0.11  0.09  0.12  0.13    3-Year holding horizon  τ=4  −2.30(5.54)  0.83(0.23)  0.29  0.24  10.30(6.45)  0.31(0.41)  0.02  −0.15  2.20(11.54)  0.57(0.60)  0.03  −0.02  0.07  0.13  τ=5  −3.96(10.05)  0.87(0.22)  0.33  0.29  17.60(12.09)  0.40(0.40)  0.03  −0.11  0.43(19.90)  0.68(0.55)  0.05  0.01  0.08  0.15    4-Year holding horizon  τ=5  0.53(6.57)  0.75(0.24)  0.25  0.20  12.72(8.12)  0.25(0.42)  0.01  −0.21  1.82(14.53)  0.59(0.62)  0.04  −0.03  0.08  0.23  Figure 3 View largeDownload slide Expected excess returns. The graphs show the expected 1-year log excess returns of zero-coupon Treasury bonds with maturities of 2, 3, 4, and 5 years. The blue, black, and red lines show expected excess returns in the three-factor A0(3), A1(3), and nonlinear model, respectively. The shaded areas show NBER recessions. Figure 3 View largeDownload slide Expected excess returns. The graphs show the expected 1-year log excess returns of zero-coupon Treasury bonds with maturities of 2, 3, 4, and 5 years. The blue, black, and red lines show expected excess returns in the three-factor A0(3), A1(3), and nonlinear model, respectively. The shaded areas show NBER recessions. To measure how well the nonlinear model predicts excess returns we compare the mean squared error of the predictor to the unconditional variance of excess returns. Specifically, we define the statistic “fraction of variance explained” that measures the explanatory power of the model implied in sample expected excess return as follows:15  FVE=1−1T∑t=1T(rxt,t+nτ−Et[rxt,t+nτ])21T∑t=1T(rxt,t+nτ−1T∑t=1Trxt,t+nτ)2. (48) If the predictor is unbiased, then the R2 from the regression of realized on expected excess returns is equal to the FVE and otherwise it is an upper bound. Table V shows the FVEs of the nonlinear, A1(3), and A0(3) model for the sample period 1961:07–2014:04. The in sample FVEs for the nonlinear model are higher than for the A1(3) and A0(3) model. In contrast to the nonlinear and A0(3) model, the performance of the A1(3) model deteriorates as we increase the holding horizon. To compare the nonlinear model to affine models more generally we regress future excess returns on the five yields. The R2s from this regression, shown in the second to last column of Table V, is an upper bound for the FVE of any affine model for which expected excess returns are spanned by yields, for example, the Cochrane and Piazzesi (2005) factor.16 The FVEs of the nonlinear model are equal to or higher than the explanatory power of the Cochrane–Piazzesi factor. This implies that no affine model without hidden risk premium factors (see discussion below) can explain more of the variation in realized excess returns than the nonlinear model. The last column of Table V shows that the explanatory power of any estimator for expected excess returns that is spanned by yields and their variances is lower than the FVE of our nonlinear model. 4.1.a. Unspanned Risk Premia There is a lot of empirical evidence that shows that a part of excess bond returns is explained by macro factors not spanned by linear combinations of yields.17 For example, Bauer and Rudebusch (2017) find that the R2 when regressing realized excess returns on the first three PC of yields along with expected inflation is 85% higher when regressing on just the first three PCs.18 We refer to this empirical finding as Unspanned Risk Premia or URP. To quantitatively capture URP in a term structure model, Duffee (2011b); Joslin, Priebsch, and Singleton (2014); and Chernov and Mueller (2012) use five-factor Gaussian models. The reason for using five factors is that three factors are needed to explain the cross-section of bond yields and then one or two factors orthogonal to the yield curve explain expected excess returns. An alternative explanation for the spanning puzzle that has not been explored in the literature is that there is a nonlinear relation between yields and expected excess returns. We therefore ask the question: are nonlinearities empirically important for understanding the spanning puzzle? To answer the question, we start by regressing model-implied 1-year expected excess return on the first PC, the first and second PC, …, and all five PCs of model-implied yields for the sample period 1961:07–2014:04. Specifically, for all bond maturities τ=2,3,4,5 we run the in sample URP regressions   Et[rxt,t+1τ]=ατ,1:n+∑i=1nβτ,1:n PCi,t+εtτ,1:n,  ∀n=1,2,3,4,5, (49) where PCi,t denotes the i-th PC of all five yields (ordered by decreasing contribution to the total variation in yields). The in sample R2s of these regressions are reported in Panels B, C, and D of Table VI. Panels C and D show that by construction the first three PCs explain all the variation in expected excess returns in the A1(3) and A0(3) model since expected excess returns are linear functions of yields in affine models. Panel B shows that the first three PCs explain on average 69.4% of the variation of expected excess returns in the nonlinear model. That is, almost one-third of the variation of expected excess returns is due to a nonlinear relation between expected excess returns and yields in sample. Table VI . URP regressions This table shows R2s (in percent) from regressions of excess returns on the five PCs of yields. Panel A shows R2 from regressions of 1-year actual realized excess return on PCs of actual yields based on the sample 1961:07–2014:04. Panels B, C, and D show for each model in sample R2 from regressions of model-implied 1-year excess return on model-implied PCs of yields. Panels E, F, and G show for each model population R2s from regressions of realized 1-year excess return on PCs of yields based on a simulated data sample of 1,000,000 months. The final column of Panels E–G shows the R2s when using the model-implied excess return instead of the model-implied PCs as independent variable. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Et[rxt,t+1τ]  Panel A: R2 in data (1961–2014)  τ=2  2.1  12.6  13.2  14.4  14.6    τ=3  0.8  13.9  14.3  15.9  16.2    τ=4  0.3  15.6  15.8  17.8  18.1    τ=5  0.1  17.2  17.3  19.7  19.9      Panel B: In sample R2 for nonlinear three-factor model  τ=2  5.7  64.9  67.5  85.3  91.2    τ=3  4.6  67.7  69.1  84.8  90.8    τ=4  4.2  69.7  70.6  84.8  90.7    τ=5  4.4  71.0  72.0  85.4  90.9      Panel C: In sample R2 for A1(3) model  τ=2  10.8  99.8  100.0        τ=3  10.5  99.7  100.0        τ=4  10.2  99.6  100.0        τ=5  9.9  99.5  100.0          Panel D: In sample R2 for A0(3) model  τ=2  5.3  99.6  100.0        τ=3  1.4  99.9  100.0        τ=4  0.2  100.0  100.0        τ=5  0.0  99.6  100.0          Panel E: Population R2 for nonlinear three-factor model  τ=2  0.0  10.7  14.5  15.7  15.8  28.0  τ=3  0.1  10.5  14.4  15.2  15.3  26.2  τ=4  0.1  10.6  14.5  15.2  15.3  25.3  τ=5  0.1  11.1  14.7  15.6  15.6  25.2    Panel F: Population R2 for A1(3) model  τ=2  3.9  4.5  4.5  4.5  4.5  4.5  τ=3  3.9  4.5  4.5  4.5  4.5  4.5  τ=4  3.9  4.5  4.5  4.5  4.5  4.5  τ=5  3.9  4.5  4.5  4.5  4.5  4.5  Panel G: Population R2 for A0(3) model  τ=2  0.4  9.5  9.6  9.6  9.6  9.6  τ=3  0.1  9.8  9.8  9.8  9.8  9.8  τ=4  0.0  10.6  10.6  10.6  10.6  10.6  τ=5  0.0  11.7  11.7  11.7  11.7  11.7  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Et[rxt,t+1τ]  Panel A: R2 in data (1961–2014)  τ=2  2.1  12.6  13.2  14.4  14.6    τ=3  0.8  13.9  14.3  15.9  16.2    τ=4  0.3  15.6  15.8  17.8  18.1    τ=5  0.1  17.2  17.3  19.7  19.9      Panel B: In sample R2 for nonlinear three-factor model  τ=2  5.7  64.9  67.5  85.3  91.2    τ=3  4.6  67.7  69.1  84.8  90.8    τ=4  4.2  69.7  70.6  84.8  90.7    τ=5  4.4  71.0  72.0  85.4  90.9      Panel C: In sample R2 for A1(3) model  τ=2  10.8  99.8  100.0        τ=3  10.5  99.7  100.0        τ=4  10.2  99.6  100.0        τ=5  9.9  99.5  100.0          Panel D: In sample R2 for A0(3) model  τ=2  5.3  99.6  100.0        τ=3  1.4  99.9  100.0        τ=4  0.2  100.0  100.0        τ=5  0.0  99.6  100.0          Panel E: Population R2 for nonlinear three-factor model  τ=2  0.0  10.7  14.5  15.7  15.8  28.0  τ=3  0.1  10.5  14.4  15.2  15.3  26.2  τ=4  0.1  10.6  14.5  15.2  15.3  25.3  τ=5  0.1  11.1  14.7  15.6  15.6  25.2    Panel F: Population R2 for A1(3) model  τ=2  3.9  4.5  4.5  4.5  4.5  4.5  τ=3  3.9  4.5  4.5  4.5  4.5  4.5  τ=4  3.9  4.5  4.5  4.5  4.5  4.5  τ=5  3.9  4.5  4.5  4.5  4.5  4.5  Panel G: Population R2 for A0(3) model  τ=2  0.4  9.5  9.6  9.6  9.6  9.6  τ=3  0.1  9.8  9.8  9.8  9.8  9.8  τ=4  0.0  10.6  10.6  10.6  10.6  10.6  τ=5  0.0  11.7  11.7  11.7  11.7  11.7  Empirically, realized excess returns are invariably used in lieu of expected excess returns as dependent variable. Hence, for all bond maturities τ=2,3,4,5 we run the URP regressions   rxt,t+1τ=ατ,1:n+∑i=1nβτ,1:n PCi,t+residual,  ∀n=1,2,3,4,5. (50) Panel A in Table VI shows R2 from regressions of realized excess returns on PCs of model-implied yields in the data based on the sample period 1961:07–2014:04. To check whether each model can match the actual R2 from the URP regression, we simulate a sample path of 1,000,000 months for 2- to 5-year excess bond returns and 1- to 5-year bond yields and compare the model implied URP regression R2s to those observed in the data. Panels E, F, and G show the population R2 for the nonlinear, A1(3), and A0(3) model, respectively. In contrast to both affine models the population R2 in the nonlinear model is largely in line with the actual R2 observed in the data. The final column in Panels E, F, and G shows the population R2 when we replace the model implied PCs in URP regression (50) with the model implied expected excess return, that is,   rxt,t+1τ=ατ+βτEt[rxt,t+1τ]+residual,  ∀τ=2,3,4,5. (51) In the nonlinear model, the average (over all bond maturities) population R2 in regression (50) when n = 3 is 81% higher than in regression (51), that is, 26.2% versus 14.5%. This implies that if there is a macro variable that perfectly tracks expected excess returns, average R2s when regressing realized excess returns on the first three PCs and this macro factor would be 81% higher than when regressing on just the first three PCs; similar to the incremental R2 documented in Bauer and Rudebusch (2017). Of course, this is not because this macro factor contains any information not in the yield curve. Is it plausible that macro factors (partially) pick up nonlinearities? To address this question, we take the in sample residuals from regressing expected excess returns on PCs in the nonlinear model (Panel B in Table VI) and regress them on expected inflation. Specifically, for all bond maturities τ=2,3,4,5 we run the regression   εtτ,1:n=ατ,n+βτ,nπt+residual,  ∀ n=3,4,5, (52) where εtτ,1:n is the residual from URP regression (49) and πt is an estimator for expected inflation that is based on the Michigan Survey of Consumers (MSC).19Table VII shows the R2, slope coefficient, and 12-lag Newey–West corrected t-statistics of regression (52). Expected inflation explains about 11% of the variation in sample URP residuals based on the first three PCs and it is statistically significant at the 5% level. The R2s increase to slightly less than 20% when adding the fourth PC. Expected inflation remains statistically significant even when considering in sample URP residuals based on all five PCs. Hence, although all information about expected excess returns is contained in the yield curve, expected inflation appears to contain information about them when running linear regressions. Table VII . URP regression residuals and expected inflation We first run a regression of model-implied 1-year expected excess returns on the PCs of model-$132#?>implied yields, Et[rxt,t+1τ]=ατ,1:n+∑i=1nβτ,1:nPCi,t+εtτ,1:n; n=3,4,5. Then we run a regression of the residual of this regression on expected inflation, πt, measured by the cross-sectional average forecasts of the Michigan Surveys of Consumers (MSC), εtτ,1:n=ατ,n+βτ,nπt+residual,n=3,4,5. This table shows the R2 (in percent), slope coefficient, and the t-statistic from the second regression. Expected excess returns are measured as the expected 1-year bond return in excess of the 1-year yield. Standard errors are Newey and West (1987) corrected using twelve lags. The data sample is 1978:1–2014:4 as MSC is not available at monthly frequencies before 1978.   PC1−PC3   PC1−PC4   PC1−PC5   Maturity  R2  Slope  t-Statistic  R2  Slope  t-Statistic  R2  Slope  t-Statistic  τ=2  11.52  −0.0011  −1.97  20.56  −0.0011  −4.27  17.76  −0.0008  −3.88  τ=3  11.66  −0.0020  −2.09  19.55  −0.0020  −4.00  16.44  −0.0015  −3.63  τ=4  11.56  −0.0027  −2.17  18.65  −0.0027  −3.79  15.28  −0.0020  −3.39  τ=5  11.09  −0.0033  −2.15  17.78  −0.0032  −3.59  14.17  −0.0024  −3.17    PC1−PC3   PC1−PC4   PC1−PC5   Maturity  R2  Slope  t-Statistic  R2  Slope  t-Statistic  R2  Slope  t-Statistic  τ=2  11.52  −0.0011  −1.97  20.56  −0.0011  −4.27  17.76  −0.0008  −3.88  τ=3  11.66  −0.0020  −2.09  19.55  −0.0020  −4.00  16.44  −0.0015  −3.63  τ=4  11.56  −0.0027  −2.17  18.65  −0.0027  −3.79  15.28  −0.0020  −3.39  τ=5  11.09  −0.0033  −2.15  17.78  −0.0032  −3.59  14.17  −0.0024  −3.17  Overall, our nonlinear model highlights an alternative channel that helps explain the spanning puzzle: expected excess returns are nonlinearly related to yields and therefore a part of expected excess returns appears to be “hidden” from a linear combination of yields and this part can be picked up by macro factors. This is achieved in a parsimonious three-factor model rather than a five-factor model as is common in the literature. 4.2 Stochastic Volatility Table VIII shows that there is more than one factor in realized yield variances in our data: the first PC of yield variances explain 94.5% of the variation while the first two PCs explain 99.2%. The A1(3) model has by definition only one factor explaining volatilities and therefore the first PC explains all the variation in model-implied realized variances.20 In the nonlinear model, the first PC explains 97.5% of the variation in model-implied variances and the first two PCs explain 99.9%. Hence, yield variances in the nonlinear model exhibit a linear multi-factor structure as in the data. Table VIII . PC analysis of realized yield variances PCs are constructed from a panel of realized yield variances of constant-maturity zero-coupon bond yields with maturities ranging from 1 to 5 years. The contribution of the first PC, the first and second PC, and the first, second, and third PC to the total variation in the five realized yield variances are shown for the data, the nonlinear model, and the A1(3) model. Actual PC contributions are computed using monthly realized variance data (based on daily squared yield changes) from 1961:07 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). Population PC contributions for the nonlinear and A1(3) model are computed using monthly realized variance data (based on daily squared yield changes) based on one simulated sample path of 1,000,000 months.   PC1  PC1−PC2  PC1−PC3  Data  0.9454  0.9922  0.9996  Nonlinear model  0.9750  0.9993  1.0000  A1(3) model  1.0000  1.0000  1.0000    PC1  PC1−PC2  PC1−PC3  Data  0.9454  0.9922  0.9996  Nonlinear model  0.9750  0.9993  1.0000  A1(3) model  1.0000  1.0000  1.0000  The nonlinear and A1(3) model also have significantly different distributions of future yield volatility. Figure 4 shows the 1-year ahead conditional distribution of the instantaneous yield volatility for the bond with 3 years to maturity (the distributions for bonds with other maturities are similar).21 The volatility is a linear function of only one factor in the A1(3) model and the distribution of future volatility is fairly symmetric and does not change much over time. In the nonlinear model, volatility is a nonlinear function of three factors and the volatility distribution takes on a variety of shapes that persist over time. Figure 4 View largeDownload slide Distribution of 1-year ahead yield volatility. The graphs show quantiles in the 1-year ahead distribution of instantaneous volatility for the bond with a maturity of 3 years. The top graph shows the distribution in the three-factor nonlinear model, while the bottom graph shows the distribution in the three-factor A1(3) model. The data sample is 07:1961 to 04:2014 and the results for July in each year are plotted. Figure 4 View largeDownload slide Distribution of 1-year ahead yield volatility. The graphs show quantiles in the 1-year ahead distribution of instantaneous volatility for the bond with a maturity of 3 years. The top graph shows the distribution in the three-factor nonlinear model, while the bottom graph shows the distribution in the three-factor A1(3) model. The data sample is 07:1961 to 04:2014 and the results for July in each year are plotted. The 97.5 quantiles of the 1-year ahead volatility distribution in the nonlinear model show that the market did not anticipate the possibility of very volatile yields before the monetary experiment in the early 80s, apart from brief periods around the 1970s recessions. However, there is a significant probability of a high yield volatility scenario since the 80s, despite the fact that volatilities have come down to levels similar to those in the 60s and 70s. It is only in the calm 2005–06 period where a high-volatility scenario was unlikely. This finding suggests that there is information about the risk of a high volatility regime in Treasury bond data which is similar to the appearance of the smile in equity options since the stock market crash of eighty-seven. Figure 5 shows the 97.5 quantiles of the 1-year ahead distribution of yield volatility for sample periods with (1961:07–2014:04) and without (1987:08–2014:04) the early 80s. There is a fat right-tail in the volatility distribution in both cases and hence the nonlinear model captures the risk of strong increase in volatility, even when such an event is not in the sample used to estimate the model. Figure 5 View largeDownload slide Distribution of 1-year ahead yield volatility for the nonlinear model estimated using the period 1961-2014 and estimated using the period 1987-2014. The graphs show the 97.5% quantiles in the 1-year ahead distribution of instantaneous volatility. The red line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the whole sample period 1961–2014. The yellow line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the period 1987–2014. The results for September in each year are plotted. Figure 5 View largeDownload slide Distribution of 1-year ahead yield volatility for the nonlinear model estimated using the period 1961-2014 and estimated using the period 1987-2014. The graphs show the 97.5% quantiles in the 1-year ahead distribution of instantaneous volatility. The red line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the whole sample period 1961–2014. The yellow line shows the 97.5% quantiles in the three-factor nonlinear model, where the model is estimated by using data for the period 1987–2014. The results for September in each year are plotted. The regime-switching models of Dai, Singleton, and Yang (2007); Bansal and Zhou (2002); and Bansal, Tauchen, and Zhou (2004) capture time variation in the probabilities of high volatility regimes by adding a state variable that picks up the regime. However, if a high-volatility regime is not in the sample used to estimate the model, then the regimes in the model will pick up minor variations in volatility (see the discussion in Dai, Singleton, and Yang, 2007). Everything works through nonlinearities in our model and therefore the probability of a high-volatility regime can be pinned down in a sample that does not include such an episode. 4.2.a. Unspanned Stochastic Volatility There is a large literature suggesting that interest rate volatility risk cannot be hedged by a portfolio consisting solely of bonds; a phenomenon referred to by Collin-Dufresne and Goldstein (2002) as Unspanned Stochastic Volatility or USV. The empirical evidence supporting USV typically comes from a low R2 when regressing measures of volatility on interest rates. For instance, Collin-Dufresne and Goldstein (2002) regress straddle returns on changes in swap rates and document R2s as low as 10%. Similarly, Andersen and Benzoni (2010) regress yield variances—measured using high frequency data—on the first six PCs of yields and find low R2s. Inconsistent with this evidence, standard affine models produce high R2s in USV regressions because there is a linear relation between yield variances and yields in the model. The nonlinear model provides an alternative explanation for low R2s in USV regression because the relation between yield variances and yields is nonlinear. However, it is an empirical question if nonlinearities in the model are strong enough to produce R2s similar to those found in the data. To answer this question, we follow Andersen and Benzoni (2010) and regress realized yield variance on PCs of yields. Specifically, for each bond maturity τ=1,2,3,4,5 and number n=1,2,3,4,5 of PCs we run the following USV regression in the data:   rvtτ=ατ+∑i=1nβiτPCi,t+εtτ, (53) where as in the previous section, PCi,t denotes the i-th PC of all five yields (ordered by decreasing contribution to the total variation in yields). The R2s of these USV regressions in the data are reported in Panel A of Table IX. The average R2 when regressing realized variance on the first three PCs is 32.4%, confirming that the PCs of yields only explain a fraction of the variation in yield variance in the data.22 Table IX . USV regressions Panel A shows R2s (in percent) from regressing realized variance on the five PCs of yields. Panel B shows in sample R2s for the nonlinear model from regressing model-implied instantaneous variance on the PCs of model-implied yields. Panel C shows in population R2s for the nonlinear model from regressing monthly realized variance (based on daily model-implied yields) on the PCs of monthly yields (based on averages over daily model-implied yields) based on a sample of 1,000,000 simulated months. Panels D and E show corresponding results for the A1(3) model, where only results for one maturity is shown because R2s are the same for all maturities. Panel F shows the explanatory power of the PCs of residuals from the USV regressions in Panels A and B. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: R2 in the data (1961–2014)  τ=1  24.3  26.8  35.0  35.7  40.2  τ=2  23.2  24.8  33.7  35.4  41.6  τ=3  21.9  22.8  32.6  35.8  42.5  τ=4  20.3  20.7  31.1  35.9  42.6  τ=5  18.8  18.9  29.6  36.0  42.6    Panel B: In sample R2 for nonlinear three-factor model  τ=1  21.6  21.8  44.0  47.9  55.1  τ=2  19.1  19.1  42.3  49.2  57.4  τ=3  17.5  17.6  41.8  50.9  59.9  τ=4  16.7  16.8  42.0  52.9  61.7  τ=5  16.9  17.2  42.4  54.6  62.1    Panel C: Population R2 for nonlinear three-factor model  τ=1  31.8  32.7  40.8  46.0  56.9  τ=2  32.8  33.8  40.7  48.6  60.8  τ=3  32.9  34.2  40.1  50.3  63.4  τ=4  32.6  34.4  39.2  51.0  65.0  τ=5  31.9  34.7  38.3  51.0  66.1    Panel D: In sample R2 for A1(3) model  τ=1,…,5  21.5  22.3  100.0  100.0  100.0    Panel E: Population R2 for A1(3) model  τ=1,…,5  0.0  0.0  45.8  45.8  45.8    Panel F: In sample PC analysis of USV regression residuals  Data  91.8  98.7  99.9  100.0  100.0  Nonlinear model  97.9  99.9  100.0  100.0  100.0  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: R2 in the data (1961–2014)  τ=1  24.3  26.8  35.0  35.7  40.2  τ=2  23.2  24.8  33.7  35.4  41.6  τ=3  21.9  22.8  32.6  35.8  42.5  τ=4  20.3  20.7  31.1  35.9  42.6  τ=5  18.8  18.9  29.6  36.0  42.6    Panel B: In sample R2 for nonlinear three-factor model  τ=1  21.6  21.8  44.0  47.9  55.1  τ=2  19.1  19.1  42.3  49.2  57.4  τ=3  17.5  17.6  41.8  50.9  59.9  τ=4  16.7  16.8  42.0  52.9  61.7  τ=5  16.9  17.2  42.4  54.6  62.1    Panel C: Population R2 for nonlinear three-factor model  τ=1  31.8  32.7  40.8  46.0  56.9  τ=2  32.8  33.8  40.7  48.6  60.8  τ=3  32.9  34.2  40.1  50.3  63.4  τ=4  32.6  34.4  39.2  51.0  65.0  τ=5  31.9  34.7  38.3  51.0  66.1    Panel D: In sample R2 for A1(3) model  τ=1,…,5  21.5  22.3  100.0  100.0  100.0    Panel E: Population R2 for A1(3) model  τ=1,…,5  0.0  0.0  45.8  45.8  45.8    Panel F: In sample PC analysis of USV regression residuals  Data  91.8  98.7  99.9  100.0  100.0  Nonlinear model  97.9  99.9  100.0  100.0  100.0  To assess the ability of the nonlinear model to capture USV we regress model-implied instantaneous yield variance on the PCs of model-implied yields:   v(t,t+τ)2=ατ+∑i=1nβi(τ)PCi,t+εtτ, (54) where v(t,t+τ) is given in Equation (29). Panel B shows that the average in sample R2 from USV regression (54) on the first three PCs (n = 3) is with 42.5% not substantially higher than in the data. In contrast, Panel D shows that in the A1(3) model the in sample R2 is 100% once the first three PCs are included in the USV regression (54). Hence, the presence of nonlinearities gives rise to low R2s in USV regressions. To understand why a significant part of variance is (linearly) unspanned by yields we recall that Equation (24) shows that the local volatility consists of two components, σlev and σvol, and thus the instantaneous yield variance is   σ(t,T)′σ(t,T)=σvol(t,T)′σvol(t,T)+σlev(t,T)′σlev(t,T)+2σvol(t,T)′σlev(t,T). (55) While the average (across maturities) in sample R2 from regressing the yield variance on the first five PCs of model-implied yields is only 59.2% (see Panel B of Table IX), the average in sample R2 from regressing each component in Equation (55) on the five PCs of yields is 94.4%, 88.2%, and 94.9%, respectively. Hence, each component is close to being linearly spanned, but they partially offset each other.23 When P1(t,T)=P2(t,T), the second and third term in Equation (55) vanish and volatility is largely spanned. Hence, the fraction of volatility that is unspanned varies significantly over time consistent with findings in Jacobs and Karoui (2009). The actual R2 of USV regression (53) reported in Panel A of Table IX are not directly comparable to the in sample R2 of USV regression (54) reported in Panel B for the nonlinear model and Panel C in the A1(3) model because realized variance based on daily data is a noisy proxy for yield variance. To check whether the nonlinear model can quantitatively capture USV in the data, we simulate 1,000,000 months of daily data (with 21 days in each month), compute the monthly realized variance and monthly average yield, and run the same URP regression as in the data, that is, regression (53). Panel C shows that the population R2s for the nonlinear model are very similar to the in sample R2 of Panel B where we use instantaneous variance instead of realized variance, that is, the average R2 is 39.8% when including the first three PCs. Hence, our results are robust to taking into account that realized variance based on daily data is a noisy proxy for instantaneous variance. Panel E shows that the average population R2 is 45.8% in the A1(3) model when regressing realized variance on the first three PCs of yields, which brings the population R2s much closer to the R2s in the data. However, the population R2s when using only one or two PCs in the A1(3) model are zero which is strongly at odds with the data.24 Bikbov and Chernov (2009) discuss how measurement error due to microstructure effects such as the bid-ask spread in option and bond prices affects the explanatory power of USV regressions. Collin-Dufresne and Goldstein (2002) argue that measurement error cannot be the reason for low R2s in USV regressions because there is a strong factor structure in the regression residuals across bond maturities. Panel F of Table IX confirms the factor structure in the data because the first PC of the residuals εt1,…,εt5 of the USV regression (53) explains 91.8% of the total variation in the USV residuals. Similarly, the first PC explains 98% of the variation in the residuals of USV regression (54) implied by the nonlinear model. Hence, our nonlinear model can capture the low explanatory power and the strong residual factor structure of the USV regressions that is observed in the data. Collin-Dufresne and Goldstein (2002) introduce knife edge parameter restrictions in affine models such that volatility state variable(s) do not affect bond yields, the so-called USV models. The most commonly used USV models—the A1(3) and A1(4)—have one factor driving volatility and this factor does not affect yields. These models generate zero R2s in the above USV regression in population, inconsistent with the empirical evidence. In contrast, the nonlinear model retains a parsimonious three-factor structure and yet can generate R2s in USV regressions which are broadly in line with those in the data. 4.3 Linearity in the Cross-Section of Yields The nonlinear bond pricing model allows us to capture the observed time variation in the mean and volatility of excess bond returns. However, Balduzzi and Chiang (2012) show that in the cross-section there is an almost linear relation between yields of different maturities. To check whether the nonlinear model captures the cross-sectional linearity we follow Duffee (2011a) and determine the PCs of zero-coupon bond yield changes with maturities ranging from 1 to 5 years and regress the yield changes of each bond on a constant and the first three PCs. The results for the data (based on 634 observations) and the three models (based on 1 million simulated observations) are shown in Table X. Table X . PC analysis of yields Principal Components (PCs) are constructed from a panel of constant-maturity zero-coupon bond yields with maturities ranging from 1 to 5 years. The contribution of the first PC, the first and second PC, and the first, second, and third PC to the total variation in the five bond yields are shown in Panel A. In Panel B yields for each bond are then regressed on the first three PCs and a constant (omitted). Actual PC contributions, slope coefficients, and R2s are computed using monthly data of 1- to 5-year zero-coupon bond yields from 1961:07 to 2014:04 obtained from Gurkaynak, Sack, and Wright (2007). For all three models population PC contributions, population slope coefficients, and population R2s are based on one simulated sample path of 1,000,000 months.   PC1  PC1−PC2  PC1−PC3  Panel A: PCs of yields  Data  99.1909  99.9779  99.9996  Nonlinear model  99.6866  99.9977  100.0000  A1(3) model  99.9738  100.0000  100.0000  A0(3) model  99.3788  99.9819  100.0000    Panel B: Linearity in the cross-section of yields  Maturity   PC1  PC2   PC3   R2    Data (1961–2014)  τ = 1  0.47  −0.72  0.48  1.00  τ = 2  0.46  −0.22  −0.52  1.00  τ = 3  0.45  0.12  −0.46  1.00  τ = 4  0.43  0.36  −0.02  1.00  τ = 5  0.42  0.54  0.54  1.00    Nonlinear three-factor model in population  τ = 1  0.45  −0.67  0.52  1.00  τ = 2  0.45  −0.29  −0.37  1.00  τ = 3  0.45  0.04  −0.52  1.00  τ = 4  0.45  0.33  −0.17  1.00  τ = 5  0.44  0.59  0.54  1.00    A1(3) model in population  τ = 1  0.51  −0.66  0.53  1.00  τ = 2  0.48  −0.21  −0.58  1.00  τ = 3  0.44  0.14  −0.41  1.00  τ = 4  0.41  0.39  0.04  1.00  τ = 5  0.38  0.58  0.46  1.00    A0(3) model in population  τ = 1  0.47  −0.72  0.47  1.00  τ = 2  0.46  −0.21  −0.52  1.00  τ = 3  0.45  0.13  −0.46  1.00  τ = 4  0.43  0.36  −0.01  1.00  τ = 5  0.42  0.54  0.54  1.00    PC1  PC1−PC2  PC1−PC3  Panel A: PCs of yields  Data  99.1909  99.9779  99.9996  Nonlinear model  99.6866  99.9977  100.0000  A1(3) model  99.9738  100.0000  100.0000  A0(3) model  99.3788  99.9819  100.0000    Panel B: Linearity in the cross-section of yields  Maturity   PC1  PC2   PC3   R2    Data (1961–2014)  τ = 1  0.47  −0.72  0.48  1.00  τ = 2  0.46  −0.22  −0.52  1.00  τ = 3  0.45  0.12  −0.46  1.00  τ = 4  0.43  0.36  −0.02  1.00  τ = 5  0.42  0.54  0.54  1.00    Nonlinear three-factor model in population  τ = 1  0.45  −0.67  0.52  1.00  τ = 2  0.45  −0.29  −0.37  1.00  τ = 3  0.45  0.04  −0.52  1.00  τ = 4  0.45  0.33  −0.17  1.00  τ = 5  0.44  0.59  0.54  1.00    A1(3) model in population  τ = 1  0.51  −0.66  0.53  1.00  τ = 2  0.48  −0.21  −0.58  1.00  τ = 3  0.44  0.14  −0.41  1.00  τ = 4  0.41  0.39  0.04  1.00  τ = 5  0.38  0.58  0.46  1.00    A0(3) model in population  τ = 1  0.47  −0.72  0.47  1.00  τ = 2  0.46  −0.21  −0.52  1.00  τ = 3  0.45  0.13  −0.46  1.00  τ = 4  0.43  0.36  −0.01  1.00  τ = 5  0.42  0.54  0.54  1.00  Panel A of Table X shows that the first three PCs describe almost all the variation of bond yield changes in the nonlinear model which is consistent with the data. Moreover, Panel B of Table X shows that the population loading for each yield on the level, slope, and curvature factor in the nonlinear model is similar to the data. We conclude that the cross-sectional variation of bond yields implied by the nonlinear model is well explained by the first three PCs and no yield breaks this linear relation. 5. One Factor Model—an Illustration In this section, we estimate a one-factor nonlinear model to highlight the role of nonlinearity in a simple setting. Table XI shows the estimated parameters with asymptotic standard errors (in parenthesis) based on the sample period 1961:07–2014:04. Panel A of Figure 6 shows the stochastic weight s(X), defined in Equation (8), over the sample period. The dynamics of s(X) in the one-factor model are similar to the dynamics in the three-factor model—shown in Figure 2—although s(X) moves closer to zero in the three-factor model. Table XI . Parameter estimates of the one-factor nonlinear model This table contains parameter estimates and asymptotic standard errors (in parenthesis) for the nonlinear one factor model. The parameter estimates are based on yield and realized variance data for the sample period 1961:07–2014:04. The bond maturities range from 1 to 5 years and the data are obtained from Gurkaynak, Sack, and Wright (2007). The UKF is used to estimate the nonlinear model. κ  ρ0  ρX  λ0  λX  γ  β  σy  σrv  0.04027(0.03695)  0.03061(0.06164)  0.01093(0.0001309)  −0.6473(0.5656)  0.05966(0.03703)  0.01456(0.03452)  −0.4206(0.003205)  0.003122(0.0004019)  0.0001671(1.161e−05)  κ  ρ0  ρX  λ0  λX  γ  β  σy  σrv  0.04027(0.03695)  0.03061(0.06164)  0.01093(0.0001309)  −0.6473(0.5656)  0.05966(0.03703)  0.01456(0.03452)  −0.4206(0.003205)  0.003122(0.0004019)  0.0001671(1.161e−05)  Table XII . URP and USV regressions in the one-factor nonlinear model Panel A shows in sample R2s from regressions of model-implied 1-year excess returns on the PCs of model-implied yields. Panel B shows in sample R2s from regressing model-implied instantaneous variance on the PCs of model-implied yields. Model-implied PCs are constructed from a panel of constant-maturity zero-coupon bond yields with maturities ranging from 1- to 5- years. The in sample results are based on the sample period 1961:07–2014:04. Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: In sample R2 for nonlinear one-factor model  τ=2  13.0  69.3  98.1  98.4  100.0  τ=3  14.4  73.2  98.1  98.4  100.0  τ=4  16.7  76.7  98.1  98.4  100.0  τ=5  19.9  79.9  98.1  98.4  100.0    Panel B: In sample R2 for nonlinear one-factor model  τ= 1  52.0  64.8  96.1  96.8  99.9  τ= 2  55.1  68.2  96.6  97.2  99.9  τ= 3  57.9  71.3  97.0  97.5  100.0  τ= 4  60.4  73.9  97.4  97.8  100.0  τ= 5  62.7  76.2  97.7  98.0  100.0  Maturity  PC1  PC1−PC2  PC1−PC3  PC1−PC4  PC1−PC5  Panel A: In sample R2 for nonlinear one-factor model  τ=2  13.0  69.3  98.1  98.4  100.0  τ=3  14.4  73.2  98.1  98.4  100.0  τ=4  16.7  76.7  98.1  98.4  100.0  τ=5  19.9  79.9  98.1  98.4  100.0    Panel B: In sample R2 for nonlinear one-factor model  τ= 1  52.0  64.8  96.1  96.8  99.9  τ= 2  55.1  68.2  96.6  97.2  99.9  τ= 3  57.9  71.3  97.0  97.5  100.0  τ= 4  60.4  73.9  97.4  97.8  100.0  τ= 5  62.7  76.2  97.7  98.0  100.0  Figure 6 View largeDownload slide Stochastic weight, yields, volatilities, and excess returns in a one-factor nonlinear model. Panel A shows the estimated stochastic weight on the Gaussian base model for the sample period 1961–2014 and Panel C shows it as function of the factor X. Panels B and F show yields as function of the factor X and the 1-year yield, respectively. Panels D and E show yield variance and expected excess returns as a function of the 1-year yield. The parameters for the one-factor nonlinear model are estimated using yields and realized yield variance of zero-coupon Treasury bonds with maturities ranging from 1 to 5 years. The range of X on the x-axis equals the range of X in the sample period 1961–2014. Figure 6 View largeDownload slide Stochastic weight, yields, volatilities, and excess returns in a one-factor nonlinear model. Panel A shows the estimated stochastic weight on the Gaussian base model for the sample period 1961–2014 and Panel C shows it as function of the factor X. Panels B and F show yields as function of the factor X and the 1-year yield, respectively. Panels D and E show yield variance and expected excess returns as a function of the 1-year yield. The parameters for the one-factor nonlinear model are estimated using yields and realized yield variance of zero-coupon Treasury bonds with maturities ranging from 1 to 5 years. The range of X on the x-axis equals the range of X in the sample period 1961–2014. Panel B of Figure 6 shows bond yields as a function of the state variable X. The relation between yields and X is close to linear for low Xs, while for high Xs the rate of change picks up and yields increase more rapidly with X. The reason is that s(X) starts to move away from one as X increases as seen in Panel C and moreover, the speed with which s(X) moves away from one increases for high Xs. Hence, for a given change in X, yields respond more for a high X, that corresponds to a high yield environment than for a low X, that corresponds to a low yield environment. Taken together, yield variances must be substantially higher for high yield environments than for low yield environments, which Panel D indeed shows. Moreover, the nonlinear relation between yields and their variances shown in Panel D leads to USV. Specifically, Panel B in Table XII shows that the first PC of yields only explains between 52% and 63% of yield variance in sample. In contrast, in any affine one-factor stochastic volatility model the R2 is 100%. Table XIII . Equilibrium models The table shows various equilibrium models and how they map into the nonlinear term structure models. Model  N  d  X  α  γ  β  Stationary  Two trees  1  2  log (D1(t)/D2(t))  −R  1  1  No  Multiple consumption goods  1  2  log (D1(t)/D2(t))  −Rb  (1−φφ)1−b  b  No  External habit formation  1  1  X  R  1  β  Yes  Heterogeneous beliefs  1  1  log (λ(t))  R  1  −1R  No  HARA utility  1  1  log (b/C(t))  −R  1  1  No  Model  N  d  X  α  γ  β  Stationary  Two trees  1  2  log (D1(t)/D2(t))  −R  1  1  No  Multiple consumption goods  1  2  log (D1(t)/D2(t))  −Rb  (1−φφ)1−b  b  No  External habit formation  1  1  X  R  1  β  Yes  Heterogeneous beliefs  1  1  log (λ(t))  R  1  −1R  No  HARA utility  1  1  log (b/C(t))  −R  1  1  No  Panel E of Figure 6 shows the relation between yields and instantaneous expected excess returns. In a standard affine one-factor model the relation is linear, but we see that in the nonlinear model there is a U-shaped relation. This nonlinearity creates URP in the model. Indeed, Panel A in Table XII shows that the first PC of yields only explains between 13.0% and 19.9% of the variation in expected excess returns. Given the U-shaped relation between excess returns and yields it is not surprising that the level factor does not have more explanatory power but it provides a stark contrast to one-factor affine models where the first PC always explains 100%. Finally, Figure 6’s Panel F shows that the relation between the yields themselves is approximately linear. Thus, although there are significant nonlinear effects in the time series of excess returns and yield volatilities, there is an approximately linear relation between yields in the cross-section which is consistent with the data. 6. Conclusion We introduce a new reduced-form term structure model where the short rate and market prices of risk are nonlinear functions of Gaussian state variables and derive closed-form solutions for yields. The nonlinear model with three Gaussian factors matches both the time-variation in expected excess returns and yield volatilities of US Treasury bonds from 1961 to 2014. Because there are nonlinear relations between factors, yields, and variances, the model exhibits features consistent with empirical evidence on URP and USV. We are not aware of any term structure models—in particular a model with only three factors—that have empirical properties consistent with evidence on time-variation in expected excess returns and volatilities, URP, and USV. Although our empirical analysis has focused on a nonlinear generalization of an affine Gaussian model, it is possible to generalize a wide range of term structure models such as affine models with stochastic volatility and quadratic models. Our generalization introduces new dynamics for bond returns while keeping the new model as tractable as the standard model. Furthermore, the method extends to processes such as jump-diffusions and continuous time Markov chains. We explore this in Feldhütter, Heyerdahl-Larsen, and Illeditsch (2016). Appendix A: General Nonlinear Gaussian Model In this section, we provide closed-form solutions for a more general class of nonlinear term structure models, prove Theorem 1, and relate our results to the class of reduced-form asset pricing models with closed-form solutions discussed in Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012). A.1 The Stochastic Discount Factor Let γ denote a nonnegative constant and M0(t) a strictly positive stochastic process with dynamics given in Equation (3). The SDF is defined as   M(t)=M0(t)(1+γe−β′X(t))α, (56) where β∈Rd and α∈N. A.2 Closed-Form Bond Prices We show in the next theorem that the price of a bond is a weighted average of bond prices in artificial economies that belong to the class of essentially affine Gaussian term structure models. Theorem 2 The price of a zero-coupon bond that matures at time T is   P(t,T)=∑n=0αsn(t)Pn(t,T), (57) where  Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t), (58)  sn(t)=(αn)γne−nβ′X(t)(1+γe−β′X(t))α. (59) The coefficient An(T−t)and the d-dimensional vector Bn(T−t)solve the ordinary differential equations given in Equations (10) and (11). Proof: Using the binomial expansion theorem, the SDF in Equation (56) can be expanded as   M(t)=∑n=0αMn(t), (60) where   Mn(t)=(αn)γne−nβ′X(t)M0(t). (61) Each summand can be interpreted as a SDF in an artificial economy.25 The dynamics of the strictly positive stochastic process Mn(t) are   dMn(t)Mn(t)=−rn(t)dt−Λn(t)′dW(t), (62) where   Λn(t)=Λ0(t)+nΣ′β, (63)  rn(t)=r0(t)+nβ′κ(X¯−X(t))−n22β′ΣΣ′β−nβ′ΣΛ0(t). (64) Plugging in for r0(t) and Λ0(t), it is straightforward to show that Λn(t) and rn(t) are affine functions of X(t) with coefficients given in Equations (12)–(15). If Mn(t) is interpreted as a SDF of an artificial economy indexed by n then we know that bond prices in this economy belong to the class of essentially (exponential) affine Gaussian term structure models and hence   Pn(t,T)=eAn(T−t)+Bn(T−t)′X(t), (65) where coefficient An(T−t) and the d-dimensional vector Bn(T−t) solve the ordinary differential equations (10) and (11). Hence, the bond price is   P(t,T)=∑n=0αsn(t)Pn(t,T), (66) where sn(t) is given in Equation (59). Proof of Theorem 1. Set α = 1 in Theorem 2. A.3 Expected Return and Bond Volatility Applying Ito’s lemma to Equation (56) leads to the dynamics of the SDF:   dM(t)M(t)=−r(t) dt−Λ(t)′dW(t), (67) where   r(t)=r0(t)+α(1−s(t))β′κ(X¯−X(t))−α(1−s(t))β′ΣΛ0(t)−α2(1−s(t))(α(1−s(t))+s(t))β′ΣΣ′β. (68) and   Λ(t)=Λ0(t)+α(1−s(t))Σ′β. (69) Let ωn(t,T) denote the contribution of each artificial exponential affine bond price to the total bond price. Specifically,   ωn(t,T)=Pn(t,T)sn(t)P(t,T). (70) The dynamics of the bond price P(t, T) are   dP(t,T)P(t,T)=(r(t)+Λ(t)′σ(t,T))dt+σ(t,T)′dW(t), (71) where   σ(t,T)=Σ′(∑n=0αωn(t,T)Bn(T−t)+β(∑n=0αn ωn(t,T)−α(1−s(t)))). (72) A.4 Link to Reduced-Form Asset Pricing Models How is this model related to the large literature on reduced-form asset pricing models with closed-form solutions? At a first glance it does not seem to be related because the Gaussian state dynamics of X(t) under the data generating or physical measure are no longer Gaussian under the risk neutral measure Q. Specifically,   dX(t)=(κX¯−κX(t)−ΣΛ(t)) dt+ΣdWQ(t), (73) where Λ(t), given in Equation (69), is a nonlinear function of X(t) and   dQ=e−12∫0tΛ(a)′Λ(a)da−∫0tΛ(a)′dWP(a)dP. (74) However, we can compute the state dynamics under the risk neutral measure in the benchmark model defined as   dQ0=e−12∫0tΛ0(a)′Λ0(a)da−∫0tΛ0(a)′ dWP(a)dP, (75) where Λ0(t), which is given in Equation (5), is an affine function of X(t) and thus Gaussian under Q0. Specifically,   dX(t)=(κX¯−Σλ0,0−(κ+Σλ0,X)X(t))dt+ΣdWQ0(t). (76) Define   f(XT)=(1+γe−β′X(T))α(1+γe−β′X(t))α (77) and rewrite the bond price as an expectation under the risk neutral measure in the benchmark model. Specifically,   P(t,T)=Et[M(T)M(t)]=Et[M0(T)M0(t)f(XT)]=EtQ0[e−∫tTr0(a)daf(X(T))], (78) where r0(t), given in Equation (4), is affine in X(t). Duffie, Pan, and Singleton (2000) and Chen and Joslin (2012) show that the expectation in Equation (78) can be solved in closed form if f(x)=∑n(cn+vnx)eβnx, the short rate is affine in X(t), and X(t) is Gaussian under Q. As shown in the proof of Theorem 1, the function f(Xt) can be expanded into the exponential polynomial   f(XT)=∑n=0α(αn)γn(1+γe−β′X(t))αe−nβ′X(T)=∑n=0αvne−nβ′X(T) (79) using the Binomial expansion theorem and hence the bond price is given in closed form. B. Equilibrium Models In this section, we show that the functional form of the state price density in Equations (2) and (56) naturally comes out of several equilibrium models.26 We need to allow for state variables that follow arithmetic Brownian motions and hence we rewrite the dynamics of the state vector in Equation (1) in the slightly more general form   dX(t)=(θ−κX(t))dt+ΣdW(t), (80) where θ is d-dimensional and κ and Σ are d × d-dimensional. In what follows, the standard consumption-based asset pricing model with a representative agent power utility and log-normally distributed consumption will serve as our benchmark model. Specifically, the state price density takes the following form:   M0(t)=e−ρtC(t)−R, (81) where R is the coefficient of RRA and C(t) is aggregate consumption with dynamics   dC(t)C(t)=μCdt+σC′dW(t). (82) The short rate and the market price of risk are both constant and given by   Λ0=RσC (83)  r0=ρ+RμC−12R(R+1)σC′σC. (84)Table XIII summarizes the relation between the nonlinear term structure models and the equilibrium models discussed in this section. B.1 Two Trees Cochrane, Longstaff, and Santa-Clara (2008) study an economy in which aggregate consumption is the sum of two Lucas trees. In particular they assume that the dividends of each tree follow a geometric Brownian motion   dDi(t)=Di(t)(μidt+σi′dW(t)). (85) Aggregate consumption is C(t)=D1(t)+D2(t). There is a representative agent with power utility and risk aversion R. Hence, the SDF is   M(t)=e−ρtC(t)−R=e−ρt(D1(t)+D2(t))−R=e−ρtD1(t)−R(1+D2(t)D1(t))−R=M0(t)(1+elog(D2(t))−log(D1(t)))−R, (86) where M0(t)=e−ρtD1−R and X(t)= log (D1(t)/D2(t)). Equation (86) has the same form as the SDF in Equation (56) with α∉N. Specifically, γ  =  1, β  =  1, and α=−R. Note that in this case the state variable is the log-ratio of two geometric Brownian motions and thus κ  =  0. The share s(X(t)) and hence yields are not stationary. B.2 Multiple Consumption Goods Models with multiple consumption goods and CES consumption aggregator naturally fall within the functional form of the SDF in Equation (56). Consider a setting with two consumption goods. The aggregate output of the two goods are given by   dDi(t)=Di(t)(μidt+σi′dW(t)). (87) Assume that the representative agent has the following utility over aggregate consumption C:   u(C,t)=e−ρt11−RC1−R, (88) where   C(C1,C2)=(φ1−bC1b+(1−φ)1−bC2b)1b. (89) We use the aggregate consumption bundle as numeraire, and consequently the state price density is   M(t)=e−ρtC(t)−R=(ϕ)bR1−be−ρtD1(t)−R(1+(1−ϕϕ)1−b(D2(t)D1(t))b)−Rb. (90) After normalizing Equation (90) has the same form as the SDF in Equation (56) with α∉N. Specifically, X(t)= log(D1(t)/D2(t)), γ=(1−φφ)1−b, β=b, and α=−Rb. As in the case with Two Trees, the share s(X(t)) and hence yields are not stationary. B.3 External Habit Formation The utility function in Campbell and Cochrane (1999) is   U(C,H)=e−ρt11−R(C−H)1−R, (91) where H is the habit level. Rather than working directly with the habit level, Campbell and Cochrane (1999) define the surplus consumption ratio s=C−HC. The SDF is   M(t)=e−ρtC(t)−Rs(t)−R (92)  =M0(t)s(t)−R. (93) Define the state variable   dX(t)=κ(X¯−X(t))dt+bdW(t), (94) where κ>0, σc>0 and b > 0. Now let s(t)=11+e−βX(t). Note that s(t) is between 0 and 1. In particular, s(t) follows   ds(t)=s(t)(μs(t)dt+σs(t)dW(t)), (95) where   μs(t)=(1−s(t))(βκ(X¯−X(t))+12(1−2s(t))β2b2) (96)  σs(t)=(1−s(t))βb. (97) The functional form of the surplus consumption ratio differs from Campbell and Cochrane (1999). However, note that the surplus consumption ratio is locally perfectly correlated with consumption shocks, mean-reverting and bounded between 0 and 1 just as in Campbell and Cochrane (1999). The state price density can be written as   M(t)=M0(t)(1+e−βX(t))R. (98) The above state price density has the same form as Equation (56) with parameters γ  =  1, β=β, and α=R. Note that the state variable X in this case is mean-reverting and therefore the share s(X(t)) and hence yields are stationary. B.4 Heterogeneous Beliefs Consider an economy with two agents that have different beliefs. Let both agents have power utility with the same coefficient of relative risk aversion, R. Moreover, assume that aggregate consumption follows the dynamics in Equation (82). The agents do not observe the expected growth rate and agree to disagree.27 The equilibrium can be solved by forming the central planner problem with stochastic weight λ that captures the agents’ initial relative wealth and their differences in beliefs (see Basak (2000), e.g.),   U(C,λ)=max⁡{C1+C2=C}(11−RC11−R+λ11−RC21−R). (99) Solving the above problem leads to the optimal consumption of the agents   C1(t)=s(t)C(t), (100)  C2(t)=(1−s(t))C(t), (101) where s(t)=11+λ(t)1R is the consumption share of the first agent and C is the aggregate consumption. The state price density as perceived by the first agent is   M(t)=e−ρtC1(t)−R  =e−ρtC(t)−Rs(t)−R  =M0(t)(1+e1Rlog(λ(t)))R. (102) This has the same form as Equation (56) with X(t)= log(λ(t)), γ  =  1, β=−1R, and α=R. The dynamics of the state variable is driven by the log-likelihood ratio of the two agents and consequently the share s(X(t)) and hence yields are not stationary. B.5 Hara Utility Consider a pure exchange economy with a representative agent with utility u(t,c)=e−ρt1−R(C+b)1−R, where R  >  0 and b  >  0. We can write the SDF as   M(t)=e−ρtC(t)−R  =e−ρt(C(t)+b)−R  =e−ρtC(t)−R(1+bC(t))−R  =M0(t)(1+elog(b)−log(C(t)))−R. (103) After normalizing Equation (103) has the same form as the SDF in Equation (56) with α∉N. Specifically, X(t)= log(b/C(t)), γ  =  1, β  =  1, and α=−R. Similarly to the model with Two Trees and multiple consumption goods, the share s(X(t)) and hence yields are nonstationary as the ratio b/C(t) will eventually converge to zero or infinity depending on the expected growth in the economy. C. Gauss–Hermite Quadrature While bond prices and bond yields are given in closed form, conditional moments of yields and bond returns are not. However, it is straightforward to calculate conditional expectations using Gauss–Hermite polynomials because the state vector X(t) is Gaussian.28 In this section, we illustrate how to calculate the expectation of a function of Gaussian state variables. Let μX and ΣX denote the conditional mean and variance of X(u) at time t < u. Let f(X(t)) be a function of the state vector at time t. For instance, if you want to calculate at time t the n-th uncentered moment of the bond yield with maturity τ at time u, then f(X(u))=(y(τ)(X(u)))n. Hence, the conditional expectation of y(τ)(X(u)) at time t is   Et[f(X(u))]=∫Rdf(x)1((2π)d|ΣX|)0.5e−12(x−μX)′ΣX−1(x−μX)dx. (104) Define y=2σX−1(x−μX) where σX is determined by the Cholesky decomposition of ΣX=σXσX′. Hence, we can write Equation (104) as   π−d2∫Rdf(2σXy+μX)e−y′ydy. (105) Let g(y)=f(2σXy+μX). We set d = 3 in the empirical section of the paper and thus the integral in Equation (105) can be approximated by the n point Gauss–Hermite quadrature   ∫Rdf(2σXy+μX)e−y′ydy≈∑i=1n∑j=1n∑k=1nwiwjwkg(y1(i),y2(j),y3(k)), (106) where wi are the weights and yl(i) are the nodes for the n point Gauss–Hermite quadrature for i=1,..,n and l=1,..,3. We use n  =  4 in Equation (106). Footnotes 1 Although the literature is too large to cite in full, examples include Campbell and Shiller (1991) and Cochrane and Piazzesi (2005) on time-varying excess returns, Duffee (2011b) and Joslin, Priebsch, and Singleton (2014) on unspanned expected excess returns, Jacobs and Karoui (2009) and Collin-Dufresne, Goldstein, and Jones (2009) on time-varying volatility, and Collin-Dufresne and Goldstein (2002) and Andersen and Benzoni (2010) on Unspanned Stochastic Volatility. 2 Dai and Singleton (2002), and Tang and Xia (2007) find that the only affine three-factor model that can capture time-variation in expected excess returns is the Gaussian model that has no stochastic volatility. Duffee (2011b), Wright (2011), and Joslin, Priebsch, and Singleton (2014) capture unspanned expected excess in four- and five-factor affine models that have no stochastic volatility. Unspanned Stochastic Volatility is typically modeled by adding additional factors to the standard three factors (Collin-Dufresne, Goldstein, and Jones, 2009; Creal and Wu, 2015). See also Dai and Singleton (2003) and Duffee (2010) and the references therein. 3 See Ludvigson and Ng (2009), Cooper and Priestley (2009), Cieslak and Povala (2015), Duffee (2011b), Joslin, Priebsch, and Singleton (2014), Chernov and Mueller (2012), and Bauer and Rudebusch (2017). 4 Papers on this topic include Collin-Dufresne and Goldstein (2002), Heidari and Wu (2003), Fan, Gupta, and Ritchken (2003), Li and Zhao (2006), Carr, Gabaix, and Wu (2009), Andersen and Benzoni (2010), Bikbov and Chernov (2009), Joslin (2014), and Creal and Wu (2015). 5 Collin-Dufresne and Goldstein (2002) introduce knife edge parameter restrictions in affine models such that volatility state variable(s) do not affect bond pricing, the so-called USV models. The most commonly used USV models—the A1(3) and A1(4) USV models—have one factor driving volatility and this factor is independent of yields. These models generate zero R2s in USV regressions inconsistent with the empirical evidence. 6 It is also possible to combine the general exponential-type SDF in our paper with the affine-type SDF in Filipovic, Larsson, and Trolle (2015) to get an exponential polynomial-type SDF similar to the setting of Chen and Joslin (2012). 7 Constantinides (1992), Rogers (1997), Gabaix (2009), Carr, Gabaix, and Wu (2009), and Filipovic, Larsson, and Trolle (2015) also specify the functional form of the SDF directly and provide closed-form solutions for bond prices. 8 Chan et al. (1992), Ait-Sahalia (1996a, 1996b), Stanton (1997), Pritsker (1998), Chapman and Pearson (2000), Ang and Bekaert (2002), and Jones (2003) study the nonlinearity of the short rate. Jermann (2013) and Richard (2013) study nonlinear term structure models, but they do not obtain closed-form solutions for bond prices. 9 If λ0,X and κ are zero, then σ0(T−t)=σ1(T−t). 10 The instantaneous volatility of the bond yield is 1τv(t,t+τ). 11 We calculate yield volatility by dividing price volatility with the bond duration. We calculate bond duration in two steps. We first find the coupon that makes the present value of a five year bond’s cash flow equal to the at-the-money price of the underlying bond the option is written on (available from Datastream). We then calculate the modified duration of this bond. 12 We choose to keep the estimation as parsimonious as possible by letting the σrv be the same for all realized variances. An alternative is to use the theoretical result in Barndorff-Nielsen and Shephard (2002) that the variance of the measurement noise is approximately two times the square of the spot variance and allow for different measurement errors across bond maturity. 13 Alan Greenspan became chairman of the Fed on August 11, 1987. 14 Moments of yields and returns in the nonlinear model are easily calculated using Gauss–Hermite quadrature, see Appendix C for details. In the rest of the paper we use Gauss–Hermite quadrature when we do not have closed-form solutions for expectations or variances. 15 Almeida, Graveline, and Joslin (2011) refer to this measure as a modified R2. 16 The average R2 from regressing excess returns onto yields for a 1-year holding horizon is 17% which is lower than the 37% reported in Cochrane and Piazzesi (2005). There are two reasons for this. First, the data sets are different. If we use the Fama–Bliss data, then the average R2 increases to 25%. Second, Cochrane and Piazzesi (2005) use the period 1964–2003 and R2s are lower outside this sample period as documented in Duffee (2012). 17 See Ludvigson and Ng (2009), Cooper and Priestley (2009), Cieslak and Povala (2015), Duffee (2011b), Joslin, Priebsch, and Singleton (2014), and Chernov and Mueller (2012). Bauer and Rudebusch (2017) argue that this evidence can be explained by measurement error. 18 The R2 is 0.36 in the former and 0.195 in the latter, see Bauer and Rudebusch (2017)’s Table 3. Joslin, Priebsch, and Singleton (2014) present similar evidence. 19 Expected inflation is measured as the cross-sectional average of one-year ahead price growth forecasts of consumers surveyed by the University of Michigan. MSC is a survey conducted on monthly frequencies covering a large cross-section of consumers and Ang, Bekaert, and Wei (2007) show that it is a good unbiased predictor of inflation. 20 Even though realized variances are noisy measures of integrated variances, average yields nevertheless span realized variances, see Andersen and Benzoni (2010). 21 The instantaneous yield volatility is 1τv(τ)(t) with v(τ)(t) given in Equation (29). 22 The R2 are higher than those found in Andersen and Benzoni (2010) because the sample period includes the monetary experiment, see Jacobs and Karoui (2009) for a discussion of the explanatory power in USV regressions for different time periods. 23 In particular, as s(t) moves toward the high volatility model, the yield difference between the two models tends to decrease. That is, as the first part in Equation (55) increases, the second part in the same equation tends to decrease. 24 Since measurement errors when using realized variance in the A1(3) model result in a drop in R2s from 100% to 45.8%, an interesting question is if the population R2s in the nonlinear model in Panel C would be substantially higher if instantaneous variance is used instead of realized variance. The answer is no. If instantaneous model-implied variance is used the average R2 is 48.4% instead of 43.6% in Panel C. 25 Similar expansions of the SDF appear in Yan (2008); Dumas, Kurshev, and Uppal (2009); Bhamra and Uppal (2014); and Ehling et al. (2016). 26 Chen and Joslin (2012) provide an alternative way to solve many of these equilibrium models that is based on a nonlinear transform of processes with tractable characteristic functions. 27 The model can easily be generalized to a setting with disagreement about multiple stochastic processes and learning. For instance, Ehling et al. (2016) show that in a model with disagreement about inflation, the bond prices are weighted averages of quadratic Gaussian term structure models. 28 For more details see Judd (1998). References Ahn D. H., Dittmar R., Gallant A. ( 2002) Quadratic term structure models: theory and evidence, Review of Financial Studies  15, 243– 288. Google Scholar CrossRef Search ADS   Ahn D. H., Dittmar R. F., Gallant A. R., Gao B. ( 2003) Purebred or hybrid? Reproducing the volatility in term structure dynamics, Journal of Econometrics  116, 147– 180. Google Scholar CrossRef Search ADS   Ahn D. H., Gao B. ( 1999) A parametric non-linear model of term structure dynamics, Review of Financial Studies  12, 721– 762. Google Scholar CrossRef Search ADS   Ait-Sahalia Y. ( 1996a) Nonparametric pricing of interest rate derivative securities, Econometrica  64, 527– 560. Google Scholar CrossRef Search ADS   Ait-Sahalia Y. ( 1996b) Testing continuous-time models of the spot interest rate, Review of Financial Studies  9, 385– 426. Google Scholar CrossRef Search ADS   Almeida C., Graveline J. J., Joslin S. ( 2011) Do interest rate options contain information about excess bond returns?, Journal of Econometrics  164, 35– 44. Google Scholar CrossRef Search ADS   Andersen T. G., Benzoni L. ( 2010) Do bonds span volatility risk in the U.S. Treasury market? A specification test for affine term structure models, Journal of Finance  65, 603– 653. Google Scholar CrossRef Search ADS   Andersen T. G., Bollerslev T., Diebold F. X. ( 2010) Parametric and nonparametric volatility measurement, in: Ait-Sahalia Y., Hansen L. P. (eds.), Handbook of Financial Econometrics, Vol 1: Tools and Techniques (Handbooks in Finance) , Elsevier, pp. 67– 137. Google Scholar CrossRef Search ADS   Ang A., Bekaert G. ( 2002) Short rate nonlinearities and regime switches, Journal of Economic Dynamics and Control  26, 1243– 1274. Google Scholar CrossRef Search ADS   Ang A., Bekaert G., Wei M. ( 2007) Do macro variables, asset markets, or surveys forecast inflation better?, Journal of Monetary Economics  54, 1163– 1212. Google Scholar CrossRef Search ADS   Balduzzi P., Chiang I. H. E. ( 2012) A simple test of the affine class of term structure models, Review of Asset Pricing Studies  2, 203– 244. Google Scholar CrossRef Search ADS   Bansal R., Tauchen G., Zhou H. ( 2004) Regime shifts, risk premiums in the term structure, and the business cycle, Journal of Business and Economic Statistics  22, 396– 409. Google Scholar CrossRef Search ADS   Bansal R., Zhou H. ( 2002) Term structure of interest rates with regime shifts, Journal of Finance  57, 1997– 2043. Google Scholar CrossRef Search ADS   Barndorff-Nielsen O., Shephard N. ( 2002) Econometric analysis of realized volatility and its use in estimating stochastic volatility models, Journal of the Royal Statistical Society B  64, 253– 280. Google Scholar CrossRef Search ADS   Basak S. ( 2000) A model of dynamic equilibrium asset pricing with heterogeneous beliefs and extraneous risk, Journal of Economic Dynamics and Control  24, 63– 95. Google Scholar CrossRef Search ADS   Bauer M., Rudebusch G. ( 2017) Resolving the spanning puzzle in macro-finance term structure models, Review of Finance  21, 511– 553. Bhamra H., Uppal R. ( 2014) Asset prices with heterogeneity in preferences and beliefs, Review of Financial Studies  27, 519– 580. Google Scholar CrossRef Search ADS   Bikbov R., Chernov M. ( 2009) Unspanned stochastic volatility in affine models: evidence from eurodollar futures and options, Management Science  55, 1292– 1305. Google Scholar CrossRef Search ADS   Campbell J. Y., Cochrane J. H. ( 1999) By force of habit: a consumption-based explanation of aggregate stock market behavior, The Journal of Political Economy  107, 205– 251. Google Scholar CrossRef Search ADS   Campbell J. Y., Shiller R. J. ( 1991) Yield spread and interest rate movements: a bird’s eye view, Review of Economic Studies  58, 495– 514. Google Scholar CrossRef Search ADS   Carr P., Gabaix X., Wu L. ( 2009) Linearity-generating processes, unspanned stochastic volatility, and interest-rate option pricing. Working paper. Carr P., Wu L. ( 2009) Stock options and credit default swaps: a joint framework for valuation and estimation, Journal of Financial Econometrics  2009, 1– 41. Chan K., Karolyi A., Longstaff F., Sanders A. B. ( 1992) An empirical comparison of alternative models of the short-term interest rate, Journal of Finance  47, 1209– 1227. Google Scholar CrossRef Search ADS   Chapman D. A., Pearson N. D. ( 2000) Is the short rate drift actually nonlinear?, Journal of Finance  LV, 355– 388. Google Scholar CrossRef Search ADS   Chen H., Joslin S. ( 2012) Generalized transform analysis of affine processes and applications in finance, Review of Financial Studies  25, 2225– 2256. Google Scholar CrossRef Search ADS   Chernov M., Mueller P. ( 2012) The term structure of inflation expectations, Journal of Financial Economics  106, 367– 394. Google Scholar CrossRef Search ADS   Christoffersen P., Dorion C., Jacobs K., Karoui L. ( 2014) Nonlinear Kalman filtering in affine term structure models, Management Science  60, 2248– 2268. Google Scholar CrossRef Search ADS   Cieslak A., Povala P. ( 2015) Expected returns in treasury bonds, conditionally accepted, Review of Financial Studies  28, 2859– 2901. Google Scholar CrossRef Search ADS   Cieslak A., Povala P. ( 2016) Information in the term structure of yield curve volatility, Journal of Finance  71, 1393– 1436. Google Scholar CrossRef Search ADS   Cochrane J., Piazzesi M. ( 2005) Bond risk premia, American Economic Review  95, 138– 160. Google Scholar CrossRef Search ADS   Cochrane J. H., Longstaff F. A., Santa-Clara P. ( 2008) Two trees, Review of Financial Studies  21, 347– 385. Google Scholar CrossRef Search ADS   Collin-Dufresne P., Goldstein R. S. ( 2002) Do bonds span the fixed income markets? Theory and evidence for unspanned stochastic volatility, Journal of Finance  57, 1685– 1730. Google Scholar CrossRef Search ADS   Collin-Dufresne P., Goldstein R. S., Jones C. S. ( 2009) Can interest rate volatility be extracted from the cross-section of bond yields?, Journal of Financial Economics  94, 47– 66. Google Scholar CrossRef Search ADS   Constantinides G. M. ( 1992) A theory of the nominal term structure of interest rates, Review of Financial Studies  5, 531– 552. Google Scholar CrossRef Search ADS   Cooper I., Priestley R. ( 2009) Time-varying risk premiums and the output gap, Review of Financial Studies  22, 2601– 2633. Google Scholar CrossRef Search ADS   Cox J. C., Ingersoll J. E., Ross S. A. ( 1985) A theory of the term structure of interest rates, Econometrica  53, 385– 408. Google Scholar CrossRef Search ADS   Creal D. D., Wu J. C. ( 2015) Estimation of affine term structure models with spanned or unspanned stochastic volatility, Journal of Econometrics  185, 60– 81. Google Scholar CrossRef Search ADS   Dai Q., Le A., Singleton K. ( 2010) Discrete-time affine term structure models with generalized market prices of risk, Review of Financial Studies  23, 2184– 2227. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. ( 2000) Specification analysis of affine term structure models, Journal of Finance  55, 1943– 1978. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. ( 2003) Term structure dynamics in theory and reality, Review of Financial Studies  16, 631– 678. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. J. ( 2002) Expectation puzzles, time-varying risk premia, and affine models of the term structure, Journal of Financial Economics  63, 415– 441. Google Scholar CrossRef Search ADS   Dai Q., Singleton K. J., Yang W. ( 2007) Regime shifts in a dynamic term structure model of U.S. treasury bond yields, Review of Financial Studies  20, 1669– 1706. Google Scholar CrossRef Search ADS   Duffee G. ( 2002) Term premia and interest rate forecast in affine models, Journal of Finance  57, 405– 443. Google Scholar CrossRef Search ADS   Duffee G. ( 2010) Sharpe ratios in term structure models. Working paper, University of California, Berkeley. Duffee G. ( 2011a) Forecasting with the term structure: the role of no-arbitrage restrictions. Working Paper, Johns Hopkins University. Duffee G. ( 2011b) Information in (and not in) the term structure, Review of Financial Studies  24, 2895– 2934. Google Scholar CrossRef Search ADS   Duffee G. ( 2012) Forecasting interest rates. Working paper. Duffie D., Kan R. ( 1996) A yield-factor model of interest rates, Mathematical Finance  6, 379– 406. Google Scholar CrossRef Search ADS   Duffie D., Pan J., Singleton K. ( 2000) Transform analysis and asset pricing for affine jump-diffusions, Econometrica  68, 1343– 1376. Google Scholar CrossRef Search ADS   Dumas B., Kurshev A., Uppal R. ( 2009) Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility, The Journal of Finance  64, 579– 629. Google Scholar CrossRef Search ADS   Ehling P., Gallmeyer M., Heyerdahl-Larsen C., Illeditsch P. ( 2016) Disagreement about inflation and the yield curve. Working paper. Fama E. F., Bliss R. R. ( 1987) The information in long-maturity forward rates, American Economic Review  77, 680– 692. Fan R., Gupta A., Ritchken P. ( 2003) Hedging in the possible presence of unspanned stochastic volatility: evidence from Swaption markets, Journal of Finance  58, 2219– 2248. Google Scholar CrossRef Search ADS   Feldhütter P. ( 2016) Can affine models match the moments in bond yields?, Quarterly Journal of Finance  6, 1– 56. Google Scholar CrossRef Search ADS   Feldhütter P., Heyerdahl-Larsen C., Illeditsch P. ( 2016) Expanded term structure models. Work in progress. Feldhütter P., Lando D. ( 2008) Decomposing swap spreads, Journal of Financial Economics  88, 375– 405. Google Scholar CrossRef Search ADS   Filipovic D., Larsson M., Trolle A. B. ( 2015) Linear-rational term structure models, Journal of Finance , forthcoming. Gabaix X. ( 2009) Linearity-generating processes: a modelling tool yielding closed forms for asset prices. Working paper, CEPR and NBER. Gurkaynak R. S., Sack B., Wright J. H. ( 2007) The U.S. treasury yield curve: 1961 to the present, Journal of Monetary Economics  54, 2291– 2304. Google Scholar CrossRef Search ADS   Hansen L. P., Hodrick R. J. ( 1980) Forward exchange rates as optimal predictors of future spot rates: an econometric analysis, Journal of Political Economy  88, 829– 853. Google Scholar CrossRef Search ADS   Heidari M., Wu L. ( 2003) Are interest rate derivatives spanned by the term structure of interest rates?, Journal of Fixed Income  12, 75– 86. Google Scholar CrossRef Search ADS   Jacobs K., Karoui L. ( 2009) Conditional volatility in affine term-structure models: evidence from Treasury and swap markets, Journal of Financial Economics  91, 288– 318. Google Scholar CrossRef Search ADS   Jermann U. ( 2013) A production-based model for the term structure, Journal of Financial Economics  109, 293– 306. Google Scholar CrossRef Search ADS   Jones C. S. ( 2003) Nonlinear mean reversion in the short-term interest rate, Review of Financial Studies  16, 793– 843. Google Scholar CrossRef Search ADS   Joslin S. ( 2014) Can unspanned stochastic volatility models explain the cross section of bond volatilities?, Management Science , forthcoming. Joslin S., Priebsch M., Singleton K. ( 2014) Risk premiums in dynamic term structure models with unspanned macro risks, Journal of Finance  69, 1197– 1233. Google Scholar CrossRef Search ADS   Judd K. L. ( 1998) Numerical Methods in Economics , 1st edition, MIT Press. Leippold M., Wu L. ( 2003) Design and estimation of quadratic term structure models, European. Finance Review  7, 47– 73. Google Scholar CrossRef Search ADS   Li H., Zhao F. ( 2006) Unspanned stochastic volatility: evidence from hedging interest rate derivatives, Journal of Finance  61, 341– 378. Google Scholar CrossRef Search ADS   Ludvigson S., Ng S. ( 2009) Macro factors in bond risk premia, Review of Financial Studies  22, 5027– 5067. Google Scholar CrossRef Search ADS   Newey W. K., West K. D. ( 1987) A simple, positive sem-definite, heteroscedasticity, and autocorrelation consistent covariance matrix, Econometrica  55, 703– 708. Google Scholar CrossRef Search ADS   Pritsker M. ( 1998) Nonparametric density estimation and tests of continuous time interest rate models, Review of Financial Studies  11, 449– 487. Google Scholar CrossRef Search ADS   Richard S. ( 2013) A non-linear macroeconomic term structure model. Working paper. Rogers L. ( 1997) The potential approach to the term structure of interest rates and foreign exchange rates, Mathematical Finance  7, 157– 164. Google Scholar CrossRef Search ADS   Stanton R. ( 1997) A nonparametric model of term structure dynamics and the market price of interest rate risk, Journal of Finance  VII, 1973– 2002. Google Scholar CrossRef Search ADS   Tang H., Xia Y. ( 2007) An international examination of affine term structure models and the expectation hypothesis, Journal of Financial and Quantitative Analysis  42, 41– 80. Google Scholar CrossRef Search ADS   Vasicek O. ( 1977) An equilibrium characterization of the term structure, Journal of Financial Economics  5, 177– 188. Google Scholar CrossRef Search ADS   Wright J. ( 2011) Term premia and inflation uncertainty: empirical evidence from an international panel dataset, American Economic Review  101, 1514– 1534. Google Scholar CrossRef Search ADS   Yan H. ( 2008) Natural selection in financial markets: does it work?, Management Science  54, 1935– 1950. Google Scholar CrossRef Search ADS   © The Authors 2016. Published by Oxford University Press on behalf of the European Finance Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Review of FinanceOxford University Press

Published: Feb 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off