Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Bayesian multivariate Beveridge–Nelson decomposition of I(1) and I(2) series with cointegration

Bayesian multivariate Beveridge–Nelson decomposition of I(1) and I(2) series with cointegration 1IntroductionDistinguishing between growth and cycles is fundamental in macroeconomics. One can define growth as the time-varying steady state, or the permanent component, and cycles as deviations from the steady state, or the transitory component. One may interpret the permanent and transitory components as the natural rate and gap, respectively, though some economists may disagree with such interpretation, in which case one can consider the permanent component of the natural rate.Phelps (1995) defines the natural rate as the current stationary rate (p. 16) or the equilibrium steady state path (pp. 29–30). Woodford (2003, pp. 8–9) defines the natural rate as the equilibrium rate under flexible prices, which may not be in the steady state. Kiley (2013) compares alternative definitions empirically using a new Keynesian DSGE model for the US, and obtains similar estimates of the output gap. Laubach and Williams (2016) and Holston et al. (2017) note that these long-run and short-run views are complementary. Del Negro et al. (2017) also distinguish the natural rate and its low-frequency component. See also Glick (2020, sec. 2.2) for alternative definitions of the natural rate of interest r*.If shocks affecting the two components differ, then policy prescriptions for promoting growth and stabilizing cycles differ. Thus it is useful to decompose economic fluctuations into the two components.Among such decomposition methods, this paper focuses on the multivariate Beveridge–Nelson (B–N) decomposition, which decomposes a multivariate I(1) or CI(1,1) series into a random walk permanent component and an I(0) transitory component, assuming a linear time series model such as a VAR model or a vector error-correction model (VECM) for the differenced series. In practice, however, some series may be I(2), e.g., log output in some countries, in which case one must decompose I(1) and I(2) series jointly. Murasawa (2015) develops the multivariate B–N decomposition of I(1) and I(2) series.As Murasawa (2015) shows, for non-US data, the B–N decomposition assuming I(1) log output often gives an unreasonable estimate of the output gap, perhaps because of possible structural breaks in the mean output growth rate; see Figure 1. Kamber, Morley, and Wong (2018, p. 563) explain,…if there is a large reduction in the long-run growth rate, a forecasting model that fails to account for it will keep anticipating faster growth than actually occurs after the break, leading to a persistently negative estimate of the output gap based on the BN decomposition.In this quote, Kamber et al. (2018) seem to consider the output growth rate gap. If one fails to account for a large reduction in the true mean growth rate μ*, then the output growth rate Δln Yttends to be below the assumed mean growth rate μ; i.e., the output growth rate gap Δln Yt− μ indeed tends to be negative. If log output is I(1), so that the output growth rate is I(0), then with positive serial correlation, the future Δln Yt− μ tends to be negative, implying that the current output gap is positive, as shown in Figure 1.Figure 1:Output gap estimates in Japan given by the multivariate B–N decomposition assuming I(1) or I(2) log output. The plots replicate those in Murasawa (2015, Figures 2 and 3). The shaded areas indicate the recessions determined by the Cabinet Office.Assuming I(2) log output and hence I(1) output growth rate, one introduces a stochastic trend in the output growth rate, which captures possible structural breaks in the mean growth rate automatically in real time without specifying break dates a priori. Thus the B–N decomposition assuming I(2) log output gives a more ‘reasonable’ estimate of the output gap that fluctuates around 0; see Figure 1.This paper extends Murasawa (2015) in two ways. First, we allow for cointegration in the multivariate B–N decomposition of I(1) and I(2) series. Recall that the consumption Euler equation in a simple macroeconomic model implies a dynamic IS equation such that for all t,(1)Et(ΔlnYt+1)=1σ(rt−ρ)$${E}_{t}({\Delta}\,\mathrm{ln}\,{Y}_{t+1})=\frac{1}{\sigma }({r}_{t}-\rho )$$where Ytis output, rtis the real interest rate, ρ is the discount rate, and σ is the curvature of the utility of consumption; see Galí (2015, pp. 21–23). This equation implies that if 0 < σ < ∞, then the output growth rate and the real interest rate are of the same order of integration; thus if the real interest rate is I(1), then so is the output growth rate with possible cointegration, and log output is I(2). This observation motivates our development of the multivariate B–N decomposition of I(1) and I(2) series with cointegration.Since the B–N decomposition requires only a reduced-form forecasting model, we do not assume Eq. (1), which is just a motivating example for considering some cointegration. Though we assume that {Δln Yt} is I(1), our method is valid even if it is I(0), since we can treat an I(0) series as an I(1) series cointegrated with itself, i.e., ‘pseudo’ cointegration; see Fisher et al. (2016, section 2.2).,One can think of this paper as a reduced-form VECM version of Laubach and Williams (2003), though we include more variables. To estimate the natural rate of interest, they start from the consumption Euler equation, and assume I(2) log output. In one specification, they also assume cointegration between the output growth rate and the real interest rate.Second, we apply Bayesian analysis to obtain error bands for the components, building on recent developments in Bayesian analysis of a VECM. Since cointegrating vectors require normalization, our parameter of interest is in fact the cointegrating space rather than cointegrating vectors. Strachan and Inder (2004) use a matrix angular central Gaussian (MACG) distribution proposed by Chikuse (1990) as a prior on the cointegrating space. Koop, León-González, and Strachan (2010) propose two Gibbs samplers, a collapsed Gibbs sampler and a parameter-augmented Gibbs sampler, for posterior simulation of such a model. Since one often has prior information on the steady state of a system, Villani (2009) specifies a prior on the steady state form of a VECM. Since some hyperparameters such as the tightness (shrinkage) hyperparameter on the VAR coefficients are difficult to choose, Giannone, Lenza, and Primiceri (2015) use a hierarchical prior. Using these ideas, we develop a Gibbs sampler to simulate the joint posterior distribution of the components.Since we assume a VECM as the true reduced-form model, the B–N decomposition defines the permanent and transitory components; hence it suffices for our error bands to reflect only parameter uncertainty. If we instead assume an unobserved components (UC) model as the true model, then the B–N decomposition estimates the components, and we must consider estimation errors as well. See Morley (2011) on this distinction.As an application, we simulate the joint posterior distribution of the natural rates (or their permanent components) and gaps of output, inflation, interest, and unemployment in the US during 1950Q1–2018Q4. To apply the Bayesian multivariate B–N decomposition of I(1) and I(2) series with cointegration, we assume a four-variate VAR model for the output growth rate, the CPI inflation rate, the short-term interest rate, and the unemployment rate, and estimate it in the VECM form. The Bayes factors give decisive evidences that the cointegrating rank is 2. The posterior medians of the gaps seem reasonable compared to previous works that focus on a particular natural rate or gap. The posterior probability of positive gap is useful when the sign of the gap is uncertain. The Phillips curve and Okun’s law hold between the gaps despite that we do not impose such relations. The dynamic IS equation holds between the natural rates of output and interest, though the evidence is weak. Comparisons of alternative model specifications show not only that assuming I(2) log output gives a more ‘reasonable’ estimate of the output gap, but also that allowing for cointegration gives much bigger estimates of the gaps for all variables.The paper proceeds as follows. Section 2 reviews the literature on the B–N decomposition. Section 3 derives the multivariate B–N decomposition of I(1) and I(2) series with cointegration. Section 4 specifies our model and prior, and explains our posterior simulation and model evaluation. Section 5 applies the method to US data. Section 6 discusses remaining issues. The Appendix gives the details of the derivation of our algorithm. We use the notation proposed by Abadir and Magnus (2002).2LiteratureBeveridge and Nelson (1981) give operational definitions of the permanent and transitory components, show that one can express any I(1) series as the sum of a random walk permanent component and an I(0) transitory component, and propose the B–N decomposition of a univariate I(1) series, assuming an ARIMA model.Beveridge and Nelson (1981) reverse the sign of the transitory component. Nelson (2008) explains why they had to do so.Multivariate extension of the B–N decomposition is straightforward. Evans (1989a, 1989b) and Evans and Reichlin (1994) apply the B–N decomposition to a multivariate series consisting of I(0) and I(1) series, assuming a VAR model for the stationarized series. Evans and Reichlin (1994) show that the transitory components are no smaller with the multivariate B–N decomposition than with the univariate one. This is because the transitory components are ‘forecastable movements’ (Rotemberg and Woodford 1996), and multivariate models forecast no worse than univariate models, using more information. King et al. (1991) and Cochrane (1994) apply the B–N decomposition to a CI(1,1) series, assuming a VECM.Morley (2002) gives a general framework for the B–N decomposition, using a state space representation of the assumed linear time series model. Garratt, Robertson, and Wright (2006) note that if the state vector is observable as in a VAR model or a VECM, then the transitory component is an explicit weighted sum of the observables given the model parameters; thus the multivariate B–N decomposition based on a VAR model or a VECM is transparent. They also note that the result of the multivariate B–N decomposition depends strongly on the assumed cointegrating rank.The B–N decomposition also applies to an I(2) series. Newbold and Vougas (1996), Oh and Zivot (2006), and Oh, Zivot, and Creal (2008) extend the B–N decomposition to a univariate I(2) series. Murasawa (2015) extends the method to a multivariate series consisting of I(1) and I(2) series.One can apply Bayesian analysis to obtain error bands for the components. This approach is useful especially when the state vector is observable as in a VAR model or a VECM, in which case the components are explicit functions of the model parameters and observables; thus the joint posterior distribution of the model parameters directly translates into that of the components. Murasawa (2014) uses a Bayesian VAR model to obtain error bands for the components.Kiley (2013) uses a Bayesian DSGE model, but gives no error band for the components. Del Negro et al. (2017) use a Bayesian DSGE model and give error bands for the components. They also use a multivariate unobserved components (UC) model, where the permanent components have a factor structure and the transitory components follow a VAR model. Bayesian analysis of a UC model requires state smoothing, since the state vector is unobservable given the model parameters; thus it is less straightforward than that of a VAR model. Morley and Wong (2020) use a large Bayesian VAR model and give error bands for the components, but they do not take parameter uncertainty into account.Morley and Wong (2020) assume a UC model as the true model, and interpret the B–N decomposition as a way to estimate the components; hence their error bands reflect estimation errors of the components, but not parameter uncertainty.There seems no previous work that uses a Bayesian VECM to obtain error bands for the components.Cogley et al. (2010) use a Bayesian time-varying parameter VAR model, which is a nonlinear time series model. They still apply the B–N decomposition and give error bands for the components, pretending at each date that the VAR coefficients no longer vary. They justify their approach as an approximation based on an ‘anticipated-utility’ model. See also Cogley and Sargent (2002, 2005 and Cogley et al. (2005).3Model specification3.1VAR modelLet for d = 1, 2, xt,d$\left\{{\boldsymbol{x}}_{t,d}\right\}$be an Nd-variate I(d) sequence. Let N ≔ N1 + N2. Let for all t, xt≔xt,1′,xt,2′′${\boldsymbol{x}}_{t}{:=}{\left({\boldsymbol{x}}_{t,1}^{\prime },{\boldsymbol{x}}_{t,2}^{\prime }\right)}^{\prime }$, yt,1 ≔ xt,1, yt,2 ≔ Δxt,2, and yt≔yt,1′,yt,2′′${\boldsymbol{y}}_{t}{:=}{\left({\boldsymbol{y}}_{t,1}^{\prime },{\boldsymbol{y}}_{t,2}^{\prime }\right)}^{\prime }$, where the prime (′) denotes the transpose of a vector or a matrix, so that {yt} is I(1). Assume also that {yt} is CI(1,1) with cointegrating rank r.Our assumption includes the case where some series in {yt} are I(0), since we can treat an I(0) series as an I(1) series cointegrated with itself, i.e., ‘pseudo’ cointegration; see Fisher, Huh, and Pagan (2016, section 2.2).Let for d = 1, 2, μd≔ E(Δyt,d). Let μ≔μ1′,μ2′′$\boldsymbol{\mu }{:=}{\left({\boldsymbol{\mu }}_{1}^{\prime },{\boldsymbol{\mu }}_{2}^{\prime }\right)}^{\prime }$. Let {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$be such that for all t,(2)yt=α+μt+yt*$${\boldsymbol{y}}_{t}=\boldsymbol{\alpha }+\boldsymbol{\mu }t+{\boldsymbol{y}}_{t}^{\ast }$$where αis an intercept vector. Assume a VAR(p + 1) model for {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$such that for all t,By starting from a VAR model for I(1) series {yt}, we exclude cointegration among I(2) series {xt,2} nor multicointegration (polynomial cointegration) studied by Johansen (1995), Granger and Lee (1989, 1990, and Gregoir and Laroque (1993, 1994. We leave extensions of our Bayesian multivariate B–N decomposition to such cases for future work.(3)Π(L)yt*=ut$$\boldsymbol{\mathit\Pi }(\mathrm{L}){\boldsymbol{y}}_{t}^{\ast }={\boldsymbol{u}}_{t}$$(4){ut}∼WN(Σ)$$\left\{{\boldsymbol{u}}_{t}\right\}\sim \mathrm{W}\mathrm{N}(\boldsymbol{\mathit\Sigma })$$where Π(L) is a (p + 1)th-order polynomial matrix in the lag operator L and WN(Σ) means ‘(multivariate) white noise sequence with variance–covariance matrix Σ’.3.2VECM representationWriteΠ(L)=Π(1)L+Φ(L)(1−L)$$\boldsymbol{\mathit\Pi }(\mathrm{L})=\boldsymbol{\mathit\Pi }(1)\mathrm{L}+\boldsymbol{\mathit\Phi }(\mathrm{L})(1-\mathrm{L})$$where Φ(L) ≔ (Π(L) − Π(1)L)/(1 − L). Then we have a VECM of order p for {Δyt*}$\left\{{\Delta}{\boldsymbol{y}}_{t}^{\ast }\right\}$such that for all t,(5)Φ(L)Δyt*=−Π(1)yt−1*+ut=−ΛΓ′yt−1*+ut\begin{align}\hfill \boldsymbol{\mathit\Phi }(\mathrm{L}){\Delta}{\boldsymbol{y}}_{t}^{\ast }& =-\boldsymbol{\mathit\Pi }(1){\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t}\hfill \\ \hfill & =-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t}\hfill \end{align}where Λ,Γ∈RN×r$\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma }\in {\mathbb{R}}^{N\times r}$. Since {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$is CI(1,1), the roots of det(Π(z)) = 0 must lie on or outside the unit circle. This requirement gives an implicit restriction on the VECM parameters (Φ(.), Λ, Γ).We can write for all t,(6)Φ(L)(Δyt−μ)=−ΛΓ′[yt−1−α−μ(t−1)]+ut=−Λ[Γ′yt−1−β−δ(t−1)]+ut\begin{align}\hfill \boldsymbol{\mathit\Phi }(\mathrm{L})({\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu })& =-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }[{\boldsymbol{y}}_{t-1}-\boldsymbol{\alpha }-\boldsymbol{\mu }(t-1)]+{\boldsymbol{u}}_{t}\hfill \\ \hfill & =-\boldsymbol{\mathit\Lambda }[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-\boldsymbol{\delta }(t-1)]+{\boldsymbol{u}}_{t}\hfill \end{align}where β≔ Γ′αand δ≔ Γ′μ. Though slightly different, the last expression is essentially a steady state VECM suggested by Villani (2009, p. 633). We have for all t,(7)E(Δyt)=μ$$E({\Delta}{\boldsymbol{y}}_{t})=\boldsymbol{\mu }$$(8)E(Γ′yt)=β+δt$$E({\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t})=\boldsymbol{\beta }+\boldsymbol{\delta }t$$which help us to specify an informative prior on (μ, β, δ). Thus a steady state VECM is useful for Bayesian analysis.3.3State space representationAssume that p ≥ 1. We have for all t,Γ′yt*=Γ′(yt−1*+Φ1Δyt−1*+⋯+ΦpΔyt−p*−ΛΓ′yt−1*+ut)=Γ′Φ1Δyt−1*+⋯+Γ′ΦpΔyt−p*+(Ir−Γ′Λ)Γ′yt−1*+Γ′ut\begin{align*}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}^{\ast }& ={\boldsymbol{\mathit\Gamma }}^{\prime }({\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{\mathit\Phi }}_{1}{\Delta}{\boldsymbol{y}}_{t-1}^{\ast }+\cdots +{\boldsymbol{\mathit\Phi }}_{p}{\Delta}{\boldsymbol{y}}_{t-p}^{\ast }-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t})\hfill \\ \hfill & ={\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}{\Delta}{\boldsymbol{y}}_{t-1}^{\ast }+\cdots +{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}{\Delta}{\boldsymbol{y}}_{t-p}^{\ast }+({\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda }){\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{u}}_{t}\hfill \end{align*}or(9)Γ′yt−β−δt=Γ′Φ1(Δyt−1−μ)+⋯+Γ′Φp(Δyt−p−μ)+(Ir−Γ′Λ)[Γ′yt−1−β−δ(t−1)]+Γ′ut\begin{align}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}-\boldsymbol{\beta }-\boldsymbol{\delta }t& ={\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}({\Delta}{\boldsymbol{y}}_{t-1}-\boldsymbol{\mu })+\cdots +{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}({\Delta}{\boldsymbol{y}}_{t-p}-\boldsymbol{\mu })\hfill \\ \hfill & \quad +({\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda })[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-\boldsymbol{\delta }(t-1)]+{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{u}}_{t}\hfill \end{align}Let stbe a state vector such that for all t,st≔Δyt−μ⋮Δyt−p+1−μΓ′yt−β−δt$${\boldsymbol{s}}_{t}{:=}\left(\begin{matrix}\hfill {\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu }\hfill \\ \hfill {\vdots}\hfill \\ \hfill {\Delta}{\boldsymbol{y}}_{t-p+1}-\boldsymbol{\mu }\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}-\boldsymbol{\beta }-\boldsymbol{\delta }t\hfill \end{matrix}\right)$$which is I(0) and observable given the model parameters. A state space representation of the steady state VECM is for all t,(10)st=Ast−1+Bzt$${\boldsymbol{s}}_{t}=\boldsymbol{A}{\boldsymbol{s}}_{t-1}+\boldsymbol{B}{\boldsymbol{z}}_{t}$$(11)Δyt=μ+Cst$${\Delta}{\boldsymbol{y}}_{t}=\boldsymbol{\mu }+\boldsymbol{C}{\boldsymbol{s}}_{t}$$(12){zt}∼WN(IN)$$\left\{{\boldsymbol{z}}_{t}\right\}\sim \mathrm{W}\mathrm{N}({\boldsymbol{I}}_{N})$$whereA≔Φ1…Φp−ΛI(p−1)NO(p−1)N×NO(p−1)N×rΓ′Φ1…Γ′ΦpIr−Γ′ΛB≔Σ1/2O(p−1)N×NΓ′Σ1/2C≔INON×(p−1)NON×r\begin{align*}\hfill \boldsymbol{A}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{\mathit\Phi }}_{1}\hfill & \hfill \dots & \hfill {\boldsymbol{\mathit\Phi }}_{p}\hfill & \hfill -\boldsymbol{\mathit\Lambda }\hfill \\ \hfill \hfill & \hfill {\boldsymbol{I}}_{(p-1)N}\hfill & \hfill {\mathbf{O}}_{(p-1)N\times N}\hfill & \hfill {\mathbf{O}}_{(p-1)N\times r}\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}\hfill & \hfill \dots & \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}\hfill & \hfill {\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda }\hfill \end{matrix}\right]\hfill \\ \hfill \boldsymbol{B}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{\mathit\Sigma }}^{1/2}\hfill \\ \hfill {\mathbf{O}}_{(p-1)N\times N}\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Sigma }}^{1/2}\hfill \end{matrix}\right]\hfill \\ \hfill \boldsymbol{C}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{I}}_{N}\hfill & \hfill {\mathbf{O}}_{N\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{N\times r}\hfill \end{matrix}\right]\hfill \end{align*}with Om×ndenoting an m × n null matrix. Note that {st} is I(0) if and only if the eigenvalues of Alie inside the unit circle.We have for all t, for h ≥ 0,Et(Δyt+h)=μ+CAhst$${E}_{t}({\Delta}{\boldsymbol{y}}_{t+h})=\boldsymbol{\mu }+\boldsymbol{C}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$or(13)Et(Δxt+h,1)=μ1+C1Ahst$${E}_{t}({\Delta}{\boldsymbol{x}}_{t+h,1})={\boldsymbol{\mu }}_{1}+{\boldsymbol{C}}_{1}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$(14)EtΔ2xt+h,2=μ2+C2Ahst$${E}_{t}\left({{\Delta}}^{2}{\boldsymbol{x}}_{t+h,2}\right)={\boldsymbol{\mu }}_{2}+{\boldsymbol{C}}_{2}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$whereC1≔IN1ON1×N2ON1×(p−1)NON1×rC2≔ON2×N1IN2ON2×(p−1)NON2×r\begin{align*}\hfill {\boldsymbol{C}}_{1}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{I}}_{{N}_{1}}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times {N}_{2}}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times r}\hfill \end{matrix}\right]\hfill \\ \hfill {\boldsymbol{C}}_{2}& {:=}\left[\begin{matrix}\hfill {\mathbf{O}}_{{N}_{2}\times {N}_{1}}\hfill & \hfill {\boldsymbol{I}}_{{N}_{2}}\hfill & \hfill {\mathbf{O}}_{{N}_{2}\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{{N}_{2}\times r}\hfill \end{matrix}\right]\hfill \end{align*}3.4Multivariate B–N decompositionIntroducing cointegration changes the state space model, but the formulae for the multivariate B–N decomposition of I(1) and I(2) series given by Murasawa (2015, Theorem 1) remain almost unchanged. Let xt*${\boldsymbol{x}}_{t}^{\ast }$and ctbe the B–N permanent and transitory components in xt, respectively.Theorem 1Suppose that the eigenvalues of Alie inside the unit circle. Then for all t,(15)xt,1*=limT→∞(Et(xt+T,1)−Tμ1)$${\boldsymbol{x}}_{t,1}^{\ast }=\underset{T\to \infty }{\mathrm{lim}}\;({E}_{t}\;({\boldsymbol{x}}_{t+T,1})-T{\boldsymbol{\mu }}_{1})$$(16)xt,2*=limT→∞Et(xt+T,2)−T2μ22−Tμ22+Δxt,2+C2(IpN+r−A)−1Ast$${\boldsymbol{x}}_{t,2}^{\ast }=\underset{T\to \infty }{\mathrm{lim}}\left\{{E}_{t}({\boldsymbol{x}}_{t+T,2})-{T}^{2}\frac{{\boldsymbol{\mu }}_{2}}{2}-T\left[\frac{{\boldsymbol{\mu }}_{2}}{2}+{\Delta}{\boldsymbol{x}}_{t,2}+{\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}{\boldsymbol{s}}_{t}\right]\right\}$$(17)ct,1=−C1(IpN+r−A)−1Ast$${\boldsymbol{c}}_{t,1}=-{\boldsymbol{C}}_{1}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}{\boldsymbol{s}}_{t}$$(18)ct,2=C2(IpN+r−A)−2A2st$${\boldsymbol{c}}_{t,2}={\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-2}{\boldsymbol{A}}^{2}{\boldsymbol{s}}_{t}$$ProofSee Murasawa (2015, pp. 158–159). □LetW≔−C1(IpN+r−A)−1AC2(IpN+r−A)−2A2$$\boldsymbol{W}{:=}\left[\begin{matrix}\hfill -{\boldsymbol{C}}_{1}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}\hfill \\ \hfill {\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-2}{\boldsymbol{A}}^{2}\hfill \end{matrix}\right]$$Then for all t,(19)ct=Wst$${\boldsymbol{c}}_{t}=\boldsymbol{W}{\boldsymbol{s}}_{t}$$where Wdepends only on the VECM coefficients and {st} is observable given the model parameters. This observation is useful for Bayesian analysis of {ct}.4Bayesian analysis4.1Conditional likelihood functionAssume Gaussian innovations for Bayesian analysis, and write the VECM as for all t,(20)Δyt−μ=Φ1(Δyt−1−μ)+⋯+Φp(Δyt−p−μ)−Λ[Γ′yt−1−β−Γ′μ(t−1)]+ut$${\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu }={\boldsymbol{\mathit\Phi }}_{1}({\Delta}{\boldsymbol{y}}_{t-1}-\boldsymbol{\mu })+\cdots +{\boldsymbol{\mathit\Phi }}_{p}({\Delta}{\boldsymbol{y}}_{t-p}-\boldsymbol{\mu })-\boldsymbol{\mathit\Lambda }[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mu }(t-1)]+{\boldsymbol{u}}_{t}$$(21){ut}∼INN0N,P−1$$\left\{{\boldsymbol{u}}_{t}\right\}\sim {\mathrm{I}\mathrm{N}}_{N}\left({\mathbf{0}}_{N},{\boldsymbol{P}}^{-1}\right)$$where INN0N,P−1${\mathrm{I}\mathrm{N}}_{N}\left({\mathbf{0}}_{N},{\boldsymbol{P}}^{-1}\right)$means ‘independent N-variate normal distributions with mean vector 0N(null vector) and variance–covariance matrix P−1’. Let ψ≔ (β′, μ′)′, Φ≔ [Φ1, …, Φp], and Y≔ [y0, …, yT].With a slight abuse of notation, Φdenotes the VAR coefficient matrices in Φ(.).By the prediction error decomposition, the joint pdf of YisWe use the same p(.) to denote different distributions, which is common in the Bayesian literature; see, e.g., Gelman et al. (2014, p. 6).(22)p(Y|ψ,Φ,P,Λ,Γ)=p(ΔyT,…,Δyp+1,sp|ψ,Φ,P,Λ,Γ)=∏t=p+1Tp(Δyt|st−1,ψ,Φ,P,Λ,Γ)p(sp|ψ,Φ,P,Λ,Γ)\begin{align}\hfill p(\boldsymbol{Y}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })& =p({\Delta}{\boldsymbol{y}}_{T},\dots ,{\Delta}{\boldsymbol{y}}_{p+1},{\boldsymbol{s}}_{p}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\hfill \\ \hfill & =\prod\limits _{t=p+1}^{T}p({\Delta}{\boldsymbol{y}}_{t}\vert {\boldsymbol{s}}_{t-1},\boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })p({\boldsymbol{s}}_{p}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\hfill \end{align}Our Bayesian analysis relies on ∏t=p+1Tp(Δyt|st−1,ψ,Φ,P,Λ,Γ)${\prod }_{t=p+1}^{T}p({\Delta}{\boldsymbol{y}}_{t}\vert {\boldsymbol{s}}_{t-1},\boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$, which is equivalent to the conditional likelihood function of (ψ, Φ, P, Λ, Γ) given sp.4.2IdentificationIdentification of (Λ, Γ) from Π(1) requires some restrictions, though such identification is unnecessary for the multivariate B–N decomposition. LetΛ*≔Λ(Γ′Γ)1/2,Γ*≔Γ(Γ′Γ)−1/2$${\boldsymbol{\mathit\Lambda }}_{\ast }{:=}\boldsymbol{\mathit\Lambda }{({\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Gamma })}^{1/2},\quad {\boldsymbol{\mathit\Gamma }}_{\ast }{:=}\boldsymbol{\mathit\Gamma }{({\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Gamma })}^{-1/2}$$Then Λ*Γ*′ = ΛΓ′ and Γ*′Γ* = Ir. These restrictions are common, but do not identify the signs of Λ* and Γ*.Alternatively, we can apply linear normalization. Write Γ≔ [Γ1′, Γ2′]′, where Γ1 is r × r and Γ2 is (N − r) × r. LetΛ̄≔ΛΓ1,Γ̄≔ΓΓ1−1$$\bar{\boldsymbol{\mathit\Lambda }}{:=}\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}_{1},\quad \bar{\boldsymbol{\mathit\Gamma }}{:=}\boldsymbol{\mathit\Gamma }{\boldsymbol{\mathit\Gamma }}_{1}^{-1}$$Then we can identify Λ̄,Γ̄$\left(\bar{\boldsymbol{\mathit\Lambda }},\bar{\boldsymbol{\mathit\Gamma }}\right)$. Let β̄≔Γ̄′α=Γ1−1′β$\bar{\boldsymbol{\beta }}{:=}{\bar{\boldsymbol{\mathit\Gamma }}}^{\prime }\boldsymbol{\alpha }={\left({\boldsymbol{\mathit\Gamma }}_{1}^{-1}\right)}^{\prime }\boldsymbol{\beta }$and ψ̄≔β̄′,μ′′$\bar{\boldsymbol{\psi }}{:=}{\left({\bar{\boldsymbol{\beta }}}^{\prime },{\boldsymbol{\mu }}^{\prime }\right)}^{\prime }$correspondingly. Note that αis not identifiable from the VECM.4.3Prior4.3.1Steady state parametersAssume that(23)p(α,μ,Φ,P,Λ,Γ)=p(α,μ)p(Φ,P,Λ,Γ)$$p(\boldsymbol{\alpha },\boldsymbol{\mu },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })=p(\boldsymbol{\alpha },\boldsymbol{\mu })p(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$$Assume a normal prior on (α, μ) such that(24)αμ∼N2Nα0μ0,Q0,α−1ON×NON×NQ0,μ−1$$\left(\begin{matrix}\hfill \boldsymbol{\alpha }\hfill \\ \hfill \boldsymbol{\mu }\hfill \end{matrix}\right)\sim {\mathrm{N}}_{2N}\left(\left(\begin{matrix}\hfill {\boldsymbol{\alpha }}_{0}\hfill \\ \hfill {\boldsymbol{\mu }}_{0}\hfill \end{matrix}\right),\left[\begin{matrix}\hfill {\boldsymbol{Q}}_{0,\boldsymbol{\alpha }}^{-1}\hfill & \hfill {\mathbf{O}}_{N\times N}\hfill \\ \hfill {\mathbf{O}}_{N\times N}\hfill & \hfill {\boldsymbol{Q}}_{0,\boldsymbol{\mu }}^{-1}\hfill \end{matrix}\right]\right)$$where Q0, αand Q0, μare the prior precision matrices of αand μ, respectively. Since β≔ Γ′α, this prior implies a prior on ψsuch that(25)ψ|Γ∼Nr+Nψ0,Q0−1$$\boldsymbol{\psi }\vert \boldsymbol{\mathit\Gamma }\sim {\mathrm{N}}_{r+N}\left({\boldsymbol{\psi }}_{0},{\boldsymbol{Q}}_{0}^{-1}\right)$$whereψ0≔Γ′α0μ0,Q0≔Γ′Q0,α−1Γ−1Or×rON×NQ0,μ$${\boldsymbol{\psi }}_{0}{:=}\left(\begin{matrix}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\alpha }}_{0}\hfill \\ \hfill {\boldsymbol{\mu }}_{0}\hfill \end{matrix}\right),\quad {\boldsymbol{Q}}_{0}{:=}\left[\begin{matrix}\hfill {\left({\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{Q}}_{0,\boldsymbol{\alpha }}^{-1}\boldsymbol{\mathit\Gamma }\right)}^{-1}\hfill & \hfill {\mathbf{O}}_{r\times r}\hfill \\ \hfill {\mathbf{O}}_{N\times N}\hfill & \hfill {\boldsymbol{Q}}_{0,\boldsymbol{\mu }}\hfill \end{matrix}\right]$$which depends on Γin general.4.3.2VAR parametersLet S be the set of (Φ, Λ, Γ) such that the eigenvalues of Alie inside the unit circle, so that {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$is CI(1,1). Write(26)p(Φ,P,Λ,Γ)∝p*(Φ,P,Λ,Γ)[(Φ,Λ,Γ)∈S]$$p(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\propto {p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })[(\boldsymbol{\mathit\Phi },\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\in S]$$where p*(.) is the pdf with no truncation of (Φ, Λ, Γ) and [.] is the indicator function that truncates the support of p*(.) to S. Assume that(27)p*(Φ,P,Λ,Γ)=p*(Φ,P)p*(Λ,Γ)$${p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })={p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P}){p}^{\ast }(\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$$Assume a hierarchical normal-Wishart prior on (Φ, P) such that(28)Φ|P,ν∼NN×pNM0;P−1,(νD0)−1$$\boldsymbol{\mathit\Phi }\vert \boldsymbol{P},\nu \sim {\mathrm{N}}_{N\times \mathrm{p}N}\left({\boldsymbol{M}}_{0};{\boldsymbol{P}}^{-1},{(\nu {\boldsymbol{D}}_{0})}^{-1}\right)$$(29)P∼WNk0;S0−1$$\boldsymbol{P}\sim {\mathrm{W}}_{N}\left({k}_{0};{\boldsymbol{S}}_{0}^{-1}\right)$$(30)ν∼GamA02,B02$$\nu \sim \mathrm{G}\mathrm{a}\mathrm{m}\left(\frac{{A}_{0}}{2},\frac{{B}_{0}}{2}\right)$$where ν is a hyperparameter that controls the tightness of the prior on the VAR coefficients, WNk0;S0−1${\mathrm{W}}_{N}\left({k}_{0};{\boldsymbol{S}}_{0}^{-1}\right)$means ‘an (N × N)-variate Wishart distribution with k0 degrees of freedom and scale matrix S0−1${\boldsymbol{S}}_{0}^{-1}$’, and Gam(A0/2, B0/2) means ‘a gamma distribution with shape A0/2 and rate B0/2’. Since it is often difficult to choose ν a priori, we assume a gamma prior on ν.4.3.3Cointegrating spaceAssume that(31)p*(Λ,Γ)=p*(Λ)p*(Γ)$${p}^{\ast }(\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })={p}^{\ast }(\boldsymbol{\mathit\Lambda }){p}^{\ast }(\boldsymbol{\mathit\Gamma })$$Assume a normal prior on Λsuch that(32)Λ∼NN×rON×r;(η0G0)−1,Ir$$\boldsymbol{\mathit\Lambda }\sim {\mathrm{N}}_{N\times r}\left({\mathbf{O}}_{N\times r};{({\eta }_{0}{\boldsymbol{G}}_{0})}^{-1},{\boldsymbol{I}}_{r}\right)$$where η0G0 is the prior precision matrix of each column of Λ.Let VrRN${V}_{r}\left({\mathbb{R}}^{N}\right)$be the r-dimensional Steifel manifold in RN${\mathbb{R}}^{N}$, i.e.,VrRN≔H∈RN×r:H′H=Ir$${V}_{r}\left({\mathbb{R}}^{N}\right){:=}\left\{\boldsymbol{H}\in {\mathbb{R}}^{N\times r}:{\boldsymbol{H}}^{\prime }\boldsymbol{H}={\boldsymbol{I}}_{r}\right\}$$Let H0∈VrRN${\boldsymbol{H}}_{0}\in {V}_{r}\left({\mathbb{R}}^{N}\right)$. Let H(.) be such that ∀ τ ≥ 0,H(τ)≔H0H0′+τH0⊥H0⊥′$$\boldsymbol{H}(\tau ){:=}{\boldsymbol{H}}_{0}{\boldsymbol{H}}_{0}^{\prime }+\tau {\boldsymbol{H}}_{0\perp }{\boldsymbol{H}}_{0\perp }^{\prime }$$where H0⊥∈VN−rRN${\boldsymbol{H}}_{0\perp }\in {V}_{N-r}\left({\mathbb{R}}^{N}\right)$lies in the orthogonal complement of the column space of H0, so that H(0) = H0H0′ is of rank r and H(1) = IN. Assume a normal prior on Γsuch that(33)Γ∼NN×rON×r;H(τ0)−1,Ir$$\boldsymbol{\mathit\Gamma }\sim {\mathrm{N}}_{N\times r}\left({\mathbf{O}}_{N\times r};\boldsymbol{H}{({\tau }_{0})}^{-1},{\boldsymbol{I}}_{r}\right)$$where τ0 ≠ 0. This prior on Γimplies a prior on Γ* such that(34)Γ*∼MACGN×rH(τ0)−1$${\boldsymbol{\mathit\Gamma }}_{\ast }\sim {\mathrm{M}\mathrm{A}\mathrm{C}\mathrm{G}}_{N\times r}\left(\boldsymbol{H}{({\tau }_{0})}^{-1}\right)$$where τ0 ≔ 1 implies the flat prior on VrRN${V}_{r}\left({\mathbb{R}}^{N}\right)$; see Chikuse (2003, sec. 2.4.2). Note that p*(Λ*|Γ) is normal, but p*(Λ*|Γ*) is not; see Koop, León-González, and Strachan (2010, p. 232), who use these priors on (Λ, Γ) for their parameter-augmented Gibbs sampler.4.4Posterior simulationWe simulate p(ψ, Φ, P, Λ, Γ, ν|Y) by a Gibbs sampler consisting of five blocks:1.Draw ψfrom p(ψ|Φ, P, Λ, Γ, ν, Y) = p(ψ|Φ, P, Λ, Γ, Y).2.Draw (Φ, P) from p*(Φ, P|ψ, Λ, Γ, ν, Y).3.Draw Λfrom p*(Λ|ψ, Φ, P, Γ, ν, Y) = p*(Λ|ψ, Φ, P, Γ, Y).4.Draw Γfrom p*(Γ|ψ, Φ, P, Λ, ν, Y) = p*(Γ|ψ, Φ, P, Λ, Y). Accept the draw if (Φ, Λ, Γ) ∈ S; otherwise go back to step 2 and draw another (Φ, P, Λ, Γ).5.Draw ν from p(ν|ψ, Φ, P, Λ, Γ, Y) = p(ν|Φ, P).The first block builds on Villani (2009); the second block is standard; the third and fourth blocks follow the parameter-augmented Gibbs sampler proposed by Koop, León-González, and Strachan (2010); the fifth block is standard. See the Appendix for the details of each block.4.5Bayes factorWe use the Bayes factor for Bayesian model selection. When choosing between nested models with certain priors, the Savage–Dickey (S–D) density ratio gives the Bayes factor without estimating the marginal likelihoods; see Wagenmakers et al. (2010) for a tutorial on the S–D method.We choose the cointegrating rank r. Consider comparing the following two models (hypotheses):(35)H0:rk(Π(1))=0 vs Hr:rk(Π(1))=r$${H}_{0}:rk(\boldsymbol{\mathit\Pi }(1))=0\quad \text{vs}\quad {H}_{r}:rk(\boldsymbol{\mathit\Pi }(1))=r$$Koop, León-González, and Strachan (2008, pp. 451–452) note that the problem is the same as comparing the following two nested models:(36)H0:Λ=ON×r vs Hr:Λ≠ON×r$${H}_{0}:\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\quad \text{vs}\quad {H}_{r}:\boldsymbol{\mathit\Lambda }\ne {\mathbf{O}}_{N\times r}$$The restriction (Φ, Λ, Γ) ∈ S is necessary only for the B–N decomposition, and unnecessary for forecasting in general; hence we ignore this restriction for the moment, so that the priors on Λand Γbecome independent.If the priors on Λand Γare dependent, then we can use the generalized S–D density ratio proposed by Verdinelli and Wasserman (1995).Then the S–D density ratio for H0 versus Hris(37)B0,r=p(Λ=ON×r|Y;Hr)p(Λ=ON×r|Hr)$${B}_{0,r}=\frac{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r})}{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {H}_{r})}$$The prior gives the denominator directly. For the numerator, we have(38)p(Λ|Y;Hr)=E(p(Λ|ψ,Φ,P,Γ,Y;Hr)|Y;Hr)$$p(\boldsymbol{\mathit\Lambda }\vert \boldsymbol{Y};{H}_{r})=E(p(\boldsymbol{\mathit\Lambda }\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Gamma },\boldsymbol{Y};{H}_{r})\vert \boldsymbol{Y};{H}_{r})$$Let {ψl,Φl,Pl,Γl}l=1L${\left\{{\boldsymbol{\psi }}_{l},{\boldsymbol{\mathit\Phi }}_{l},{\boldsymbol{P}}_{l},{\boldsymbol{\mathit\Gamma }}_{l}\right\}}_{l=1}^{L}$be posterior draws. Letp̂(Λ=ON×r|Y;Hr)≔1L∑l=1Lp(Λ=ON×r|ψl,Φl,Pl,Γl,Y;Hr)$$\hat{p}(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r}){:=}\frac{1}{L}\sum\limits _{l=1}^{L}p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {\boldsymbol{\psi }}_{l},{\boldsymbol{\mathit\Phi }}_{l},{\boldsymbol{P}}_{l},{\boldsymbol{\mathit\Gamma }}_{l},\boldsymbol{Y};{H}_{r})$$See Appendix A.4 for the conditional posterior p(Λ|ψ, Φ, P, Γ, Y). An estimator of the S–D density ratio for H0 versus Hris(39)B̂0,r=p̂(Λ=ON×r|Y;Hr)p(Λ=ON×r|Hr)$${\hat{B}}_{0,r}=\frac{\hat{p}(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r})}{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {H}_{r})}$$5Application5.1DataWe consider joint estimation of the natural rates (or their permanent components) and gaps of the following four macroeconomic variables in the US:Output Let Ytbe output. Assume that { ln Yt} is I(2), so that {Δln Yt} is I(1).Inflation rate Let Ptbe the price level and πt≔ ln(Pt/Pt−1) be the inflation rate. Assume that {πt} is I(1).Interest rate Let Itbe the 3-month nominal interest rate (annual rate in per cent), it≔ ln(1 + It/400), rt≔ it− Et(πt+1) be the ex ante real interest rate, and r̂t≔it−πt+1${\hat{r}}_{t}{:=}{i}_{t}-{\pi }_{t+1}$be the ex post real interest rate. Assume that {rt} is I(1).We can estimate the interest rate gap even if we observe {r̂t}$\left\{{\hat{r}}_{t}\right\}$instead of {rt}. See Murasawa (2014, pp. 499–500).Unemployment rate Let Ltbe the labor force, Etbe employment, and Ut≔ − ln(Et/Lt) be the unemployment rate. Assume that {Ut} is I(1).Table 1 describes the data, which are available from FRED (Federal Reserve Economic Data). When monthly series are available, i.e., except for real GDP, we take the 3-month arithmetic means of monthly series each quarter to obtain quarterly series, from which we construct the quarterly inflation, interest, and unemployment rates as defined above.Table 1:Data.VariableDescriptionYtReal GDP (billions of chained 2012 dollars, SA, AR)PtCPI for all urban consumers: all items (1982–84 = 100, SA)It3-month treasury bill: secondary market rate (%, AR)LtCivilian labor force (thousands of persons, SA)EtCivilian employment level (thousands of persons, SA)SA means ‘seasonally-adjusted’; AR means ‘annual rate’.Let for all t,xt≔πtr̂tUtlnYt,yt≔πtr̂tUtΔ⁡lnYt$${\boldsymbol{x}}_{t}{:=}\left(\begin{matrix}\hfill {\pi }_{t}\hfill \\ \hfill {\hat{r}}_{t}\hfill \\ \hfill {U}_{t}\hfill \\ \hfill \mathrm{ln}\,{Y}_{t}\hfill \end{matrix}\right),\quad {\boldsymbol{y}}_{t}{:=}\left(\begin{matrix}\hfill {\pi }_{t}\hfill \\ \hfill {\hat{r}}_{t}\hfill \\ \hfill {U}_{t}\hfill \\ \hfill {\Delta}\mathrm{ln}\,{Y}_{t}\hfill \end{matrix}\right)$$The sample period of {yt} is 1948Q1–2018Q4 (284 observations). Table 2 shows summary statistics of {yt} and {Δyt} multiplied by 100. Note that {πt}, {r̂t}$\left\{{\hat{r}}_{t}\right\}$, and {Δln Yt} are quarterly rates (not annualized).Table 2:Summary statistics.VariableMin.1st qu.MedianMean3rd qu.Max.100πt−2.320.390.750.841.103.98100r̂t$100{\hat{r}}_{t}$−3.65−0.240.230.190.552.69100Ut2.644.755.725.957.0711.28100Δln Yt−2.630.310.760.781.253.85100Δπt−3.85−0.260.00−0.010.281.98100Δr̂t$100{\Delta}{\hat{r}}_{t}$−1.95−0.270.020.010.253.81100ΔUt−1.03−0.20−0.070.000.121.79100Δ2 ln Yt−2.79−0.71−0.020.000.624.705.2Preliminary analysesAs preliminary analyses, we check if {yt} is indeed I(1), though our method is valid even if some series in {yt} are I(0). For each I(0) series, we treat it as an I(1) series cointegrated with itself, and increase the cointegrating rank by 1 as ‘pseudo’ cointegration; see Fisher, Huh, and Pagan (2016, sec. 2.2).Table 3 shows the results of the ADF and ADF-GLS tests for unit root, with or without constant and/or trend terms in the ADF regression.We use gretl 2021a for our preliminary analyses.The results depend on the number of lags included in the ADF regression, since the ADF test suffers from size distortion with short lags and low power with long lags. The ADF-GLS test mitigates the problem except when there is no constant nor trend term in the ADF regression, in which case the ADF test is asymptotically point optimal. The level 0.05 ADF-GLS test rejects H0:yt,i∼I(1)${H}_{0}\;:\left\{{y}_{t,i}\right\}\sim \mathrm{I}(1)$in favor of H1:yt,i∼I(0)${H}_{1}\;:\left\{{y}_{t,i}\right\}\sim \mathrm{I}(0)$for {Δln Yt} (and possibly for {πt}, whose results are mixed). Hence these unit root tests do not support our assumption that {yt} is I(1).Table 3:Unit root tests.VariableConst.TrendADFADF-GLSLagsτp-valueLagsτπtYesYes4−3.730.024−3.40∗∗r̂t${\hat{r}}_{t}$YesYes7−3.890.015−2.70∗UtYesYes13−2.870.1712−2.19Δln YtYesYes11−5.780.001−8.30∗∗∗πtYesNo4−3.690.0012−0.75r̂t${\hat{r}}_{t}$YesNo7−3.900.005−1.58UtYesNo13−3.010.0312−1.49Δln YtYesNo2−8.300.001−6.36∗∗∗ΔπtYesNo3−11.880.000−13.90∗∗∗Δr̂t${\Delta}{\hat{r}}_{t}$YesNo8−8.720.000−21.61∗∗∗ΔUtYesNo12−5.210.000−7.47∗∗∗Δ2 ln YtYesNo14−7.520.000−25.16∗∗∗ΔπtNoNo3−11.900.00Δr̂t${\Delta}{\hat{r}}_{t}$NoNo8−8.750.00ΔUtNoNo12−5.220.00Δ2 ln YtNoNo14−7.530.00For the ADF-GLS test, ∗, ∗∗, and ∗∗∗ denote significance at the 10%, 5%, and 1% levels, respectively. For the number of lags included in the ADF regression, we use the default choice in gretl 2021a with maximum 15, where the lag order selection criteria are AIC for the ADF test, and a modified AIC using the Perron and Qu (2007) method for the ADF-GLS test. With no constant nor trend in the ADF regression, the ADF test is asymptotically point optimal; hence the ADF-GLS test is unnecessary.Table 4 shows the results of the KPSS stationarity tests, with or without a trend term. The results depend on the lag truncation parameter for the Newey–West estimator of the long-run error variance. With a trend term, the level 0.05 KPSS test rejects H0 : {yt,i} ∼ I(0) in favor of H1 : {yt,i} ∼ I(1) except for {Δln Yt}. With no trend term, the test rejects H0 : {yt,i} ∼ I(0) in favor of H1 : {yt,i} ∼ I(1) except for {r̂t}$\left\{{\hat{r}}_{t}\right\}$. Though the results for {r̂t}$\left\{{\hat{r}}_{t}\right\}$and {Δln Yt} are mixed, these stationarity tests support our assumption that {yt} is I(1).Table 4:KPSS stationarity tests.VariableTrendLMπtYes0.53***r̂t${\hat{r}}_{t}$Yes0.32***UtYes0.23***Δln YtYes0.03πtNo0.58**r̂t${\hat{r}}_{t}$No0.34UtNo0.63**Δln YtNo0.48**ΔπtNo0.03Δr̂t${\Delta}{\hat{r}}_{t}$No0.04ΔUtNo0.07Δ2 ln YtNo0.01∗∗ and ∗∗∗ denote significance at the 5 and 1% levels, respectively. The lag truncation parameter for the Newey–West estimator of the long-run error variance is 5 (the default value for our sample length in gretl 2021a).Overall, these unit root and stationarity tests confirm that {Δyt} is I(0), but are inconclusive if each component of {yt} is I(1). We can still proceed with our assumption that {yt} is I(1), allowing for ‘pseudo’ cointegration.As preliminary analyses, we also check if {yt} is CI(1,1). Table 5 shows the results of the Engle–Granger cointegration tests, with or without a trend term in the cointegrating regression and for each choice of the left-hand side (LHS) variable. The results depend on the number of lags included in the ADF regression for the residual series, but the level 0.05 test rejects the null of no cointegration in favor of the alternative of cointegration only when we put Δln Yton the LHS of the cointegrating regression.Table 5:Engle–Granger cointegration tests.LHSTrendτp-ValueπtYes−3.790.21r̂t${\hat{r}}_{t}$Yes−3.580.30UtYes−3.280.45Δln YtYes−8.330.00πtNo−3.600.16r̂t${\hat{r}}_{t}$No−3.570.17UtNo−3.190.32Δln YtNo−7.850.00The number of lags included in the ADF regression is 4 (the default value for our sample length in gretl 2021a).Table 6 shows the results of Johansen’s cointegration tests, with unrestricted constant and restricted trend terms in the VECM. The results depend on the number of lags included in the VECM, but overall, the tests suggest that the cointegrating rank is 2, possibly including ‘pseudo’ cointegration.Table 6:Johansen’s cointegration tests (unrestricted constant and restricted trend in the VECM).RankTracep-Valueλ-maxp-Value0142.770.0083.000.00159.780.0041.810.00217.970.3512.000.4335.970.475.970.48The number of lags included in the VECM is 4 (the default value for our sample length in gretl 2021a).Since the results of these cointegration tests are mixed, we estimate VECMs with different cointegrating ranks, and use the Bayes factor to choose the cointegrating rank.5.3Model specificationFor our data, N ≔ 4. To select p, we fit VAR models to {yt} up to VAR(8), i.e., p = 7, and check model selection criteria. The common estimation period is 1950Q1–2018Q4. Table 7 summarizes the results of lag order selection.The standard lag order selection criteria are valid even when {yt} is I(1); see Kilian and Lütkepohl (2017, pp. 99–100).The level 0.05 LR test fails to reject H0 : {yt} ∼ VAR(7) against H1 : {yt} ∼ VAR(8) and AIC selects VAR(7), but BIC and HQC select much smaller models. Since a high-order VAR model covers low-order VAR models as special cases, to be conservative, we choose a VAR(8) model for {yt}, i.e., p = 7, and impose a shrinkage prior on the VAR coefficients in the VECM.Table 7:Lag order selection.LagLog-likLRp-ValueAICBICHQC14400.07−31.71−31.40−31.5824752.06703.980.00−34.15−33.62∗−33.9334782.5661.010.00−34.25−33.52−33.96∗44800.3535.580.00−34.26−33.32−33.8854814.3327.960.03−34.25−33.09−33.7964834.2639.870.00−34.28−32.91−33.7374855.6042.670.00−34.32∗−32.74−33.6884868.1525.100.07−34.29−32.51−33.58For AIC, BIC, and HQC, ∗ denotes the selected model. The LR test statistic for testing H0 : {yt} ∼ VAR(p − 1) versus H1 : {yt} ∼ VAR(p) follows χ2(16) under H0.For the prior on α, we set α0 ≔ 0Nand Q0,α≔ IN. For the prior on μ, we set μ0≔μ̂${\boldsymbol{\mu }}_{0}{:=}\hat{\boldsymbol{\mu }}$, where μ̂$\hat{\boldsymbol{\mu }}$is the sample mean of {Δyt}, and Q0,μ≔ IN.Following Kadiyala and Karlsson (1997) and Bańbura, Giannone, and Reichlin (2010), we setM0≔ON×pND0≔diag(1,…,p)2⊗diag(s1,…,sN)2k0≔N+2S0≔(k0−N−1)diag(s1,…,sN)2\begin{align*}\hfill {\boldsymbol{M}}_{0}& {:=}{\mathbf{O}}_{N\times \mathrm{p}N}\hfill \\ \hfill {\boldsymbol{D}}_{0}& {:=}\enspace \mathrm{diag}{(1,\dots ,p)}^{2}\otimes \enspace \mathrm{diag}{({s}_{1},\dots ,{s}_{N})}^{2}\hfill \\ \hfill {k}_{0}& {:=}N+2\hfill \\ \hfill {\boldsymbol{S}}_{0}& {:=}({k}_{0}-N-1)\mathrm{diag}{({s}_{1},\dots ,{s}_{N})}^{2}\hfill \end{align*}where for i = 1, …, N, si2${s}_{i}^{2}$is an estimate of var(ut,i) based on the univariate AR(p + 1) model with constant and trend terms for {yt,i}.For the prior on Λ, we set η0 ≔ 1 and G0 ≔ IN. For the prior on Γ, we set τ0 ≔ 1, which gives the flat prior on the cointegrating space.For the prior on ν, we set A0 ≔ 1 and B0 ≔ 1, i.e., ν ∼ χ2(1); hence the tightness hyperparameter on the VAR coefficients tends to be small, implying potentially mild shrinkage toward M0 ≔ ON×pN.Overall, our priors are weakly informative in the sense of Gelman et al. (2014, p. 55).5.4Bayesian computationWe run our Gibbs sampler on R 4.0.5 developed by R Core Team (2021). We use the ML estimate of (Φ, P, Λ, Γ) for their initial values.The urca package for R is useful for ML estimation of a VECM.With poor initial values, the restriction that the eigenvalues of Alie inside the unit circle does not hold, and the iteration cannot start. Hence the choice of initial values seems important when one applies the B–N decomposition. Once the iteration starts, the restriction rarely binds for our sample.To check convergence of the Markov chain generated by our Gibbs sampler to its stationary distribution, we perform convergence diagnoses discussed in Robert and Casella (2009, ch. 8) and available in the coda package for R. Given the diagnoses, we discard the initial 1000 draws, and use the next 4000 draws for our posterior inference. At significance level 0.05, Geweke’s Z-score rejects equality of the means of the first 10% and the last 50% of the posterior sample (the default choice in the coda package) for 6 out of 141 identified parameters (applying linear normalization to Γ). Since the rejection rate is close to the significance level, 1000 draws are enough as burn-in. The minimum effective sample size among the 141 identified parameters is 238. Gelman et al. (2014, p. 267) write, ‘…100 independent draws are enough for reasonable posterior summaries.’ Hence 4000 draws are enough for posterior inference.To select the cointegrating rank, we set p ≔ 7, assume the above priors, and compute the S–D density ratios for r = 0 versus r = 1, 2, 3, respectively, from which we derive the posterior probabilities of r = 0, 1, 2, 3, assuming the flat prior on r. We find that the posterior probability of r = 2 is often numerically 1, which is consistent with the results of Johansen’s cointegration tests.The S–D density ratio may be inaccurate if the marginal posterior pdf of Λat ON×ris extremely small. In such a case, one should check the result by running the Gibbs sampler multiple times. Wagenmakers et al. (2010, p. 182) write, ‘the qualitative conclusion is evident and reliable … even if the quantitative result is not.’Thus we set r ≔ 2 in the following analysis.5.5Empirical resultsFigure 2 plots the actual rates and our point estimates (posterior medians) of the natural rates (or their permanent components) of the four variables. For ease of comparison of the actual and natural rates, we omit error bands for the natural rates, which are identical to those for the gaps. Figure 3 plots our point estimates of the gaps and their 95% error bands. Since a VECM has both the level and difference series, to avoid confusion in estimation, we do not annualize quarterly rates. To compare our results with previous works that use annualized quarterly rates, one must multiply our inflation and interest rates by 4 or 400.Figure 2:Actual rates (solid red) and the posterior medians of the natural rates (dashed green). The shaded areas are the NBER recessions.Figure 3:Posterior medians of the gaps and their 95% error bands (posterior 0.025- and 0.975-quantiles). The shaded areas are the NBER recessions.Our estimate of the natural rate of inflation is smoother than typical univariate estimates of trend inflation in the CPI; e.g., Faust and Wright (2013, p. 22). Our estimate looks close to a recent estimate of trend inflation in the US CPI in Morley, Piger, and Rasche (2015, p. 894) based on a bivariate UC model for the inflation and unemployment rates, which assumes independent shocks to the trend and gap components and allows for structural breaks in the variances of these shocks. Our estimate is more volatile, however, than a recent estimate of trend inflation in the PCE price index by Chan, Clark, and Koop (2018, p. 21) that uses information in survey inflation expectations. Hwu and Kim (2019) estimate trend inflation in the PCE price index using a univariate UC model with correlated shocks to the trend and gap components, Markov-switching gap persistence, and Markov-switching association between inflation and inflation uncertainty. Without the last feature, their estimate looks close to ours; see Hwu and Kim (2019, p. 2314).Our estimate of the natural rate of interest is more volatile than a recent estimate by Del Negro et al. (2017, p. 237) based on a VAR model with common trends (or a multivariate UC model with independent shocks to the trend and gap components and a factor structure for the trend components) for short- and long-term interest rates, inflation and its survey expectations, and some other variables. Their estimate is smooth partly because they impose tight priors on the variances of the shocks to the trend components. With the ‘loosest possible prior’, their estimate is as volatile as ours; see Del Negro et al. (2017, p. 272).Their ‘loosest possible prior’ on the variance–covariance matrix of the shocks to the trend components is an inverse Wishart prior with the minimum degree of freedom that gives a finite mean; see Del Negro et al. (2017, p. 271).Despite different definitions of the natural rate, their estimate based on a DSGE model looks close to ours; see Del Negro et al. (2017, p. 237).Del Negro et al. (2017) clearly distinguish the natural rate and its low-frequency component, estimating the former by a DSGE model and the latter by a VAR model. To construct the real interest rate, they use the PCE inflation rate instead of the CPI inflation rate; hence their real interest rate is on average slightly higher than ours.Estimates by Laubach and Williams (2016, p. 60), Holston, Laubach, and Williams (2017, p. S61), and Lewis and Vazquez-Grande (2019) are close to trend output growth by construction; hence their estimates are quite different from those by Del Negro et al. (2017) and ours, especially before 1980.Holston, Laubach, and Williams (2017, p. S63) writes,…we assume a one-for-one relationship between the trend growth rate of output and the natural rate of interest, which corresponds to assuming σ = 1 in Eq. (1).where Eq. (1) in their paper is the consumption Euler equation in a steady state.Our estimate of the natural rate of unemployment looks more volatile than a recent estimate by Morley, Piger, and Rasche (2015, p. 898), obtained as a by-product of estimation of trend inflation. A possible reason for the difference is that they assume independent shocks to the trend and gap components, which may not hold in practice. The two estimates may coincide if one allows for dependence between the shocks, as Morley, Nelson, and Zivot (2003) show for the univariate trend-cycle decomposition. Despite the difference in volatility, our estimate of the unemployment rate gap looks close to that by Morley, Piger, and Rasche (2015, p. 901) in terms of the sign and magnitude. Recent estimates of the natural rate and gap by Crump et al. (2019, pp. 176, 180) are as smooth as those by Morley, Piger, and Rasche (2015), perhaps partly because they also assume independent shocks to the trend and gap components.Crump et al. (2019) distinguish the trend unemployment rate ū$\bar{u}$and the natural rate of unemployment u*. They construct ū$\bar{u}$from estimates of the trend job separation and finding rates based on a multivariate UC model with a factor structure for the rates across demographic groups, assuming independent shocks to the trend and gap components. Treating ū$\bar{u}$as exogenous and relying on a New Keynesian (wage) Phillips curve, they estimate u* using the CPI, survey inflation expectations, and various wage data, again assuming independent shocks to the natural rate and gap. Allowing for dependence between these shocks seems an interesting and important topic for future work.Our estimate of the output gap is at most about ±5% of the output level, and looks close to a recent estimate by Morley and Wong (2020, p. 9) based on a large Bayesian VAR(4) model with 23 variables. Our estimate also looks close to their estimate based on a Bayesian VAR(4) model with four variables using output growth, the unemployment rate, the CPI inflation rate, and the growth rate of industrial production, assuming that they are all I(0); see Morley and Wong (2020, p. 8). Though output growth may or may not be I(0) in the US, it may be clearly I(1) in some countries or regions, in which case their method may give an unreasonable estimate of the output gap with a strong upward or downward trend, as Murasawa (2015) shows for the Japanese data.Figure 4 plots the posterior probability of positive gap for the four variables. This probability index is useful if the sign of the gap is of interest. Even if the 95% error band for the gap covers 0, the posterior probability of positive gap may be close to 0.025 or 0.975. Indeed, the probability index is often below 0.25 or above 0.75; hence we are often quite sure about the sign of the gap. Moreover, Figure 4 shows the relation between the gaps more clearly than Figure 3.Figure 4:Posterior probability of positive gap. The shaded areas are the NBER recessions.Figure 5 shows the relation between the gaps more directly. The left panels are the scatter plots of the posterior medians of the gaps in each quarter. The right panels are the posterior pdfs of the correlation coefficients between the gaps. We see that the Phillips curves and Okun’s law hold between the gaps, though we do not impose such relations. Thus our estimates of the gaps seem mutually consistent from a macroeconomic point of view.Figure 5:Phillips curves and Okun’s law. The left panels are the scatter plots of the posterior medians of the gaps in each quarter. The right panels are the posterior pdfs of the correlation coefficients between the gaps.Figure 6 shows the relation between the trend output growth rate Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and the natural rate of interest rt*${r}_{t}^{\ast }$, i.e., the dynamic IS curve; see Eq. (1). The left panel is the scatter plot of the posterior medians of Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and rt*${r}_{t}^{\ast }$in each quarter. The right panel is the posterior pdf of the slope coefficient, i.e., 1/σ in Eq. (1), obtained by regressing Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$on rt*${r}_{t}^{\ast }$for each posterior draw of {Δ⁡lnYt*,rt*}$\left\{{\Delta}\mathrm{ln}\,{Y}_{t}^{\ast },{r}_{t}^{\ast }\right\}$. On average, the slope is positive as expected, but the evidence is weak. The result is similar to Hamilton et al. (2016) and Lunsford and West (2019), who study the determinants of rt*${r}_{t}^{\ast }$and find positive but low and insignificant long-run correlations between Δln Ytand rtin the postwar US.Figure 6:Dynamic IS curve. The left panel is the scatter plot of the posterior medians of Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and rt*${r}_{t}^{\ast }$in each quarter. The right panel is the posterior pdf of the slope coefficient obtained by regressing Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$on rt*${r}_{t}^{\ast }$for each posterior draw of {Δ⁡lnYt*,rt*}$\left\{{\Delta}\mathrm{ln}\,{Y}_{t}^{\ast },{r}_{t}^{\ast }\right\}$.5.6Comparison of alternative model specificationsFigure 7 compares point estimates of the gaps under three alternative assumptions, i.e., I(1) log output, I(2) log output with no cointegration, and I(2) log output with cointegration.For the first two cases with no cointegration, we assume VAR(7) models for the stationarized series, and use the same prior and Gibbs sampler as those for our VECM (except for the loading and cointegrating matrices).For ease of comparison, we omit error bands here.Figure 7:Posterior medians of the gaps assuming I(1) log output (solid red), I(2) log output with no cointegration (dashed green), and I(2) log output with cointegration (longdash blue). The shaded areas are the NBER recessions.The result assuming I(1) log output is similar to that in Murasawa (2014, p. 513), who assumes a VAR(12) model with no constant term for the centered stationarized series, and chooses the tightness hyperparameter by the empirical Bayes method. In contrast to the result for Japan in Figure 1, for the US, the multivariate B–N decomposition assuming I(1) log output gives a ‘reasonable’ estimate of the output gap that fluctuates around 0. However, the output gap is persistently positive since 2010, which may not be sensible.Assuming I(2) log output changes the estimate of the output gap. However, it hardly changes the estimates of other gaps, similar to the result for Japan in Murasawa (2015), which makes sense because the B–N transitory components are ‘forecastable movements’, and whether log output is I(1) or I(2), the two VAR models use the information in log output for forecasting other variables in almost the same way.A VAR(p) model with Δ2⁡lnYt$\left\{{{\Delta}}^{2}\mathrm{ln}\,{Y}_{t}\right\}$includes Δ2 ln Yt−p≔ Δln Yt−p− Δln Yt−p−1 as a predictor and hence use the information in Δln Yt−p−1, but a VAR(p) model with {Δln Yt} does not. If p is large enough, then the coefficient on Δln Yt−p−1 tends to be small, especially with our shrinkage prior on the VAR coefficients.The output gap now keeps fluctuating around 0 since 2010, perhaps because I(2) log output captures a possible structural break in the mean output growth rate after the 2008 Great Recession.Allowing for cointegration gives much bigger estimates of the gaps for all variables. The result follows partly because a VECM uses more predictors, i.e., the error correction terms, than a VAR model; see Evans and Reichlin (1994). However, the result also implies that for all variables, the coefficients on the error correction terms are large enough, if not significantly different from 0, to obtain such different estimates of the gaps. These bigger estimates of the gaps, or the associated estimates of the natural rates, are often close to other recent estimates that focus on a particular natural rate or gap, as already noted.6DiscussionThe dynamic IS equation implies that if the real interest rate is I(1), then so is the output growth rate with possible cointegration, and log output is I(2). We extend the multivariate B–N decomposition to such a case, and apply Bayesian analysis to obtain error bands for the components. To implement this idea, we consider a steady state VECM with a hierarchical prior on the VAR coefficients, and develop a Gibbs sampler for posterior simulation of the parameters and the components. Applying the method to US data with weakly informative priors, we obtain a reasonable joint estimate of the natural rates (or their permanent components) and gaps of output, inflation, interest, and unemployment.The B–N decomposition assuming I(1) log output may give a trending estimate of the output gap if the mean output growth rate has structural breaks. Assuming I(2) log output and hence I(1) output growth rate, we introduce a stochastic trend in the output growth rate, which captures possible structural breaks in the mean output growth rate. Thus the B–N decomposition assuming I(2) log output gives a reasonable estimate of the output gap that fluctuates around 0. Since the B–N transitory components are ‘forecastable movements’, and a VECM forecasts no worse than a VAR model, allowing for cointegration gives larger estimates of the gaps for all variables. Since we can treat an I(0) series as an I(1) series cointegrated with itself, our method is valid even if the output growth rate is I(0), or log output is I(1).The B–N decomposition has some advantages over other approaches:1.The B–N decomposition relies on a reduced-form forecasting model; hence it requires fewer assumptions than the decomposition based on a structural model.2.The B–N decomposition requires no state smoothing if the state vector is observable as in our VECM; hence it can handle a large state vector more easily than the decomposition based on a UC model, which involves latent variables.Thus the multivariate B–N decomposition is useful for estimating the natural rates and gaps of many variables jointly. Most, if not all, previous works focus on the natural rate or gap of one variable, using other variables only to improve estimation. We estimate the natural rates and gaps of four variables, which are equally of interest.Since a reduced-form VECM is the most basic forecasting model for cointegrated series, our method gives a benchmark joint estimate of the natural rates and gaps of major macroeconomic variables. One can compare this benchmark estimate with alternative estimates based on other forecasting models or DSGE models such as those in Del Negro et al. (2017). Our method is useful especially for non-US data, where log output is often clearly I(2). Application to non-US data is an interesting and important topic for future work.One can possibly refine our estimate in two ways. First, one can use a larger VECM. Second, one can introduce nonlinearity into our VECM; e.g., Markov-switching, stochastic volatility, or more general time-varying parameters. Since the B–N decomposition may not apply to nonlinear models, and nonlinear models may become unnecessary with more variables, the first direction seems more promising. The following four variables are of particular interest:1.PCE price index in addition to the CPI, which makes our results comparable to various previous works that use whichever price index.2.long-run (LR) interest rate in addition to the short-run (SR) interest rate, which gives an estimate of the trend term premium, or the natural yield curve; cf. Brzoza-Brzezina and Kotłowski (2014) and Imakubo, Kojima, and Nakajima (2018)3.nominal yields on Aaa and Baa corporate bonds, which give estimates of the trend liquidity premium and the trend safety premium, or the trend liquidity and safety convenience yields; cf. Del Negro et al. (2017)Our VECM allows for cointegration among I(1) series, but does not allow for cointegration among I(2) series nor multicointegration (polynomial cointegration). Such extensions are useful if we have multiple I(2) series; e.g., log output and the price level (instead of the inflation rate). We leave such extensions for future work.Lastly, to obtain a monthly instead of quarterly joint estimate of the natural rates and gaps of output and other variables, it seems straightforward to extend the B–N decomposition of mixed-frequency series proposed by Murasawa (2016) to I(1) and I(2) series with cointegration. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Studies in Nonlinear Dynamics & Econometrics de Gruyter

Bayesian multivariate Beveridge–Nelson decomposition of I(1) and I(2) series with cointegration

Loading next page...
 
/lp/de-gruyter/bayesian-multivariate-beveridge-nelson-decomposition-of-i-1-and-i-2-LKPc8kvSHX
Publisher
de Gruyter
Copyright
© 2021 Yasutomo Murasawa published by De Gruyter, Berlin/Boston
ISSN
1558-3708
eISSN
1558-3708
DOI
10.1515/snde-2020-0049
Publisher site
See Article on Publisher Site

Abstract

1IntroductionDistinguishing between growth and cycles is fundamental in macroeconomics. One can define growth as the time-varying steady state, or the permanent component, and cycles as deviations from the steady state, or the transitory component. One may interpret the permanent and transitory components as the natural rate and gap, respectively, though some economists may disagree with such interpretation, in which case one can consider the permanent component of the natural rate.Phelps (1995) defines the natural rate as the current stationary rate (p. 16) or the equilibrium steady state path (pp. 29–30). Woodford (2003, pp. 8–9) defines the natural rate as the equilibrium rate under flexible prices, which may not be in the steady state. Kiley (2013) compares alternative definitions empirically using a new Keynesian DSGE model for the US, and obtains similar estimates of the output gap. Laubach and Williams (2016) and Holston et al. (2017) note that these long-run and short-run views are complementary. Del Negro et al. (2017) also distinguish the natural rate and its low-frequency component. See also Glick (2020, sec. 2.2) for alternative definitions of the natural rate of interest r*.If shocks affecting the two components differ, then policy prescriptions for promoting growth and stabilizing cycles differ. Thus it is useful to decompose economic fluctuations into the two components.Among such decomposition methods, this paper focuses on the multivariate Beveridge–Nelson (B–N) decomposition, which decomposes a multivariate I(1) or CI(1,1) series into a random walk permanent component and an I(0) transitory component, assuming a linear time series model such as a VAR model or a vector error-correction model (VECM) for the differenced series. In practice, however, some series may be I(2), e.g., log output in some countries, in which case one must decompose I(1) and I(2) series jointly. Murasawa (2015) develops the multivariate B–N decomposition of I(1) and I(2) series.As Murasawa (2015) shows, for non-US data, the B–N decomposition assuming I(1) log output often gives an unreasonable estimate of the output gap, perhaps because of possible structural breaks in the mean output growth rate; see Figure 1. Kamber, Morley, and Wong (2018, p. 563) explain,…if there is a large reduction in the long-run growth rate, a forecasting model that fails to account for it will keep anticipating faster growth than actually occurs after the break, leading to a persistently negative estimate of the output gap based on the BN decomposition.In this quote, Kamber et al. (2018) seem to consider the output growth rate gap. If one fails to account for a large reduction in the true mean growth rate μ*, then the output growth rate Δln Yttends to be below the assumed mean growth rate μ; i.e., the output growth rate gap Δln Yt− μ indeed tends to be negative. If log output is I(1), so that the output growth rate is I(0), then with positive serial correlation, the future Δln Yt− μ tends to be negative, implying that the current output gap is positive, as shown in Figure 1.Figure 1:Output gap estimates in Japan given by the multivariate B–N decomposition assuming I(1) or I(2) log output. The plots replicate those in Murasawa (2015, Figures 2 and 3). The shaded areas indicate the recessions determined by the Cabinet Office.Assuming I(2) log output and hence I(1) output growth rate, one introduces a stochastic trend in the output growth rate, which captures possible structural breaks in the mean growth rate automatically in real time without specifying break dates a priori. Thus the B–N decomposition assuming I(2) log output gives a more ‘reasonable’ estimate of the output gap that fluctuates around 0; see Figure 1.This paper extends Murasawa (2015) in two ways. First, we allow for cointegration in the multivariate B–N decomposition of I(1) and I(2) series. Recall that the consumption Euler equation in a simple macroeconomic model implies a dynamic IS equation such that for all t,(1)Et(ΔlnYt+1)=1σ(rt−ρ)$${E}_{t}({\Delta}\,\mathrm{ln}\,{Y}_{t+1})=\frac{1}{\sigma }({r}_{t}-\rho )$$where Ytis output, rtis the real interest rate, ρ is the discount rate, and σ is the curvature of the utility of consumption; see Galí (2015, pp. 21–23). This equation implies that if 0 < σ < ∞, then the output growth rate and the real interest rate are of the same order of integration; thus if the real interest rate is I(1), then so is the output growth rate with possible cointegration, and log output is I(2). This observation motivates our development of the multivariate B–N decomposition of I(1) and I(2) series with cointegration.Since the B–N decomposition requires only a reduced-form forecasting model, we do not assume Eq. (1), which is just a motivating example for considering some cointegration. Though we assume that {Δln Yt} is I(1), our method is valid even if it is I(0), since we can treat an I(0) series as an I(1) series cointegrated with itself, i.e., ‘pseudo’ cointegration; see Fisher et al. (2016, section 2.2).,One can think of this paper as a reduced-form VECM version of Laubach and Williams (2003), though we include more variables. To estimate the natural rate of interest, they start from the consumption Euler equation, and assume I(2) log output. In one specification, they also assume cointegration between the output growth rate and the real interest rate.Second, we apply Bayesian analysis to obtain error bands for the components, building on recent developments in Bayesian analysis of a VECM. Since cointegrating vectors require normalization, our parameter of interest is in fact the cointegrating space rather than cointegrating vectors. Strachan and Inder (2004) use a matrix angular central Gaussian (MACG) distribution proposed by Chikuse (1990) as a prior on the cointegrating space. Koop, León-González, and Strachan (2010) propose two Gibbs samplers, a collapsed Gibbs sampler and a parameter-augmented Gibbs sampler, for posterior simulation of such a model. Since one often has prior information on the steady state of a system, Villani (2009) specifies a prior on the steady state form of a VECM. Since some hyperparameters such as the tightness (shrinkage) hyperparameter on the VAR coefficients are difficult to choose, Giannone, Lenza, and Primiceri (2015) use a hierarchical prior. Using these ideas, we develop a Gibbs sampler to simulate the joint posterior distribution of the components.Since we assume a VECM as the true reduced-form model, the B–N decomposition defines the permanent and transitory components; hence it suffices for our error bands to reflect only parameter uncertainty. If we instead assume an unobserved components (UC) model as the true model, then the B–N decomposition estimates the components, and we must consider estimation errors as well. See Morley (2011) on this distinction.As an application, we simulate the joint posterior distribution of the natural rates (or their permanent components) and gaps of output, inflation, interest, and unemployment in the US during 1950Q1–2018Q4. To apply the Bayesian multivariate B–N decomposition of I(1) and I(2) series with cointegration, we assume a four-variate VAR model for the output growth rate, the CPI inflation rate, the short-term interest rate, and the unemployment rate, and estimate it in the VECM form. The Bayes factors give decisive evidences that the cointegrating rank is 2. The posterior medians of the gaps seem reasonable compared to previous works that focus on a particular natural rate or gap. The posterior probability of positive gap is useful when the sign of the gap is uncertain. The Phillips curve and Okun’s law hold between the gaps despite that we do not impose such relations. The dynamic IS equation holds between the natural rates of output and interest, though the evidence is weak. Comparisons of alternative model specifications show not only that assuming I(2) log output gives a more ‘reasonable’ estimate of the output gap, but also that allowing for cointegration gives much bigger estimates of the gaps for all variables.The paper proceeds as follows. Section 2 reviews the literature on the B–N decomposition. Section 3 derives the multivariate B–N decomposition of I(1) and I(2) series with cointegration. Section 4 specifies our model and prior, and explains our posterior simulation and model evaluation. Section 5 applies the method to US data. Section 6 discusses remaining issues. The Appendix gives the details of the derivation of our algorithm. We use the notation proposed by Abadir and Magnus (2002).2LiteratureBeveridge and Nelson (1981) give operational definitions of the permanent and transitory components, show that one can express any I(1) series as the sum of a random walk permanent component and an I(0) transitory component, and propose the B–N decomposition of a univariate I(1) series, assuming an ARIMA model.Beveridge and Nelson (1981) reverse the sign of the transitory component. Nelson (2008) explains why they had to do so.Multivariate extension of the B–N decomposition is straightforward. Evans (1989a, 1989b) and Evans and Reichlin (1994) apply the B–N decomposition to a multivariate series consisting of I(0) and I(1) series, assuming a VAR model for the stationarized series. Evans and Reichlin (1994) show that the transitory components are no smaller with the multivariate B–N decomposition than with the univariate one. This is because the transitory components are ‘forecastable movements’ (Rotemberg and Woodford 1996), and multivariate models forecast no worse than univariate models, using more information. King et al. (1991) and Cochrane (1994) apply the B–N decomposition to a CI(1,1) series, assuming a VECM.Morley (2002) gives a general framework for the B–N decomposition, using a state space representation of the assumed linear time series model. Garratt, Robertson, and Wright (2006) note that if the state vector is observable as in a VAR model or a VECM, then the transitory component is an explicit weighted sum of the observables given the model parameters; thus the multivariate B–N decomposition based on a VAR model or a VECM is transparent. They also note that the result of the multivariate B–N decomposition depends strongly on the assumed cointegrating rank.The B–N decomposition also applies to an I(2) series. Newbold and Vougas (1996), Oh and Zivot (2006), and Oh, Zivot, and Creal (2008) extend the B–N decomposition to a univariate I(2) series. Murasawa (2015) extends the method to a multivariate series consisting of I(1) and I(2) series.One can apply Bayesian analysis to obtain error bands for the components. This approach is useful especially when the state vector is observable as in a VAR model or a VECM, in which case the components are explicit functions of the model parameters and observables; thus the joint posterior distribution of the model parameters directly translates into that of the components. Murasawa (2014) uses a Bayesian VAR model to obtain error bands for the components.Kiley (2013) uses a Bayesian DSGE model, but gives no error band for the components. Del Negro et al. (2017) use a Bayesian DSGE model and give error bands for the components. They also use a multivariate unobserved components (UC) model, where the permanent components have a factor structure and the transitory components follow a VAR model. Bayesian analysis of a UC model requires state smoothing, since the state vector is unobservable given the model parameters; thus it is less straightforward than that of a VAR model. Morley and Wong (2020) use a large Bayesian VAR model and give error bands for the components, but they do not take parameter uncertainty into account.Morley and Wong (2020) assume a UC model as the true model, and interpret the B–N decomposition as a way to estimate the components; hence their error bands reflect estimation errors of the components, but not parameter uncertainty.There seems no previous work that uses a Bayesian VECM to obtain error bands for the components.Cogley et al. (2010) use a Bayesian time-varying parameter VAR model, which is a nonlinear time series model. They still apply the B–N decomposition and give error bands for the components, pretending at each date that the VAR coefficients no longer vary. They justify their approach as an approximation based on an ‘anticipated-utility’ model. See also Cogley and Sargent (2002, 2005 and Cogley et al. (2005).3Model specification3.1VAR modelLet for d = 1, 2, xt,d$\left\{{\boldsymbol{x}}_{t,d}\right\}$be an Nd-variate I(d) sequence. Let N ≔ N1 + N2. Let for all t, xt≔xt,1′,xt,2′′${\boldsymbol{x}}_{t}{:=}{\left({\boldsymbol{x}}_{t,1}^{\prime },{\boldsymbol{x}}_{t,2}^{\prime }\right)}^{\prime }$, yt,1 ≔ xt,1, yt,2 ≔ Δxt,2, and yt≔yt,1′,yt,2′′${\boldsymbol{y}}_{t}{:=}{\left({\boldsymbol{y}}_{t,1}^{\prime },{\boldsymbol{y}}_{t,2}^{\prime }\right)}^{\prime }$, where the prime (′) denotes the transpose of a vector or a matrix, so that {yt} is I(1). Assume also that {yt} is CI(1,1) with cointegrating rank r.Our assumption includes the case where some series in {yt} are I(0), since we can treat an I(0) series as an I(1) series cointegrated with itself, i.e., ‘pseudo’ cointegration; see Fisher, Huh, and Pagan (2016, section 2.2).Let for d = 1, 2, μd≔ E(Δyt,d). Let μ≔μ1′,μ2′′$\boldsymbol{\mu }{:=}{\left({\boldsymbol{\mu }}_{1}^{\prime },{\boldsymbol{\mu }}_{2}^{\prime }\right)}^{\prime }$. Let {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$be such that for all t,(2)yt=α+μt+yt*$${\boldsymbol{y}}_{t}=\boldsymbol{\alpha }+\boldsymbol{\mu }t+{\boldsymbol{y}}_{t}^{\ast }$$where αis an intercept vector. Assume a VAR(p + 1) model for {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$such that for all t,By starting from a VAR model for I(1) series {yt}, we exclude cointegration among I(2) series {xt,2} nor multicointegration (polynomial cointegration) studied by Johansen (1995), Granger and Lee (1989, 1990, and Gregoir and Laroque (1993, 1994. We leave extensions of our Bayesian multivariate B–N decomposition to such cases for future work.(3)Π(L)yt*=ut$$\boldsymbol{\mathit\Pi }(\mathrm{L}){\boldsymbol{y}}_{t}^{\ast }={\boldsymbol{u}}_{t}$$(4){ut}∼WN(Σ)$$\left\{{\boldsymbol{u}}_{t}\right\}\sim \mathrm{W}\mathrm{N}(\boldsymbol{\mathit\Sigma })$$where Π(L) is a (p + 1)th-order polynomial matrix in the lag operator L and WN(Σ) means ‘(multivariate) white noise sequence with variance–covariance matrix Σ’.3.2VECM representationWriteΠ(L)=Π(1)L+Φ(L)(1−L)$$\boldsymbol{\mathit\Pi }(\mathrm{L})=\boldsymbol{\mathit\Pi }(1)\mathrm{L}+\boldsymbol{\mathit\Phi }(\mathrm{L})(1-\mathrm{L})$$where Φ(L) ≔ (Π(L) − Π(1)L)/(1 − L). Then we have a VECM of order p for {Δyt*}$\left\{{\Delta}{\boldsymbol{y}}_{t}^{\ast }\right\}$such that for all t,(5)Φ(L)Δyt*=−Π(1)yt−1*+ut=−ΛΓ′yt−1*+ut\begin{align}\hfill \boldsymbol{\mathit\Phi }(\mathrm{L}){\Delta}{\boldsymbol{y}}_{t}^{\ast }& =-\boldsymbol{\mathit\Pi }(1){\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t}\hfill \\ \hfill & =-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t}\hfill \end{align}where Λ,Γ∈RN×r$\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma }\in {\mathbb{R}}^{N\times r}$. Since {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$is CI(1,1), the roots of det(Π(z)) = 0 must lie on or outside the unit circle. This requirement gives an implicit restriction on the VECM parameters (Φ(.), Λ, Γ).We can write for all t,(6)Φ(L)(Δyt−μ)=−ΛΓ′[yt−1−α−μ(t−1)]+ut=−Λ[Γ′yt−1−β−δ(t−1)]+ut\begin{align}\hfill \boldsymbol{\mathit\Phi }(\mathrm{L})({\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu })& =-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }[{\boldsymbol{y}}_{t-1}-\boldsymbol{\alpha }-\boldsymbol{\mu }(t-1)]+{\boldsymbol{u}}_{t}\hfill \\ \hfill & =-\boldsymbol{\mathit\Lambda }[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-\boldsymbol{\delta }(t-1)]+{\boldsymbol{u}}_{t}\hfill \end{align}where β≔ Γ′αand δ≔ Γ′μ. Though slightly different, the last expression is essentially a steady state VECM suggested by Villani (2009, p. 633). We have for all t,(7)E(Δyt)=μ$$E({\Delta}{\boldsymbol{y}}_{t})=\boldsymbol{\mu }$$(8)E(Γ′yt)=β+δt$$E({\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t})=\boldsymbol{\beta }+\boldsymbol{\delta }t$$which help us to specify an informative prior on (μ, β, δ). Thus a steady state VECM is useful for Bayesian analysis.3.3State space representationAssume that p ≥ 1. We have for all t,Γ′yt*=Γ′(yt−1*+Φ1Δyt−1*+⋯+ΦpΔyt−p*−ΛΓ′yt−1*+ut)=Γ′Φ1Δyt−1*+⋯+Γ′ΦpΔyt−p*+(Ir−Γ′Λ)Γ′yt−1*+Γ′ut\begin{align*}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}^{\ast }& ={\boldsymbol{\mathit\Gamma }}^{\prime }({\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{\mathit\Phi }}_{1}{\Delta}{\boldsymbol{y}}_{t-1}^{\ast }+\cdots +{\boldsymbol{\mathit\Phi }}_{p}{\Delta}{\boldsymbol{y}}_{t-p}^{\ast }-\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{u}}_{t})\hfill \\ \hfill & ={\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}{\Delta}{\boldsymbol{y}}_{t-1}^{\ast }+\cdots +{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}{\Delta}{\boldsymbol{y}}_{t-p}^{\ast }+({\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda }){\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}^{\ast }+{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{u}}_{t}\hfill \end{align*}or(9)Γ′yt−β−δt=Γ′Φ1(Δyt−1−μ)+⋯+Γ′Φp(Δyt−p−μ)+(Ir−Γ′Λ)[Γ′yt−1−β−δ(t−1)]+Γ′ut\begin{align}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}-\boldsymbol{\beta }-\boldsymbol{\delta }t& ={\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}({\Delta}{\boldsymbol{y}}_{t-1}-\boldsymbol{\mu })+\cdots +{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}({\Delta}{\boldsymbol{y}}_{t-p}-\boldsymbol{\mu })\hfill \\ \hfill & \quad +({\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda })[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-\boldsymbol{\delta }(t-1)]+{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{u}}_{t}\hfill \end{align}Let stbe a state vector such that for all t,st≔Δyt−μ⋮Δyt−p+1−μΓ′yt−β−δt$${\boldsymbol{s}}_{t}{:=}\left(\begin{matrix}\hfill {\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu }\hfill \\ \hfill {\vdots}\hfill \\ \hfill {\Delta}{\boldsymbol{y}}_{t-p+1}-\boldsymbol{\mu }\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t}-\boldsymbol{\beta }-\boldsymbol{\delta }t\hfill \end{matrix}\right)$$which is I(0) and observable given the model parameters. A state space representation of the steady state VECM is for all t,(10)st=Ast−1+Bzt$${\boldsymbol{s}}_{t}=\boldsymbol{A}{\boldsymbol{s}}_{t-1}+\boldsymbol{B}{\boldsymbol{z}}_{t}$$(11)Δyt=μ+Cst$${\Delta}{\boldsymbol{y}}_{t}=\boldsymbol{\mu }+\boldsymbol{C}{\boldsymbol{s}}_{t}$$(12){zt}∼WN(IN)$$\left\{{\boldsymbol{z}}_{t}\right\}\sim \mathrm{W}\mathrm{N}({\boldsymbol{I}}_{N})$$whereA≔Φ1…Φp−ΛI(p−1)NO(p−1)N×NO(p−1)N×rΓ′Φ1…Γ′ΦpIr−Γ′ΛB≔Σ1/2O(p−1)N×NΓ′Σ1/2C≔INON×(p−1)NON×r\begin{align*}\hfill \boldsymbol{A}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{\mathit\Phi }}_{1}\hfill & \hfill \dots & \hfill {\boldsymbol{\mathit\Phi }}_{p}\hfill & \hfill -\boldsymbol{\mathit\Lambda }\hfill \\ \hfill \hfill & \hfill {\boldsymbol{I}}_{(p-1)N}\hfill & \hfill {\mathbf{O}}_{(p-1)N\times N}\hfill & \hfill {\mathbf{O}}_{(p-1)N\times r}\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{1}\hfill & \hfill \dots & \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Phi }}_{p}\hfill & \hfill {\boldsymbol{I}}_{r}-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Lambda }\hfill \end{matrix}\right]\hfill \\ \hfill \boldsymbol{B}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{\mathit\Sigma }}^{1/2}\hfill \\ \hfill {\mathbf{O}}_{(p-1)N\times N}\hfill \\ \hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\mathit\Sigma }}^{1/2}\hfill \end{matrix}\right]\hfill \\ \hfill \boldsymbol{C}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{I}}_{N}\hfill & \hfill {\mathbf{O}}_{N\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{N\times r}\hfill \end{matrix}\right]\hfill \end{align*}with Om×ndenoting an m × n null matrix. Note that {st} is I(0) if and only if the eigenvalues of Alie inside the unit circle.We have for all t, for h ≥ 0,Et(Δyt+h)=μ+CAhst$${E}_{t}({\Delta}{\boldsymbol{y}}_{t+h})=\boldsymbol{\mu }+\boldsymbol{C}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$or(13)Et(Δxt+h,1)=μ1+C1Ahst$${E}_{t}({\Delta}{\boldsymbol{x}}_{t+h,1})={\boldsymbol{\mu }}_{1}+{\boldsymbol{C}}_{1}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$(14)EtΔ2xt+h,2=μ2+C2Ahst$${E}_{t}\left({{\Delta}}^{2}{\boldsymbol{x}}_{t+h,2}\right)={\boldsymbol{\mu }}_{2}+{\boldsymbol{C}}_{2}{\boldsymbol{A}}^{h}{\boldsymbol{s}}_{t}$$whereC1≔IN1ON1×N2ON1×(p−1)NON1×rC2≔ON2×N1IN2ON2×(p−1)NON2×r\begin{align*}\hfill {\boldsymbol{C}}_{1}& {:=}\left[\begin{matrix}\hfill {\boldsymbol{I}}_{{N}_{1}}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times {N}_{2}}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{{N}_{1}\times r}\hfill \end{matrix}\right]\hfill \\ \hfill {\boldsymbol{C}}_{2}& {:=}\left[\begin{matrix}\hfill {\mathbf{O}}_{{N}_{2}\times {N}_{1}}\hfill & \hfill {\boldsymbol{I}}_{{N}_{2}}\hfill & \hfill {\mathbf{O}}_{{N}_{2}\times (p-1)N}\hfill & \hfill {\mathbf{O}}_{{N}_{2}\times r}\hfill \end{matrix}\right]\hfill \end{align*}3.4Multivariate B–N decompositionIntroducing cointegration changes the state space model, but the formulae for the multivariate B–N decomposition of I(1) and I(2) series given by Murasawa (2015, Theorem 1) remain almost unchanged. Let xt*${\boldsymbol{x}}_{t}^{\ast }$and ctbe the B–N permanent and transitory components in xt, respectively.Theorem 1Suppose that the eigenvalues of Alie inside the unit circle. Then for all t,(15)xt,1*=limT→∞(Et(xt+T,1)−Tμ1)$${\boldsymbol{x}}_{t,1}^{\ast }=\underset{T\to \infty }{\mathrm{lim}}\;({E}_{t}\;({\boldsymbol{x}}_{t+T,1})-T{\boldsymbol{\mu }}_{1})$$(16)xt,2*=limT→∞Et(xt+T,2)−T2μ22−Tμ22+Δxt,2+C2(IpN+r−A)−1Ast$${\boldsymbol{x}}_{t,2}^{\ast }=\underset{T\to \infty }{\mathrm{lim}}\left\{{E}_{t}({\boldsymbol{x}}_{t+T,2})-{T}^{2}\frac{{\boldsymbol{\mu }}_{2}}{2}-T\left[\frac{{\boldsymbol{\mu }}_{2}}{2}+{\Delta}{\boldsymbol{x}}_{t,2}+{\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}{\boldsymbol{s}}_{t}\right]\right\}$$(17)ct,1=−C1(IpN+r−A)−1Ast$${\boldsymbol{c}}_{t,1}=-{\boldsymbol{C}}_{1}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}{\boldsymbol{s}}_{t}$$(18)ct,2=C2(IpN+r−A)−2A2st$${\boldsymbol{c}}_{t,2}={\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-2}{\boldsymbol{A}}^{2}{\boldsymbol{s}}_{t}$$ProofSee Murasawa (2015, pp. 158–159). □LetW≔−C1(IpN+r−A)−1AC2(IpN+r−A)−2A2$$\boldsymbol{W}{:=}\left[\begin{matrix}\hfill -{\boldsymbol{C}}_{1}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-1}\boldsymbol{A}\hfill \\ \hfill {\boldsymbol{C}}_{2}{({\boldsymbol{I}}_{pN+r}-\boldsymbol{A})}^{-2}{\boldsymbol{A}}^{2}\hfill \end{matrix}\right]$$Then for all t,(19)ct=Wst$${\boldsymbol{c}}_{t}=\boldsymbol{W}{\boldsymbol{s}}_{t}$$where Wdepends only on the VECM coefficients and {st} is observable given the model parameters. This observation is useful for Bayesian analysis of {ct}.4Bayesian analysis4.1Conditional likelihood functionAssume Gaussian innovations for Bayesian analysis, and write the VECM as for all t,(20)Δyt−μ=Φ1(Δyt−1−μ)+⋯+Φp(Δyt−p−μ)−Λ[Γ′yt−1−β−Γ′μ(t−1)]+ut$${\Delta}{\boldsymbol{y}}_{t}-\boldsymbol{\mu }={\boldsymbol{\mathit\Phi }}_{1}({\Delta}{\boldsymbol{y}}_{t-1}-\boldsymbol{\mu })+\cdots +{\boldsymbol{\mathit\Phi }}_{p}({\Delta}{\boldsymbol{y}}_{t-p}-\boldsymbol{\mu })-\boldsymbol{\mathit\Lambda }[{\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{y}}_{t-1}-\boldsymbol{\beta }-{\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mu }(t-1)]+{\boldsymbol{u}}_{t}$$(21){ut}∼INN0N,P−1$$\left\{{\boldsymbol{u}}_{t}\right\}\sim {\mathrm{I}\mathrm{N}}_{N}\left({\mathbf{0}}_{N},{\boldsymbol{P}}^{-1}\right)$$where INN0N,P−1${\mathrm{I}\mathrm{N}}_{N}\left({\mathbf{0}}_{N},{\boldsymbol{P}}^{-1}\right)$means ‘independent N-variate normal distributions with mean vector 0N(null vector) and variance–covariance matrix P−1’. Let ψ≔ (β′, μ′)′, Φ≔ [Φ1, …, Φp], and Y≔ [y0, …, yT].With a slight abuse of notation, Φdenotes the VAR coefficient matrices in Φ(.).By the prediction error decomposition, the joint pdf of YisWe use the same p(.) to denote different distributions, which is common in the Bayesian literature; see, e.g., Gelman et al. (2014, p. 6).(22)p(Y|ψ,Φ,P,Λ,Γ)=p(ΔyT,…,Δyp+1,sp|ψ,Φ,P,Λ,Γ)=∏t=p+1Tp(Δyt|st−1,ψ,Φ,P,Λ,Γ)p(sp|ψ,Φ,P,Λ,Γ)\begin{align}\hfill p(\boldsymbol{Y}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })& =p({\Delta}{\boldsymbol{y}}_{T},\dots ,{\Delta}{\boldsymbol{y}}_{p+1},{\boldsymbol{s}}_{p}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\hfill \\ \hfill & =\prod\limits _{t=p+1}^{T}p({\Delta}{\boldsymbol{y}}_{t}\vert {\boldsymbol{s}}_{t-1},\boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })p({\boldsymbol{s}}_{p}\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\hfill \end{align}Our Bayesian analysis relies on ∏t=p+1Tp(Δyt|st−1,ψ,Φ,P,Λ,Γ)${\prod }_{t=p+1}^{T}p({\Delta}{\boldsymbol{y}}_{t}\vert {\boldsymbol{s}}_{t-1},\boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$, which is equivalent to the conditional likelihood function of (ψ, Φ, P, Λ, Γ) given sp.4.2IdentificationIdentification of (Λ, Γ) from Π(1) requires some restrictions, though such identification is unnecessary for the multivariate B–N decomposition. LetΛ*≔Λ(Γ′Γ)1/2,Γ*≔Γ(Γ′Γ)−1/2$${\boldsymbol{\mathit\Lambda }}_{\ast }{:=}\boldsymbol{\mathit\Lambda }{({\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Gamma })}^{1/2},\quad {\boldsymbol{\mathit\Gamma }}_{\ast }{:=}\boldsymbol{\mathit\Gamma }{({\boldsymbol{\mathit\Gamma }}^{\prime }\boldsymbol{\mathit\Gamma })}^{-1/2}$$Then Λ*Γ*′ = ΛΓ′ and Γ*′Γ* = Ir. These restrictions are common, but do not identify the signs of Λ* and Γ*.Alternatively, we can apply linear normalization. Write Γ≔ [Γ1′, Γ2′]′, where Γ1 is r × r and Γ2 is (N − r) × r. LetΛ̄≔ΛΓ1,Γ̄≔ΓΓ1−1$$\bar{\boldsymbol{\mathit\Lambda }}{:=}\boldsymbol{\mathit\Lambda }{\boldsymbol{\mathit\Gamma }}_{1},\quad \bar{\boldsymbol{\mathit\Gamma }}{:=}\boldsymbol{\mathit\Gamma }{\boldsymbol{\mathit\Gamma }}_{1}^{-1}$$Then we can identify Λ̄,Γ̄$\left(\bar{\boldsymbol{\mathit\Lambda }},\bar{\boldsymbol{\mathit\Gamma }}\right)$. Let β̄≔Γ̄′α=Γ1−1′β$\bar{\boldsymbol{\beta }}{:=}{\bar{\boldsymbol{\mathit\Gamma }}}^{\prime }\boldsymbol{\alpha }={\left({\boldsymbol{\mathit\Gamma }}_{1}^{-1}\right)}^{\prime }\boldsymbol{\beta }$and ψ̄≔β̄′,μ′′$\bar{\boldsymbol{\psi }}{:=}{\left({\bar{\boldsymbol{\beta }}}^{\prime },{\boldsymbol{\mu }}^{\prime }\right)}^{\prime }$correspondingly. Note that αis not identifiable from the VECM.4.3Prior4.3.1Steady state parametersAssume that(23)p(α,μ,Φ,P,Λ,Γ)=p(α,μ)p(Φ,P,Λ,Γ)$$p(\boldsymbol{\alpha },\boldsymbol{\mu },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })=p(\boldsymbol{\alpha },\boldsymbol{\mu })p(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$$Assume a normal prior on (α, μ) such that(24)αμ∼N2Nα0μ0,Q0,α−1ON×NON×NQ0,μ−1$$\left(\begin{matrix}\hfill \boldsymbol{\alpha }\hfill \\ \hfill \boldsymbol{\mu }\hfill \end{matrix}\right)\sim {\mathrm{N}}_{2N}\left(\left(\begin{matrix}\hfill {\boldsymbol{\alpha }}_{0}\hfill \\ \hfill {\boldsymbol{\mu }}_{0}\hfill \end{matrix}\right),\left[\begin{matrix}\hfill {\boldsymbol{Q}}_{0,\boldsymbol{\alpha }}^{-1}\hfill & \hfill {\mathbf{O}}_{N\times N}\hfill \\ \hfill {\mathbf{O}}_{N\times N}\hfill & \hfill {\boldsymbol{Q}}_{0,\boldsymbol{\mu }}^{-1}\hfill \end{matrix}\right]\right)$$where Q0, αand Q0, μare the prior precision matrices of αand μ, respectively. Since β≔ Γ′α, this prior implies a prior on ψsuch that(25)ψ|Γ∼Nr+Nψ0,Q0−1$$\boldsymbol{\psi }\vert \boldsymbol{\mathit\Gamma }\sim {\mathrm{N}}_{r+N}\left({\boldsymbol{\psi }}_{0},{\boldsymbol{Q}}_{0}^{-1}\right)$$whereψ0≔Γ′α0μ0,Q0≔Γ′Q0,α−1Γ−1Or×rON×NQ0,μ$${\boldsymbol{\psi }}_{0}{:=}\left(\begin{matrix}\hfill {\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{\alpha }}_{0}\hfill \\ \hfill {\boldsymbol{\mu }}_{0}\hfill \end{matrix}\right),\quad {\boldsymbol{Q}}_{0}{:=}\left[\begin{matrix}\hfill {\left({\boldsymbol{\mathit\Gamma }}^{\prime }{\boldsymbol{Q}}_{0,\boldsymbol{\alpha }}^{-1}\boldsymbol{\mathit\Gamma }\right)}^{-1}\hfill & \hfill {\mathbf{O}}_{r\times r}\hfill \\ \hfill {\mathbf{O}}_{N\times N}\hfill & \hfill {\boldsymbol{Q}}_{0,\boldsymbol{\mu }}\hfill \end{matrix}\right]$$which depends on Γin general.4.3.2VAR parametersLet S be the set of (Φ, Λ, Γ) such that the eigenvalues of Alie inside the unit circle, so that {yt*}$\left\{{\boldsymbol{y}}_{t}^{\ast }\right\}$is CI(1,1). Write(26)p(Φ,P,Λ,Γ)∝p*(Φ,P,Λ,Γ)[(Φ,Λ,Γ)∈S]$$p(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\propto {p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })[(\boldsymbol{\mathit\Phi },\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })\in S]$$where p*(.) is the pdf with no truncation of (Φ, Λ, Γ) and [.] is the indicator function that truncates the support of p*(.) to S. Assume that(27)p*(Φ,P,Λ,Γ)=p*(Φ,P)p*(Λ,Γ)$${p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })={p}^{\ast }(\boldsymbol{\mathit\Phi },\boldsymbol{P}){p}^{\ast }(\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })$$Assume a hierarchical normal-Wishart prior on (Φ, P) such that(28)Φ|P,ν∼NN×pNM0;P−1,(νD0)−1$$\boldsymbol{\mathit\Phi }\vert \boldsymbol{P},\nu \sim {\mathrm{N}}_{N\times \mathrm{p}N}\left({\boldsymbol{M}}_{0};{\boldsymbol{P}}^{-1},{(\nu {\boldsymbol{D}}_{0})}^{-1}\right)$$(29)P∼WNk0;S0−1$$\boldsymbol{P}\sim {\mathrm{W}}_{N}\left({k}_{0};{\boldsymbol{S}}_{0}^{-1}\right)$$(30)ν∼GamA02,B02$$\nu \sim \mathrm{G}\mathrm{a}\mathrm{m}\left(\frac{{A}_{0}}{2},\frac{{B}_{0}}{2}\right)$$where ν is a hyperparameter that controls the tightness of the prior on the VAR coefficients, WNk0;S0−1${\mathrm{W}}_{N}\left({k}_{0};{\boldsymbol{S}}_{0}^{-1}\right)$means ‘an (N × N)-variate Wishart distribution with k0 degrees of freedom and scale matrix S0−1${\boldsymbol{S}}_{0}^{-1}$’, and Gam(A0/2, B0/2) means ‘a gamma distribution with shape A0/2 and rate B0/2’. Since it is often difficult to choose ν a priori, we assume a gamma prior on ν.4.3.3Cointegrating spaceAssume that(31)p*(Λ,Γ)=p*(Λ)p*(Γ)$${p}^{\ast }(\boldsymbol{\mathit\Lambda },\boldsymbol{\mathit\Gamma })={p}^{\ast }(\boldsymbol{\mathit\Lambda }){p}^{\ast }(\boldsymbol{\mathit\Gamma })$$Assume a normal prior on Λsuch that(32)Λ∼NN×rON×r;(η0G0)−1,Ir$$\boldsymbol{\mathit\Lambda }\sim {\mathrm{N}}_{N\times r}\left({\mathbf{O}}_{N\times r};{({\eta }_{0}{\boldsymbol{G}}_{0})}^{-1},{\boldsymbol{I}}_{r}\right)$$where η0G0 is the prior precision matrix of each column of Λ.Let VrRN${V}_{r}\left({\mathbb{R}}^{N}\right)$be the r-dimensional Steifel manifold in RN${\mathbb{R}}^{N}$, i.e.,VrRN≔H∈RN×r:H′H=Ir$${V}_{r}\left({\mathbb{R}}^{N}\right){:=}\left\{\boldsymbol{H}\in {\mathbb{R}}^{N\times r}:{\boldsymbol{H}}^{\prime }\boldsymbol{H}={\boldsymbol{I}}_{r}\right\}$$Let H0∈VrRN${\boldsymbol{H}}_{0}\in {V}_{r}\left({\mathbb{R}}^{N}\right)$. Let H(.) be such that ∀ τ ≥ 0,H(τ)≔H0H0′+τH0⊥H0⊥′$$\boldsymbol{H}(\tau ){:=}{\boldsymbol{H}}_{0}{\boldsymbol{H}}_{0}^{\prime }+\tau {\boldsymbol{H}}_{0\perp }{\boldsymbol{H}}_{0\perp }^{\prime }$$where H0⊥∈VN−rRN${\boldsymbol{H}}_{0\perp }\in {V}_{N-r}\left({\mathbb{R}}^{N}\right)$lies in the orthogonal complement of the column space of H0, so that H(0) = H0H0′ is of rank r and H(1) = IN. Assume a normal prior on Γsuch that(33)Γ∼NN×rON×r;H(τ0)−1,Ir$$\boldsymbol{\mathit\Gamma }\sim {\mathrm{N}}_{N\times r}\left({\mathbf{O}}_{N\times r};\boldsymbol{H}{({\tau }_{0})}^{-1},{\boldsymbol{I}}_{r}\right)$$where τ0 ≠ 0. This prior on Γimplies a prior on Γ* such that(34)Γ*∼MACGN×rH(τ0)−1$${\boldsymbol{\mathit\Gamma }}_{\ast }\sim {\mathrm{M}\mathrm{A}\mathrm{C}\mathrm{G}}_{N\times r}\left(\boldsymbol{H}{({\tau }_{0})}^{-1}\right)$$where τ0 ≔ 1 implies the flat prior on VrRN${V}_{r}\left({\mathbb{R}}^{N}\right)$; see Chikuse (2003, sec. 2.4.2). Note that p*(Λ*|Γ) is normal, but p*(Λ*|Γ*) is not; see Koop, León-González, and Strachan (2010, p. 232), who use these priors on (Λ, Γ) for their parameter-augmented Gibbs sampler.4.4Posterior simulationWe simulate p(ψ, Φ, P, Λ, Γ, ν|Y) by a Gibbs sampler consisting of five blocks:1.Draw ψfrom p(ψ|Φ, P, Λ, Γ, ν, Y) = p(ψ|Φ, P, Λ, Γ, Y).2.Draw (Φ, P) from p*(Φ, P|ψ, Λ, Γ, ν, Y).3.Draw Λfrom p*(Λ|ψ, Φ, P, Γ, ν, Y) = p*(Λ|ψ, Φ, P, Γ, Y).4.Draw Γfrom p*(Γ|ψ, Φ, P, Λ, ν, Y) = p*(Γ|ψ, Φ, P, Λ, Y). Accept the draw if (Φ, Λ, Γ) ∈ S; otherwise go back to step 2 and draw another (Φ, P, Λ, Γ).5.Draw ν from p(ν|ψ, Φ, P, Λ, Γ, Y) = p(ν|Φ, P).The first block builds on Villani (2009); the second block is standard; the third and fourth blocks follow the parameter-augmented Gibbs sampler proposed by Koop, León-González, and Strachan (2010); the fifth block is standard. See the Appendix for the details of each block.4.5Bayes factorWe use the Bayes factor for Bayesian model selection. When choosing between nested models with certain priors, the Savage–Dickey (S–D) density ratio gives the Bayes factor without estimating the marginal likelihoods; see Wagenmakers et al. (2010) for a tutorial on the S–D method.We choose the cointegrating rank r. Consider comparing the following two models (hypotheses):(35)H0:rk(Π(1))=0 vs Hr:rk(Π(1))=r$${H}_{0}:rk(\boldsymbol{\mathit\Pi }(1))=0\quad \text{vs}\quad {H}_{r}:rk(\boldsymbol{\mathit\Pi }(1))=r$$Koop, León-González, and Strachan (2008, pp. 451–452) note that the problem is the same as comparing the following two nested models:(36)H0:Λ=ON×r vs Hr:Λ≠ON×r$${H}_{0}:\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\quad \text{vs}\quad {H}_{r}:\boldsymbol{\mathit\Lambda }\ne {\mathbf{O}}_{N\times r}$$The restriction (Φ, Λ, Γ) ∈ S is necessary only for the B–N decomposition, and unnecessary for forecasting in general; hence we ignore this restriction for the moment, so that the priors on Λand Γbecome independent.If the priors on Λand Γare dependent, then we can use the generalized S–D density ratio proposed by Verdinelli and Wasserman (1995).Then the S–D density ratio for H0 versus Hris(37)B0,r=p(Λ=ON×r|Y;Hr)p(Λ=ON×r|Hr)$${B}_{0,r}=\frac{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r})}{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {H}_{r})}$$The prior gives the denominator directly. For the numerator, we have(38)p(Λ|Y;Hr)=E(p(Λ|ψ,Φ,P,Γ,Y;Hr)|Y;Hr)$$p(\boldsymbol{\mathit\Lambda }\vert \boldsymbol{Y};{H}_{r})=E(p(\boldsymbol{\mathit\Lambda }\vert \boldsymbol{\psi },\boldsymbol{\mathit\Phi },\boldsymbol{P},\boldsymbol{\mathit\Gamma },\boldsymbol{Y};{H}_{r})\vert \boldsymbol{Y};{H}_{r})$$Let {ψl,Φl,Pl,Γl}l=1L${\left\{{\boldsymbol{\psi }}_{l},{\boldsymbol{\mathit\Phi }}_{l},{\boldsymbol{P}}_{l},{\boldsymbol{\mathit\Gamma }}_{l}\right\}}_{l=1}^{L}$be posterior draws. Letp̂(Λ=ON×r|Y;Hr)≔1L∑l=1Lp(Λ=ON×r|ψl,Φl,Pl,Γl,Y;Hr)$$\hat{p}(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r}){:=}\frac{1}{L}\sum\limits _{l=1}^{L}p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {\boldsymbol{\psi }}_{l},{\boldsymbol{\mathit\Phi }}_{l},{\boldsymbol{P}}_{l},{\boldsymbol{\mathit\Gamma }}_{l},\boldsymbol{Y};{H}_{r})$$See Appendix A.4 for the conditional posterior p(Λ|ψ, Φ, P, Γ, Y). An estimator of the S–D density ratio for H0 versus Hris(39)B̂0,r=p̂(Λ=ON×r|Y;Hr)p(Λ=ON×r|Hr)$${\hat{B}}_{0,r}=\frac{\hat{p}(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert \boldsymbol{Y};{H}_{r})}{p(\boldsymbol{\mathit\Lambda }={\mathbf{O}}_{N\times r}\vert {H}_{r})}$$5Application5.1DataWe consider joint estimation of the natural rates (or their permanent components) and gaps of the following four macroeconomic variables in the US:Output Let Ytbe output. Assume that { ln Yt} is I(2), so that {Δln Yt} is I(1).Inflation rate Let Ptbe the price level and πt≔ ln(Pt/Pt−1) be the inflation rate. Assume that {πt} is I(1).Interest rate Let Itbe the 3-month nominal interest rate (annual rate in per cent), it≔ ln(1 + It/400), rt≔ it− Et(πt+1) be the ex ante real interest rate, and r̂t≔it−πt+1${\hat{r}}_{t}{:=}{i}_{t}-{\pi }_{t+1}$be the ex post real interest rate. Assume that {rt} is I(1).We can estimate the interest rate gap even if we observe {r̂t}$\left\{{\hat{r}}_{t}\right\}$instead of {rt}. See Murasawa (2014, pp. 499–500).Unemployment rate Let Ltbe the labor force, Etbe employment, and Ut≔ − ln(Et/Lt) be the unemployment rate. Assume that {Ut} is I(1).Table 1 describes the data, which are available from FRED (Federal Reserve Economic Data). When monthly series are available, i.e., except for real GDP, we take the 3-month arithmetic means of monthly series each quarter to obtain quarterly series, from which we construct the quarterly inflation, interest, and unemployment rates as defined above.Table 1:Data.VariableDescriptionYtReal GDP (billions of chained 2012 dollars, SA, AR)PtCPI for all urban consumers: all items (1982–84 = 100, SA)It3-month treasury bill: secondary market rate (%, AR)LtCivilian labor force (thousands of persons, SA)EtCivilian employment level (thousands of persons, SA)SA means ‘seasonally-adjusted’; AR means ‘annual rate’.Let for all t,xt≔πtr̂tUtlnYt,yt≔πtr̂tUtΔ⁡lnYt$${\boldsymbol{x}}_{t}{:=}\left(\begin{matrix}\hfill {\pi }_{t}\hfill \\ \hfill {\hat{r}}_{t}\hfill \\ \hfill {U}_{t}\hfill \\ \hfill \mathrm{ln}\,{Y}_{t}\hfill \end{matrix}\right),\quad {\boldsymbol{y}}_{t}{:=}\left(\begin{matrix}\hfill {\pi }_{t}\hfill \\ \hfill {\hat{r}}_{t}\hfill \\ \hfill {U}_{t}\hfill \\ \hfill {\Delta}\mathrm{ln}\,{Y}_{t}\hfill \end{matrix}\right)$$The sample period of {yt} is 1948Q1–2018Q4 (284 observations). Table 2 shows summary statistics of {yt} and {Δyt} multiplied by 100. Note that {πt}, {r̂t}$\left\{{\hat{r}}_{t}\right\}$, and {Δln Yt} are quarterly rates (not annualized).Table 2:Summary statistics.VariableMin.1st qu.MedianMean3rd qu.Max.100πt−2.320.390.750.841.103.98100r̂t$100{\hat{r}}_{t}$−3.65−0.240.230.190.552.69100Ut2.644.755.725.957.0711.28100Δln Yt−2.630.310.760.781.253.85100Δπt−3.85−0.260.00−0.010.281.98100Δr̂t$100{\Delta}{\hat{r}}_{t}$−1.95−0.270.020.010.253.81100ΔUt−1.03−0.20−0.070.000.121.79100Δ2 ln Yt−2.79−0.71−0.020.000.624.705.2Preliminary analysesAs preliminary analyses, we check if {yt} is indeed I(1), though our method is valid even if some series in {yt} are I(0). For each I(0) series, we treat it as an I(1) series cointegrated with itself, and increase the cointegrating rank by 1 as ‘pseudo’ cointegration; see Fisher, Huh, and Pagan (2016, sec. 2.2).Table 3 shows the results of the ADF and ADF-GLS tests for unit root, with or without constant and/or trend terms in the ADF regression.We use gretl 2021a for our preliminary analyses.The results depend on the number of lags included in the ADF regression, since the ADF test suffers from size distortion with short lags and low power with long lags. The ADF-GLS test mitigates the problem except when there is no constant nor trend term in the ADF regression, in which case the ADF test is asymptotically point optimal. The level 0.05 ADF-GLS test rejects H0:yt,i∼I(1)${H}_{0}\;:\left\{{y}_{t,i}\right\}\sim \mathrm{I}(1)$in favor of H1:yt,i∼I(0)${H}_{1}\;:\left\{{y}_{t,i}\right\}\sim \mathrm{I}(0)$for {Δln Yt} (and possibly for {πt}, whose results are mixed). Hence these unit root tests do not support our assumption that {yt} is I(1).Table 3:Unit root tests.VariableConst.TrendADFADF-GLSLagsτp-valueLagsτπtYesYes4−3.730.024−3.40∗∗r̂t${\hat{r}}_{t}$YesYes7−3.890.015−2.70∗UtYesYes13−2.870.1712−2.19Δln YtYesYes11−5.780.001−8.30∗∗∗πtYesNo4−3.690.0012−0.75r̂t${\hat{r}}_{t}$YesNo7−3.900.005−1.58UtYesNo13−3.010.0312−1.49Δln YtYesNo2−8.300.001−6.36∗∗∗ΔπtYesNo3−11.880.000−13.90∗∗∗Δr̂t${\Delta}{\hat{r}}_{t}$YesNo8−8.720.000−21.61∗∗∗ΔUtYesNo12−5.210.000−7.47∗∗∗Δ2 ln YtYesNo14−7.520.000−25.16∗∗∗ΔπtNoNo3−11.900.00Δr̂t${\Delta}{\hat{r}}_{t}$NoNo8−8.750.00ΔUtNoNo12−5.220.00Δ2 ln YtNoNo14−7.530.00For the ADF-GLS test, ∗, ∗∗, and ∗∗∗ denote significance at the 10%, 5%, and 1% levels, respectively. For the number of lags included in the ADF regression, we use the default choice in gretl 2021a with maximum 15, where the lag order selection criteria are AIC for the ADF test, and a modified AIC using the Perron and Qu (2007) method for the ADF-GLS test. With no constant nor trend in the ADF regression, the ADF test is asymptotically point optimal; hence the ADF-GLS test is unnecessary.Table 4 shows the results of the KPSS stationarity tests, with or without a trend term. The results depend on the lag truncation parameter for the Newey–West estimator of the long-run error variance. With a trend term, the level 0.05 KPSS test rejects H0 : {yt,i} ∼ I(0) in favor of H1 : {yt,i} ∼ I(1) except for {Δln Yt}. With no trend term, the test rejects H0 : {yt,i} ∼ I(0) in favor of H1 : {yt,i} ∼ I(1) except for {r̂t}$\left\{{\hat{r}}_{t}\right\}$. Though the results for {r̂t}$\left\{{\hat{r}}_{t}\right\}$and {Δln Yt} are mixed, these stationarity tests support our assumption that {yt} is I(1).Table 4:KPSS stationarity tests.VariableTrendLMπtYes0.53***r̂t${\hat{r}}_{t}$Yes0.32***UtYes0.23***Δln YtYes0.03πtNo0.58**r̂t${\hat{r}}_{t}$No0.34UtNo0.63**Δln YtNo0.48**ΔπtNo0.03Δr̂t${\Delta}{\hat{r}}_{t}$No0.04ΔUtNo0.07Δ2 ln YtNo0.01∗∗ and ∗∗∗ denote significance at the 5 and 1% levels, respectively. The lag truncation parameter for the Newey–West estimator of the long-run error variance is 5 (the default value for our sample length in gretl 2021a).Overall, these unit root and stationarity tests confirm that {Δyt} is I(0), but are inconclusive if each component of {yt} is I(1). We can still proceed with our assumption that {yt} is I(1), allowing for ‘pseudo’ cointegration.As preliminary analyses, we also check if {yt} is CI(1,1). Table 5 shows the results of the Engle–Granger cointegration tests, with or without a trend term in the cointegrating regression and for each choice of the left-hand side (LHS) variable. The results depend on the number of lags included in the ADF regression for the residual series, but the level 0.05 test rejects the null of no cointegration in favor of the alternative of cointegration only when we put Δln Yton the LHS of the cointegrating regression.Table 5:Engle–Granger cointegration tests.LHSTrendτp-ValueπtYes−3.790.21r̂t${\hat{r}}_{t}$Yes−3.580.30UtYes−3.280.45Δln YtYes−8.330.00πtNo−3.600.16r̂t${\hat{r}}_{t}$No−3.570.17UtNo−3.190.32Δln YtNo−7.850.00The number of lags included in the ADF regression is 4 (the default value for our sample length in gretl 2021a).Table 6 shows the results of Johansen’s cointegration tests, with unrestricted constant and restricted trend terms in the VECM. The results depend on the number of lags included in the VECM, but overall, the tests suggest that the cointegrating rank is 2, possibly including ‘pseudo’ cointegration.Table 6:Johansen’s cointegration tests (unrestricted constant and restricted trend in the VECM).RankTracep-Valueλ-maxp-Value0142.770.0083.000.00159.780.0041.810.00217.970.3512.000.4335.970.475.970.48The number of lags included in the VECM is 4 (the default value for our sample length in gretl 2021a).Since the results of these cointegration tests are mixed, we estimate VECMs with different cointegrating ranks, and use the Bayes factor to choose the cointegrating rank.5.3Model specificationFor our data, N ≔ 4. To select p, we fit VAR models to {yt} up to VAR(8), i.e., p = 7, and check model selection criteria. The common estimation period is 1950Q1–2018Q4. Table 7 summarizes the results of lag order selection.The standard lag order selection criteria are valid even when {yt} is I(1); see Kilian and Lütkepohl (2017, pp. 99–100).The level 0.05 LR test fails to reject H0 : {yt} ∼ VAR(7) against H1 : {yt} ∼ VAR(8) and AIC selects VAR(7), but BIC and HQC select much smaller models. Since a high-order VAR model covers low-order VAR models as special cases, to be conservative, we choose a VAR(8) model for {yt}, i.e., p = 7, and impose a shrinkage prior on the VAR coefficients in the VECM.Table 7:Lag order selection.LagLog-likLRp-ValueAICBICHQC14400.07−31.71−31.40−31.5824752.06703.980.00−34.15−33.62∗−33.9334782.5661.010.00−34.25−33.52−33.96∗44800.3535.580.00−34.26−33.32−33.8854814.3327.960.03−34.25−33.09−33.7964834.2639.870.00−34.28−32.91−33.7374855.6042.670.00−34.32∗−32.74−33.6884868.1525.100.07−34.29−32.51−33.58For AIC, BIC, and HQC, ∗ denotes the selected model. The LR test statistic for testing H0 : {yt} ∼ VAR(p − 1) versus H1 : {yt} ∼ VAR(p) follows χ2(16) under H0.For the prior on α, we set α0 ≔ 0Nand Q0,α≔ IN. For the prior on μ, we set μ0≔μ̂${\boldsymbol{\mu }}_{0}{:=}\hat{\boldsymbol{\mu }}$, where μ̂$\hat{\boldsymbol{\mu }}$is the sample mean of {Δyt}, and Q0,μ≔ IN.Following Kadiyala and Karlsson (1997) and Bańbura, Giannone, and Reichlin (2010), we setM0≔ON×pND0≔diag(1,…,p)2⊗diag(s1,…,sN)2k0≔N+2S0≔(k0−N−1)diag(s1,…,sN)2\begin{align*}\hfill {\boldsymbol{M}}_{0}& {:=}{\mathbf{O}}_{N\times \mathrm{p}N}\hfill \\ \hfill {\boldsymbol{D}}_{0}& {:=}\enspace \mathrm{diag}{(1,\dots ,p)}^{2}\otimes \enspace \mathrm{diag}{({s}_{1},\dots ,{s}_{N})}^{2}\hfill \\ \hfill {k}_{0}& {:=}N+2\hfill \\ \hfill {\boldsymbol{S}}_{0}& {:=}({k}_{0}-N-1)\mathrm{diag}{({s}_{1},\dots ,{s}_{N})}^{2}\hfill \end{align*}where for i = 1, …, N, si2${s}_{i}^{2}$is an estimate of var(ut,i) based on the univariate AR(p + 1) model with constant and trend terms for {yt,i}.For the prior on Λ, we set η0 ≔ 1 and G0 ≔ IN. For the prior on Γ, we set τ0 ≔ 1, which gives the flat prior on the cointegrating space.For the prior on ν, we set A0 ≔ 1 and B0 ≔ 1, i.e., ν ∼ χ2(1); hence the tightness hyperparameter on the VAR coefficients tends to be small, implying potentially mild shrinkage toward M0 ≔ ON×pN.Overall, our priors are weakly informative in the sense of Gelman et al. (2014, p. 55).5.4Bayesian computationWe run our Gibbs sampler on R 4.0.5 developed by R Core Team (2021). We use the ML estimate of (Φ, P, Λ, Γ) for their initial values.The urca package for R is useful for ML estimation of a VECM.With poor initial values, the restriction that the eigenvalues of Alie inside the unit circle does not hold, and the iteration cannot start. Hence the choice of initial values seems important when one applies the B–N decomposition. Once the iteration starts, the restriction rarely binds for our sample.To check convergence of the Markov chain generated by our Gibbs sampler to its stationary distribution, we perform convergence diagnoses discussed in Robert and Casella (2009, ch. 8) and available in the coda package for R. Given the diagnoses, we discard the initial 1000 draws, and use the next 4000 draws for our posterior inference. At significance level 0.05, Geweke’s Z-score rejects equality of the means of the first 10% and the last 50% of the posterior sample (the default choice in the coda package) for 6 out of 141 identified parameters (applying linear normalization to Γ). Since the rejection rate is close to the significance level, 1000 draws are enough as burn-in. The minimum effective sample size among the 141 identified parameters is 238. Gelman et al. (2014, p. 267) write, ‘…100 independent draws are enough for reasonable posterior summaries.’ Hence 4000 draws are enough for posterior inference.To select the cointegrating rank, we set p ≔ 7, assume the above priors, and compute the S–D density ratios for r = 0 versus r = 1, 2, 3, respectively, from which we derive the posterior probabilities of r = 0, 1, 2, 3, assuming the flat prior on r. We find that the posterior probability of r = 2 is often numerically 1, which is consistent with the results of Johansen’s cointegration tests.The S–D density ratio may be inaccurate if the marginal posterior pdf of Λat ON×ris extremely small. In such a case, one should check the result by running the Gibbs sampler multiple times. Wagenmakers et al. (2010, p. 182) write, ‘the qualitative conclusion is evident and reliable … even if the quantitative result is not.’Thus we set r ≔ 2 in the following analysis.5.5Empirical resultsFigure 2 plots the actual rates and our point estimates (posterior medians) of the natural rates (or their permanent components) of the four variables. For ease of comparison of the actual and natural rates, we omit error bands for the natural rates, which are identical to those for the gaps. Figure 3 plots our point estimates of the gaps and their 95% error bands. Since a VECM has both the level and difference series, to avoid confusion in estimation, we do not annualize quarterly rates. To compare our results with previous works that use annualized quarterly rates, one must multiply our inflation and interest rates by 4 or 400.Figure 2:Actual rates (solid red) and the posterior medians of the natural rates (dashed green). The shaded areas are the NBER recessions.Figure 3:Posterior medians of the gaps and their 95% error bands (posterior 0.025- and 0.975-quantiles). The shaded areas are the NBER recessions.Our estimate of the natural rate of inflation is smoother than typical univariate estimates of trend inflation in the CPI; e.g., Faust and Wright (2013, p. 22). Our estimate looks close to a recent estimate of trend inflation in the US CPI in Morley, Piger, and Rasche (2015, p. 894) based on a bivariate UC model for the inflation and unemployment rates, which assumes independent shocks to the trend and gap components and allows for structural breaks in the variances of these shocks. Our estimate is more volatile, however, than a recent estimate of trend inflation in the PCE price index by Chan, Clark, and Koop (2018, p. 21) that uses information in survey inflation expectations. Hwu and Kim (2019) estimate trend inflation in the PCE price index using a univariate UC model with correlated shocks to the trend and gap components, Markov-switching gap persistence, and Markov-switching association between inflation and inflation uncertainty. Without the last feature, their estimate looks close to ours; see Hwu and Kim (2019, p. 2314).Our estimate of the natural rate of interest is more volatile than a recent estimate by Del Negro et al. (2017, p. 237) based on a VAR model with common trends (or a multivariate UC model with independent shocks to the trend and gap components and a factor structure for the trend components) for short- and long-term interest rates, inflation and its survey expectations, and some other variables. Their estimate is smooth partly because they impose tight priors on the variances of the shocks to the trend components. With the ‘loosest possible prior’, their estimate is as volatile as ours; see Del Negro et al. (2017, p. 272).Their ‘loosest possible prior’ on the variance–covariance matrix of the shocks to the trend components is an inverse Wishart prior with the minimum degree of freedom that gives a finite mean; see Del Negro et al. (2017, p. 271).Despite different definitions of the natural rate, their estimate based on a DSGE model looks close to ours; see Del Negro et al. (2017, p. 237).Del Negro et al. (2017) clearly distinguish the natural rate and its low-frequency component, estimating the former by a DSGE model and the latter by a VAR model. To construct the real interest rate, they use the PCE inflation rate instead of the CPI inflation rate; hence their real interest rate is on average slightly higher than ours.Estimates by Laubach and Williams (2016, p. 60), Holston, Laubach, and Williams (2017, p. S61), and Lewis and Vazquez-Grande (2019) are close to trend output growth by construction; hence their estimates are quite different from those by Del Negro et al. (2017) and ours, especially before 1980.Holston, Laubach, and Williams (2017, p. S63) writes,…we assume a one-for-one relationship between the trend growth rate of output and the natural rate of interest, which corresponds to assuming σ = 1 in Eq. (1).where Eq. (1) in their paper is the consumption Euler equation in a steady state.Our estimate of the natural rate of unemployment looks more volatile than a recent estimate by Morley, Piger, and Rasche (2015, p. 898), obtained as a by-product of estimation of trend inflation. A possible reason for the difference is that they assume independent shocks to the trend and gap components, which may not hold in practice. The two estimates may coincide if one allows for dependence between the shocks, as Morley, Nelson, and Zivot (2003) show for the univariate trend-cycle decomposition. Despite the difference in volatility, our estimate of the unemployment rate gap looks close to that by Morley, Piger, and Rasche (2015, p. 901) in terms of the sign and magnitude. Recent estimates of the natural rate and gap by Crump et al. (2019, pp. 176, 180) are as smooth as those by Morley, Piger, and Rasche (2015), perhaps partly because they also assume independent shocks to the trend and gap components.Crump et al. (2019) distinguish the trend unemployment rate ū$\bar{u}$and the natural rate of unemployment u*. They construct ū$\bar{u}$from estimates of the trend job separation and finding rates based on a multivariate UC model with a factor structure for the rates across demographic groups, assuming independent shocks to the trend and gap components. Treating ū$\bar{u}$as exogenous and relying on a New Keynesian (wage) Phillips curve, they estimate u* using the CPI, survey inflation expectations, and various wage data, again assuming independent shocks to the natural rate and gap. Allowing for dependence between these shocks seems an interesting and important topic for future work.Our estimate of the output gap is at most about ±5% of the output level, and looks close to a recent estimate by Morley and Wong (2020, p. 9) based on a large Bayesian VAR(4) model with 23 variables. Our estimate also looks close to their estimate based on a Bayesian VAR(4) model with four variables using output growth, the unemployment rate, the CPI inflation rate, and the growth rate of industrial production, assuming that they are all I(0); see Morley and Wong (2020, p. 8). Though output growth may or may not be I(0) in the US, it may be clearly I(1) in some countries or regions, in which case their method may give an unreasonable estimate of the output gap with a strong upward or downward trend, as Murasawa (2015) shows for the Japanese data.Figure 4 plots the posterior probability of positive gap for the four variables. This probability index is useful if the sign of the gap is of interest. Even if the 95% error band for the gap covers 0, the posterior probability of positive gap may be close to 0.025 or 0.975. Indeed, the probability index is often below 0.25 or above 0.75; hence we are often quite sure about the sign of the gap. Moreover, Figure 4 shows the relation between the gaps more clearly than Figure 3.Figure 4:Posterior probability of positive gap. The shaded areas are the NBER recessions.Figure 5 shows the relation between the gaps more directly. The left panels are the scatter plots of the posterior medians of the gaps in each quarter. The right panels are the posterior pdfs of the correlation coefficients between the gaps. We see that the Phillips curves and Okun’s law hold between the gaps, though we do not impose such relations. Thus our estimates of the gaps seem mutually consistent from a macroeconomic point of view.Figure 5:Phillips curves and Okun’s law. The left panels are the scatter plots of the posterior medians of the gaps in each quarter. The right panels are the posterior pdfs of the correlation coefficients between the gaps.Figure 6 shows the relation between the trend output growth rate Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and the natural rate of interest rt*${r}_{t}^{\ast }$, i.e., the dynamic IS curve; see Eq. (1). The left panel is the scatter plot of the posterior medians of Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and rt*${r}_{t}^{\ast }$in each quarter. The right panel is the posterior pdf of the slope coefficient, i.e., 1/σ in Eq. (1), obtained by regressing Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$on rt*${r}_{t}^{\ast }$for each posterior draw of {Δ⁡lnYt*,rt*}$\left\{{\Delta}\mathrm{ln}\,{Y}_{t}^{\ast },{r}_{t}^{\ast }\right\}$. On average, the slope is positive as expected, but the evidence is weak. The result is similar to Hamilton et al. (2016) and Lunsford and West (2019), who study the determinants of rt*${r}_{t}^{\ast }$and find positive but low and insignificant long-run correlations between Δln Ytand rtin the postwar US.Figure 6:Dynamic IS curve. The left panel is the scatter plot of the posterior medians of Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$and rt*${r}_{t}^{\ast }$in each quarter. The right panel is the posterior pdf of the slope coefficient obtained by regressing Δ⁡lnYt+1*${\Delta}\mathrm{ln}\,{Y}_{t+1}^{\ast }$on rt*${r}_{t}^{\ast }$for each posterior draw of {Δ⁡lnYt*,rt*}$\left\{{\Delta}\mathrm{ln}\,{Y}_{t}^{\ast },{r}_{t}^{\ast }\right\}$.5.6Comparison of alternative model specificationsFigure 7 compares point estimates of the gaps under three alternative assumptions, i.e., I(1) log output, I(2) log output with no cointegration, and I(2) log output with cointegration.For the first two cases with no cointegration, we assume VAR(7) models for the stationarized series, and use the same prior and Gibbs sampler as those for our VECM (except for the loading and cointegrating matrices).For ease of comparison, we omit error bands here.Figure 7:Posterior medians of the gaps assuming I(1) log output (solid red), I(2) log output with no cointegration (dashed green), and I(2) log output with cointegration (longdash blue). The shaded areas are the NBER recessions.The result assuming I(1) log output is similar to that in Murasawa (2014, p. 513), who assumes a VAR(12) model with no constant term for the centered stationarized series, and chooses the tightness hyperparameter by the empirical Bayes method. In contrast to the result for Japan in Figure 1, for the US, the multivariate B–N decomposition assuming I(1) log output gives a ‘reasonable’ estimate of the output gap that fluctuates around 0. However, the output gap is persistently positive since 2010, which may not be sensible.Assuming I(2) log output changes the estimate of the output gap. However, it hardly changes the estimates of other gaps, similar to the result for Japan in Murasawa (2015), which makes sense because the B–N transitory components are ‘forecastable movements’, and whether log output is I(1) or I(2), the two VAR models use the information in log output for forecasting other variables in almost the same way.A VAR(p) model with Δ2⁡lnYt$\left\{{{\Delta}}^{2}\mathrm{ln}\,{Y}_{t}\right\}$includes Δ2 ln Yt−p≔ Δln Yt−p− Δln Yt−p−1 as a predictor and hence use the information in Δln Yt−p−1, but a VAR(p) model with {Δln Yt} does not. If p is large enough, then the coefficient on Δln Yt−p−1 tends to be small, especially with our shrinkage prior on the VAR coefficients.The output gap now keeps fluctuating around 0 since 2010, perhaps because I(2) log output captures a possible structural break in the mean output growth rate after the 2008 Great Recession.Allowing for cointegration gives much bigger estimates of the gaps for all variables. The result follows partly because a VECM uses more predictors, i.e., the error correction terms, than a VAR model; see Evans and Reichlin (1994). However, the result also implies that for all variables, the coefficients on the error correction terms are large enough, if not significantly different from 0, to obtain such different estimates of the gaps. These bigger estimates of the gaps, or the associated estimates of the natural rates, are often close to other recent estimates that focus on a particular natural rate or gap, as already noted.6DiscussionThe dynamic IS equation implies that if the real interest rate is I(1), then so is the output growth rate with possible cointegration, and log output is I(2). We extend the multivariate B–N decomposition to such a case, and apply Bayesian analysis to obtain error bands for the components. To implement this idea, we consider a steady state VECM with a hierarchical prior on the VAR coefficients, and develop a Gibbs sampler for posterior simulation of the parameters and the components. Applying the method to US data with weakly informative priors, we obtain a reasonable joint estimate of the natural rates (or their permanent components) and gaps of output, inflation, interest, and unemployment.The B–N decomposition assuming I(1) log output may give a trending estimate of the output gap if the mean output growth rate has structural breaks. Assuming I(2) log output and hence I(1) output growth rate, we introduce a stochastic trend in the output growth rate, which captures possible structural breaks in the mean output growth rate. Thus the B–N decomposition assuming I(2) log output gives a reasonable estimate of the output gap that fluctuates around 0. Since the B–N transitory components are ‘forecastable movements’, and a VECM forecasts no worse than a VAR model, allowing for cointegration gives larger estimates of the gaps for all variables. Since we can treat an I(0) series as an I(1) series cointegrated with itself, our method is valid even if the output growth rate is I(0), or log output is I(1).The B–N decomposition has some advantages over other approaches:1.The B–N decomposition relies on a reduced-form forecasting model; hence it requires fewer assumptions than the decomposition based on a structural model.2.The B–N decomposition requires no state smoothing if the state vector is observable as in our VECM; hence it can handle a large state vector more easily than the decomposition based on a UC model, which involves latent variables.Thus the multivariate B–N decomposition is useful for estimating the natural rates and gaps of many variables jointly. Most, if not all, previous works focus on the natural rate or gap of one variable, using other variables only to improve estimation. We estimate the natural rates and gaps of four variables, which are equally of interest.Since a reduced-form VECM is the most basic forecasting model for cointegrated series, our method gives a benchmark joint estimate of the natural rates and gaps of major macroeconomic variables. One can compare this benchmark estimate with alternative estimates based on other forecasting models or DSGE models such as those in Del Negro et al. (2017). Our method is useful especially for non-US data, where log output is often clearly I(2). Application to non-US data is an interesting and important topic for future work.One can possibly refine our estimate in two ways. First, one can use a larger VECM. Second, one can introduce nonlinearity into our VECM; e.g., Markov-switching, stochastic volatility, or more general time-varying parameters. Since the B–N decomposition may not apply to nonlinear models, and nonlinear models may become unnecessary with more variables, the first direction seems more promising. The following four variables are of particular interest:1.PCE price index in addition to the CPI, which makes our results comparable to various previous works that use whichever price index.2.long-run (LR) interest rate in addition to the short-run (SR) interest rate, which gives an estimate of the trend term premium, or the natural yield curve; cf. Brzoza-Brzezina and Kotłowski (2014) and Imakubo, Kojima, and Nakajima (2018)3.nominal yields on Aaa and Baa corporate bonds, which give estimates of the trend liquidity premium and the trend safety premium, or the trend liquidity and safety convenience yields; cf. Del Negro et al. (2017)Our VECM allows for cointegration among I(1) series, but does not allow for cointegration among I(2) series nor multicointegration (polynomial cointegration). Such extensions are useful if we have multiple I(2) series; e.g., log output and the price level (instead of the inflation rate). We leave such extensions for future work.Lastly, to obtain a monthly instead of quarterly joint estimate of the natural rates and gaps of output and other variables, it seems straightforward to extend the B–N decomposition of mixed-frequency series proposed by Murasawa (2016) to I(1) and I(2) series with cointegration.

Journal

Studies in Nonlinear Dynamics & Econometricsde Gruyter

Published: Jun 1, 2022

Keywords: natural rate; output gap; trend-cycle decomposition; trend inflation; unit root; vector error correction model (VECM)

There are no references for this article.