TY - JOUR AU - Tremayne, A., R. AB - Summary Many macroeconomic time series exhibit non‐stationary behaviour. When modelling such series an important problem is to assess the nature of this non‐stationary behaviour. Initial interest centred on two types of linear non‐stationary models, namely those for which the removal of a trend induces stationarity and those for which taking the first difference produces a stationary series. The latter are referred to as unit root models. More recently, other models such as state space models have proved popular. The paper suggests a technique of exploratory data analysis that helps to shed light on the two types of linear non‐stationarity. It is a Bayesian estimative procedure, generally using the exact likelihood. A contour plot of the joint posterior density of interest, rather than a (possibly large) sample from this density that could be obtained from a Monte Carlo Markov chain approach, is advocated. We propose a useful graphical template that can be gainfully employed at the initial stages of data investigation. It also indicates clearly when traditional difference/trend stationary models should not be considered further for data. Application of this graphical device to artificial series and real data provides insight into inadequacies of more usual conditional forms of analysis where different types of non‐stationarity are considered. Exemplars include cases where the bivariate plot leads to indications of non‐stationary, and possibly non‐linear, data generating mechanisms that may not conventionally occur to the empirical modeller. 1. Introduction It has long been appreciated that many macroeconomic time series are non‐stationary. Since the seminal work of Nelson and Plosser (1982) a huge literature has burgeoned on the topic. Often, researchers seek to characterize the dynamic properties of their series by determining whether a stochastic or a deterministic trend model is suitable. But many other models are also available, see for example Koop and van Dijk (2000) for an interesting use of state space models. In this paper we develop and illustrate an exploratory data analysis tool that can be gainfully used before proceeding to more formal modelling and hypothesis testing. Our methodology can provide early evidence for many series of whether or not it may be reasonable to scrutinize stochastic or deterministic trend models closely. Similarly, clear evidence is often provided when the indications are that these models are misspecified for the data at hand. The approach developed is Bayesian, uses proper priors and is estimative in nature. Maddala and Kim (1998, Ch. 8) provide a careful summary of the literature on Bayesian analysis of stochastic trends. We use a structural form discussed by Lubrano (1995) (among others) and develop a graphical approach based on the marginal joint posterior density of unit root and deterministic trend parameters. We explicitly do not exclude the possibility of explosive non‐stationary behaviour. By considering the graph of this joint posterior, it is often possible to obtain quite firm indications about the issues raised in the first paragraph. The model employed, and its associated likelihood, are discussed in Section 2. Section 3 presents the Bayesian analysis and a discussion of the graphical approach, together with presentations based on simulated data from known data generating mechanisms. In Section 4 we illustrate the approach in the context of the familiar data set first discussed by Nelson and Plosser (1982), henceforth NP. 2. The Model 2.1. Model specification The basic model we consider derives from the work of Bhargava (1986) (see also Davidson and MacKinnon (1993)). As extended by Lubrano (1995)and Geweke (1994), it can be rewritten as (2.1) This is a structural form of the type investigated by Schotman and van Dijk (1991). We note in passing that this is similar to the model considered by Ahn (1993) in developing a Lagrange multiplier test; see also Schmidt and Phillips (1992). The reduced form of (2.1) is (2.2) which clearly involves non‐linearity in both ρ and δ. This model has also been used in a Bayesian context by Hoek (1997, Section 4.2). Lubrano (1995) simplifies this expression by replacing δ with an estimator in the terms (Δyt−i−δ). 2.2. Specification of the likelihood In order to obtain the likelihood for sample data we use the well known decomposition of the likelihood (2.3) where we denote the, as yet unspecified, vector of parameters by . In (2.3) the notation denotes the first j elements of . It is clear from (2.2) that the first p terms on the right hand side of (2.3) will depend on the unobserved values , which must be specified in some way. Naylor and Marriott (1996) have shown that the exact likelihood can be obtained by treating these unobserved values as additional parameters and then marginalizing the resulting likelihood by integration. If we now write with , then, under normality, each term on the right‐hand side of (2.3) is (2.4) The formulation of the problem does not constrain the εt to be normally distributed. An alternative might be to model εt /σ as Student's t with ν degrees of freedom and to then either specify a value for ν or include it as a parameter to be estimated. Such an approach may have fruitful application in empirical finance where fat‐tailed distributions are commonplace. Given the joint prior for , the joint posterior for ρ and δ can be obtained under the unconditional likelihood by integrating out the unobserved values and the other elements of the parameter vector. A different approach to inference is to use Monte Carlo Markov chain (MCMC) techniques to generate large samples from the posterior density. In the context of (2.1), Hoek (1997) has adopted this alternative methodology, though his treatment of the initial values does not lead to the exact likelihood in our sense. Hoek takes the first pre‐sample value, u0, as and sets all other pre‐sample values to zero (Hoek 1997, p. 33). This has the effect of treating the single pre‐sample value u0 as a parameter, with the specified normal distribution as a prior. In this paper we follow a numerical, or deterministic (Koop and van Dijk 2000) integration strategy in order to plot contours of the exact joint posterior of interest. Clearly, the MCMC sampling‐based approach alluded to above could be undertaken, but without a very intensive investigation, contour plots of the posterior density similar to those we obtain would be unattainable. Rather a ‘cloud’ of contour points would be obtained and the availability of the exact posterior density would not be utilized. In order to obtain contours that even approach the precision available to us, extremely large MCMC samples may be needed. In this case the exact joint posterior density is available and we have chosen to employ it rather than attempt to take, and use, a sample from it. Koop and van Dijk (2000, Section 1) presents a useful discussion of alternative computational procedures. 3. A Bayesian Approach 3.1. The priors A Bayesian analysis of model (2.1) now follows Naylor and Marriott (1996), with the augmented parameter vector and prior of the form (3.5) We use proper densities that are not over informative but do, we feel, represent sensible choices. Of course, in any practical situation, problem‐specific expertise may lead to different priors, or non‐informative priors may be preferred. However it is straightforward to adapt the methodology to be described below in order to handle such alternative specifications. In the examples that follow, we have attempted to elicit what we believe are reasonable priors and, in the case of the important parameter ρ report the effects of varying the choice. Notwithstanding this, we emphasize that our choices in this paper are purely illustrative and that other researchers may be better able to motivate different priors. For μ and σ we follow Monahan (1983) and use the normal and inverse gamma distributions respectively. Monahan suggests values for the hyperparameters corresponding to various levels of information. The weakest of the priors given is described as ‘diffuse’ and this is the joint prior we take here, since these parameters are not central to our proposal. Though any choice of E(μ | σ) is possible, in the absence of explicit information which is rarely likely to be available it seems reasonable to set it to zero; this approach is adopted in all the examples of this paper. In seeking to elicit a proper prior for δ we realize that an econometrician who believed that the data generating process was 2.1, might not want to attach much prior weight to values of δ larger than, say, 0.1 in absolute value. However our intention is to use the graphical device described in the next section as an exploratory tool, and, as we show in Section 3.4, large values of δ can arise when the data are generated from models other than (2.1). The prior for δ is therefore taken as zero mean normal with unit variance which provides a proper prior that is not too informative when model (2.1) is appropriate, yet allows the exploratory device the freedom to indicate some situations for which (2.1) is not appropriate. A N(0, 1) prior is specified for φj, j = 1, 2,…, (p − 1), offering a proper, but not very informative prior centred on zero. The prior mean therefore corresponds to ut in (2.1) being an AR(1) process (discussed in detail below), but the prior has substantial weight on alternatives with large correlations (negative or positive) at lags greater than one as well. Viewing the history purely as parameters and regarding the process as having started at time t = 1 with this set of ‘initial values’ gives us a freedom of choice as to what aspects of the eventual model we might reflect in the prior. It is well known that the second moments of the marginal Gaussian distribution for these pre‐sample values are not defined on the boundary where ρ= 1. In view of the importance of this point in the parameter space in the present context, a choice of prior for that circumvents the problem is required. The second moments of Student's t distributions with 0 < ν < 3 degrees of freedom are not defined, but such distributions are symmetrical and proper. This family of distributions can therefore provide candidates for our choice of prior for the pre‐sample values. The specific choice we make here for is a multivariate Student t with ν= 2 degrees of freedom and with location and scale and respectively: (3.6) with This choice models several aspects of in an appealing manner. The non‐finite second moments on the unit root boundary have been accommodated, in that the contribution of to the likelihood is well defined for all values of ρ; this would not otherwise be the case when ρ= 1. Actual prior beliefs about the process can be modelled by choice of ; we view as a natural choice given the structural form (2.1). Some control over the plausible range of values of can be exercised via σ0. If an expert witness cannot offer a good constant value for σ0 a natural choice would seem to be σ0=σ to at least ensure sensible scaling in relation to the variation in the data. Finally our choice of reflects the a priori mean AR(1) process referred to above in that we choose k = E(ρ), the prior mean for ρ. Zivot (1994) points out that inappropriate treatment of the initial value, x0 in his notation, in models where it is unusually large can lead to incorrect posterior inference. He further observes that `one way to eliminate the dependence of the posteriors on x0 is to treat x0 as stochastic', as we do here. (Indeed, we have observed for simulated data that our graphical inference is robust whether the initial observations are generated as Gaussian or even as t2, with or without dependence structure.) Our general approach is to try to impose as little prior structure as practicable while retaining proper priors in order to allow the data to `speak for itself' by means of the graphical representation of the joint posterior of δ and ρ, prior specifications for which are discussed below. It should be reiterated that in real applications a researcher has freedom of choice with respect to these hyperparameters and the functional forms of the prior distributions. A considerable debate in the literature surrounds the prior specification adopted for the `unit root parameter' ρ. In a paper concerned with classical confidence intervals, Stock (1991) argues that one advantage of such an approach is that it sidesteps the debate over priors; see also Andrews (1993). The classical approach is equivalent to allocating equal probability to negative and positive values of ρ. In our view, if a researcher is seriously considering models like 2.1 in the context of data which may have stochastic trends, they are likely to regard negative values of ρ with extreme scepticism. In the Bayesian literature Phillips (1991a) adapted a Jeffreys prior as his `critics prior' and argued that this was more appropriate than a flat prior for considering unit roots. The ignorance priors suggested by Phillips put considerable weight on values of ρ in excess of unity; this point is well illustrated in Phillips (1991a, Figure 1). We use an adjusted version of this prior in what follows. Schotman and van Dijk (1991), Uhlig (1994) and Zivot (1994) all consider Jeffreys priors for different treatments of the initial value, y0. Schotman (1994) compares these priors with the maximal data information priors of Zellner (1977). Other authors consider proper priors and Poirier (1991) uses a normal distribution centred on the stationary boundary and with a point mass at ρ= 1. Berger and Yang (1994) (whose proposal is implemented below) adapt a reference prior (Bernardo 1979) for the AR(1) model and Lubrano (1995) uses a degenerate beta density. Various sensible proper priors may be available for ρ. The default one used in this paper is a normal prior with a small prior belief in an explosive model as a subjective econometrician's prior. Non‐stationary values are possible under this prior but we do not wish to put point probability mass on any point value of ρ, including ρ= 1. This prior is, therefore, chosen to place most of the probability in the range 0.70–1.05 with P (ρ > 1) ≈ 0.05; we use ρ∼ N[0.863, (0.083)2 ]. One might envisage an econometrician using this prior when the time series plot of the data is suggestive of autoregressive behaviour exhibiting substantial persistence. However, in view of the considerable evidence of the effect of prior choice on posterior inferences in this problem, and mindful of the important issues raised by Stock (1991, Section 5) and others, we undertake sensitivity analyses. For instance, when the default prior for ρ seems inappropriate for the data at hand a more diffuse specification of ρ∼ N[0.5, (0.25)2 ] is used. Many other authors have debated the merits of different choices for the prior of the autoregressive parameter and it seems sensible to consider them in the current context. We, therefore, also entertain the priors suggested in Phillips (1991a) and Berger and Yang (1994) for comparative purposes. The former of these is given by where This prior, as originally advanced, puts very substantial weight on a prior value greater than unity. To attenuate this possibly undesirable feature, Phillips (1991b) and Zivot and Phillips (1994) propose the use of an exponential damping factor for this purpose and this device is employed here. Berger and Yang's symmetrized reference prior is This is based on a reference prior for the stationary region (where it corresponds to Jeffreys prior) and symmetrized in the non‐stationary region with equal probability assigned to each. In both cases we restrict the range of values of ρ considered to [0.55, 1.05]. Inference about is obtained via the posterior density where is a normalizing constant. The inference available from the posterior density may not be in readily assimilated form. For example may be of high dimension, so that simple plots of the posterior density are not possible. In this case, plots of selected marginal posterior densities where is in some one‐or two‐dimensional sub‐space, and is its complement in , may be helpful. Another convenient summary is provided by posterior moments, for example The prior means are often used as Bayes point estimates, while second order moments reveal something of the posterior correlation structure. Such posterior correlations may well arise as a consequence only of the data, having not been specified a priori. This will generally be the case for correlations between ρ and the φ's, which are assumed to be zero in our prior but will usually be non‐zero in the posterior. Of course, computation of both the posterior density and of convenient summaries require evaluation of integrals, generally over several dimensions. The integrals needed for this cannot be found analytically and so some computational method is needed. For p ≤ 3 there are at most nine parameters and iterative Gauss–Hermite quadrature, see for example Smith et al. (1985) and Shaw (1988), can be used efficiently. For 4 < p ≤ 6 the conditional likelihood described by (2.3) with pre‐sample values set to zero, rather than treated as parameters, could be employed with our current software. This approach is discussed in Marriott et al. (2003) to deal with significant moving average components in data as are sometimes observed with macroeconomic time series, see, inter aliaSchwert (1987) and Leybourne (1994). For all the examples given here the `Bayes Four' library, see Naylor (1991), has been used.1 This facilitates computation of , first and second‐order posterior moments, univariate and bivariate marginal posterior densities, and predictive expectations. To improve performance of this method a functional transformation of σ to log(σ) was used. The method basically uses a Gauss–Hermite rule which is re‐scaled by factors calculated from first and second order posterior moments. Values for these are determined by iterative application of the method. The system requires starting values to be provided at the outset and, particularly in time series applications, it can be difficult to obtain values that are satisfactory. A linear search algorithm based on Gauss–Hermite rules has been used in the analyses reported here and seems to work well. 3.2. Graphical analysis of the joint posterior The focus in the present paper is directed towards δ (deterministic trend parameter) and ρ (autoregressive or stochastic trend parameter), although, in principle, a researcher could concern themselves with any two elements of . Here the exploratory data inference is generally based on the joint marginal posterior density (3.7) where denotes the entire parameter vector except for the trend parameters δ and ρ. Contour plots of this density provide a helpful graphical tool for distinguishing between stationary series, possibly with a time trend, and difference stationary series at the outset of any analysis. In addition they can provide an indication of whether any form of the model (2.1) can serve as an adequate specification for the series under consideration, or whether adopting such would result in prima facie model misspecification. Following from the discussion in Ahn (1993) we make the following observations about the behaviour of the process (2.1) for different values of ρ and δ. When ρ= 1Δut follows an AR (p − 1) process and yt has a stochastic trend, with Δyt following φ(L)(Δyt−δ) =εt. When ρ < 1ut is stationary AR (p) and yt does not have a stochastic trend. When δ |= 0yt has a linear time trend whether ρ= 1 or not. Figure 1 shows the ρ–δ parameter space on which we plot contours of the marginal bivariate density and the above discussion can be translated into observations that can be made on the basis of where on the graph the joint posterior density falls. Figure 1. Open in new tabDownload slide The ρ–δ parameter space Figure 1. Open in new tabDownload slide The ρ–δ parameter space (1) In the presence of a unit root, the outermost contours can be expected to lie along and on either side of the line DEF. (2) For series with no linear time trend and stationary disturbances (e.g. stationary AR(p)) the contours will be concentrated along the line EH. (3) If the contours fall within the rectangle DFGI but not on the lines DEF or EH then yt has a linear time trend and stationary disturbances. (4) Should the contours straddle the line DEF, away from the line BEH, and have a substantial probability content above DEF this would be suggestive of a non‐stationary explosive model for the time series in question. For any except very short series this conclusion is implausible. We would rather adopt the alternative interpretation that the model being used to derive the joint density is inappropriate and, hence, that a wider class of models should be entertained. These four points essentially encapsulate the spirit of the use of the proposal. We envisage the device being used at an early stage of modelling as a tool of exploratory data analysis. The plot should indicate whether: it is likely to be gainful to proceed to use (2.1); there is clear evidence of trends of either, neither or both types; and there would be misspecification if (2.1) were subsequently to be fitted. In all cases the joint posterior density should be examined in conjunction with a time series plot of the original data, but space limitations preclude this here. Clearly if there are strong indications of non‐linearity in the original series the fitting of (2.1) would not be appropriate. The purpose of our graphical device is to determine the likely admissibility or not of model (2.1) for the data at hand. Should (1), (2) or (3) arise then model (2.1) in some form may be appropriate, otherwise the search for an adequate model for the data must be renewed. Such an extended search is warranted if: there are indications of non‐linearity in the data; there is evidence in the graph of the joint posterior that (4) above applies; or there is other irregular behaviour of the graph that is unanticipated if (2.1) is appropriate. We also note that this graphical tool could also be of use to the `classical' econometrician. If the parameters μ, σ and the φi are put equal to their maximum likelihood estimates and flat priors are then used for ρ and δ, the graphical tool will produce contours of the likelihood for ρ and δ conditional on the maximum likelihood values for the other (nuisance) parameters. The discussion following Figure 1 can now be taken to apply to these likelihood contours. As a preliminary, we consider integrating out the parameter δ also, so that the plot obtained is the (univariate) marginal posterior density of ρ. This is designed to give a feel for the influence or otherwise of the prior for that parameter in determining the posterior after combination with the likelihood. Figure 2 portrays the results obtained for a realization of 100 observations from a stationary AR(1) model with parameter ρ= 0.7 using our default prior. The prior, which is the broken line, clearly does retain influence, but has largely been dominated by the likelihood component, so that the data does inform the analysis influentially. We intend that our prior specification for ρ should be a realistic one, without being too restrictive and believe that this evidence, together with that provided below, supports this. (Use of our more diffuse prior leads to a posterior inference centred at a slightly lower value than that shown, as might be expected.) Figure 2. Open in new tabDownload slide Univariate density for ρyt= 0.7yt−1+εt Figure 2. Open in new tabDownload slide Univariate density for ρyt= 0.7yt−1+εt 3.3. Inference using simulated data from (2.1) In order to illustrate the use of Figure 1 it is useful to `calibrate' the performance of the joint posterior density of ρ and δ for different simulated stationary and non‐stationary processes. We generated samples of 100 observations from each of a range of models, some of which are generated from (2.1) and others that are not, and obtained the joint posterior density arising from fitting model (2.1) in each case. Three models from (2.1) were used: For all the above cases and those to be used in Section 3.4, εt∼ N [0, (0.06)2 ]. A choice of σ= 0.06 for the innovation standard deviation was made on the basis of fitting the same model to all of the Nelson and Plosser (1982)data sets and recording a point estimate of σ in each case. The median value of these point estimates was 0.0595. In these calibration examples we used values of p = 2 and 3 for the Dickey–Fuller style augmentation in (2.1) together with the exact likelihood and generally found here the inferences from our graphical device were qualitatively unaffected by the choice of p. In view of research evidence, see inter aliaHoek (1997), Leybourne (1994) and Schwert (1987), Schwert (1989), the use of these values of p is likely to be inadequate whenever there are substantial moving average components in the data. This is a matter we take up further below. All plots were obtained using `Splus'; see Becker et al. (1988) for an introduction. The joint posterior densities of ρ and δ for (a) to (c) above are plotted, respectively, in Figures 3–5. The six contours in these and other figures are drawn at 80, 40, 20, 10, 5 and 1% of the modal height of the estimated joint posterior distribution. In the case of a joint posterior density that is spherical Gaussian these contours correspond to 20, 60, 80, 90, 95 and 99% highest posterior density (HPD) regions respectively. The graphs displayed are indicative of the results we obtained from a substantial number of realizations for each case. Figure 3. Open in new tabDownload slide yt= 0.01t + ut ; ut= ut−1+εt Figure 3. Open in new tabDownload slide yt= 0.01t + ut ; ut= ut−1+εt Figure 5. Open in new tabDownload slide yt= ut ; ut= ut−1+Δut−1+εt Figure 5. Open in new tabDownload slide yt= ut ; ut= ut−1+Δut−1+εt The joint posterior densities depicted in Figures 3–5 behave as foreshadowed in discussion of Figure 1. Consider first Figure 3 for which the spreading of the outer contours along the line DEF indicates the presence of a unit root. Notice that the mode of the joint posterior is indicative of a downward bias in the estimate of ρ. We note that Phillips (1991a), Phillips (1991b) and Zivot and Phillips (1994) use a Jeffreys prior in an attempt to offset this. The plot for data generating mechanism (a) and almost all other plots using this alternative prior look qualitatively very similar to the one given here. In Figure 3, in addition to the spreading of the outer contours along DEF, the posterior mode is around δ= 0.01 and the mass of the joint posterior distribution is well to the right of the line δ= 0 thereby indicating the presence of both a deterministic trend and a unit root. Of course, in the context of the reduced form (2.2) this takes the form of a non‐zero drift parameter in a unit root model. It is also the case that the behaviour of the contours in Figure 3 is robust both to a substantial sharpening of the marginal prior density for ρ by increasing the mean towards the unit root (retaining the small prior probability of explosive behaviour) as well as to the use of our more diffuse prior. We also used the damped Jeffreys prior (Zivot and Phillips 1994) and the symmetrized reference prior of Berger and Yang (1994). The case portrayed in Figure 3 was one of very few where the inference is qualitatively altered under the latter prior. The corresponding figure evidences a unit root, but also some unpredictable behaviour of the type that will be encountered in Section 3.4. This presumably arises because of the very substantial prior weight given to an explosive process with this prior. The Jeffreys and symmetrized reference priors, while leading generally to qualitatively similar conclusions, do often indicate a higher posterior mean for ρ, particularly when this is large. Inferences regarding δ are unaffected by varying the prior for ρ. This information provides some evidence as to the sensitivity of our procedure to choice of prior, though our belief is that it is generally fairly robust in this respect. The joint posterior density shown in Figure 4, reflects cases where no unit root is present. The contours are more compact relative to those in the previous figure. This can be seen, for example, by noting that the distance between the 10% contours on either side of the mode of the joint posterior in the δ direction in Figure 4 is about 0.004 (compared with about twice that for Figure 3). A parallel notion of `compactness' will also be used for the ρ direction below. In Figure 4 the contours are centred around δ= 0.01 (compare Figure 3), but this time all plotted contours of the joint posterior density lie below the line DEF, thus leading to the graphical inference that there is no stochastic trend. However, the contours indicate a large probability mass between ρ= 0.65 and 0.85 indicating the possibility of a stable autoregressive structure in the disturbances of (2.1). Figure 4. Open in new tabDownload slide yt= 0.01t + ut ; ut= 0.7ut−1+εt Figure 4. Open in new tabDownload slide yt= 0.01t + ut ; ut= 0.7ut−1+εt The plot in Figure 5 derives from model (c) which is evidently a process with not one but two unit roots, i.e. yt is integrated of order 2, I(2). In this case not only is ρ= 1 but there is also unit root behaviour in the first differences of (c). The figure indicates very strongly the presence of a unit root and a plot of the original data will generally be indicative of highly non‐stationary behaviour anyway. If a modeller wishes to distinguish between I(1) and I(2) processes using our graphical inference, the qualitative difference between Figures 3 and 5 may be helpful. At any event, if a data realization is I(2) and produces a figure similar to the latter, then the graph pertaining to the first differences of the data can be expected to more closely resemble the former. 3.4. Inferences using data not simulated from 2.1 In practical situations, one of a researcher's main aims will be to determine whether or not (2.1) is indeed appropriate. Of course, should (2.1) be inadequate, a vast range of other specifications could be entertained. This subsection considers artificially generated data from a range of models that might be considered particularly pertinent.2 The first data generating mechanism is which can be thought of as the reduced form arising from the structural model (2.1) with the linear trend augmented by a quadratic term. The contours of the empirical joint posterior portrayed in Figure 6 all lie above the line DEF and we would clearly reject (2.1) as a plausible generating mechanism for the data. Figure 6. Open in new tabDownload slide yt= 0.001 + yt−1+ 0.002t +εt Figure 6. Open in new tabDownload slide yt= 0.001 + yt−1+ 0.002t +εt The next two specifications involve explicitly non‐linear data generating mechanisms. The first of these is the bilinear process The arguments of Quinn (1982) indicate that parametric combinations of α and β admit different aspects of stationarity and ergodicity to these models. For instance, strictly stationary data generating mechanisms can arise even when |α| > 1.0. The situation is neatly summarized by Figure 1 of Quinn (1982, p. 251). We advocate the use of our estimative procedure only after preliminary consideration of the data. This would naturally include a time series plot. Generally speaking, whenever the variance contribution of the bilinear component to the dependence structure of yt is non‐negligible this will be evident in the time series plot and the erratic behaviour of this plot will exclude (2.1) from further consideration. However, it is still of some interest to explore the results of applying the procedure in such circumstances. Two representative examples are given here. The first of these derives from a realization where α= 0.8 and β= 22. The implied process is strictly stationary. The relevant graph is presented as Figure 7 and it is seen that while the estimated value of ρ approximates α well, the estimated value of δ is completely unreliable with our compactness measure being in excess of 1.0! This figure provides strong evidence of the inappropriateness of (2.1). The plot in Figure 8 employs α= 1.21 and β= 22 and this process is not strictly stationary. In this case the estimated value of ρ does not even provide useful information about α and we would point out in passing that the posterior mean of σ is around 11.15, compared to the (true) value of 0.06. A range of other plots is available on the web site http://dcm.ntu.ac.uk/jmm/plots.html,3but a succinct summary is that the plots seem to provide useful information about α via ρ only when α is less than 1.0 or the heteroskedastic effect introduced by the bilinear component is small. Figure 7. Open in new tabDownload slide yt= (0.8 + 22εt)yt−1+εt Figure 7. Open in new tabDownload slide yt= (0.8 + 22εt)yt−1+εt Figure 8. Open in new tabDownload slide yt= (1.21 + 22εt)yt−1+εt Figure 8. Open in new tabDownload slide yt= (1.21 + 22εt)yt−1+εt Of some interest is the non‐stationary random coefficient process of with αt∼ N [1, (0.10)2 ]. Leybourne et al. (1996) present a comprehensive discussion of these models in the context of testing for alternatives to unit root models. While some realizations give rise to even more volatile posterior joint densities, the result we portray in Figure 9 is typical of many. In this case there is substantially more probability mass above the line DEF than is evident either in Figures 3 or 4. Such unpredictable joint posterior plots suggest strongly that a wider class of models than any obtainable from (2.1) should be entertained. Figure 9. Open in new tabDownload slide yt=αt yt−1+εt Figure 9. Open in new tabDownload slide yt=αt yt−1+εt As a sensitivity analysis we used Student's t with five degrees of freedom, rather than Gaussian noise terms to generate data from each of (a) to (f) in this and the previous subsection. Heavy tailed distributions are often observed in practice with financial time series, where samples of size at least 500 are frequently encountered. The resultant graphical inference is broadly similar to that reported above. We carried out experiments for T between 100 and 500 and found that, in the few cases where inferences were unreliable for the smallest samples (T = 100), they generally behaved predictably for the largest samples (T = 500). The exception to this is the case of non‐random unit root processes (i.e. unit root models other than (f)) where large samples are required for robustness to non‐normal errors. There are two further topics that are briefly raised here and which we investigate in greater detail in Marriott et al. (2003). The first of these relates to a topic addressed in some detail in Hoek (1997, Ch. 5), raised first by Schwert (1987), Schwert (1989) and subsequently by other authors. These authors discuss the presence of moving average components in macroeconomic data sets akin to those employed in the context of our graphical device in the next section. Hoek's ultimate conclusion is that `some of the NP series … contain strong MA components'. But `it appears that unit root inference for the extended NP data is not much affected by considering ARMA instead of AR models' (Hoek 1997, p. 95). Using these data, Leybourne (1994) reports quite large MA components for three of the NP series, one of which does not appear to have a unit root. Of course, any MA component, other than one with a unit root, can be expressed as an AR of sufficiently high order. Indeed, this is the spirit of the Dickey–Fuller augmentation methodology involving the φi parameters here. So, in principle, our proposal can handle such components by choosing p sufficiently large. To investigate this case further we therefore adopted the expedient, mentioned towards the end of Section 3.1, of no longer treating as a parameter to be estimated. Instead we substitute the elements thereby freed up in with the higher order autoregressive components implied by larger values of p. A maximum value of p = 6 is now available within the current implementation of our software. Of course even this expedient may be inadequate (see, for example, Schwert (1989, p. 148) for relevant discussion). As footnote 1 indicates, in principle, given a choice of p, the φi could be integrated out of the joint posterior density and our approach could then be adapted for the remaining parameters. The paper by Marriott et al. (2003) also investigates the use of our device in the context of structural breaks in the data and the reader is referred to that paper for further details. 3.5. State space models The final class of data generating processes we consider here is provided by the state space models discussed by Koop and van Dijk (2000). These authors use state space models and Bayesian methods to investigate the presence of stochastic trends in economic time series. The state space models they consider take the form where ut are i.i.d. and εt are i.i.d.N(0, σε2), independent of ut. The model in (g) is (5) of Koop and van Dijk (2000) and, substituting the value of τt in the observation equation (and its lagged value) into the transition equation, it can also be written as Hence we would expect always to find a unit root in the data whenever σu2 > 0. We generated samples of 100 observations from (g) with ρ= 0, which corresponds to process (2) of Koop and van Dijk (2000). For our simulations we used σu=σε= 0.06, so that our example corresponds in spirit with the second example they describe in the penultimate paragraph of their Section 2.1, for which their parameter . Some preliminary investigations indicate interesting properties for these processes as well as links with processes in which there is an MA component. In Figure 10 we present a graphical joint density that is typical of those that we found our device would clearly indicate the presence of a unit root in the data; compare the Bayes factor in favour of the unit root in Koop and van Dijk's (2000) example. We also note that our experience corresponds to what they have reported in the second row of their Table 1. Figure 10. Open in new tabDownload slide yt=τt+εt ; τt=τt−1+ ut Figure 10. Open in new tabDownload slide yt=τt+εt ; τt=τt−1+ ut 4. The Nelson and Plosser Data The data we use to illustrate this approach are taken from the seminal article of Nelson and Plosser (1982). Ordinarily, economic time series such as these, which are annual data averages for the year, would, with the exception of BND, be studied by first taking logarithms of the data in order to model rates of growth or to estimate elasticities. We have adopted this convention in the analyses reported here. For each series the value of p, which determines the number of lagged differences included in the regression, is either 2 or 3 for the exact likelihood or up to 6 if our implementation of the conditional likelihood is employed. In the spirit of the arguments of Agiakloglou and Newbold (1992) (that sampling properties of unit root tests are affected by increasing the order of the Augmented Dickey–Fuller regressions), we anticipate our graphical inference may be less precise (perhaps conveniently described by our `compactness' measure) for increasing values of p. However, in view of the effects of positive moving average components (Schwert 1987, Schwert 1989) in real data, we feel that a range of values of p must be considered in practical exercises. It is, however, of some comfort to note that Leybourne (1994, p. 727) and Hoek (1997, p. 92) found that none of these 14 series evidenced a positive value of γ (the moving average parameter in the notation of Marriott et al. (2003)), which seems likely to be more problematic than negative values. The value of p used in the graphs presented here is chosen to be one for which inference from the joint posterior plot is stable in the sense that incrementing p has no appreciable effect on the estimated posterior moments, particularly on the mean. The first figure presented, Figure 11, relates to the unemployment series, UN. It is often argued that this is the only one of the 14 series likely to be generated as a stationary autoregression and the figure indicates persuasively that this is the case. Many of the other series show that (2.1) may be a plausible class of models to entertain, often being suggestive of a unit root, with the exceptions of those discussed in the next paragraph. Of course, many authors have analysed the NP data over the last 20 years. Perron (1989) uses the GNP series to motivate structural breaks in macroeconomic time series. Koop and van Dijk (2000, Table 3) have provided posterior model probabilities for these series also and compare their results with the earlier work of Schotman and van Dijk (1991). While we are not trying in this paper to specify or choose specific models for data, the bivariate posterior plot for GNP is provided in Figure 12 as an example. The regularity and form of the plot is suggestive that (2.1) may be appropriate for the data. Koop and van Dijk (2000, p. 276) opine that more evidence for stationarity for this series is found from their analysis than for certain other of the NP series and our plot is not at odds with that. Other plots show stronger evidence of unit roots; the plot relating to the variable SP500 (p = 2) is given in Figure 13 to exemplify the generic appearance of them. The evidence of Koop and van Dijk (2000, Table 3) also strongly suggests that one of their hypotheses H2 or H3 (both of which imply an integrated series) is overwhelmingly likely for this series. Figure 11. Open in new tabDownload slide UN, p = 2 Figure 11. Open in new tabDownload slide UN, p = 2 Figure 12. Open in new tabDownload slide GNP, p = 3 Figure 12. Open in new tabDownload slide GNP, p = 3 Figure 13. Open in new tabDownload slide SP500, p = 2 Figure 13. Open in new tabDownload slide SP500, p = 2 However, three of the series show strong departures from what would be expected under (2.1). Figure 14 (p = 2) clearly indicates that the VEL series is not likely to be well represented by (2.1). Indeed the plot shown here bears a marked resemblance to that of Figure 9. The plot for CPI is very similar, but is not presented. Finally, Figure 15 for the BND series is an even less regular bivariate posterior probability distribution, being distinctly bimodal. Whilst we would not use our device to propose a parametric specification appropriate for such a series, any economic modeller would be clearly led to conclude that a richer class of models than (2.1) should be entertained for this series. Figure 14. Open in new tabDownload slide VEL, p = 2 Figure 14. Open in new tabDownload slide VEL, p = 2 Figure 15. Open in new tabDownload slide BND, p = 2 Figure 15. Open in new tabDownload slide BND, p = 2 5. Conclusion The proposed Bayesian graphical analysis, by allowing a probabilistic examination of the joint behaviour of two parameters of specific interest, provides a flexible and powerful tool in the exploratory analysis of economic time series. In particular it is seen that data generated from (2.1) with a unit root is likely to generate a qualitatively different plot from data without a unit root on the basis of these plots. It is also the case that data from non‐linear processes that are widely discussed in the literature give rise to samples that can be identified by this approach as not belonging to the class of models described by (2.1). The experiments and portrayals of Section 3 provide templates for a range of models which are (Section 3.3) and are not (Section 3.4) based on (2.1) and that may be of importance in practice. Some links to the state space form models recently considered by Koop and van Dijk (2000) are also provided. Application of the methodology to the famous 14 NP US annual data sets indicates that (2.1) may be an inadequate maintained model in at least three cases. Given the availability of suitable software we feel that the proposed procedure provides a powerful new exploratory inference technique for applied economic researchers. Acknowledgements The authors are very grateful for careful comments made by Gael Martin and Max King. The helpful criticisms of a co‐editor and two anonymous referees have served to streamline and focus the paper. Footnotes 1 " As suggested by a referee, the value of p can, in principle, be made as large as desired by ‘concentrating out’ the relevant parameters and proceeding as before. This requires some further coding that we have not, as yet, attempted. 2 " A referee interestingly suggested that we might apply our model with fractional difference models. We tried this with stationary fractional difference AR(1) processes, but our procedure generally suggested a stable AR(1). This is, perhaps, unremarkable as the low order autocorrelations of such a model look like those of the latter. Of course, our method does not purport to capture long‐range dependence and it may not be surprising that the results were thus. Example plots are provided on the website referred to later in the text. 3 " This website contains plots relevant to most of the artificial data generating mechanisms considered by us here and elsewhere and also a full set of those pertaining to the NP data used in Section 4. References 1 Agiakloglou C. & Newbold P. ( 1992 ); Empirical evidence on Dickey‐Fuller‐type tests . Journal of Time Series Analysis 13 : 471 – 83 . Google Scholar Crossref Search ADS WorldCat 2 Ahn S. K. ( 1993 ); Some tests for unit roots in autoregressive‐integrated‐moving average models with deterministic trends . Biometrika 80 : 855 – 68 . Google Scholar Crossref Search ADS WorldCat 3 Andrews D. W. K. ( 1993 ); Exactly median‐unbiased estimation of first order autoregressive/unit root models . Econometrica 61 : 139 – 65 . Google Scholar Crossref Search ADS WorldCat 4 Becker R. A. , Chambers J. M.& Wilks A. R. ( 1988 ); The New S Language , Wadsworth Pacific Grove , California. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 5 Berger J. O. & Yang R. ( 1994 ); Noninformative priors and Bayesian testing for the AR(1) model . Econometric Theory 10 : 461 – 82 . Google Scholar Crossref Search ADS WorldCat 6 Bernardo J. M. ( 1979 ); Reference posterior distributions for Bayes inference . Journal of the Royal Statistical Society B 41 : 113 – 47 . OpenURL Placeholder Text WorldCat 7 Bhargava A. ( 1986 ); On the theory of testing for unit roots in observed time series . Review of Economic Studies LIII : 369 – 84 . Google Scholar Crossref Search ADS WorldCat 8 Davidson R. & MacKinnon J. G. ( 1993 ); Estimation and Inference in Econometrics , Oxford University Press , New York. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 9 Geweke J. ( 1994 ); Priors for macroeconomic time series and their application . Econometric Theory 10 : 609 – 32 . Google Scholar Crossref Search ADS WorldCat 10 Hoek H. ( 1997 ); Variable Trends: A Bayesian Perspective Amsterdam. 11 Koop G. & Van Dijk H. ( 2000 ); Testing for integration using evolving trend and seasonal models: a Bayesian approach . Journal of Econometrics 97 : 261 – 91 . Google Scholar Crossref Search ADS WorldCat 12 Leybourne S. J. ( 1994 ); Testing for unit roots: a simple alternative to Dickey‐Fuller . Applied Economics 26 : 721 – 9 . Google Scholar Crossref Search ADS WorldCat 13 Leybourne S. J. , McCabe B. P. M.& Tremayne A. R. ( 1996 ); Can economic time series be differenced to stationarity? Journal of Business and Economic Statistics 14 : 435 – 46 . OpenURL Placeholder Text WorldCat 14 Lubrano M. ( 1995 ); Testing for unit roots in a Bayesian framework . Journal of Econometrics 69 : 81 – 109 . Google Scholar Crossref Search ADS WorldCat 15 Maddala G. S. & Kim I.‐M. ( 1998 ); Unit Roots, Cointegration and Structural Change , Cambridge University Press , Cambridge. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 16 Marriott J. M. , Naylor J. C.& Tremayne A. R. ( 2003 ); Bayesian graphical inference for economic time series that may have stochastic or deterministic trends . In Advances in Economics and Econometrics: Theory and Applications, Selected Papers from the Australasian Meeting of the Econometric Society , ( Becker R. & Hurn S.. ed.) Edward Elgar Cheltenham. (forthcoming). Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 17 Monahan J. F. ( 1983 ); Fully Bayesian analysis of ARMA time series models . Journal of Econometrics 21 : 307 – 31 . Google Scholar Crossref Search ADS WorldCat 18 Naylor J. C. ( 1991 ); Bayes Four User Guide. Technical Report. Nottingham Trent University. 19 Naylor J. C. & Marriott J. M. ( 1996 ); A Bayesian analysis of non‐stationary autoregressive series . In Bayesian Statistics 5 , ( Bernardo J. M., Berger J. O., Dawid A. P. & Smith A. F. M.. ed.) , pp. 705 – 12 . Oxford University Press , Oxford. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 20 Nelson C. R. & Plosser C. I. ( 1982 ); Trends and random walks in macroeconomic time series . Journal of Monetary Economics 10 : 139 – 62 . Google Scholar Crossref Search ADS WorldCat 21 Perron P. ( 1989 ); The great crash, the oil price shock and the unit root hypothesis . Econometrica 57 : 1361 – 401 . Google Scholar Crossref Search ADS WorldCat 22 Phillips P. C. B. ( 1991a ); To criticize the critics: an objective Bayesian analysis of stochastic trends . Journal of Applied Econometrics 6 : 333 – 64 . Google Scholar Crossref Search ADS WorldCat 23 Phillips P. C. B. ( 1991b ); Bayesian roots and unit roots: de rebus prioribussemper est disputandum . Journal of Applied Econometrics 6 : 435 – 73 . Google Scholar Crossref Search ADS WorldCat 24 Poirier D. J. ( 1991 ); A comment on `to criticise the critics: an objective Bayesian analysis of stochastic trends' . Journal of Applied Econometrics 6 : 381 – 6 . Google Scholar Crossref Search ADS WorldCat 25 Quinn B. G. ( 1982 ); A note on the existence of strictly stationary solutions to bilinear equations . Journal of Time Series Analysis 3 : 249 – 52 . Google Scholar Crossref Search ADS WorldCat 26 Schmidt P. & Phillips P. C. B. ( 1992 ); LM tests for a unit root in the presence of deterministic trends . Oxford Bulletin of Economics and Statistics 54 : 257 – 87 . Google Scholar Crossref Search ADS WorldCat 27 Schotman P. C. ( 1994 ); Priors for the AR(1) model: parameterization issues and time series considerations . Econometric Theory 10 : 579 – 95 . Google Scholar Crossref Search ADS WorldCat 28 Schotman P. C. & Van Dijk H. K. ( 1991 ); On Bayesian routes to unit roots . Journal of Applied Econometrics 6 : 387 – 401 . Google Scholar Crossref Search ADS WorldCat 29 Schwert G. W. ( 1987 ); Effects of model specification on tests for unit roots in macroeconomic data . Journal of Monetary Economics 20 : 73 – 103 . Google Scholar Crossref Search ADS WorldCat 30 Schwert G. W. ( 1989 ); Tests for unit roots: a Monte Carlo investigation . Journal of Business and Economic Statistics 7 : 147 – 59 . OpenURL Placeholder Text WorldCat 31 Shaw J. E. H. ( 1988 ); Aspects of numerical integration and summarisation . In Bayesian Statistics 3 , ( Bernardo J. M., DeGroot M. H., Lindley D. V. & Smith A. F. M.. ed.) , pp. 411 – 28 . Oxford University Press Oxford. (with discussion). Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 32 Smith A. F. M. , Skene A. M., Shaw J. E. H., Naylor J. C.& Dransfield M. ( 1985 ); The implementation of the Bayesian paradigm . Communications in Statistics 14 : 1079 – 102 . Google Scholar Crossref Search ADS WorldCat 33 Stock J. H. ( 1991 ); Confidence intervals for the largest autoregressive root in U.S. macroeconomic time series . Journal of Monetary Economics 28 : 435 – 59 . Google Scholar Crossref Search ADS WorldCat 34 Uhlig H. ( 1994 ); On Jeffreys prior when using the exact likelihood function . Econometric Theory 10 : 633 – 44 . Google Scholar Crossref Search ADS WorldCat 35 Zellner A. ( 1977 ); Maximal data information prior distributions . In New Developments in the Applications of Bayesian Methods , ( Aykac A. & Brumat C.. ed.) , pp. 211 – 32 . North‐Holland , Amsterdam. Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC 36 Zivot E. ( 1994 ); A Bayesian analysis of the unit root hypothesis within an unobserved components model . Econometric Theory 10 : 552 – 78 . Google Scholar Crossref Search ADS WorldCat 37 Zivot E. & Phillips P. C. B. ( 1994 ); A Bayesian analysis of trend determination in economic time series . Econometric Reviews 13 : 291 – 336 . Google Scholar Crossref Search ADS WorldCat © Royal Economic Society 2003 TI - Exploring economic time series: a Bayesian graphical approach JF - The Econometrics Journal DO - 10.1111/1368-423X.00105 DA - 2003-06-01 UR - https://www.deepdyve.com/lp/oxford-university-press/exploring-economic-time-series-a-bayesian-graphical-approach-9W3LLX0s8T SP - 124 VL - 6 IS - 1 DP - DeepDyve ER -