Bayesian Dynamic Modeling of High-Frequency Integer Price Changes

Bayesian Dynamic Modeling of High-Frequency Integer Price Changes Abstract We investigate high-frequency volatility models for analyzing intradaily tick by tick stock price changes using Bayesian estimation procedures. Our key interest is the extraction of intradaily volatility patterns from high-frequency integer price changes. We account for the discrete nature of the data via two different approaches: ordered probit models and discrete distributions. We allow for stochastic volatility by modeling the variance as a stochastic function of time, with intraday periodic patterns. We consider distributions with heavy tails to address occurrences of jumps in tick by tick discrete prices changes. In particular, we introduce a dynamic version of the negative binomial difference model with stochastic volatility. For each model, we develop a Markov chain Monte Carlo estimation method that takes advantage of auxiliary mixture representations to facilitate the numerical implementation. This new modeling framework is illustrated by means of tick by tick data for two stocks from the NYSE and for different periods. Different models are compared with each other based on the predictive likelihoods. We find evidence in favor of our preferred dynamic negative binomial difference model. High-frequency price changes observed at stock, futures, and commodity markets can typically not be regarded as continuous variables. In most electronic markets, the smallest possible price difference is set by the regulator or the trading platform. Here we develop and investigate dynamic models for high-frequency integer price changes that take the discreteness of prices into account. We explore the dynamic properties of integer time series observations. In particular, we are interested in the stochastic volatility dynamics of price changes within intradaily time intervals. This information can be used for the timely identification of changes in volatility and to obtain more accurate estimates of integrated volatility. In the current literature on high-frequency returns, price discreteness is typically neglected. However, the discreteness can have an impact on the distribution of price changes and on its volatility; see, for example, Security and Exchange Commission Report (2012), Chakravarty, Wood, and Ness (2004) and Ronen and Weaver (2001). Those assets that have prices with a spread of almost always equal to one tick are defined as large tick assets; see, Eisler, Bouchaud, and Kockelkoren (2012). These large tick assets are especially affected by the discreteness through the effect of different quoting strategies on these assets; see the discussions in Chordia and Subrahmanyam (1995) and Cordella and Foucault (1999). Also the effect of liquidity on large tick assets can be substantial as it is documented by O’Hara, Saar, and Zhong (2014) and Ye and Yao (2014). Many large tick assets exist on most U.S. exchange markets as the tick size is set to only one penny for stocks with a price greater than 1$ by the Security and Exchange Commission in Rule 612 of the Regulation National Market System. Hence almost all low price stocks are large tick assets. Moreover, many futures contracts are not decimalized for example, five-years U.S Treasury Note futures and EUR/USD futures fall into this category; see Dayri and Rosenbaum (2013). The relevance of discreteness and its effect on the analysis of price changes have been the motivation to develop models that account for integer prices. Similar to the case of continuous returns, we are primarily interested in the extraction of volatility from discrete price changes. We consider different dynamic model specifications for the high-frequency integer price changes with a focus on the modeling and extraction of stochastic volatility. We have encountered the studies of Müller and Czado (2009) and Stefanos (2015) who propose ordered probit models with time-varying variance specifications. We adopt their modeling approaches as a reference and also use their treatments of Bayesian estimation. The main novelty of our study is the specification of a new model for tick by tick price changes based on the discrete negative binomial distribution, which we shall refer to shortly as the ΔNB distribution. The properties of this distribution are explored in detail in our study. In particular, the heavy tail properties are emphasized. In our analysis, we adopt the ΔNB distribution conditional on a Gaussian latent state vector process, which represent the components of the stochastic volatility process. The volatility process accounts for the periodic pattern in high-frequency volatility due to intradaily seasonal effects such as the opening, lunch and closing hours. Our Bayesian modeling approach provides a flexible and unified framework to fit the observed tick by tick price changes. The ΔNB properties closely mimic the empirical stylized properties of trade by trade price changes. Hence, we will argue that the ΔNB model with stochastic volatility is an attractive alternative to models based on the Skellam distribution as suggested earlier; see Koopman, Lit, and Lucas (2017). We further decompose the unobserved log volatility into intradaily periodic and transient volatility components. We propose a Bayesian estimation procedure using standard Gibbs sampling methods. Our procedure is based on data augmentation and auxiliary mixtures; it extends the auxiliary mixture sampling procedure proposed by Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009). The procedures are implemented in a computationally efficient manner. In our empirical study, we consider two stocks from the NYSE, IBM, and Coca Cola, in a volatile week in October 2008 and a calmer week in April 2010. We compare the in-sample and out-of-sample fits of four different model specifications: ordered probit model based on the normal and Student’s t distributions, the Skellam distribution, and the ΔNB model. We compare the models in terms of Bayesian information criterion and predictive likelihoods. We find that the ΔNB model is favored for series with a relatively low tick size and in periods of more volatility. Our study is related to different strands in the econometric literature. Modeling discrete price changes with static Skellam and ΔNB distributions has been introduced by Alzaid and Omair (2010) and Barndorff-Nielsen, Pollard, and Shephard (2012). The dynamic specification of the Skellam distribution and its (non-Bayesian) statistical treatment have been explored by Koopman, Lit, and Lucas (2017). Furthermore, our study is related to Bayesian treatments of stochastic volatility models for continuous returns; see, for example, Chib, Nardari, and Shephard (2002), Kim, Shephard, and Chib (1998), Omori et al. (2007) and, more recently, Stroud and Johannes (2014). We extend this literature on trade by trade price changes by explicitly accounting for prices discreteness and heavy tails of the tick by tick return distribution. These extensions are explored in other contexts in Engle (2000), Czado and Haug (2010), Dahlhaus and Neddermeyer (2014), and Rydberg and Shephard (2003). The remainder is organized as follows. In Section 1, we review different dynamic model specifications for high-frequency integer price changes. We give most attention to the introduction of the dynamic ΔNB distribution. Section 2 develops a Bayesian estimation procedure based on Gibbs sampling, mainly for the ΔNB case of which the Skellam is a special case. In Section 3, we present the details of our empirical study including a description of our dataset, the data cleaning procedure, the presentation of our estimation results, and a discussion of our overall empirical findings. Section 4 concludes. 1 Dynamic Models for Discrete Price Changes We start this section with a discussion of dynamic volatility modeling for high-frequency data. Next, we review models for integer valued variables based on such a dynamic volatility specification. The first group of these models includes the ordered probit models based on normal and Student’s t distributions with stochastic volatility. The second group is captured by our novel dynamic negative binomial difference (ΔNB) model with stochastic volatility, which nests the dynamic Skellam model as a special case. We then present the main features of our newly introduced ΔNB model. 1.1 Dynamic Volatility Specification To capture the salient empirical features of high-frequency trade by trade price changes such as intradaily volatility clustering and persistent dynamics, one typically specifies the following dynamic models for the log volatility ht: ht=μh+xt,  xt+1=ϕxt+ηt,  ηt∼N(0,ση2), (1) for t=1,…,T, where t is a transaction counter (and not a time index), and where μh is the unconditional mean of the log volatility of the continuous returns, xt is a zero mean autoregressive process (AR) of order one, as denoted by AR(1), with ϕ as the persistence parameter for the log volatility process, and ση2 as the variance of the Gaussian disturbance term ηt. The mean μh represents the daily log volatility and the autoregressive process xt captures the changes in log volatility due to firm specific or market information experienced during the day. The latent variable xt is specified as an AR(1) process with zero mean; this restriction is enforced to allow for the identification of μh. However, there is also another stylized fact of intradaily price changes, which is the seasonality pattern in the volatility process. In particular, the volatility at the opening minutes of the trading day is high, during the lunch-hour it is lowest, and at the closing minutes it is increasing somewhat. We can account for the intradaily volatility pattern by further decomposing the log volatility ht into a deterministic daily seasonal pattern st and a stochastically time varying signal xt as ht=μh+st+xt,  E(st)=0, (2) where st is a normalized spline function with its unconditional expectation equal to zero. Such a specification allows us to smoothly interpolate different levels of volatility over the day. We enforce the zero mean constraint via a simple linear restriction. In our model we specify st as an intradaily cubic spline function, constructed from piecewise cubic polynomials. More precisely, we adopt the representation of Poirier (1973) where the periodic cubic spline st is based on K knots and the regression equation st=wtβ, (3) where wt is a 1×K weight vector and β=(β1,…,βK)′ is a K×1 vector containing the values of the spline function at the K knots. Further details about the spline and the Poirier representation are presented in Appendix B. In our empirical study, we adopt a spline function with K = 3 knots at {09:30, 12:30, 16:00}. The spline function also accounts for the overnight effect of high volatility at the opening of trading due to the cumulation of new information during the closure of the market. The difference of βK (market closure 16:00) and β1 (market opening 9:30) measures the overnight effect in log volatility ht. This treatment of overnight effects follows Engle (2000) and Müller and Czado (2009); alternatively we can introduce a daily random effect for the opening minutes of trading. For such alternative treatments of intradaily seasonality and overnight effects, we refer to Weinberg, Brown, and Jonathan (2007), Bos (2008) and Stroud and Johannes (2014). 1.2 Ordered Normal Stochastic Volatility Model In econometrics, the ordered probit model is typically used for the modeling of ordinal variables. But we can also adopt the ordered probit model in a natural way for the modeling of discrete price changes. In this approach, we effectively round a realization from a continuous distribution to its nearest integer. The continuous distribution can be subject to stochastic volatility; this extension is relatively straightforward. Let rt* be the continuous return which is rounded to rt=k when rt*∈[k−0.5,k+0.5). We observe rt and we regard rt* as a latent variable. By neglecting the discreteness of rt during the estimation procedure, we clearly would distort the measurement of the scaling or variation of rt*. Therefore we need to take account of the rounding of rt by specifying an ordered probit model with rounding thresholds [k−0.5,k+0.5). We assume that the underlying distribution for rt* is subject to stochastic volatility. We obtain the following specification rt=k, with probability Φ(k+0.5exp⁡(ht/2))−Φ(k−0.5exp⁡(ht/2)), for k∈ℤ, (4) for t=1,…,T, where ht is the logarithm of the time varying stochastic variance for the standard normal distribution with cumulative density function Φ(·) for the latent variable rt*. The dynamic model specification for ht is given by (1). Similar ordered probit model specifications with stochastic volatility are introduced by Müller and Czado (2009) and Stefanos (2015). In the specification of Müller and Czado (2009), the rounding barriers are not necessarily equally spaced but need to be estimated. This further flexibility may improve the model fit compared to our basic ordered SV model specification. On the other hand, the more flexible model can only be fitted to data accurately when sufficient observations for each possible discrete outcome are available. If you only have a few price jumps of more than, say, ±10 tick sizes, it may become a problem to handle such large outcomes. The basic model specification (1) and (4) accounts for the discreteness of the prices via the ordered probit specification and for intradaily volatility clustering via the possibly persistent dynamic process of xt. The model can be modified and extended in several ways. First, we can account for the market microstructure noise observed in tick by tick returns (see, e.g., Aït-Sahalia, Mykland, and Zhang, 2011 and Griffin and Oomen, 2008) by including an autoregressive moving average (ARMA) process in the specification of the mean of rt*. In a similar way, we can facilitate the incorporation of explanatory variables such as market imbalance which can also have predictive power. Second, to include predetermined announcement effects, we can include regression effects in the specification as proposed in Stroud and Johannes (2014). Third, it is possible that the unconditional mean μh of the volatility of price changes is time varying. For example, we may expect that for larger price stocks the volatility is higher and therefore the volatility is not properly scaled when the price has changed. The time-varying conditional mean of the volatility can be easily incorporated in the model, by specifying a random walk dynamics for μh, which would allow for smooth changes in the mean over time. For our current purposes below we can rely on the specification as given by Equation (2). 1.3 Ordered t Stochastic Volatility Model It is well documented in the financial econometrics literature that asset prices are subject to jumps; see, for example, Aït-Sahalia, Jacod, and Li (2012). However, the ordered normal specification, as we have introduced it above, does not deliver sufficiently heavy tails in its asset price distribution to accommodate the jumps that are typically observed in high-frequency returns. To account for the jumps more appropriately, we can consider a heavy tailed distribution instead of the normal distribution. In this way, we can assign probability mass to the infrequently large jumps in asset returns. An obvious choice for a heavy tailed distribution is the Student’s t-distribution, which would imply the following specification, rt=k, with probability T(k+0.5exp⁡(ht/2),ν)−T(k−0.5exp⁡(ht/2),ν), for k∈ℤ, (5) which effectively replaces model Equation (4), where T(·,ν) is the cumulative density function of the Student’s t-distribution with ν as the degrees of freedom parameter. The model specification for ht is provided by Equation (1) or (2). The parameter vector of this model specification is denoted by ψ and includes the degrees of freedom ν, the unconditional mean of log volatility μh, the volatility persistence coefficient ϕ, the variance of the log volatility disturbance ση2, and the unknown vector β in (3) with values of the spline at its knot positions. In case of the normal ordered probit specification, we can rely on the same parameters but without ν. The estimation procedure for these unknown parameters in the ordered probit models are carried out by standard Bayesian simulation methods for which the details are provided in Appendix C. 1.4 Dynamic ΔNB Model Positive integer variables can alternatively be modeled directly via discrete distributions such as the Poisson or the negative binomial, see Johnson, Kemp, and Kotz (2005). These well-known distributions only provide support to positive integers. When modeling price differences, we also need to allow for negative integers. For example, in this case, the Skellam distribution can be considered, see Skellam (1946). The specification of these distributions can be extended to stochastic volatility model straightforwardly. However, the analysis and estimation based on such models are more intricate. In this context, Alzaid and Omair (2010) advocates the use of the Skellam distribution based on the difference of two Poisson random variables. Barndorff-Nielsen, Pollard, and Shephard (2012) introduces the negative binomial difference (ΔNB) distribution, which has fatter tails compared to the Skellam distribution. Next we review the ΔNB distribution and its properties. We further introduce a dynamic version of the ΔNB model from which the dynamic Skellam model is a special case. The ΔNB distribution is implied by the construction of the difference of two negative binomial random variables which we denote by NB+ and NB− where the variables have number of failures λ+ and λ−, respectively, and failure rates ν+ and ν−, respectively. We denote the ΔNB variable as the random variable R and is simply defined as R=NB+−NB−. We then assume that R is distributed as R∼ΔNB(λ+,ν+,λ−,ν−), where ΔNB is the difference negative binomial distribution with probability mass function given by fΔNB(r;λ+,ν+,λ−,ν−)=m×{d+×F(ν++r,ν−,r+1; λ˜+λ˜−),if r≥0,d−×F(ν+,ν−−r,−r+1; λ˜+λ˜−),if r<0, where m=(ν˜+)ν+(ν˜−)ν−, d[s]=(λ˜[s])r(ν[s])r / r!, ν˜[s]=ν[s]λ[s]+ν[s],  λ˜[s]=λ[s]λ[s]+ν[s],  for [s]=+,−, and with the hypergeometric function F(a,b,c;z)=∑n=0∞(a)n(b)n(c)n znn!, where (x)n is the Pochhammer symbol of falling factorial and is defined as (x)n=x(x−1)(x−2)⋯(x−n+1)=Γ(x+1)Γ(x−n+1). More details about the ΔNB distribution, its probability mass function, and properties are provided by Barndorff-Nielsen, Pollard, and Shephard (2012). For example, the ΔNB distribution has the following first and second moments E(R)=λ+−λ−,  Var(R)=λ+(1+λ+ν+)+λ−(1+λ−ν−). The variables ν+, ν−, λ+, and λ− are treated typically as unknown coefficients. An important special case of the ΔNB distribution is its the zero mean and symmetric version, which is obtained when λ=λ+=λ− and ν=ν+=ν−. The probability mass function for the corresponding random variable R is given by f0(r;λ,ν)=(νλ+ν)2ν(λλ+ν)|r|Γ(ν+|r|)Γ(ν)Γ(|r|+1)F(ν+|r|,ν,|r|+1;(λλ+ν)2). In this case, we have obtained a zero mean random variable R with its variance given by Var(R)=2λ(1+λν). (6) We denote the distribution for the zero mean random variable R by ΔNB(λ,ν). This random variable R can alternatively be considered as being generated from a compound Poisson process, that is R=∑i=1NMi, where random variable N is generated by the Poisson distribution with intensity λ × (z1+z2),  z1, z2∼Ga(ν,ν), (7) with Ga(ν,ν) being the Gamma distribution, having its shape and scale both equal to ν, and where indicator variable Mi is generated as Mi={1,with probability P(Mi=1)=z1 / (z1+z2),−1,with probability P(Mi=−1)=z2 / (z1+z2). We will use this representation of a zero mean ΔNB variable for the developments below and in our empirical study. In the empirical analyses of this study, we adopt the zero inflated versions of the ΔNB distributions, because empirically we observe a clear overrepresentation of trade by trade price changes that are equal to zero. In the analysis of Rydberg and Shephard (2003), the zero changes are also treated explicitly since they decompose a discrete price change into activity, direction, and size. All zero changes are treated as in-activities. This decomposition model is particularly suited for the analysis of micro market structure. In our empirical modeling framework, we concentrate on the extraction of volatility in time series of discrete price changes. The number of zero price changes are especially high for the more liquid stocks. This is due to the available volumes on best bid prices that are relatively much higher. Hence the price impact of one trade is much lower as a result. The zero inflated version is accomplished by the specification of the random variable R0 as r0 ={r,with probability (1−γ)fΔNB(r;λ+,ν+,λ−,ν−),0,with probability γ+(1−γ)fΔNB(0;λ+,ν+,λ−,ν−), where fΔNB(r;λ+,ν+,λ−,ν−) is the probability mass function for r and 0<γ<1 is treated as a fixed and unknown coefficient. We denote the zero inflated ΔNB probability mass function with f0. The dynamic specifications of the ΔNB distributions can be obtained by letting the variables ν[s] and /orλ[s] be time-varying random variables, for [s]=+,−. We opt to have a time-varying λ[s] since it is more natural for an intensity than a degrees of freedom parameter to vary over time. The dynamic modeling of ν could also be interesting but we leave this for future research. We restrict our analysis to the zero inflated zero mean ΔNB distribution f0(rt;λt,ν) and we further assume that the degrees of freedom parameters for positive and negative price changes are the same, that is λt=λt+=λt−. Taking the above considerations into account, the dynamic ΔNB model can be specified as above but with λt=exp⁡(ht), where ht is specified as in Equation (1) or (2). We recognize that exp⁡(ht/2) represents the standard deviation of the latent variable rt* in our ordered probit model specification and here we consider exp⁡(ht). However, the variance of a ΔNB random variable relies on λ2 and is not symmetric in λ. Hence we do not model the standard deviation of a ΔNB random variable as such. Also, due to the asymmetry we require to specify λt as a positive variable (enforced by the exponential function). The main reason for a letting λt be time varying is to simplify the derivation of the sampling scheme. In particular, the current specification is convenient for deriving the auxiliary mixture sampling method, see Section 2 for details. 1.5 Dynamic Skellam Model The dynamic ΔNB model embeds the dynamic Skellam model as considered by Koopman, Lit, and Lucas (2017). It is obtained as the limiting case of letting ν go to infinity, that is ν→∞; for a derivation and further details, see Appendix A. 2 Bayesian Estimation Procedures Bayesian estimation procedures for the ordered normal and ordered Student’s t stochastic volatility models are discussed by Müller and Czado (2009) and Stefanos (2015); their procedures, with some details, are presented in Appendix C. Here we develop a Bayesian estimation procedure for observations yt, with t=1,…,T, coming from the dynamic ΔNB model. We provide the details of the procedure and discuss its computational implementation. Our reference dynamic ΔNB model is given by yt∼f0(yt;λt,ν),λt=exp⁡  ht,ht=μh+st+xt,st=wtβ,xt+1=ϕxt+ηt, where ηt∼N(0,ση2), for t=1,…,T. The details of the model are discussed in Section 1. The variable parameters ν, μh, β, ϕ and ση2 are static while xt is a latent variable that is modeled as a stationary autoregressive process. The intradaily seasonal effect st is represented by a Poirier spline; see Appendix B. Our proposed Bayesian estimation procedure aims to estimate all static variables jointly with the time-varying signal h1,…,hT for the dynamic ΔNB model. It is based on Gibbs sampling, data augmentation, and auxiliary mixture sampling methods which are developed by Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009). At each time point t, for t=1,…,T, we introduce a set of latent auxiliary variables to facilitate the derivation of conditional distributions. By introducing these auxiliary variables, we are able to specify the model as a linear state space model with non-Gaussian observation disturbances. Moreover using an auxiliary mixture sampling procedure, we can even obtain conditionally an approximating linear Gaussian state space model. In such a setting, we can exploit the highly efficient Kalman filtering and smoothing procedures for the sampling of many full paths for the dynamic latent variables. These ingredients are key for a computational feasible implementation of our estimation process. 2.1 Data Augmentation: Our Latent Auxiliary Variables We use the following auxiliary variables for the data augmentation. We define Nt as the sum of NB+ and NB−, the gamma mixing variables zt1 and zt2. Moreover conditional on zt1, zt2 and the intensity λt, we can interpret Nt as a Poisson process on [0,1] with intensity (zt1+zt2)λt based on the result in Equation (7). We can introduce the latent arrival time of the Nt-th jump of the Poisson process τt2 and the arrival time between the Nt-th and Nt+1-th jump of the process τt1 for every t=1,…,T. The interarrival time τt1 can be assumed to come from an exponential distribution with intensity (zt1+zt2)λt while the Ntth arrival time can be treated as the gamma distributed variable with density function Ga(Nt,(zt1+zt2)λt). We have τt1=ξt1(zt1+zt2)λt,ξt1∼Exp(1),τt2=ξt2(zt1+zt2)λt,ξt2∼Ga(Nt,1), where we can treat ξt1 and ξt2 as auxiliary variables. By taking the logarithm of the equations and substituting the definition of log⁡λt from Equation (2), we obtain −log⁡τt1=log(zt1+zt2)+μh+st+xt+ξt1*, ξt1*=−log⁡ξt1,−log⁡τt2=log(zt1+zt2)+μh+st+xt+ξt2*, ξt2*=−log⁡ξt2. These equations are linear in the state vector, which would facilitate the use of Kalman filtering. However, the error terms ξt1* and ξt2* are nonnormal. We can adopt solutions as in Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009) where the exponential and the negative log-gamma distributions are approximated by normal mixture distributions. In particular, we can specify the approximations as fξ*(x;Nt)≈∑i=1C(Nt)ωi(Nt)ϕ(x,mi(Nt),vi(Nt)), where C(Nt) is the number of mixture components at time t, for t=1,…,T, ωi(Nt) is the weight, and ϕ(x,m,v) is the normal density for variable x with mean m and variance v. These approximations remain to depend on Nt because the log gamma distribution is not canonical and it has different shapes for different values of Nt. 2.2 Mixture Indicators for Obtaining Conditional Linear Model Conditionally on N, z1, z2, τ1, τ2 and C={ctj,t=1,…,T,j=1,…,min⁡(Nt+1,2)} we can write the following state space form y˜t︸min⁡(Nt+1,2)×1=[1wt11wt1]︸min⁡(Nt+1,2)×(K+2)[μhβxt]︸(K+2)×1+ɛt︸min⁡(Nt+1,2)×1, ɛt∼N(0,Ht),αt+1=[μhβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μhβxt]︸(K+2)×1+[00ηt]︸(K+2)×1,ηt∼N(0,ση2), where [μhβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000ση2/(1−ϕ2)]︸(K+2)×(K+2)), Ht=diag(vct12(1),vct22(Nt)) and y˜t︸min⁡(Nt+1,2)×1=(−log⁡τt1−mct1(1)−log(zt1+zt2)−log⁡τt2−mct2(Nt)−log(zt1+zt2)). Using the mixture of normal approximation of ξt1* and ξt2*, allows us to build an efficient Gibbs sampling procedure in which we can efficiently sample the latent state paths in one block using Kalman filtering and smoothing techniques. 2.3 Sampling of Event Times Nt The remaining challenge is sampling of Nt as all the other full conditionals are standard. We notice that conditionally on zt1, zt2 and the intensity λt, the Nt’s are independent over time. Using the short-hand notation v=(v1,…,vt) for a vector of variables for all the time periods, we can write p(N|γ,ν,μh,ϕ,ση2,s,x,z1,z2,y)=∏t=1Tp(Nt|γ,λt,zt1,zt2,yt). For a given time index t, we can draw Nt from a discrete distribution with p(Nt|γ,λt,zt1,zt2,yt)=p(Nt,yt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2)=p(yt|Nt,γ,λt,zt1,zt2)p(Nt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2)=[γ1{yt=0}+(1−γ)p(yt|Nt,λt,zt1,zt2)]×p(Nt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2). (8) The denominator in Equation (8) is a Skellam probability mass function with the intensities λtzt1 and λtzt2. To calculate the probability p(yt|Nt,λt,zt1,zt2) in the second term in the brackets in (8) we use Equation (7), as yt conditionally on λt, zt1 and zt2 is distributed as a marked Poisson process, with marks given by Mi={1,with P(Mi=1)=zt1zt1+zt2,−1,with P(Mi=−1)=zt2zt1+zt2. This implies that we can represent yt as ∑i=0NtMi, so that p(yt|Nt,λt,zt1,zt2)={0 , if yt>Nt or |yt| mod⁡2≠|Nt| mod⁡2,(NtNt+yt2)(zt1zt1+zt2)Nt+yt2(zt2zt1+zt2)Nt−yt2,otherwise. Then Nt conditionally on zt1, zt2 and λt is a realization of a Poisson process on [0,1] with intensity (zt1+zt2)λt, hence the p(Nt|γ,λt,zt1,zt2) is the probability of a Poisson random variable with intensity equal to λt(zt1+zt2). We can draw all Nt’s in parallel by drawing u, a vector of uniform random variables with ut∼U[0,1], and setting Nt=min⁡{n:ut≤∑i=0np(i|γ,λt,zt1,zt2,yt)}. 2.4 Markov Chain Monte Carlo Algorithm To complete our Bayesian specification, we need to specify the prior distributions on the model parameters, which we set as follows: μh∼N(0,10), βi∼N(0,1), ϕ+12∼B(20,1.5), (9) ση2∼IG(2.5,0.025), γ∼B(1.7,10), ν∼G[2:0.2:128](15,1.5), (10) for i=1,…,K, where N is the normal, B is the beta, IG is the inverse gamma, and G[2,2.2,…,128] is the gamma distribution on a grid from 2 to 128 with a resolution of 0.2. The steps of the Markov chain Monte Carlo (MCMC) algorithm are outlined below, with more details provided in Appendix D. Initialize μh, ϕ, ση2, γ, ν, C, τ, N, z1, z2, s and x. Generate ϕ, ση2, μh,s and x from p(ϕ,ση2,μh,s,x|γ,ν,C,τ,N,z1,z2,s,y). Draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y). Generate γ from p(γ|ν,μh,ϕ,ση2,x,C,τ,N,z1,z2,s,y). Generate C,τ,N,z1,z2,ν from p(C,τ,N,z1,z2,ν|γ,μh,ϕ,ση2,x,s,y). Draw ν from p(ν|γ,μh,ϕ,ση2,x,s,y) Draw z1,z2 from p(z1,z2|ν,γ,μh,ϕ,ση2,x,s,y). Draw N from p(N|z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Draw τ from p(τ|N,z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Draw C from p(C|τ,N,z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Go to 2. The estimation of s is based on the spline specification st=wtβ in Equation (3) where K×1 vector wt can be treated as an exogenous vector and K×1 vector β contains the unknown spline values, which are treated as regression coefficients and need to be estimated. 2.5 Simulation Study To validate our estimation procedure for the dynamic Skellam and ΔNB models, we independently perform the following experiment 50 times. We simulate 20,000 observation from the model to be estimated and carry out the MCMC sampling based on 20,000 draws after the burn-in of 20,000 draws. The true parameters are set as μ=−1.7, ϕ=0.97,ση=0.02, γ=0.1, and ν = 10, which are close to those estimated from real data in our empirical study of Section 3. A single experiment takes approximately 5 hours on a 2.90 GHz CPU. Table 1 presents the posterior means, standard deviations, and the 95% credible intervals, averaged over 50 Monte Carlo replications. It also shows the mean inefficiency factors and their standard deviations across the experiments. Figure 1 illustrates the estimation results on a subsample of the initial 15 Monte Carlo replications, while Figure 2 depicts the posterior densities of the parameters from a single simulation. These results indicate that in our stylized setting, the algorithm can estimate the parameters accurately as the true values are within the highest posterior density regions based on the estimates. The posterior distributions for the autoregressive coefficient ϕ and for the state variance ση2 seem to be the most challenging to estimate efficiently, as their inefficiency factors are high. Nevertheless, the accuracy of their estimates is satisfactory, with the true values on average being within the 95% credible intervals. Table 1. Posterior means, standard deviations (in parentheses), the 95% central credible intervals (in brackets, 95% CI) and the mean inefficiency factors (IF) and their standard deviations averaged over MC 50 replications, for M = 20,000 posterior draws after a burn-in of 20,000 draws for T = 20,000 observations generated from the ΔNB models Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Table 1. Posterior means, standard deviations (in parentheses), the 95% central credible intervals (in brackets, 95% CI) and the mean inefficiency factors (IF) and their standard deviations averaged over MC 50 replications, for M = 20,000 posterior draws after a burn-in of 20,000 draws for T = 20,000 observations generated from the ΔNB models Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Figure 1. View largeDownload slide Bar plots of the posterior draws in a subsample of last 15 experiments from our Monte Carlo study. Figure 1. View largeDownload slide Bar plots of the posterior draws in a subsample of last 15 experiments from our Monte Carlo study. Figure 2. View largeDownload slide Posterior distributions of the parameters from a dynamic Δ NB model based on 20,000 observations and 20,000 iterations after a burn-in of 20,000. Each picture shows the histogram of the posterior draws the kernel density estimate of the posterior distribution, the HPD region, and the posterior mean. The true parameters are μ=−1.7, ϕ=0.97, ση2=0.02, γ=0.1 and ν=10. Figure 2. View largeDownload slide Posterior distributions of the parameters from a dynamic Δ NB model based on 20,000 observations and 20,000 iterations after a burn-in of 20,000. Each picture shows the histogram of the posterior draws the kernel density estimate of the posterior distribution, the HPD region, and the posterior mean. The true parameters are μ=−1.7, ϕ=0.97, ση2=0.02, γ=0.1 and ν=10. 3 Empirical Study In this section, we present and discuss the empirical findings from our analyses concerning tick by tick price changes for three different stocks traded at the NYSE, for two different periods. In particular, we analyze the bid prices that correspond to transactions to account for bid-ask bounds. We consider two model classes and two models for each class. The first set consists of the ordered probit models with normal and Student’s t stochastic volatility. The second set includes the dynamic Skellam and dynamic ΔNB models. The analyses include in-sample and out-of-sample marginal likelihood comparison of the models. Our aims of the empirical study is two-fold. First, the usefulness of the ΔNB model on a challenging dataset is investigated. In particular, we validate our estimation procedure and reveal possible shortcomings in the estimation of the parameters in the ΔNB model. Second, we intend to find out what the differences are when the considered models are based on heavy-tailed distributions (ordered t and ΔNB models) or not (ordered normal and dynamic Skellam models). Also, we compare the different model classes: ordered model versus integer distribution model. 3.1 Data We have access to the Thomson Reuters Sirca dataset that contains all trades and quotes with millisecond time stamps for all stocks listed at NYSE. We have collected the data for International Business Machines (IBM) and Coca-Cola (KO). These stocks differ in liquidity and in price magnitude. In our study we concentrate on two weeks of price changes: the first week of October 2008 and the last week of April 2010. These weeks exhibit different market sentiments and volatility characteristics. The month of October 2008 is in the middle of the 2008 financial crises with record high volatilities and some markets experienced their worst weeks in October 2008 since 1929. The month of April 2010 is a much calmer month with low volatilities. To avoid some of the issues related to microstructure noise in high-frequency price changes, including bid-ask bounces, we analyze the bid prices of transactions. The cleaning process of the data consists of a number of filtering steps that are similar to the procedures described in Boudt, Cornelissen, and Payseur (2012), Barndorff-Nielsen et al. (2008), and Brownlees and Gallo (2006). First, we remove all quotes-only entries which is a large portion of the data. By excluding the quotes we lose around 70−90% of the data. In the next step, we delete the trades with missing or zero prices or volumes. We also restrict our analysis to the trading period from 09:30 to 16:00. The fourth step is to aggregate the trades which have the same time stamp. We take the trades with the last sequence number when there are multiple trades at the same millisecond. We regard the last bid price as the bid price that we can observe with a millisecond resolution. Finally, we treat outliers by following the rules as suggested by Barndorff-Nielsen et al. (2008). Table 2 presents the descriptive statistics for our resulting bid price data from the 3rd to 10th October 2008 and from the 23rd to 30th April 2010, respectively. A more detailed account of the cleaning process can be found in Tables E.1 and E.2 in Appendix E. We treat the periods from the 3rd to 9th October 2008 and from the 23rd to 29th April 2010 as the in-sample periods. The two out-of-sample periods are 10th October 2008 and 30th April 2010. Figure 3 presents the empirical distributions of the tick by tick log returns as well as the tick returns and the fitted Skellam probability mass function (pmf). For the two stocks considered, IBM and KO, there is a nontrivial number of tick returns higher than 10 in absolute terms. Moreover, we find that the Skellam distribution is too lightly tailed to correctly capture the fat tails of the bid price data. Table 2. Descriptive statistics of the bid prices for IBM and KO from 3rd to 10th October 2008 (top) and from 23rd to 30th April 2010 (bottom) October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 Notes: The column “In” displays the statistics on the in-sample period from 3rd to 9th October 2008, while the column “Out” displays the descriptives for the out-of-sample period 10th October. We show the number of observations (Num.obs), average price (Avg. price), mean price change (Mean), standard deviation of price changes (Std), minimum and max integer price changes (Min,Max) as well as the percentage of zero price changes (% 0), the percentage of −1, 1 price changes (% ±1) and the percentage of price changes between 2 and 10 in absolute terms (% ± 2–10) in the sample. Table 2. Descriptive statistics of the bid prices for IBM and KO from 3rd to 10th October 2008 (top) and from 23rd to 30th April 2010 (bottom) October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 Notes: The column “In” displays the statistics on the in-sample period from 3rd to 9th October 2008, while the column “Out” displays the descriptives for the out-of-sample period 10th October. We show the number of observations (Num.obs), average price (Avg. price), mean price change (Mean), standard deviation of price changes (Std), minimum and max integer price changes (Min,Max) as well as the percentage of zero price changes (% 0), the percentage of −1, 1 price changes (% ±1) and the percentage of price changes between 2 and 10 in absolute terms (% ± 2–10) in the sample. Figure 3. View largeDownload slide Empirical distributions of bid prices for IBM and KO stocks, during the October 2008 period. Figure 3. View largeDownload slide Empirical distributions of bid prices for IBM and KO stocks, during the October 2008 period. 3.2 Estimation Results We start our analyses with the dynamic Skellam and ΔNB models for all considered stocks in the periods from 3rd to 9th October 2008 and from 23rd to 29th April 2010. We adopt the same prior specifications as in the simulation study and given by (9)–(10). In the MCMC procedure, we draw 40,000 samples from the Markov chain and disregard the first 20,000 draws as burn-in samples. The results of parameter estimation for the 2008 data period are reported in Tables 3(a and b), while for the 2010 period in Table 4(a and b). Table 3. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 Table 3. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 Table 4. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 Table 4. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 Table E.1. Summary of the cleaning and aggregation procedure on the data from 3rd to 10th Oct 2008 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 Table E.1. Summary of the cleaning and aggregation procedure on the data from 3rd to 10th Oct 2008 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 Table E.2. Summary of the cleaning and aggregation procedure on the data from 23rd to 30th April 2010 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 Table E.2. Summary of the cleaning and aggregation procedure on the data from 23rd to 30th April 2010 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 The unconditional mean volatility differ across stocks and time periods. The unconditional mean of the latent state is higher for stocks with higher price and it is higher in the more volatile periods in 2008. These results are consistent with intuition but we should not take strong conclusions from these findings. For example, we cannot compare the estimated means between models as they have somewhat different interpretations in different model specifications. The estimated AR(1) coefficients for the different series range from 0.94 to 0.99, except for the Skellam and ΔNB models applied to the IBM data in the 2008 period, in which case the posterior means were 0.51 and 0.65, respectively. This finding suggests generally persistent dynamic volatility behavior within a trading day, even after accounting for the intradaily seasonal pattern in volatility. However, by comparing the two different periods, we find that the transient volatility is less persistent in the more volatile crises period. We only included the zero inflation specification for the ΔNB and dynamic Skellam distributions when additional flexibility appears to be needed in the observation density. This flexibility has been required for higher price stocks and during the more volatile periods. The estimates for the zero inflation parameter γ ranges from 0.1 to 0.3. The degrees of freedom parameter ν for the ΔNB distribution is estimated as a higher value during the more quiet 2010 period, which suggests that the distribution of the tick by tick price change is closer to a thin tailed distribution during such periods. In addition, we have found that the estimated degrees of freedom parameter is a lower value for stocks with a higher average price. From a more technical perspective, our study has revealed that the parameters of our ΔNB modeling framework mix relatively slowly. This may indicate that our procedure can be rather inefficient. However, it turns out that the troublesome parameters are in all cases the persistence parameter of the volatility process, ϕ, and the volatility of volatility, ση. It is well established and documented that these coefficients are not easy to estimate efficiently as they have not a direct impact on the observations; see Kim, Shephard, and Chib (1998) and Stroud and Johannes (2014). Furthermore, our empirical study has some challenging numerical problems. In the 2008 period we analyze almost 70,000 observations jointly while the time series in the 2010 period is shorter but still 40,000 observations. Such long-time series will typically lead to slow mixing in Bayesian MCMC estimation procedures due to highly informative full conditional distributions. Bayesian asymptotic results guarantee that long series are more informative about the parameters. Hence we can estimate these parameters accurately. Our Monte Carlo study in the previous section has shown that our algorithm is successful in capturing the true parameters. However, with long series it can be hard to construct efficient proposal distributions. In other words, it can be hard to choose “plausible” parameters in the random walk MH algorithm. We therefore observe low acceptance rates and thus high inefficiency factors. We have also anticipated that parameter estimation for the dynamic Skellam and ΔNB models is less numerically efficient and overall more challenging when compared to the estimation for ordered normal and ordered t models. The estimation for discrete distribution models requires more auxiliary variables and the analysis is based on additional conditional statements. On the basis of the output of our MCMC estimation procedure, we obtain the estimates for the latent volatility variable ht but we can also decompose these estimates into the corresponding components of ht, these are μh, st and xt; see Equation (2). Figure 4 presents the intraday, tick by tick, Coca Cola bid price changes and its estimated components xt and st for the log volatility ht in the ΔNB model, from 23rd to 29th April 2010. The intraday seasonality matches with the typical features of tick by tick data and reflects the market mechanism; see also the discussion in Andersen (2000). The volatility is the highest at the beginning of the trading day which is the result of the overnight effect and a different trading mechanism at the pre-open call auction during the first half hour of trading (from 9:00 to 9:30). This burst of information accumulated during the overnight period leads to much higher volatility at the opening of the market. This effect is captured by the estimated initial value of the spline function β1. The overnight effect receives strong support from the data given that the posterior means are far from zero. We further find that the regular trading takes place continuously throughout the day while it becomes more intense shortly before the closing of the market. The smoothness of the intraday seasonal pattern estimates is enforced through the spline specification. Apart from the pronounced intraday seasonality, we observe many volatility changes during a trading day. Some of these volatility changes may have been sparked by news announcements while others may have occurred as the result of the trading process. Finally, the parameter values underlying the signal extraction of ht=μh+st+xt are estimated jointly for five consecutive days. Hence it is implied that the overnight effect is the same for each day in our analysis. In Appendix F, we compare our estimates of ht=μh+st+xt with those based on parameter estimates obtained for each day separately. Although some differences are clearly visible, overall the extracted signals of ht are very similar. Figure 4. View largeDownload slide Decomposition of log volatility in the dynamic ΔNB model for KO from 23rd to 29th April 2010. Figure 4. View largeDownload slide Decomposition of log volatility in the dynamic ΔNB model for KO from 23rd to 29th April 2010. 3.3 In-Sample Comparison It is widely established in Bayesian studies that the computation of sequential Bayes factors (BF) is infeasible in this framework as it requires sequential parameter estimation. The sequential estimation of the parameters in our model is computationally prohibitive given the very high time dimensions. To provide some comparative assessments of the four models that we have considered in our study, we follow Stroud and Johannes (2014) and calculate Bayesian Information Criteria (BIC) for model M as BICT(M)=−2∑t=1Tlog⁡p(yt|θ^,M)+di log⁡T, where p(yt|θ,M) can be calculated by means of a particle filter and θ^ is the posterior mean of the parameters. The implementation of the particle filter for all considered models is rather straightforward given the model details provided in Section 1. The BIC gives an asymptotic approximation to the BF by BICT(Mi)−BICT(Mj)≈−2 log⁡BFi,j. We will use this approximation for our sequential model comparisons. Figure 5a and b present the BICs for the periods from the 23rd to 29th October 2008 and from the 3rd to 9th April 2010, respectively. For the 2008 period, the IBM stock does not appear to favor the integer-based models and its behavior is captured best with the ordered t model. However, the opposite is the case for KO: both the Skellam and the ΔNB model outperform the ordered models convincingly. In the 2010 period, the IBM stock slightly favors the ΔNB model compared to the ordered t model. In this case, the Skellam model does not seem to be able to correctly capture the features of the data. For KO in the same period, the ordered t model provides the best fit to the data, with both the Skellam model and the ΔNB model not performing so well. Furthermore, we may conclude from the BIC results that the ordered t and ΔNB model tends to be favored when large jumps in volatility have occurred. Such large price changes may lead to a prolonged period of high volatility which suggests the need of the ΔNB model. These findings are consistent with the intuition that for time varying volatility models, the identification of parameters determining the tail behavior requires extreme or excessive observations in combination with low volatility. Figure 5. View largeDownload slide In-sample analysis: sequential BF approximations based on BIC, relative to the ordered normal model, for IBM (left) and KO (right) on two periods. (a) From 3rd to 9th October 2008; (b) from 23rd to 29th April 2010. Figure 5. View largeDownload slide In-sample analysis: sequential BF approximations based on BIC, relative to the ordered normal model, for IBM (left) and KO (right) on two periods. (a) From 3rd to 9th October 2008; (b) from 23rd to 29th April 2010. 3.4 Out-of-Sample Comparisons The performances of the dynamic Skellam and ΔNB models can also be compared in terms of predictive likelihoods. The one-step-ahead predictive likelihood for model M is p(yt+1|y1:t,M)= ∫∫p(yt+1|y1:t,xt+1,θ,M)p(xt+1,θ|y1:t,M)dxt+1dθ= ∫∫p(yt+1|y1:t,xt+1,θ,M)p(xt+1|θ,y1:t,M)p(θ|y1:t,M)dxt+1dθ. Generally, the h-step-ahead predictive likelihood can be decomposed to the sum of one-step-ahead predictive likelihoods through p(yt+1:t+h|y1:t,M)=∏i=1hp(yt+i|y1:t+i−1,M)=∏i=1h∫∫p(yt+i|y1:t+i−1,xt+i,θ,M) ×p(xt+i|θ,y1:t+i−1,M)p(θ|y1:t+i−1,M)dxt+idθ. These results suggest that we require the computation of p(θ|y1:t+i−1,m), for i=1,2,…, that is the posterior of the parameters using sequentially increasing data samples. It requires the MCMC procedure to be repeated as many times as we have number of out-of-sample observations. In our application, for each stock and each model, it implies several thousands of MCMC replications for a predictive analysis of a single out-of-sample day. This exercise is computationally not practical or even infeasible. However, we may be able to rely on the approximation p(yt+1:t+h|y1:t,M)≈∏i=1h∫∫p(yt+i|y1:t+i−1,xt+i,θ,M) ×p(xt+i|θ,y1:t+i−1,M)p(θ|y1:t,M)dxt+idθ. This approximation is based on the notion that, after observing a considerable amount of data, that is for t sufficiently large, the posterior distribution of the static parameters should not change much and hence p(θ|y1:t+i−1,M)≈p(θ|y1:t,M). Based on this approximation, we carry out the following exercise. From our MCMC output, we obtain a sample of posterior distributions based on the in-sample observations. For each parameter draw from the posterior distribution, we estimate the likelihood using the particle filter for the out-of-sample period. Figure 6a and b present the out-of-sample sequential predictive BF approximations for the 10th October 2008 and 30th April 2010, respectively. Similarly as in the in-sample 2008 period, also on the 10th October 2008, the ordered t model performs best for the IBM stock, while both integer distribution models perform well for the KO stock. On the 30th April 2010, the ΔNB model performs the best for IBM while the Skellam model is being the worst. It suggests that the IBM stock requires a heavy-tailed distribution, as in the ΔNB and ordered t model. For KO in the same period, both the dynamic Skellam and ΔNB models beat the ordered models. Here the Skellam model outperforms slightly the ΔNB model during most of the trading day. Figure 6. View largeDownload slide Out-of-sample analysis: sequential predictive BF approximations, relative to the ordered normal model, for IBM (left) and KO (right) for the two periods. (a) On 10th October 2008; (b) On 30th April 2010. Figure 6. View largeDownload slide Out-of-sample analysis: sequential predictive BF approximations, relative to the ordered normal model, for IBM (left) and KO (right) for the two periods. (a) On 10th October 2008; (b) On 30th April 2010. Figure A.1. View largeDownload slide (a) The Skellam distribution with different parameters; (b) the ΔNB distribution with different parameters. Figure A.1. View largeDownload slide (a) The Skellam distribution with different parameters; (b) the ΔNB distribution with different parameters. 4 Conclusions We have reviewed and introduced dynamic models for high-frequency integer price changes. In particular, we have introduced the dynamic negative binomial difference model, referred to as the ΔNB model. We have developed a MCMC procedure (based on Gibbs sampling) for the Bayesian estimation of parameters in the dynamic Skellam and ΔNB models. Furthermore, we have demonstrated our estimation procedures for simulated data and for real data consisting of tick by tick transaction bid prices from NYSE stocks. We have compared the in-sample and out-of-sample performances of two classes of models, the ordered probit models and models based on integer distributions. Our modeling framework opens several directions for future research. For instance, the ΔNB model has been defined by a time-varying specification for the λ parameter in the ΔNB distribution, while the second parameter ν is kept constant over time. It can be of interest to investigate the impact of reversing this specification by considering a dynamic model for ν. It would be also of interest to allow for a ΔNB distribution with a non-zero mean. This would allow to base our analysis on the noncentered parametrization of our state space model and hence to adopt the ancillarity-sufficiency interweaving strategy (ASIS) of Kastner and Frühwirth-Schnatter (2014) for the improvement of mixing the proposed sampler. This direction of improvement of the efficiency of our proposed sampler can also be considered for future research. Footnotes * I.B. thanks Dutch National Science Foundation (NWO) for financial support. S.J.K. acknowledges support from CREATES, Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. We are indebted to Lennart Hoogerheide, Rutger Lit, André Lucas, Mike Pitt and Lukasz Romaszko for their help and support in this research project, and to Rudolf Frühwirth for providing the C code for auxiliary mixture sampling. We further like to thank the Editor, Associate Editor, and two Referees for their constructive comments. APPENDIX A: NEGATIVE BINOMIAL DISTRIBUTION The probability mass function (pmf) of the NB distribution is given by f(k;ν,p)=Γ(ν+k)Γ(ν)Γ(k+1)pk(1−p)ν. Its different parametrization can be obtained by denoting its mean by λ=νp1−p, which implies p=λλ+ν. We refer to this parametrization as NB(λ,ν). Then the pmf takes the following form f(k;λ,ν)=Γ(ν+k)Γ(ν)Γ(k+1)(λν+λ)k(νν+λ)ν and the variance is equal to λ(1+λν). Then the dispersion index, or the variance-to-mean ratio, is equal to (1+λν)>1, which shows that the NB distribution is overdispersed. This means that there are more intervals with low counts and more intervals with high counts, compared to the Poisson distribution. The latter is nested in the NB distribution as the limiting case when ν→∞. Alternatively, the NB distribution can be written as a Poisson-Gamma mixture. Let Y follow a Poisson distribution with the mean λU, where the heterogeneity parameter U has the unit mean and is Gamma-distributed, U∼Ga(ν,ν), with the density of Ga(α,β) given by f(x;α,β)=βαxα−1e−βxΓ(α). Then f(k;λ,ν)=∫0∞fPoisson(k;λu)fGamma(u;ν,ν)du=∫0∞(λu)ke−λuk!ννuνe−νuΓ(ν)du=λkννk!Γ(ν)∫0∞e−(λ+ν)uuk+ν−1du. Substituting (λ+ν)u=s, we get =λkννk!Γ(ν)∫0∞e−ssk+ν−1(λ+ν)k+ν−11(λ+ν)ds=λkννk!Γ(ν)1(λ+ν)k+ν∫0∞e−ssk+ν−1ds=λkννk!Γ(ν)Γ(k+ν)(λ+ν)k+ν=Γ(ν+k)Γ(ν)Γ(k+1)(λν+λ)k(νν+λ)ν, which shows that Y∼NB(λ,ν). APPENDIX B: DAILY VOLATILITY PATTERNS We want to approximate the function f:ℝ→ℝ with a continuous function which is built up from piecewise polynomials of degree at most three. Let the set Δ={k0,…,kK} denote the set of of knots kj j=0,…,K. Δ is sometimes called a mesh on [k0,kK]. Let y={y0,…,yK} where yj=f(xj). We denote a cubic spline on Δ interpolating to y as SΔ(x).SΔ(x) has to satisfy the following conditions. SΔ(x)∈C2 [k0,kK]. SΔ(x) coincides with a polynomial of degree at most three on the intervals [kj−1,kj] for j=0,…,K. SΔ(x)=yj for j=0,…,K. Using condition 2 we know that the SΔ′′(x) is a linear function on [kj−1,kj], which means that we can write SΔ′′(x) as SΔ′′(x)=[kj−xhj]Mj−1+[x−kj−1hj]Mj for x∈[kj−1,kj], where Mj=SΔ′′(kj) and hj=kj−kj−1. Integrating SΔ′′(x) and solving the integrating for the two integrating constants (using SΔ(x)=yj) Poirier (1973) shows that we get SΔ′(x)=[hj6−(kj−x)22hj]Mj−1+[(x−kj−1)22hj−hj6]Mj+yj−yj−1hj, for x∈[kj−1,kj] and SΔ(x)=kj−x6hj[(kj−x)2−hj2]Mj−1+x−kj−16hj[(x−kj−1)2−hj2]Mj +[kj−xhj]yj−1+[x−kj−1hj]yj, for x∈[kj−1,kj] (A1) In the above expressions only Mj for j=0,…,K are unknown. We can use the continuity restrictions which enforce continuity at the knots by requiring that the derivatives are equal at the knots kj for j=1,…,K−1, so that SΔ′(kj−)=hjMj−1/6+hjMj/3+(yj−yj−1)/hj,SΔ′(kj+)=−hj+1Mj/3−hj+1Mj+1/6+(yj+1−yj)/hj+1, which yields K – 1 conditions (1−λj)Mj−1+2Mj+λjMj+1=6yj−1hj(hj+hj+1)−6yjhjhj+1+6yj+1hj+1(hj+hj+1), where λj=hj+1hj+hj+1. Using two end conditions we have K + 1 unknowns and K + 1 equations and we can solve the linear equation system for Mj. Using the M0=π0M1 and MK=πKMK−1 end conditions we can write Λ=[2−2π00…0001−λ12λ1…00001−λ22…000⋮⋮⋮⋮⋮⋮000…2λK−20000…1−λK−12λK−1000…0−2πK2],Θ=[000…0006h1(h1+h2)−6h1h26h2(h1+h2)…00006h2(h2+h3)−6h2h3…000⋮⋮⋮⋮⋮⋮000…−6hK−2hK−16hK−1(hK−2+hK−1)0000…6hK−1(hK−1+hK)−6hK−1hK6hK(hK−1+hK)000…000] m=(M0,M1,…,MK−1,MK)′ and y=(y0,y1,…,yK−1,yK)′. The linear equation system is given by Λm=θy (A2) and the solution is m=Λ−1Θy (A3) Using this result and Equation (A1) we can calculate SΔ(ξ)=[SΔ(ξ1),SΔ(ξ2),…,SΔ(ξN−1),SΔ(ξN)]′. Let us denote by P the N×(K+1) matrix in which the ith row i=1,…,N1, given that kj−1≤ξ≤kj, can be written as pi︸1×(K+1)=[0,…,0︸first j−2,kj−ξi6hj[(kj−ξi)2−hj2],ξi−kj−16hj[(ξi−kj−1)2−hj2],0,…,0︸last K+1−j]. Moreover, denote by Q the N×(K+1) matrix in which the ith row i=1,…,N, given that kj−1≤ξ≤kj, can be written as qi︸1×(K+1)=[0,…,0︸first j−2,kj−ξihj,ξi−kj−1hj,0,…,0︸last K+1−j]. Now using (A1) and (A3), we get SΔ(ξ)=Pm+Qy=PΛ−1Θy+Qy=(PΛ−1Θ+Q)y=W︸N×(K+1)y︸(K+1)×1, where W=PΛ−1Θ+Q.  In practical situations, we might only know the knots but we do not know the spline values, which we observe with errors. In this case we have s=SΔ(ξ)+ɛ=Wy+ɛ, where s=(s1,s2,…,sN−1,sN)′ and ɛ=(ɛ1,ɛ2,…,ɛN−1,ɛN)′, with E(ɛ)=0 and E(ɛɛ′)=σ2I. Notice that after fixing the knots we only have to estimate the value of the spline at the knots and this fully determines the shape of the spline. We can do this by a simple OLS regression y^=(W⊤W)−1W⊤s. For the identification reasons we want ∑j:unique ξjSΔ(ξj)=∑j:unique ξjwjy=w*y=0, where wi is the ith row of W and w*=∑j:unique ξjwj. To this end, a restriction can be enforced on one of the elements of y. This ensures that E(st)=0, so that st and μh can be identified. If we drop yK we can substitute yK=−∑i=0K−1(wi*/wK*)yi, where wi* is the ith element of w*. Substituting this into ∑j:unique ξjSΔ(ξj)=∑j:unique ξjwjy=∑j:unique ξj∑i=0Kwjiyi=∑j:unique ξj∑i=0K−1wjiyi−wjK∑i=0K−1(wi*/wK*)yi=∑j:unique ξj∑i=0K−1(wji−wjKwi*/wK*)yi=∑i=0K−1∑j:unique ξj(wji−wjKwi*/wK*)yi=∑i=0K−1(wi*−wK*wi*/wK*)yi=∑i=0K−1(wi*−wi*)yi=0. Lets partition W in the following way W︸N×(K+1)=[W−K︸N×K:WK︸N×1], where W−K is equal to the first K columns of W and WK is the Kth column of W. Moreover w*︸1×(K+1)=[w−K*︸1×K:wK*︸1×1]. Finally, we can define W˜︸N×K=W−K︸N×K−1wK*WK︸N×1w−K*︸1×K, so that we obtain s=SΔ(ξ)+ɛ=W˜︸N×Ky˜︸K×1+ɛ. APPENDIX C: MCMC ESTIMATION OF THE ORDERED T-SV MODEL In this section, the t element vectors (v1,…,vt) containing time-dependent variables for all time periods, are denoted by v, the variable without a subscript. C.1. Generating the Parameters x,μh, ϕ, ση2 (Step 2) Notice that conditional on C={ct,t=1,…,T}, rt* we have 2⁡log⁡rt*=μ+st+xt+log⁡λt+mct+ɛt, ɛt∼N(0,vct2) which implies the following state space form y˜t =[1wt1]︸1×(K+2)[μβxt]︸(K+2)×1+ɛt, ɛt∼N(0,vct2), (A4) αt+1 =[μβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μβxt]︸(K+2)×1+[00ηt]︸(K+2)×1,ηt∼N(0,ση2), (A5) where [μβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000ση2/(1−ϕ2)]︸(K+2)×(K+2)) (A6) and y˜t=2⁡log⁡rt*−log⁡λt−mrt1. (A7) First we draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Notice that p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y)=p(ϕ,ση2|y˜t,C,N)∝p(y˜t|ϕ,ση2,C,N)p(ϕ)p(ση2), where y˜t is defined above in Equation (A11). The likelihood can be evaluated using standard Kalman filtering and prediction error decomposition (see, e.g., Durbin and Koopman, 2012) taking advantage of fact that conditional on the auxiliary variables we have a linear Gaussian state space form given by Equations (A8), (A9), (A10), and (A11). We draw from the posterior using an adaptive random walk Metropolis-Hastings step proposed by Roberts and Rosenthal (2009). Conditional on ϕ,ση2 we draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y), which is done simulating from the smoothed state density of the linear Gaussian state space model given by (A4), (A5), (A6), and (A7). We use the simulation smoother proposed by Durbin and Koopman (2002). C.2. Generating γ (Step 3) The conditional distribution for γ simplifies as follows p(γ|ν,μ,ϕ,ση2,x,s,C,y,rt*)=p(γ|ν,h,y), because given ν, h and y, γ does not depend on C,ϕ,ση2,rt*. We further have that p(γ|ν,h,y)∝p(y|γ,ν,h)p(γ|ν,h)=p(y|γ,ν,h)p(γ), as γ is independent from ν and h. Finally, p(y|γ,ν,h)p(γ)=∏t=1T{γ1{yt=0}+(1−γ) ×[T(yt+0.5exp⁡(ht/2),ν)−T(yt−0.5exp⁡(ht/2),ν)]}γa−1(1−γ)b−1B(a,b)∝∏t=1T{γa(1−γ)b−11{yt=0}+γa−1(1−γ)b × [T(yt+0.5exp⁡(ht/2),ν)−T(yt−0.5exp⁡(ht/2),ν)]}, where T(·,ν) is the Student’s t density function with mean zero scale one and degree of freedom parameter ν. We sample from this posterior using an adaptive random walk Metropolis-Hastings sampler by Roberts and Rosenthal (2009). C.3. Generating r* First, notice that the conditional distribution for r* can be simplified as follows p(r*|γ,ν,μ,ϕ,ση2,x,s,C,λ,y)=p(r*|γ,h,λ,y)=∏t=1Tp(rt*|γ,ht,λt,yt). Then, by the law of total probability we have p(rt*|γ,ν,ht,yt)=p(rt*|γ,ν,ht,λt,yt,zero)p(zero|γ,ht,λt,yt) +p(rt*|γ,ht,λt,yt,non-zero)p(non-zero|γ,ht,λt,yt), where p(rt*|γ,ht,λt,yt,zero) is a normal density with zero mean and variance λtexp⁡(ht) truncated to the interval [yt−0.5,yt+0.5]. If yt = 0, then p(zero|γ,ht,yt=0)=p(zero,γ,ht,yt=0)p(γ,ht,yt=0)=p(yt=0|zero,γ,ht)p(zero|γ,ht)p(yt=0|γ,ht)=1×γγ+(1−γ)[Φ(0.5λtexp⁡(ht/2))−Φ(−0.5λtexp⁡(ht/2),)]. If yt=k≠0, then p(zero|γ,ht,yt=k)=p(zero,γ,ht,yt=k)p(γ,ht,yt=k)=p(yt=k|zero,γ,ht)p(zero|γ,ht)p(yt=k|γ,ht)=0. Moreover p(non-zero|γ,ht,yt)=1−p(zero|γ,ht,yt). C.4. Generating ν and λ To sample ν and λ we use the method by Stroud and Johannes (2014). We can decompose the posterior density as p(ν,λ|γ,ϕ,ση2,h,C,y,r*)=p(ν,λ|h,r*)=p(λ|ν,h,r*)p(ν|h,r*). Note that we have to following mixture representation rt*=exp⁡(ht/2)λtɛt ɛt∼N(0,1) λt∼IG(ν/2,ν/2), which implies p(ν|h,r*)∝∏t=1Tp(rt*exp⁡(ht/2)|ht,ν)p(ν), where p(rt*exp⁡(ht/2)|ht,ν)∼tν(0,1). Combined with the prior ν∼DU(2,128), this leads to the following posterior p(ν|h,r*)∝∏t=1Tp(rt*exp⁡(ht/2)|ht,ν)=∏t=1Tgν*(rt*exp⁡(ht/2))=∏t=1Tgν*(wt), where wt=rt*/exp⁡(ht/2). To avoid the computationally intense evaluation of these probabilities, we can use a Metropolis-Hastings update. We can draw the proposal ν* from the neighborhood of the current value ν(i) using a discrete uniform distribution ν*∼DU(ν(i)−δ,ν(i)+δ) and accept with probability min⁡{1,∏t=1Tgν*(yt)∏t=1Tgν(i)(yt)}, where δ is chosen such that the acceptance rate is reasonable. Finally, we have p(λ|ν,h,r*)=∏t=1Tp(λt|ν,ht,rt*)∝∏t=1Tp(rt*|λt,ν,ht)p(λt|ν), where p(rt*exp⁡(ht/2)|λt,ν,ht)∼N(0,λt),p(λt|ν)∼IG(ν/2,ν/2),p(λt|ν,ht,rt*)∼IG(ν+12,ν+(rt*exp⁡(ht/2))22). APPENDIX D: MCMC ESTIMATION OF THE DYNAMIC ΔNB MODEL In this section, the t element vectors (v1,…,vt) containing time-dependent variables for all time periods, are denoted by v, the variable without a subscript. We discuss the algorithmic details for the ΔNB model and we note that these also apply to the dynamic Skellam model, except Step 4a (generating ν). D.1. Generating x,s,μh, ϕ, ση2 (Step 2) Notice that conditional on C={ctj,t=1,…,T,j=1,…,min⁡(Nt+1,2)}, τ, N, γ and s we have −log⁡τt1=log(zt1+zt2)+μh+st+xt+mct1(1)+ɛt1, ɛt1∼N(0,vct12(1)) and −log⁡τt2=log(zt1+zt2)+μh+st+xt+mct2(Nt)+ɛt2, ɛt2∼N(0,vct22(Nt)), which implies the following state space form y˜t︸min⁡(Nt+1,2)×1=[1wt11wt1]︸min⁡(Nt+1,2)×(K+2)[μhβxt]︸(K+2)×1+ɛt︸min⁡(Nt+1,2)×1, ɛt∼N(0,Ht), (A8) αt+1=[μhβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μhβxt]︸(K+2)×1+[00ηt]︸(K+2)×1, (A9) where ηt∼N(0,ση2) and [μhβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000σeta2/(1−ϕ2)]︸(K+2)×(K+2)), (A10) with Ht=diag(vct12(1),vct,22(Nt)) and y˜t︸min⁡(Nt+1,2)×1=(−log⁡τt1−mrt1(1)−log(zt1+zt2)−log⁡τt2−mrt2(Nt)−log(zt1+zt2)). (A11) First we draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Notice that p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y)=p(ϕ,ση2|y˜t,C,N)∝p(y˜t|ϕ,ση2,C,N)p(ϕ)p(ση2), (A12) where y˜t is defined above in Equation (A11). The likelihood can be evaluated using the standard Kalman filter and prediction error decomposition (see, e.g., Durbin and Koopman, 2012) taking advantage of fact that conditional on the auxiliary variables we have a linear Gaussian state space form given by Equations (A8)–(A11). We draw from the posterior using an adaptive random walk Metropolis-Hastings step proposed by Roberts and Rosenthal (2009). Conditional on ϕ,ση2 we draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y), which is done simulating from the smoothed state density of the linear Gaussian state space model given by (A8), (A9), (A10), and (A11). We use the simulation smoother proposed by Durbin and Koopman (2002). D.2. Generating γ (Step 3) Notice that we can simplify p(γ|ν,μh,ϕ,ση2,x,C,s,τ,N,z1,z2,y)=p(γ|ν,μh,s,x,y) (A13) because given ν, λ and y, the variables C,τ,N,z1,z2 are redundant. We can then decompose p(γ|ν,μh,s,x,y)∝p(y|γ,ν,μh,s,x)p(γ|ν,μh,s,x)=p(y|γ,ν,μh,s,x)p(γ) (A14) as γ is independent from ν and λt=exp⁡(μh+st+xt). Plugging in the formula for the likelihood and for the prior for γ yields p(y|γ,ν,μh,x)p(γ)=∏t=1T[γ1{yt=0}+(1−γ)(νλt+ν)2ν(λtλt+ν)|yt|Γ(ν+|yt|)Γ(ν)Γ(|yt|) ×F(ν+yt,ν,yt+1;(λtλt+ν)2)]γa−1(1−γ)b−1B(a,b)∝∏t=1T[γa(1−γ)b−11{yt=0}+γa−1(1−γ)b(νλt+ν)2ν(λtλt+ν)|yt| ×Γ(ν+|yt|)Γ(ν)Γ(|yt|)F(ν+yt,ν,yt+1;(λtλt+ν)2)]. We sample from this posterior using an adaptive random walk Metropolis-Hastings sampler. D.3. Generating C,τ,N,z1,z2,ν (Step 4) To start with, we decompose the joint posterior of C, τ, N, z1, z2 and ν into p(C,τ,N,z1,z2,ν|γ,μh,ϕ,ση2,s,x,y)=p(C|τ,N,z1,z2γ,p,μh,ϕ,ση2,s,x,y) ×p(τ|N,z1,z2γ,ν,μh,ϕ,ση2,s,x,y) ×p(N|z1,z2γ,ν,μh,ϕ,ση2,s,x,y) ×p(z1,z2|γ,ν,μh,ϕ,ση2,s,x,y) ×p(ν|γ,μh,ϕ,ση2,s,x,y). Generating ν (Step 4a) Note that p(ν|γ,μh,ϕ,ση2,s,x,y)=p(ν|γ,λ,y)∝p(ν,γ,λ,y)=p(y|γ,λ,ν)p(λ|γ,ν)p(γ|ν)p(ν)=p(y|γ,λ,ν)p(λ)p(γ)p(ν)∝p(y|γ,λ,ν)p(ν), where p(y|γ,λ,ν) is a product of zero inflated ΔNB probability mass functions. We draw ν using a discrete uniform prior ν∼DU(2,128) and a random walk proposal in the following fashion as suggested by Stroud and Johannes (2014) for degree of freedom parameter of a t density. We can write the posterior as a multinomial distribution p(ν|μh,x,z1,z2)∼M(π2*,…,π128*) with probabilities πν*∝∏t=1T[γI{yt=0}+(1−γ)fΔNB(yt;λt,ν)]=∏t=1Tgν(yt).  To avoid the computationally intense evaluation of these probabilities, we can use a Metropolis-Hastings update. We can draw the proposal ν* from the neighborhood of the current value ν(i) using a discrete uniform distribution ν*∼DU(ν(i)−δ,ν(i)+δ) and accept with the probability min⁡{1,∏t=1Tgν*(yt)∏t=1Tgν(i)(yt)}, where δ is chosen such that the acceptance rate is reasonable. Generating z1, z2 (Step 4b) Notice that given γ, μh, s, x and y the elements of the vectors z1 and z2 are independent over time, so that their posterior distribution factorizes as follows p(z1,z2|γ,ν,μh,ϕ,ση2,s,x,y)=∏t=1Tp(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt). Then we have for a single component p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt)∝p(zt1,zt2,γ,ν,μh,ϕ,ση2,st,xt,yt)=p(yt|zt1,zt2,γ,ν,μh,ϕ,ση2,st,xt) ×p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt), which we express as p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt)∝g(zt1,zt2)ννzt1νe−νzt1Γ(ν)ννzt2νe−νzt2Γ(ν), where g(zt1,zt2)=[γ1{yt=0}+(1−γ)exp[−λt(zt1+zt2)](zt1zt2)yt2I|yt|(2λtzt1zt2)], with λt=exp⁡(μh+st+xt). We can carry out an independent MH step by sampling z1t* and z2t* from Ga(λt,ν) with the acceptance probability equal to min⁡{g(z1t*,z2t*)g(zt1,zt2),1}. Generating N (Step 4c) As described in Section 2.3. Generating τ (Step 4d) Notice that p(τ|N,z1,z2,γ,ν,μh,ϕ,ση2,x,y)=p(τ|N,μh,z1,z2,s,x). Moreover p(τ|μh,z1,z2,s,x)=∏t=1Tp(τ1t,τ2t|Nt,μh,zt1,zt2,st,xt),=∏t=1Tp(τ1t|τ2t,Nt,μh,zt1,zt2,st,xt)p(τ2t|Nt,μh,zt1,zt2,st,xt), where we can sample from p(τ2t|Nt,μh,zt1,zt2,st,xt) using the fact that conditionally on Nt the arrival time τ2t of the Ntth jump is the maximum of Nt uniform random variables and it has a Beta(Nt,1) distribution. The arrival time of the (Nt+1)th jump after 1 is exponentially distributed with intensity λt(zt1+zt2), hence τ1t=1+ξt−τ2t, ξt∼Exp(λt(zt1+zt2)). Generating C (Step 4e) Notice that p(C|τ,N,z1,z2,γ,ν,μh,ϕ,ση2,s,x,y)=p(C|τ,N,z1,z2,ν,s,x). Moreover, p(C|τ,N,z1,z2,ν,s,x)=∏t=1T∏j=1min⁡(Nt+1,2)p(rtj|τt,Nt,μh,zt1,zt2,st,xt). We can than sample ct1 from the following discrete distribution p(ct1|τt,Nt,μh,zt1,zt2,st,xt)∝wk(1)ϕ(−log⁡τ1t−log⁡[λt(zt1+zt2)],mk(1),vk2(1)), where k=1,…,C(1). If Nt>0 then draw rt2 from the discrete distribution p(ct2|τt,Nt,μh,zt1,zt2,st,xt)∝wk(Nt)ϕ(−log⁡τ1t−log[λt(zt1+zt2)],mk(Nt),vk2(Nt)), for k=1,…,C(Nt). APPENDIX E: DATA CLEANING AND TRADE DURATIONS Tables E.1 and E.2 in Appendix E present the details of the data cleaning and aggregation procedure. Figure E.1 in Appendix E presents the time series of the durations between the subsequent trades and their histograms (on the log⁡10 scale for the frequencies), which are based on the cleaned data. Figure E.1. View largeDownload slide Durations between trades and their log⁡10 frequencies for IBM and KO, for the 2008 (top rows) and 2010 (bottom rows) samples, respectively. (a) IMB; (b) KO. Figure E.1. View largeDownload slide Durations between trades and their log⁡10 frequencies for IBM and KO, for the 2008 (top rows) and 2010 (bottom rows) samples, respectively. (a) IMB; (b) KO. APPENDIX F: INTRADAY FEATURES, INCLUDING OVERNIGHT EFFECTS In Figure F.1 in Appendix F, we present the estimated decompositions of the log volatility ht=μh+st+xt where we compare the component and signal estimates based on the model parameters that are estimated using all five days jointly and those from the model parameters that are estimated for each day separately. The motivation of this comparison is to verify whether the overnight effect, together with other intraday features, can be considered to be the same for each trading day or whether such features change from day to day. For our analysis of KO tick by tick transaction bid prices, we conclude that some features (such as intraday persistence) can be different from day to day but that the overall effects, including overnight effects, appear to be similar. Figure F.1. View largeDownload slide Volatility decomposition ht=μh+st+xt KO tick bid price returns for the 2008 data: ΔNB model parameters estimated based on the full sample (top two panels) and for each day separately (bottom two panels). Figure F.1. View largeDownload slide Volatility decomposition ht=μh+st+xt KO tick bid price returns for the 2008 data: ΔNB model parameters estimated based on the full sample (top two panels) and for each day separately (bottom two panels). References Aït-Sahalia Y. , Jacod J. , Li J. . 2012 Testing for Jumps in Noisy High Frequency Data . Journal of Econometrics 168 : 207 – 222 . Google Scholar CrossRef Search ADS Aït-Sahalia Y. , Mykland P. A. , Zhang L. . 2011 Ultra High Frequency Volatility Estimation with Dependent Microstructure Noise . Journal of Econometrics 160 : 160 – 175 . Google Scholar CrossRef Search ADS Alzaid A. , Omair M. A. . 2010 . On the Poisson Difference Distribution Inference and Applications . Bulletin of the Malaysian Mathematical Science Society 33 : 17 – 45 . Andersen T. G. . 2000 . Some Reflections on Analysis of High-Frequency Data . Journal of Business Economic Statistics 18 : 146 – 153 . Barndorff-Nielsen O. E. , Hansen P. R. , Lunde A. , Shephard N. . 2008 . Realized Kernels in Practice: Trades and Quotes . Econometrics Journal 4 : 1 – 32 . Barndorff-Nielsen O. E. , Pollard D. G. , Shephard N. . 2012 . Integer-valued Lévy Processes and Low Latency Financial Econometrics . Quantitative Finance 12 : 587 – 605 . Google Scholar CrossRef Search ADS Bos C. . 2008 . “ Model-based Estimation of High Frequency Jump Diffusions with Microstructure Noise and Stochastic Volatility .” TI Discussion paper . Google Scholar CrossRef Search ADS Boudt K. , Cornelissen J. , Payseur S. . 2012 . Highfrequency: Toolkit for the Analysis of Highfrequency Financial Data in R . Brownlees C. , Gallo G. . 2006 . Financial Econometrics Analysis at Ultra-High Frequency: Data Handling Concerns . Computational Statistics and Data Analysis 51 : 2232 – 2245 . Google Scholar CrossRef Search ADS Chakravarty S. , Wood R. A. , Ness R. A. V. . 2004 . Decimals and Liquidity: A Study of the NYSE . Journal of Financial Research 27 : 75 – 94 . Google Scholar CrossRef Search ADS Chib S. , Nardari F. , Shephard N. . 2002 . Markov Chain Monte Carlo for Stochastic Volatility Models . Journal of Econometrics 108 : 281 – 316 . Google Scholar CrossRef Search ADS Chordia T. , Subrahmanyam A. . 1995 . Market Making, the Tick Size and Payment-for-Order-Flow: Theory and Evidence . Journal of Business 68 : 543 – 576 . Google Scholar CrossRef Search ADS Cordella T. , Foucault T. . 1999 . Minimum Price Variations, Time Priority and Quote Dynamics . Journal of Financial Intermediation 8 : 141 – 173 . Google Scholar CrossRef Search ADS Czado C. , Haug S. . 2010 . An ACD-ECOGARCH(1,1) model . Journal of Financial Econometrics 8 : 335 – 344 . Google Scholar CrossRef Search ADS Dahlhaus R. , Neddermeyer J. C. . 2014 . Online Spot Volatility-Estimation and Decompostion with Nonlinear Market Microstructure Noise Models . Journal of Financial Econometrics 12 : 174 – 212 . Google Scholar CrossRef Search ADS Dayri K. , Rosenbaum M. . 2013 . “ Large Tick Assets: Implicit Spread and Optimal Tick Size .” Working paper . Google Scholar CrossRef Search ADS Durbin J. , Koopman S. J. . 2002 . A Simple and Efficient Simulation Smoother for State Space Time Series Analysis . Biometrika 89 : 603 – 616 . Google Scholar CrossRef Search ADS Durbin J. , Koopman S. J. . 2012 . Time Series Analysis by State Space Methods ( 2nd edn ). Oxford : Oxford University Press . Google Scholar CrossRef Search ADS Eisler Z. , Bouchaud J. P. , Kockelkoren J. . 2012 . The Price Impact of Order Book Events: Market Orders, Limit Orders and Cancellations . Quantitative Finance 12 : 1395 – 1419 . Google Scholar CrossRef Search ADS Engle R. F. . 2000 . The Econometrics of Ultra-High-Frequency Data . Econometrica 68 : 1 – 22 . Google Scholar CrossRef Search ADS Frühwirth-Schnatter S. , Frühwirth R. , Held L. , Rue H. . 2009 . Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data . Statistics and Computing 19 : 479 – 492 . Google Scholar CrossRef Search ADS Frühwirth-Schnatter S. , Wagner H. . 2006 . Auxiliary Mixture Sampling for Parameter-Driven Models of Time Series of Small Counts with Applications to State Space Modeling . Biometrika 93 : 827 – 841 . Google Scholar CrossRef Search ADS Griffin J. , Oomen R. . 2008 . Sampling Returns for Realized Variance Calculations: Tick Time or Transaction Time? Econometric Reviews 27 : 230 – 253 . Google Scholar CrossRef Search ADS Johnson N. L. , Kemp A. W. , Kotz S. . 2005 . Univariate Discrete Distributions ( 3rd edn ). New Jersey : John Wiley and Sons . Google Scholar CrossRef Search ADS Kastner G. , Frühwirth-Schnatter S. . 2014 . Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models . Computational Statistics & Data Analysis 76 : 408 – 423 . Google Scholar CrossRef Search ADS Kim S. , Shephard N. , Chib S. . 1998 . Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models . Review of Economic Studies 65 : 361 – 393 . Google Scholar CrossRef Search ADS Koopman S. J. , Lit R. , Lucas A. . 2017 . Intraday Stochastic Volatility in Discrete Price Changes: The Dynamic Skellam Model . Journal of the American Statistical Association 112 : 1490 – 1503 . Google Scholar CrossRef Search ADS Müller G. , Czado C. . 2009 . Stochastic Volatility Models for Ordinal-Valued Time Series with Application to Finance . Statistical Modelling 9 : 69 – 95 . Google Scholar CrossRef Search ADS O’Hara M. , Saar G. , Zhong Z. . 2014 . “ Relative Tick Size and the Trading Environment .” Working paper . Google Scholar CrossRef Search ADS Omori Y. , Chib S. , Shephard N. , Nakajima J. . 2007 . Stochastic Volatility with Leverage: Fast Likelihood Inference . Journal of Econometrics 140 : 425 – 449 . Google Scholar CrossRef Search ADS Poirier D. J. . 1973 . Piecewise Regression Using Cubic Splines . Journal of the American Statistical Association 68 : 515 – 524 . Roberts G. O. , Rosenthal J. S. . 2009 . Examples of Adaptive MCMC . Journal of Computational and Graphical Statistics 18 : 349 – 367 . Google Scholar CrossRef Search ADS Ronen T. , Weaver D. G. . 2001 . “Teenies” Anyone? Journal of Financial Markets 4 : 231 – 260 . Google Scholar CrossRef Search ADS Rydberg T. H. , Shephard N. . 2003 . Dynamics of Trade-by-Trade Price Movements: Decomposition and Models . Journal of Financial Econometrics 1 : 2 – 25 . Google Scholar CrossRef Search ADS SEC 2012. Report to Congress on Decimalization. US Securities and Exchange Commission report. Skellam J. G. . 1946 . The Frequency Distribution of the Difference Between Two Poisson Variates Belonging to Different Populations . Journal of the Royal Statistical Society 109 : 296 . Google Scholar CrossRef Search ADS Stefanos D . 2015 . “ Bayesian Inference for Ordinal-Response State Space Mixed Models with Stochastic Volatility .” Working paper . Stroud J. R. , Johannes M. S. . 2014 . Bayesian Modeling and Forecasting of 24-hour High-Frequency Volatility . Journal of the American Statistical Association 109 : 1368 – 1384 . Google Scholar CrossRef Search ADS Weinberg J. , Brown L. D. , Jonathan R. S. . 2007 . Bayesian Forecasting of an Imhomogenous Poisson Process with Application to Call Center Data . Journal of the American Statistical Association 102 : 1185 – 1199 . Google Scholar CrossRef Search ADS Ye M. , Yao C. . 2014 . “ Tick Size Constrains, Market Structure, and Liquidity .” Working paper . Available at: http://dx.doi.org/10.2139/ssrn.2359000. (accessed April 23, 2018). © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Financial Econometrics Oxford University Press

Bayesian Dynamic Modeling of High-Frequency Integer Price Changes

Loading next page...
 
/lp/ou_press/bayesian-dynamic-modeling-of-high-frequency-integer-price-changes-ixhKxfX0St
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
1479-8409
eISSN
1479-8417
D.O.I.
10.1093/jjfinec/nby010
Publisher site
See Article on Publisher Site

Abstract

Abstract We investigate high-frequency volatility models for analyzing intradaily tick by tick stock price changes using Bayesian estimation procedures. Our key interest is the extraction of intradaily volatility patterns from high-frequency integer price changes. We account for the discrete nature of the data via two different approaches: ordered probit models and discrete distributions. We allow for stochastic volatility by modeling the variance as a stochastic function of time, with intraday periodic patterns. We consider distributions with heavy tails to address occurrences of jumps in tick by tick discrete prices changes. In particular, we introduce a dynamic version of the negative binomial difference model with stochastic volatility. For each model, we develop a Markov chain Monte Carlo estimation method that takes advantage of auxiliary mixture representations to facilitate the numerical implementation. This new modeling framework is illustrated by means of tick by tick data for two stocks from the NYSE and for different periods. Different models are compared with each other based on the predictive likelihoods. We find evidence in favor of our preferred dynamic negative binomial difference model. High-frequency price changes observed at stock, futures, and commodity markets can typically not be regarded as continuous variables. In most electronic markets, the smallest possible price difference is set by the regulator or the trading platform. Here we develop and investigate dynamic models for high-frequency integer price changes that take the discreteness of prices into account. We explore the dynamic properties of integer time series observations. In particular, we are interested in the stochastic volatility dynamics of price changes within intradaily time intervals. This information can be used for the timely identification of changes in volatility and to obtain more accurate estimates of integrated volatility. In the current literature on high-frequency returns, price discreteness is typically neglected. However, the discreteness can have an impact on the distribution of price changes and on its volatility; see, for example, Security and Exchange Commission Report (2012), Chakravarty, Wood, and Ness (2004) and Ronen and Weaver (2001). Those assets that have prices with a spread of almost always equal to one tick are defined as large tick assets; see, Eisler, Bouchaud, and Kockelkoren (2012). These large tick assets are especially affected by the discreteness through the effect of different quoting strategies on these assets; see the discussions in Chordia and Subrahmanyam (1995) and Cordella and Foucault (1999). Also the effect of liquidity on large tick assets can be substantial as it is documented by O’Hara, Saar, and Zhong (2014) and Ye and Yao (2014). Many large tick assets exist on most U.S. exchange markets as the tick size is set to only one penny for stocks with a price greater than 1$ by the Security and Exchange Commission in Rule 612 of the Regulation National Market System. Hence almost all low price stocks are large tick assets. Moreover, many futures contracts are not decimalized for example, five-years U.S Treasury Note futures and EUR/USD futures fall into this category; see Dayri and Rosenbaum (2013). The relevance of discreteness and its effect on the analysis of price changes have been the motivation to develop models that account for integer prices. Similar to the case of continuous returns, we are primarily interested in the extraction of volatility from discrete price changes. We consider different dynamic model specifications for the high-frequency integer price changes with a focus on the modeling and extraction of stochastic volatility. We have encountered the studies of Müller and Czado (2009) and Stefanos (2015) who propose ordered probit models with time-varying variance specifications. We adopt their modeling approaches as a reference and also use their treatments of Bayesian estimation. The main novelty of our study is the specification of a new model for tick by tick price changes based on the discrete negative binomial distribution, which we shall refer to shortly as the ΔNB distribution. The properties of this distribution are explored in detail in our study. In particular, the heavy tail properties are emphasized. In our analysis, we adopt the ΔNB distribution conditional on a Gaussian latent state vector process, which represent the components of the stochastic volatility process. The volatility process accounts for the periodic pattern in high-frequency volatility due to intradaily seasonal effects such as the opening, lunch and closing hours. Our Bayesian modeling approach provides a flexible and unified framework to fit the observed tick by tick price changes. The ΔNB properties closely mimic the empirical stylized properties of trade by trade price changes. Hence, we will argue that the ΔNB model with stochastic volatility is an attractive alternative to models based on the Skellam distribution as suggested earlier; see Koopman, Lit, and Lucas (2017). We further decompose the unobserved log volatility into intradaily periodic and transient volatility components. We propose a Bayesian estimation procedure using standard Gibbs sampling methods. Our procedure is based on data augmentation and auxiliary mixtures; it extends the auxiliary mixture sampling procedure proposed by Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009). The procedures are implemented in a computationally efficient manner. In our empirical study, we consider two stocks from the NYSE, IBM, and Coca Cola, in a volatile week in October 2008 and a calmer week in April 2010. We compare the in-sample and out-of-sample fits of four different model specifications: ordered probit model based on the normal and Student’s t distributions, the Skellam distribution, and the ΔNB model. We compare the models in terms of Bayesian information criterion and predictive likelihoods. We find that the ΔNB model is favored for series with a relatively low tick size and in periods of more volatility. Our study is related to different strands in the econometric literature. Modeling discrete price changes with static Skellam and ΔNB distributions has been introduced by Alzaid and Omair (2010) and Barndorff-Nielsen, Pollard, and Shephard (2012). The dynamic specification of the Skellam distribution and its (non-Bayesian) statistical treatment have been explored by Koopman, Lit, and Lucas (2017). Furthermore, our study is related to Bayesian treatments of stochastic volatility models for continuous returns; see, for example, Chib, Nardari, and Shephard (2002), Kim, Shephard, and Chib (1998), Omori et al. (2007) and, more recently, Stroud and Johannes (2014). We extend this literature on trade by trade price changes by explicitly accounting for prices discreteness and heavy tails of the tick by tick return distribution. These extensions are explored in other contexts in Engle (2000), Czado and Haug (2010), Dahlhaus and Neddermeyer (2014), and Rydberg and Shephard (2003). The remainder is organized as follows. In Section 1, we review different dynamic model specifications for high-frequency integer price changes. We give most attention to the introduction of the dynamic ΔNB distribution. Section 2 develops a Bayesian estimation procedure based on Gibbs sampling, mainly for the ΔNB case of which the Skellam is a special case. In Section 3, we present the details of our empirical study including a description of our dataset, the data cleaning procedure, the presentation of our estimation results, and a discussion of our overall empirical findings. Section 4 concludes. 1 Dynamic Models for Discrete Price Changes We start this section with a discussion of dynamic volatility modeling for high-frequency data. Next, we review models for integer valued variables based on such a dynamic volatility specification. The first group of these models includes the ordered probit models based on normal and Student’s t distributions with stochastic volatility. The second group is captured by our novel dynamic negative binomial difference (ΔNB) model with stochastic volatility, which nests the dynamic Skellam model as a special case. We then present the main features of our newly introduced ΔNB model. 1.1 Dynamic Volatility Specification To capture the salient empirical features of high-frequency trade by trade price changes such as intradaily volatility clustering and persistent dynamics, one typically specifies the following dynamic models for the log volatility ht: ht=μh+xt,  xt+1=ϕxt+ηt,  ηt∼N(0,ση2), (1) for t=1,…,T, where t is a transaction counter (and not a time index), and where μh is the unconditional mean of the log volatility of the continuous returns, xt is a zero mean autoregressive process (AR) of order one, as denoted by AR(1), with ϕ as the persistence parameter for the log volatility process, and ση2 as the variance of the Gaussian disturbance term ηt. The mean μh represents the daily log volatility and the autoregressive process xt captures the changes in log volatility due to firm specific or market information experienced during the day. The latent variable xt is specified as an AR(1) process with zero mean; this restriction is enforced to allow for the identification of μh. However, there is also another stylized fact of intradaily price changes, which is the seasonality pattern in the volatility process. In particular, the volatility at the opening minutes of the trading day is high, during the lunch-hour it is lowest, and at the closing minutes it is increasing somewhat. We can account for the intradaily volatility pattern by further decomposing the log volatility ht into a deterministic daily seasonal pattern st and a stochastically time varying signal xt as ht=μh+st+xt,  E(st)=0, (2) where st is a normalized spline function with its unconditional expectation equal to zero. Such a specification allows us to smoothly interpolate different levels of volatility over the day. We enforce the zero mean constraint via a simple linear restriction. In our model we specify st as an intradaily cubic spline function, constructed from piecewise cubic polynomials. More precisely, we adopt the representation of Poirier (1973) where the periodic cubic spline st is based on K knots and the regression equation st=wtβ, (3) where wt is a 1×K weight vector and β=(β1,…,βK)′ is a K×1 vector containing the values of the spline function at the K knots. Further details about the spline and the Poirier representation are presented in Appendix B. In our empirical study, we adopt a spline function with K = 3 knots at {09:30, 12:30, 16:00}. The spline function also accounts for the overnight effect of high volatility at the opening of trading due to the cumulation of new information during the closure of the market. The difference of βK (market closure 16:00) and β1 (market opening 9:30) measures the overnight effect in log volatility ht. This treatment of overnight effects follows Engle (2000) and Müller and Czado (2009); alternatively we can introduce a daily random effect for the opening minutes of trading. For such alternative treatments of intradaily seasonality and overnight effects, we refer to Weinberg, Brown, and Jonathan (2007), Bos (2008) and Stroud and Johannes (2014). 1.2 Ordered Normal Stochastic Volatility Model In econometrics, the ordered probit model is typically used for the modeling of ordinal variables. But we can also adopt the ordered probit model in a natural way for the modeling of discrete price changes. In this approach, we effectively round a realization from a continuous distribution to its nearest integer. The continuous distribution can be subject to stochastic volatility; this extension is relatively straightforward. Let rt* be the continuous return which is rounded to rt=k when rt*∈[k−0.5,k+0.5). We observe rt and we regard rt* as a latent variable. By neglecting the discreteness of rt during the estimation procedure, we clearly would distort the measurement of the scaling or variation of rt*. Therefore we need to take account of the rounding of rt by specifying an ordered probit model with rounding thresholds [k−0.5,k+0.5). We assume that the underlying distribution for rt* is subject to stochastic volatility. We obtain the following specification rt=k, with probability Φ(k+0.5exp⁡(ht/2))−Φ(k−0.5exp⁡(ht/2)), for k∈ℤ, (4) for t=1,…,T, where ht is the logarithm of the time varying stochastic variance for the standard normal distribution with cumulative density function Φ(·) for the latent variable rt*. The dynamic model specification for ht is given by (1). Similar ordered probit model specifications with stochastic volatility are introduced by Müller and Czado (2009) and Stefanos (2015). In the specification of Müller and Czado (2009), the rounding barriers are not necessarily equally spaced but need to be estimated. This further flexibility may improve the model fit compared to our basic ordered SV model specification. On the other hand, the more flexible model can only be fitted to data accurately when sufficient observations for each possible discrete outcome are available. If you only have a few price jumps of more than, say, ±10 tick sizes, it may become a problem to handle such large outcomes. The basic model specification (1) and (4) accounts for the discreteness of the prices via the ordered probit specification and for intradaily volatility clustering via the possibly persistent dynamic process of xt. The model can be modified and extended in several ways. First, we can account for the market microstructure noise observed in tick by tick returns (see, e.g., Aït-Sahalia, Mykland, and Zhang, 2011 and Griffin and Oomen, 2008) by including an autoregressive moving average (ARMA) process in the specification of the mean of rt*. In a similar way, we can facilitate the incorporation of explanatory variables such as market imbalance which can also have predictive power. Second, to include predetermined announcement effects, we can include regression effects in the specification as proposed in Stroud and Johannes (2014). Third, it is possible that the unconditional mean μh of the volatility of price changes is time varying. For example, we may expect that for larger price stocks the volatility is higher and therefore the volatility is not properly scaled when the price has changed. The time-varying conditional mean of the volatility can be easily incorporated in the model, by specifying a random walk dynamics for μh, which would allow for smooth changes in the mean over time. For our current purposes below we can rely on the specification as given by Equation (2). 1.3 Ordered t Stochastic Volatility Model It is well documented in the financial econometrics literature that asset prices are subject to jumps; see, for example, Aït-Sahalia, Jacod, and Li (2012). However, the ordered normal specification, as we have introduced it above, does not deliver sufficiently heavy tails in its asset price distribution to accommodate the jumps that are typically observed in high-frequency returns. To account for the jumps more appropriately, we can consider a heavy tailed distribution instead of the normal distribution. In this way, we can assign probability mass to the infrequently large jumps in asset returns. An obvious choice for a heavy tailed distribution is the Student’s t-distribution, which would imply the following specification, rt=k, with probability T(k+0.5exp⁡(ht/2),ν)−T(k−0.5exp⁡(ht/2),ν), for k∈ℤ, (5) which effectively replaces model Equation (4), where T(·,ν) is the cumulative density function of the Student’s t-distribution with ν as the degrees of freedom parameter. The model specification for ht is provided by Equation (1) or (2). The parameter vector of this model specification is denoted by ψ and includes the degrees of freedom ν, the unconditional mean of log volatility μh, the volatility persistence coefficient ϕ, the variance of the log volatility disturbance ση2, and the unknown vector β in (3) with values of the spline at its knot positions. In case of the normal ordered probit specification, we can rely on the same parameters but without ν. The estimation procedure for these unknown parameters in the ordered probit models are carried out by standard Bayesian simulation methods for which the details are provided in Appendix C. 1.4 Dynamic ΔNB Model Positive integer variables can alternatively be modeled directly via discrete distributions such as the Poisson or the negative binomial, see Johnson, Kemp, and Kotz (2005). These well-known distributions only provide support to positive integers. When modeling price differences, we also need to allow for negative integers. For example, in this case, the Skellam distribution can be considered, see Skellam (1946). The specification of these distributions can be extended to stochastic volatility model straightforwardly. However, the analysis and estimation based on such models are more intricate. In this context, Alzaid and Omair (2010) advocates the use of the Skellam distribution based on the difference of two Poisson random variables. Barndorff-Nielsen, Pollard, and Shephard (2012) introduces the negative binomial difference (ΔNB) distribution, which has fatter tails compared to the Skellam distribution. Next we review the ΔNB distribution and its properties. We further introduce a dynamic version of the ΔNB model from which the dynamic Skellam model is a special case. The ΔNB distribution is implied by the construction of the difference of two negative binomial random variables which we denote by NB+ and NB− where the variables have number of failures λ+ and λ−, respectively, and failure rates ν+ and ν−, respectively. We denote the ΔNB variable as the random variable R and is simply defined as R=NB+−NB−. We then assume that R is distributed as R∼ΔNB(λ+,ν+,λ−,ν−), where ΔNB is the difference negative binomial distribution with probability mass function given by fΔNB(r;λ+,ν+,λ−,ν−)=m×{d+×F(ν++r,ν−,r+1; λ˜+λ˜−),if r≥0,d−×F(ν+,ν−−r,−r+1; λ˜+λ˜−),if r<0, where m=(ν˜+)ν+(ν˜−)ν−, d[s]=(λ˜[s])r(ν[s])r / r!, ν˜[s]=ν[s]λ[s]+ν[s],  λ˜[s]=λ[s]λ[s]+ν[s],  for [s]=+,−, and with the hypergeometric function F(a,b,c;z)=∑n=0∞(a)n(b)n(c)n znn!, where (x)n is the Pochhammer symbol of falling factorial and is defined as (x)n=x(x−1)(x−2)⋯(x−n+1)=Γ(x+1)Γ(x−n+1). More details about the ΔNB distribution, its probability mass function, and properties are provided by Barndorff-Nielsen, Pollard, and Shephard (2012). For example, the ΔNB distribution has the following first and second moments E(R)=λ+−λ−,  Var(R)=λ+(1+λ+ν+)+λ−(1+λ−ν−). The variables ν+, ν−, λ+, and λ− are treated typically as unknown coefficients. An important special case of the ΔNB distribution is its the zero mean and symmetric version, which is obtained when λ=λ+=λ− and ν=ν+=ν−. The probability mass function for the corresponding random variable R is given by f0(r;λ,ν)=(νλ+ν)2ν(λλ+ν)|r|Γ(ν+|r|)Γ(ν)Γ(|r|+1)F(ν+|r|,ν,|r|+1;(λλ+ν)2). In this case, we have obtained a zero mean random variable R with its variance given by Var(R)=2λ(1+λν). (6) We denote the distribution for the zero mean random variable R by ΔNB(λ,ν). This random variable R can alternatively be considered as being generated from a compound Poisson process, that is R=∑i=1NMi, where random variable N is generated by the Poisson distribution with intensity λ × (z1+z2),  z1, z2∼Ga(ν,ν), (7) with Ga(ν,ν) being the Gamma distribution, having its shape and scale both equal to ν, and where indicator variable Mi is generated as Mi={1,with probability P(Mi=1)=z1 / (z1+z2),−1,with probability P(Mi=−1)=z2 / (z1+z2). We will use this representation of a zero mean ΔNB variable for the developments below and in our empirical study. In the empirical analyses of this study, we adopt the zero inflated versions of the ΔNB distributions, because empirically we observe a clear overrepresentation of trade by trade price changes that are equal to zero. In the analysis of Rydberg and Shephard (2003), the zero changes are also treated explicitly since they decompose a discrete price change into activity, direction, and size. All zero changes are treated as in-activities. This decomposition model is particularly suited for the analysis of micro market structure. In our empirical modeling framework, we concentrate on the extraction of volatility in time series of discrete price changes. The number of zero price changes are especially high for the more liquid stocks. This is due to the available volumes on best bid prices that are relatively much higher. Hence the price impact of one trade is much lower as a result. The zero inflated version is accomplished by the specification of the random variable R0 as r0 ={r,with probability (1−γ)fΔNB(r;λ+,ν+,λ−,ν−),0,with probability γ+(1−γ)fΔNB(0;λ+,ν+,λ−,ν−), where fΔNB(r;λ+,ν+,λ−,ν−) is the probability mass function for r and 0<γ<1 is treated as a fixed and unknown coefficient. We denote the zero inflated ΔNB probability mass function with f0. The dynamic specifications of the ΔNB distributions can be obtained by letting the variables ν[s] and /orλ[s] be time-varying random variables, for [s]=+,−. We opt to have a time-varying λ[s] since it is more natural for an intensity than a degrees of freedom parameter to vary over time. The dynamic modeling of ν could also be interesting but we leave this for future research. We restrict our analysis to the zero inflated zero mean ΔNB distribution f0(rt;λt,ν) and we further assume that the degrees of freedom parameters for positive and negative price changes are the same, that is λt=λt+=λt−. Taking the above considerations into account, the dynamic ΔNB model can be specified as above but with λt=exp⁡(ht), where ht is specified as in Equation (1) or (2). We recognize that exp⁡(ht/2) represents the standard deviation of the latent variable rt* in our ordered probit model specification and here we consider exp⁡(ht). However, the variance of a ΔNB random variable relies on λ2 and is not symmetric in λ. Hence we do not model the standard deviation of a ΔNB random variable as such. Also, due to the asymmetry we require to specify λt as a positive variable (enforced by the exponential function). The main reason for a letting λt be time varying is to simplify the derivation of the sampling scheme. In particular, the current specification is convenient for deriving the auxiliary mixture sampling method, see Section 2 for details. 1.5 Dynamic Skellam Model The dynamic ΔNB model embeds the dynamic Skellam model as considered by Koopman, Lit, and Lucas (2017). It is obtained as the limiting case of letting ν go to infinity, that is ν→∞; for a derivation and further details, see Appendix A. 2 Bayesian Estimation Procedures Bayesian estimation procedures for the ordered normal and ordered Student’s t stochastic volatility models are discussed by Müller and Czado (2009) and Stefanos (2015); their procedures, with some details, are presented in Appendix C. Here we develop a Bayesian estimation procedure for observations yt, with t=1,…,T, coming from the dynamic ΔNB model. We provide the details of the procedure and discuss its computational implementation. Our reference dynamic ΔNB model is given by yt∼f0(yt;λt,ν),λt=exp⁡  ht,ht=μh+st+xt,st=wtβ,xt+1=ϕxt+ηt, where ηt∼N(0,ση2), for t=1,…,T. The details of the model are discussed in Section 1. The variable parameters ν, μh, β, ϕ and ση2 are static while xt is a latent variable that is modeled as a stationary autoregressive process. The intradaily seasonal effect st is represented by a Poirier spline; see Appendix B. Our proposed Bayesian estimation procedure aims to estimate all static variables jointly with the time-varying signal h1,…,hT for the dynamic ΔNB model. It is based on Gibbs sampling, data augmentation, and auxiliary mixture sampling methods which are developed by Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009). At each time point t, for t=1,…,T, we introduce a set of latent auxiliary variables to facilitate the derivation of conditional distributions. By introducing these auxiliary variables, we are able to specify the model as a linear state space model with non-Gaussian observation disturbances. Moreover using an auxiliary mixture sampling procedure, we can even obtain conditionally an approximating linear Gaussian state space model. In such a setting, we can exploit the highly efficient Kalman filtering and smoothing procedures for the sampling of many full paths for the dynamic latent variables. These ingredients are key for a computational feasible implementation of our estimation process. 2.1 Data Augmentation: Our Latent Auxiliary Variables We use the following auxiliary variables for the data augmentation. We define Nt as the sum of NB+ and NB−, the gamma mixing variables zt1 and zt2. Moreover conditional on zt1, zt2 and the intensity λt, we can interpret Nt as a Poisson process on [0,1] with intensity (zt1+zt2)λt based on the result in Equation (7). We can introduce the latent arrival time of the Nt-th jump of the Poisson process τt2 and the arrival time between the Nt-th and Nt+1-th jump of the process τt1 for every t=1,…,T. The interarrival time τt1 can be assumed to come from an exponential distribution with intensity (zt1+zt2)λt while the Ntth arrival time can be treated as the gamma distributed variable with density function Ga(Nt,(zt1+zt2)λt). We have τt1=ξt1(zt1+zt2)λt,ξt1∼Exp(1),τt2=ξt2(zt1+zt2)λt,ξt2∼Ga(Nt,1), where we can treat ξt1 and ξt2 as auxiliary variables. By taking the logarithm of the equations and substituting the definition of log⁡λt from Equation (2), we obtain −log⁡τt1=log(zt1+zt2)+μh+st+xt+ξt1*, ξt1*=−log⁡ξt1,−log⁡τt2=log(zt1+zt2)+μh+st+xt+ξt2*, ξt2*=−log⁡ξt2. These equations are linear in the state vector, which would facilitate the use of Kalman filtering. However, the error terms ξt1* and ξt2* are nonnormal. We can adopt solutions as in Frühwirth-Schnatter and Wagner (2006) and Frühwirth-Schnatter et al. (2009) where the exponential and the negative log-gamma distributions are approximated by normal mixture distributions. In particular, we can specify the approximations as fξ*(x;Nt)≈∑i=1C(Nt)ωi(Nt)ϕ(x,mi(Nt),vi(Nt)), where C(Nt) is the number of mixture components at time t, for t=1,…,T, ωi(Nt) is the weight, and ϕ(x,m,v) is the normal density for variable x with mean m and variance v. These approximations remain to depend on Nt because the log gamma distribution is not canonical and it has different shapes for different values of Nt. 2.2 Mixture Indicators for Obtaining Conditional Linear Model Conditionally on N, z1, z2, τ1, τ2 and C={ctj,t=1,…,T,j=1,…,min⁡(Nt+1,2)} we can write the following state space form y˜t︸min⁡(Nt+1,2)×1=[1wt11wt1]︸min⁡(Nt+1,2)×(K+2)[μhβxt]︸(K+2)×1+ɛt︸min⁡(Nt+1,2)×1, ɛt∼N(0,Ht),αt+1=[μhβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μhβxt]︸(K+2)×1+[00ηt]︸(K+2)×1,ηt∼N(0,ση2), where [μhβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000ση2/(1−ϕ2)]︸(K+2)×(K+2)), Ht=diag(vct12(1),vct22(Nt)) and y˜t︸min⁡(Nt+1,2)×1=(−log⁡τt1−mct1(1)−log(zt1+zt2)−log⁡τt2−mct2(Nt)−log(zt1+zt2)). Using the mixture of normal approximation of ξt1* and ξt2*, allows us to build an efficient Gibbs sampling procedure in which we can efficiently sample the latent state paths in one block using Kalman filtering and smoothing techniques. 2.3 Sampling of Event Times Nt The remaining challenge is sampling of Nt as all the other full conditionals are standard. We notice that conditionally on zt1, zt2 and the intensity λt, the Nt’s are independent over time. Using the short-hand notation v=(v1,…,vt) for a vector of variables for all the time periods, we can write p(N|γ,ν,μh,ϕ,ση2,s,x,z1,z2,y)=∏t=1Tp(Nt|γ,λt,zt1,zt2,yt). For a given time index t, we can draw Nt from a discrete distribution with p(Nt|γ,λt,zt1,zt2,yt)=p(Nt,yt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2)=p(yt|Nt,γ,λt,zt1,zt2)p(Nt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2)=[γ1{yt=0}+(1−γ)p(yt|Nt,λt,zt1,zt2)]×p(Nt|γ,λt,zt1,zt2)p(yt|γ,λt,zt1,zt2). (8) The denominator in Equation (8) is a Skellam probability mass function with the intensities λtzt1 and λtzt2. To calculate the probability p(yt|Nt,λt,zt1,zt2) in the second term in the brackets in (8) we use Equation (7), as yt conditionally on λt, zt1 and zt2 is distributed as a marked Poisson process, with marks given by Mi={1,with P(Mi=1)=zt1zt1+zt2,−1,with P(Mi=−1)=zt2zt1+zt2. This implies that we can represent yt as ∑i=0NtMi, so that p(yt|Nt,λt,zt1,zt2)={0 , if yt>Nt or |yt| mod⁡2≠|Nt| mod⁡2,(NtNt+yt2)(zt1zt1+zt2)Nt+yt2(zt2zt1+zt2)Nt−yt2,otherwise. Then Nt conditionally on zt1, zt2 and λt is a realization of a Poisson process on [0,1] with intensity (zt1+zt2)λt, hence the p(Nt|γ,λt,zt1,zt2) is the probability of a Poisson random variable with intensity equal to λt(zt1+zt2). We can draw all Nt’s in parallel by drawing u, a vector of uniform random variables with ut∼U[0,1], and setting Nt=min⁡{n:ut≤∑i=0np(i|γ,λt,zt1,zt2,yt)}. 2.4 Markov Chain Monte Carlo Algorithm To complete our Bayesian specification, we need to specify the prior distributions on the model parameters, which we set as follows: μh∼N(0,10), βi∼N(0,1), ϕ+12∼B(20,1.5), (9) ση2∼IG(2.5,0.025), γ∼B(1.7,10), ν∼G[2:0.2:128](15,1.5), (10) for i=1,…,K, where N is the normal, B is the beta, IG is the inverse gamma, and G[2,2.2,…,128] is the gamma distribution on a grid from 2 to 128 with a resolution of 0.2. The steps of the Markov chain Monte Carlo (MCMC) algorithm are outlined below, with more details provided in Appendix D. Initialize μh, ϕ, ση2, γ, ν, C, τ, N, z1, z2, s and x. Generate ϕ, ση2, μh,s and x from p(ϕ,ση2,μh,s,x|γ,ν,C,τ,N,z1,z2,s,y). Draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y). Generate γ from p(γ|ν,μh,ϕ,ση2,x,C,τ,N,z1,z2,s,y). Generate C,τ,N,z1,z2,ν from p(C,τ,N,z1,z2,ν|γ,μh,ϕ,ση2,x,s,y). Draw ν from p(ν|γ,μh,ϕ,ση2,x,s,y) Draw z1,z2 from p(z1,z2|ν,γ,μh,ϕ,ση2,x,s,y). Draw N from p(N|z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Draw τ from p(τ|N,z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Draw C from p(C|τ,N,z1,z2,ν,γ,μh,ϕ,ση2,x,s,y). Go to 2. The estimation of s is based on the spline specification st=wtβ in Equation (3) where K×1 vector wt can be treated as an exogenous vector and K×1 vector β contains the unknown spline values, which are treated as regression coefficients and need to be estimated. 2.5 Simulation Study To validate our estimation procedure for the dynamic Skellam and ΔNB models, we independently perform the following experiment 50 times. We simulate 20,000 observation from the model to be estimated and carry out the MCMC sampling based on 20,000 draws after the burn-in of 20,000 draws. The true parameters are set as μ=−1.7, ϕ=0.97,ση=0.02, γ=0.1, and ν = 10, which are close to those estimated from real data in our empirical study of Section 3. A single experiment takes approximately 5 hours on a 2.90 GHz CPU. Table 1 presents the posterior means, standard deviations, and the 95% credible intervals, averaged over 50 Monte Carlo replications. It also shows the mean inefficiency factors and their standard deviations across the experiments. Figure 1 illustrates the estimation results on a subsample of the initial 15 Monte Carlo replications, while Figure 2 depicts the posterior densities of the parameters from a single simulation. These results indicate that in our stylized setting, the algorithm can estimate the parameters accurately as the true values are within the highest posterior density regions based on the estimates. The posterior distributions for the autoregressive coefficient ϕ and for the state variance ση2 seem to be the most challenging to estimate efficiently, as their inefficiency factors are high. Nevertheless, the accuracy of their estimates is satisfactory, with the true values on average being within the 95% credible intervals. Table 1. Posterior means, standard deviations (in parentheses), the 95% central credible intervals (in brackets, 95% CI) and the mean inefficiency factors (IF) and their standard deviations averaged over MC 50 replications, for M = 20,000 posterior draws after a burn-in of 20,000 draws for T = 20,000 observations generated from the ΔNB models Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Table 1. Posterior means, standard deviations (in parentheses), the 95% central credible intervals (in brackets, 95% CI) and the mean inefficiency factors (IF) and their standard deviations averaged over MC 50 replications, for M = 20,000 posterior draws after a burn-in of 20,000 draws for T = 20,000 observations generated from the ΔNB models Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Parameter True Mean Std 95% CI Mean IF Std IF μ −1.7000 −1.7180 (0.0457) [−1.8083,−1.6287] 177.2659 (60.5684) ϕ 0.9700 0.9701 (0.0041) [0.9614,0.9774] 354.3270 (101.5417) ση2 0.0200 0.0200 (0.0032) [0.0145,0.0270] 552.3604 (153.5302) γ 0.1000 0.0901 (0.0182) [0.0526,0.1242] 312.7835 (116.7197) β1 1.0652 1.0652 (0.0948) [0.8790,1.2511] 20.3728 (3.7064) β2 −0.8538 −0.8538 (0.0448) [−0.9420,−0.7662] 22.2124 (4.5653) ν 10.0000 9.7185 (2.3329) [5.8333,14.8933] 135.1976 (51.5453) Figure 1. View largeDownload slide Bar plots of the posterior draws in a subsample of last 15 experiments from our Monte Carlo study. Figure 1. View largeDownload slide Bar plots of the posterior draws in a subsample of last 15 experiments from our Monte Carlo study. Figure 2. View largeDownload slide Posterior distributions of the parameters from a dynamic Δ NB model based on 20,000 observations and 20,000 iterations after a burn-in of 20,000. Each picture shows the histogram of the posterior draws the kernel density estimate of the posterior distribution, the HPD region, and the posterior mean. The true parameters are μ=−1.7, ϕ=0.97, ση2=0.02, γ=0.1 and ν=10. Figure 2. View largeDownload slide Posterior distributions of the parameters from a dynamic Δ NB model based on 20,000 observations and 20,000 iterations after a burn-in of 20,000. Each picture shows the histogram of the posterior draws the kernel density estimate of the posterior distribution, the HPD region, and the posterior mean. The true parameters are μ=−1.7, ϕ=0.97, ση2=0.02, γ=0.1 and ν=10. 3 Empirical Study In this section, we present and discuss the empirical findings from our analyses concerning tick by tick price changes for three different stocks traded at the NYSE, for two different periods. In particular, we analyze the bid prices that correspond to transactions to account for bid-ask bounds. We consider two model classes and two models for each class. The first set consists of the ordered probit models with normal and Student’s t stochastic volatility. The second set includes the dynamic Skellam and dynamic ΔNB models. The analyses include in-sample and out-of-sample marginal likelihood comparison of the models. Our aims of the empirical study is two-fold. First, the usefulness of the ΔNB model on a challenging dataset is investigated. In particular, we validate our estimation procedure and reveal possible shortcomings in the estimation of the parameters in the ΔNB model. Second, we intend to find out what the differences are when the considered models are based on heavy-tailed distributions (ordered t and ΔNB models) or not (ordered normal and dynamic Skellam models). Also, we compare the different model classes: ordered model versus integer distribution model. 3.1 Data We have access to the Thomson Reuters Sirca dataset that contains all trades and quotes with millisecond time stamps for all stocks listed at NYSE. We have collected the data for International Business Machines (IBM) and Coca-Cola (KO). These stocks differ in liquidity and in price magnitude. In our study we concentrate on two weeks of price changes: the first week of October 2008 and the last week of April 2010. These weeks exhibit different market sentiments and volatility characteristics. The month of October 2008 is in the middle of the 2008 financial crises with record high volatilities and some markets experienced their worst weeks in October 2008 since 1929. The month of April 2010 is a much calmer month with low volatilities. To avoid some of the issues related to microstructure noise in high-frequency price changes, including bid-ask bounces, we analyze the bid prices of transactions. The cleaning process of the data consists of a number of filtering steps that are similar to the procedures described in Boudt, Cornelissen, and Payseur (2012), Barndorff-Nielsen et al. (2008), and Brownlees and Gallo (2006). First, we remove all quotes-only entries which is a large portion of the data. By excluding the quotes we lose around 70−90% of the data. In the next step, we delete the trades with missing or zero prices or volumes. We also restrict our analysis to the trading period from 09:30 to 16:00. The fourth step is to aggregate the trades which have the same time stamp. We take the trades with the last sequence number when there are multiple trades at the same millisecond. We regard the last bid price as the bid price that we can observe with a millisecond resolution. Finally, we treat outliers by following the rules as suggested by Barndorff-Nielsen et al. (2008). Table 2 presents the descriptive statistics for our resulting bid price data from the 3rd to 10th October 2008 and from the 23rd to 30th April 2010, respectively. A more detailed account of the cleaning process can be found in Tables E.1 and E.2 in Appendix E. We treat the periods from the 3rd to 9th October 2008 and from the 23rd to 29th April 2010 as the in-sample periods. The two out-of-sample periods are 10th October 2008 and 30th April 2010. Figure 3 presents the empirical distributions of the tick by tick log returns as well as the tick returns and the fitted Skellam probability mass function (pmf). For the two stocks considered, IBM and KO, there is a nontrivial number of tick returns higher than 10 in absolute terms. Moreover, we find that the Skellam distribution is too lightly tailed to correctly capture the fat tails of the bid price data. Table 2. Descriptive statistics of the bid prices for IBM and KO from 3rd to 10th October 2008 (top) and from 23rd to 30th April 2010 (bottom) October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 Notes: The column “In” displays the statistics on the in-sample period from 3rd to 9th October 2008, while the column “Out” displays the descriptives for the out-of-sample period 10th October. We show the number of observations (Num.obs), average price (Avg. price), mean price change (Mean), standard deviation of price changes (Std), minimum and max integer price changes (Min,Max) as well as the percentage of zero price changes (% 0), the percentage of −1, 1 price changes (% ±1) and the percentage of price changes between 2 and 10 in absolute terms (% ± 2–10) in the sample. Table 2. Descriptive statistics of the bid prices for IBM and KO from 3rd to 10th October 2008 (top) and from 23rd to 30th April 2010 (bottom) October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 October 2008 IBM KO In Out In Out Num. obs 68 002 20 800 70 356 25 036 Avg. price 96.7955 87.5832 49.2031 41.8750 Mean −0.0176 −0.0013 −0.0103 0.0046 Std 5.7768 6.3142 1.8334 2.6755 Min −181 −89 −44 −47 Max 213 169 51 65 % 0 50.1735 48.2981 53.4937 48.2385 % ± 1 8.7233 8.0673 22.0081 17.9022 % ± 2–10 34.4931 34.6346 24.2481 33.0564 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 April 2010 IBM KO In Out In Out Num. obs 43 606 8587 34 469 6073 Avg. price 130.1758 129.5754 53.6275 53.7317 Mean 0.0014 −0.0181 −0.0029 −0.0061 Std 1.2883 1.3367 0.5971 0.6691 Min −21 −18 −9 −4 Max 36 10 8 5 % 0 61.6956 60.2888 75.2734 69.3891 % ± 1 23.5472 22.6505 22.2316 26.8730 % ± 2–10 14.6883 17.0374 2.4950 3.7379 Notes: The column “In” displays the statistics on the in-sample period from 3rd to 9th October 2008, while the column “Out” displays the descriptives for the out-of-sample period 10th October. We show the number of observations (Num.obs), average price (Avg. price), mean price change (Mean), standard deviation of price changes (Std), minimum and max integer price changes (Min,Max) as well as the percentage of zero price changes (% 0), the percentage of −1, 1 price changes (% ±1) and the percentage of price changes between 2 and 10 in absolute terms (% ± 2–10) in the sample. Figure 3. View largeDownload slide Empirical distributions of bid prices for IBM and KO stocks, during the October 2008 period. Figure 3. View largeDownload slide Empirical distributions of bid prices for IBM and KO stocks, during the October 2008 period. 3.2 Estimation Results We start our analyses with the dynamic Skellam and ΔNB models for all considered stocks in the periods from 3rd to 9th October 2008 and from 23rd to 29th April 2010. We adopt the same prior specifications as in the simulation study and given by (9)–(10). In the MCMC procedure, we draw 40,000 samples from the Markov chain and disregard the first 20,000 draws as burn-in samples. The results of parameter estimation for the 2008 data period are reported in Tables 3(a and b), while for the 2010 period in Table 4(a and b). Table 3. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 Table 3. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 (b) 2008 KO data Param. OrdN Ordt Sk ΔNB μ Mean 3.4314 3.3785 0.5416 1.2071 Std (0.0338) (0.0384) (0.0096) (0.0094) 95% [3.3651,3.4976] [3.3023,3.4539] [0.5229,0.5604] [1.1886,1.2256] IF 29.2717 29.2612 385.2575 556.3769 ϕ Mean 0.9532 0.9695 0.5101 0.6462 Std (0.0024) (0.0032) (0.0092) (0.0083) 95% [0.9480,0.9578] [0.9614,0.9748] [0.4943,0.5294] [0.6337,0.6637] IF 123.1848 1280.1052 1611.7549 2120.0977 ση2 Mean 0.1259 0.0653 0.5437 0.2280 Std (0.0073) (0.0092) (0.0082) (0.0041) 95% [0.1118,0.1403] [0.0505,0.0874] [0.5278,0.5622] [0.2191,0.2348] IF 216.0429 1520.1055 2003.2192 2653.8330 γ Mean 0.4500 0.4523 0.3238 0.4190 Std (0.0022) (0.0022) (0.0025) (0.0023) 95% [0.4457,0.4544] [0.4478,0.4565] [0.3194,0.3285] [0.4144,0.4236] IF 17.3129 63.3734 499.1635 72.3874 β1 Mean 0.5085 0.5148 0.2055 0.2078 Std (0.0635) (0.0691) (0.0152) (0.0158) 95% [0.3844,0.6330] [0.3786,0.6505] [0.1755,0.2349] [0.1766,0.2389] IF 5.6089 5.3234 31.4529 150.0026 β2 Mean −0.1528 −0.1628 −0.0629 −0.0785 Std (0.0546) (0.0597) (0.0136) (0.0133) 95% [−0.2608,−0.0466] [−0.2804,−0.0440] [−0.0898,−0.0366] [−0.1043,−0.0519] IF 5.9121 5.1946 48.2788 98.3510 ν Mean 14.0288 2.2000 Std (2.5125) (0.0000) 95% [10.5000,20.6000] [2.2000,2.2000] IF 1218.1696 – (b) 2008 KO data μ Mean 1.1263 1.1052 −0.4523 0.3588 Std (0.0329) (0.0364) (0.0158) (0.0332) 95% [1.0623,1.1911] [1.0343,1.1758] [−0.4836,−0.4216] [0.2943,0.4246] IF 15.4805 25.6141 48.1751 245.6923 ϕ Mean 0.9709 0.9763 0.9477 0.9781 Std (0.0019) (0.0025) (0.0029) (0.0015) 95% [0.9672,0.9747] [0.9712,0.9810] [0.9417,0.9537] [0.9752,0.9807] IF 155.0004 762.9785 441.9543 384.1333 ση2 Mean 0.0474 0.0360 0.0329 0.0252 Std (0.0036) (0.0047) (0.0020) (0.0017) 95% [0.0404,0.0544] [0.0268,0.0455] [0.0291,0.0369] [0.0221,0.0284] IF 210.4694 904.4350 775.7113 765.3316 γ Mean 0.3725 0.3730 0.1896 0.3496 Std (0.0030) (0.0030) (0.0032) (0.0032) 95% [0.3665,0.3785] [0.3671,0.3787] [0.1835,0.1962] [0.3433,0.3556] IF 54.4008 90.2672 239.3389 556.3973 β1 Mean 0.7383 0.7524 0.3754 0.6454 Std (0.0558) (0.0598) (0.0260) (0.0544) 95% [0.6300,0.8482] [0.6359,0.8719] [0.3245,0.4263] [0.5411,0.7537] IF 4.7332 5.2416 7.0623 30.0869 β2 Mean −0.3053 −0.3092 −0.1351 −0.2406 Std (0.0557) (0.0589) (0.0264) (0.0534) 95% [−0.4148,−0.1978] [−0.4241,−0.1939] [−0.1866,−0.0832] [−0.3464,−0.1372] IF 4.2964 4.3154 8.2317 13.7685 ν Mean 49.9920 17.5968 Std (21.6531) (2.2173) 95% [25.0000,111.8500] [14.2000,23.0000] IF 561.1726 839.0705 Table 4. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 Table 4. Posterior means, standard deviations (Std, in parentheses), the 95% central credible intervals (95%, in brackets), and the inefficiency factors (IF) for M = 20,000 posterior draws after a burn-in of 20,000 draws (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 (b) 2010 KO data Param. OrdN Ordt Sk ΔNB μ Mean 0.5588 0.3051 −0.7590 −0.3193 Std (0.0429) (0.0801) (0.0298) (0.0521) 95% [0.4750,0.6430] [0.1500,0.4654] [−0.8169,−0.6999] [−0.4184,−0.2130] IF 60.3543 14.1064 74.1764 337.6589 ϕ Mean 0.9791 0.9959 0.9862 0.9923 Std (0.0030) (0.0009) (0.0021) (0.0014) 95% [0.9734,0.9842] [0.9942,0.9975] [0.9814,0.9902] [0.9897,0.9950] IF 556.1437 175.6690 683.1293 1071.2396 ση2 Mean 0.0234 0.0035 0.0050 0.0042 Std (0.0039) (0.0006) (0.0008) (0.0008) 95% [0.0168,0.0320] [0.0024,0.0048] [0.0035,0.0068] [0.0027,0.0058] IF 650.2627 278.8649 774.4566 1359.5545 γ Mean 0.4473 0.4317 0.2776 0.3879 Std (0.0040) (0.0042) (0.0055) (0.0068) 95% [0.4395,0.4552] [0.4236,0.4397] [0.2677,0.2882] [0.3754,0.4021] IF 146.9604 52.4582 573.3739 915.2304 β1 Mean 0.4733 0.6502 0.2909 0.4351 Std (0.0711) (0.1283) (0.0494) (0.0814) 95% [0.3353,0.6137] [0.4141,0.9189] [0.1964,0.3897] [0.2829,0.6037] IF 5.6478 25.0358 12.4851 134.3036 β2 Mean 0.3954 0.3880 0.2743 0.3286 Std (0.0729) (0.1358) (0.0512) (0.0831) 95% [0.2512,0.5374] [0.1179,0.6506] [0.1733,0.3753] [0.1618,0.4893] IF 4.2390 3.7980 7.1545 9.8248 ν Mean 7.4336 3.8353 Std (0.4564) (0.4580) 95% [6.6000,8.4000] [3.2000,4.8000] IF 219.8048 947.4361 (b) 2010 KO data μ Mean −1.1537 −1.2438 −2.1331 −2.0890 Std (0.0479) (0.0616) (0.0564) (0.0585) 95% [−1.2451,−1.0579] [−1.3648,−1.1230] [−2.2425,−2.0217] [−2.2025,−1.9734] IF 42.2106 80.5260 20.0063 88.9444 ϕ Mean 0.9798 0.9887 0.9926 0.9903 Std (0.0031) (0.0019) (0.0014) (0.0017) 95% [0.9731,0.9852] [0.9848,0.9922] [0.9897,0.9952] [0.9866,0.9934] IF 185.6574 222.6474 284.1253 595.5608 ση2 Mean 0.0187 0.0082 0.0041 0.0065 Std (0.0034) (0.0015) (0.0007) (0.0011) 95% [0.0129,0.0270] [0.0057,0.0115] [0.0028,0.0057] [0.0046,0.0089] IF 236.9756 293.7111 392.8999 914.2363 γ Mean 0.3608 0.3449 0.0025 0.0224 Std (0.0095) (0.0116) (0.0019) (0.0185) 95% [0.3417,0.3789] [0.3220,0.3679] [0.0003,0.0075] [0.0003,0.0678] IF 136.1222 157.5518 2058.0548 496.8247 β1 Mean 0.7304 0.7645 0.6191 0.6216 Std (0.0707) (0.0832) (0.0917) (0.0833) 95% [0.5916,0.8713] [0.6064,0.9319] [0.4508,0.8125] [0.4661,0.7938] IF 9.5713 12.5510 50.9314 50.0410 β2 Mean 0.3251 0.2992 0.4172 0.4261 Std (0.0780) (0.0904) (0.0973) (0.0939) 95% [0.1716,0.4766] [0.1230,0.4773] [0.2228,0.6084] [0.2404,0.6105] IF 4.5545 4.2750 8.2049 9.4661 ν Mean 17.4954 8.4201 Std (3.6575) (1.8791) 95% [12.3000,26.6000] [5.6000,12.8000] IF 389.5012 246.6756 Table E.1. Summary of the cleaning and aggregation procedure on the data from 3rd to 10th Oct 2008 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 Table E.1. Summary of the cleaning and aggregation procedure on the data from 3rd to 10th Oct 2008 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 688,805 541,616 Trades 128,589 81.33 126,509 76.64 Nonmissing price and volume 128,575 0.01 126,497 0.01 Trades between 9:30 and 16:00 128,561 0.01 126,484 0.01 Aggregated trades 89,517 30.37 96,482 23.72 Without outliers 88,808 0.79 95,398 1.12 Without opening trades 88,802 0.01 95,392 0.01 Table E.2. Summary of the cleaning and aggregation procedure on the data from 23rd to 30th April 2010 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 Table E.2. Summary of the cleaning and aggregation procedure on the data from 23rd to 30th April 2010 for IBM and KO IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 IBM KO No. % dropped No. % dropped Raw quotes and trades 803,648 692,657 Trades 53,346 93.36 41,184 94.05 Nonmissing price and volume 53,332 0.03 41,173 0.03 Trades between 9:30 and 16:00 53,324 0.02 41,164 0.02 Aggregated trades 52,406 1.72 40,573 1.44 Without outliers 52,199 0.39 40,548 0.06 Without opening trades 52,193 0.01 40,542 0.01 The unconditional mean volatility differ across stocks and time periods. The unconditional mean of the latent state is higher for stocks with higher price and it is higher in the more volatile periods in 2008. These results are consistent with intuition but we should not take strong conclusions from these findings. For example, we cannot compare the estimated means between models as they have somewhat different interpretations in different model specifications. The estimated AR(1) coefficients for the different series range from 0.94 to 0.99, except for the Skellam and ΔNB models applied to the IBM data in the 2008 period, in which case the posterior means were 0.51 and 0.65, respectively. This finding suggests generally persistent dynamic volatility behavior within a trading day, even after accounting for the intradaily seasonal pattern in volatility. However, by comparing the two different periods, we find that the transient volatility is less persistent in the more volatile crises period. We only included the zero inflation specification for the ΔNB and dynamic Skellam distributions when additional flexibility appears to be needed in the observation density. This flexibility has been required for higher price stocks and during the more volatile periods. The estimates for the zero inflation parameter γ ranges from 0.1 to 0.3. The degrees of freedom parameter ν for the ΔNB distribution is estimated as a higher value during the more quiet 2010 period, which suggests that the distribution of the tick by tick price change is closer to a thin tailed distribution during such periods. In addition, we have found that the estimated degrees of freedom parameter is a lower value for stocks with a higher average price. From a more technical perspective, our study has revealed that the parameters of our ΔNB modeling framework mix relatively slowly. This may indicate that our procedure can be rather inefficient. However, it turns out that the troublesome parameters are in all cases the persistence parameter of the volatility process, ϕ, and the volatility of volatility, ση. It is well established and documented that these coefficients are not easy to estimate efficiently as they have not a direct impact on the observations; see Kim, Shephard, and Chib (1998) and Stroud and Johannes (2014). Furthermore, our empirical study has some challenging numerical problems. In the 2008 period we analyze almost 70,000 observations jointly while the time series in the 2010 period is shorter but still 40,000 observations. Such long-time series will typically lead to slow mixing in Bayesian MCMC estimation procedures due to highly informative full conditional distributions. Bayesian asymptotic results guarantee that long series are more informative about the parameters. Hence we can estimate these parameters accurately. Our Monte Carlo study in the previous section has shown that our algorithm is successful in capturing the true parameters. However, with long series it can be hard to construct efficient proposal distributions. In other words, it can be hard to choose “plausible” parameters in the random walk MH algorithm. We therefore observe low acceptance rates and thus high inefficiency factors. We have also anticipated that parameter estimation for the dynamic Skellam and ΔNB models is less numerically efficient and overall more challenging when compared to the estimation for ordered normal and ordered t models. The estimation for discrete distribution models requires more auxiliary variables and the analysis is based on additional conditional statements. On the basis of the output of our MCMC estimation procedure, we obtain the estimates for the latent volatility variable ht but we can also decompose these estimates into the corresponding components of ht, these are μh, st and xt; see Equation (2). Figure 4 presents the intraday, tick by tick, Coca Cola bid price changes and its estimated components xt and st for the log volatility ht in the ΔNB model, from 23rd to 29th April 2010. The intraday seasonality matches with the typical features of tick by tick data and reflects the market mechanism; see also the discussion in Andersen (2000). The volatility is the highest at the beginning of the trading day which is the result of the overnight effect and a different trading mechanism at the pre-open call auction during the first half hour of trading (from 9:00 to 9:30). This burst of information accumulated during the overnight period leads to much higher volatility at the opening of the market. This effect is captured by the estimated initial value of the spline function β1. The overnight effect receives strong support from the data given that the posterior means are far from zero. We further find that the regular trading takes place continuously throughout the day while it becomes more intense shortly before the closing of the market. The smoothness of the intraday seasonal pattern estimates is enforced through the spline specification. Apart from the pronounced intraday seasonality, we observe many volatility changes during a trading day. Some of these volatility changes may have been sparked by news announcements while others may have occurred as the result of the trading process. Finally, the parameter values underlying the signal extraction of ht=μh+st+xt are estimated jointly for five consecutive days. Hence it is implied that the overnight effect is the same for each day in our analysis. In Appendix F, we compare our estimates of ht=μh+st+xt with those based on parameter estimates obtained for each day separately. Although some differences are clearly visible, overall the extracted signals of ht are very similar. Figure 4. View largeDownload slide Decomposition of log volatility in the dynamic ΔNB model for KO from 23rd to 29th April 2010. Figure 4. View largeDownload slide Decomposition of log volatility in the dynamic ΔNB model for KO from 23rd to 29th April 2010. 3.3 In-Sample Comparison It is widely established in Bayesian studies that the computation of sequential Bayes factors (BF) is infeasible in this framework as it requires sequential parameter estimation. The sequential estimation of the parameters in our model is computationally prohibitive given the very high time dimensions. To provide some comparative assessments of the four models that we have considered in our study, we follow Stroud and Johannes (2014) and calculate Bayesian Information Criteria (BIC) for model M as BICT(M)=−2∑t=1Tlog⁡p(yt|θ^,M)+di log⁡T, where p(yt|θ,M) can be calculated by means of a particle filter and θ^ is the posterior mean of the parameters. The implementation of the particle filter for all considered models is rather straightforward given the model details provided in Section 1. The BIC gives an asymptotic approximation to the BF by BICT(Mi)−BICT(Mj)≈−2 log⁡BFi,j. We will use this approximation for our sequential model comparisons. Figure 5a and b present the BICs for the periods from the 23rd to 29th October 2008 and from the 3rd to 9th April 2010, respectively. For the 2008 period, the IBM stock does not appear to favor the integer-based models and its behavior is captured best with the ordered t model. However, the opposite is the case for KO: both the Skellam and the ΔNB model outperform the ordered models convincingly. In the 2010 period, the IBM stock slightly favors the ΔNB model compared to the ordered t model. In this case, the Skellam model does not seem to be able to correctly capture the features of the data. For KO in the same period, the ordered t model provides the best fit to the data, with both the Skellam model and the ΔNB model not performing so well. Furthermore, we may conclude from the BIC results that the ordered t and ΔNB model tends to be favored when large jumps in volatility have occurred. Such large price changes may lead to a prolonged period of high volatility which suggests the need of the ΔNB model. These findings are consistent with the intuition that for time varying volatility models, the identification of parameters determining the tail behavior requires extreme or excessive observations in combination with low volatility. Figure 5. View largeDownload slide In-sample analysis: sequential BF approximations based on BIC, relative to the ordered normal model, for IBM (left) and KO (right) on two periods. (a) From 3rd to 9th October 2008; (b) from 23rd to 29th April 2010. Figure 5. View largeDownload slide In-sample analysis: sequential BF approximations based on BIC, relative to the ordered normal model, for IBM (left) and KO (right) on two periods. (a) From 3rd to 9th October 2008; (b) from 23rd to 29th April 2010. 3.4 Out-of-Sample Comparisons The performances of the dynamic Skellam and ΔNB models can also be compared in terms of predictive likelihoods. The one-step-ahead predictive likelihood for model M is p(yt+1|y1:t,M)= ∫∫p(yt+1|y1:t,xt+1,θ,M)p(xt+1,θ|y1:t,M)dxt+1dθ= ∫∫p(yt+1|y1:t,xt+1,θ,M)p(xt+1|θ,y1:t,M)p(θ|y1:t,M)dxt+1dθ. Generally, the h-step-ahead predictive likelihood can be decomposed to the sum of one-step-ahead predictive likelihoods through p(yt+1:t+h|y1:t,M)=∏i=1hp(yt+i|y1:t+i−1,M)=∏i=1h∫∫p(yt+i|y1:t+i−1,xt+i,θ,M) ×p(xt+i|θ,y1:t+i−1,M)p(θ|y1:t+i−1,M)dxt+idθ. These results suggest that we require the computation of p(θ|y1:t+i−1,m), for i=1,2,…, that is the posterior of the parameters using sequentially increasing data samples. It requires the MCMC procedure to be repeated as many times as we have number of out-of-sample observations. In our application, for each stock and each model, it implies several thousands of MCMC replications for a predictive analysis of a single out-of-sample day. This exercise is computationally not practical or even infeasible. However, we may be able to rely on the approximation p(yt+1:t+h|y1:t,M)≈∏i=1h∫∫p(yt+i|y1:t+i−1,xt+i,θ,M) ×p(xt+i|θ,y1:t+i−1,M)p(θ|y1:t,M)dxt+idθ. This approximation is based on the notion that, after observing a considerable amount of data, that is for t sufficiently large, the posterior distribution of the static parameters should not change much and hence p(θ|y1:t+i−1,M)≈p(θ|y1:t,M). Based on this approximation, we carry out the following exercise. From our MCMC output, we obtain a sample of posterior distributions based on the in-sample observations. For each parameter draw from the posterior distribution, we estimate the likelihood using the particle filter for the out-of-sample period. Figure 6a and b present the out-of-sample sequential predictive BF approximations for the 10th October 2008 and 30th April 2010, respectively. Similarly as in the in-sample 2008 period, also on the 10th October 2008, the ordered t model performs best for the IBM stock, while both integer distribution models perform well for the KO stock. On the 30th April 2010, the ΔNB model performs the best for IBM while the Skellam model is being the worst. It suggests that the IBM stock requires a heavy-tailed distribution, as in the ΔNB and ordered t model. For KO in the same period, both the dynamic Skellam and ΔNB models beat the ordered models. Here the Skellam model outperforms slightly the ΔNB model during most of the trading day. Figure 6. View largeDownload slide Out-of-sample analysis: sequential predictive BF approximations, relative to the ordered normal model, for IBM (left) and KO (right) for the two periods. (a) On 10th October 2008; (b) On 30th April 2010. Figure 6. View largeDownload slide Out-of-sample analysis: sequential predictive BF approximations, relative to the ordered normal model, for IBM (left) and KO (right) for the two periods. (a) On 10th October 2008; (b) On 30th April 2010. Figure A.1. View largeDownload slide (a) The Skellam distribution with different parameters; (b) the ΔNB distribution with different parameters. Figure A.1. View largeDownload slide (a) The Skellam distribution with different parameters; (b) the ΔNB distribution with different parameters. 4 Conclusions We have reviewed and introduced dynamic models for high-frequency integer price changes. In particular, we have introduced the dynamic negative binomial difference model, referred to as the ΔNB model. We have developed a MCMC procedure (based on Gibbs sampling) for the Bayesian estimation of parameters in the dynamic Skellam and ΔNB models. Furthermore, we have demonstrated our estimation procedures for simulated data and for real data consisting of tick by tick transaction bid prices from NYSE stocks. We have compared the in-sample and out-of-sample performances of two classes of models, the ordered probit models and models based on integer distributions. Our modeling framework opens several directions for future research. For instance, the ΔNB model has been defined by a time-varying specification for the λ parameter in the ΔNB distribution, while the second parameter ν is kept constant over time. It can be of interest to investigate the impact of reversing this specification by considering a dynamic model for ν. It would be also of interest to allow for a ΔNB distribution with a non-zero mean. This would allow to base our analysis on the noncentered parametrization of our state space model and hence to adopt the ancillarity-sufficiency interweaving strategy (ASIS) of Kastner and Frühwirth-Schnatter (2014) for the improvement of mixing the proposed sampler. This direction of improvement of the efficiency of our proposed sampler can also be considered for future research. Footnotes * I.B. thanks Dutch National Science Foundation (NWO) for financial support. S.J.K. acknowledges support from CREATES, Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. We are indebted to Lennart Hoogerheide, Rutger Lit, André Lucas, Mike Pitt and Lukasz Romaszko for their help and support in this research project, and to Rudolf Frühwirth for providing the C code for auxiliary mixture sampling. We further like to thank the Editor, Associate Editor, and two Referees for their constructive comments. APPENDIX A: NEGATIVE BINOMIAL DISTRIBUTION The probability mass function (pmf) of the NB distribution is given by f(k;ν,p)=Γ(ν+k)Γ(ν)Γ(k+1)pk(1−p)ν. Its different parametrization can be obtained by denoting its mean by λ=νp1−p, which implies p=λλ+ν. We refer to this parametrization as NB(λ,ν). Then the pmf takes the following form f(k;λ,ν)=Γ(ν+k)Γ(ν)Γ(k+1)(λν+λ)k(νν+λ)ν and the variance is equal to λ(1+λν). Then the dispersion index, or the variance-to-mean ratio, is equal to (1+λν)>1, which shows that the NB distribution is overdispersed. This means that there are more intervals with low counts and more intervals with high counts, compared to the Poisson distribution. The latter is nested in the NB distribution as the limiting case when ν→∞. Alternatively, the NB distribution can be written as a Poisson-Gamma mixture. Let Y follow a Poisson distribution with the mean λU, where the heterogeneity parameter U has the unit mean and is Gamma-distributed, U∼Ga(ν,ν), with the density of Ga(α,β) given by f(x;α,β)=βαxα−1e−βxΓ(α). Then f(k;λ,ν)=∫0∞fPoisson(k;λu)fGamma(u;ν,ν)du=∫0∞(λu)ke−λuk!ννuνe−νuΓ(ν)du=λkννk!Γ(ν)∫0∞e−(λ+ν)uuk+ν−1du. Substituting (λ+ν)u=s, we get =λkννk!Γ(ν)∫0∞e−ssk+ν−1(λ+ν)k+ν−11(λ+ν)ds=λkννk!Γ(ν)1(λ+ν)k+ν∫0∞e−ssk+ν−1ds=λkννk!Γ(ν)Γ(k+ν)(λ+ν)k+ν=Γ(ν+k)Γ(ν)Γ(k+1)(λν+λ)k(νν+λ)ν, which shows that Y∼NB(λ,ν). APPENDIX B: DAILY VOLATILITY PATTERNS We want to approximate the function f:ℝ→ℝ with a continuous function which is built up from piecewise polynomials of degree at most three. Let the set Δ={k0,…,kK} denote the set of of knots kj j=0,…,K. Δ is sometimes called a mesh on [k0,kK]. Let y={y0,…,yK} where yj=f(xj). We denote a cubic spline on Δ interpolating to y as SΔ(x).SΔ(x) has to satisfy the following conditions. SΔ(x)∈C2 [k0,kK]. SΔ(x) coincides with a polynomial of degree at most three on the intervals [kj−1,kj] for j=0,…,K. SΔ(x)=yj for j=0,…,K. Using condition 2 we know that the SΔ′′(x) is a linear function on [kj−1,kj], which means that we can write SΔ′′(x) as SΔ′′(x)=[kj−xhj]Mj−1+[x−kj−1hj]Mj for x∈[kj−1,kj], where Mj=SΔ′′(kj) and hj=kj−kj−1. Integrating SΔ′′(x) and solving the integrating for the two integrating constants (using SΔ(x)=yj) Poirier (1973) shows that we get SΔ′(x)=[hj6−(kj−x)22hj]Mj−1+[(x−kj−1)22hj−hj6]Mj+yj−yj−1hj, for x∈[kj−1,kj] and SΔ(x)=kj−x6hj[(kj−x)2−hj2]Mj−1+x−kj−16hj[(x−kj−1)2−hj2]Mj +[kj−xhj]yj−1+[x−kj−1hj]yj, for x∈[kj−1,kj] (A1) In the above expressions only Mj for j=0,…,K are unknown. We can use the continuity restrictions which enforce continuity at the knots by requiring that the derivatives are equal at the knots kj for j=1,…,K−1, so that SΔ′(kj−)=hjMj−1/6+hjMj/3+(yj−yj−1)/hj,SΔ′(kj+)=−hj+1Mj/3−hj+1Mj+1/6+(yj+1−yj)/hj+1, which yields K – 1 conditions (1−λj)Mj−1+2Mj+λjMj+1=6yj−1hj(hj+hj+1)−6yjhjhj+1+6yj+1hj+1(hj+hj+1), where λj=hj+1hj+hj+1. Using two end conditions we have K + 1 unknowns and K + 1 equations and we can solve the linear equation system for Mj. Using the M0=π0M1 and MK=πKMK−1 end conditions we can write Λ=[2−2π00…0001−λ12λ1…00001−λ22…000⋮⋮⋮⋮⋮⋮000…2λK−20000…1−λK−12λK−1000…0−2πK2],Θ=[000…0006h1(h1+h2)−6h1h26h2(h1+h2)…00006h2(h2+h3)−6h2h3…000⋮⋮⋮⋮⋮⋮000…−6hK−2hK−16hK−1(hK−2+hK−1)0000…6hK−1(hK−1+hK)−6hK−1hK6hK(hK−1+hK)000…000] m=(M0,M1,…,MK−1,MK)′ and y=(y0,y1,…,yK−1,yK)′. The linear equation system is given by Λm=θy (A2) and the solution is m=Λ−1Θy (A3) Using this result and Equation (A1) we can calculate SΔ(ξ)=[SΔ(ξ1),SΔ(ξ2),…,SΔ(ξN−1),SΔ(ξN)]′. Let us denote by P the N×(K+1) matrix in which the ith row i=1,…,N1, given that kj−1≤ξ≤kj, can be written as pi︸1×(K+1)=[0,…,0︸first j−2,kj−ξi6hj[(kj−ξi)2−hj2],ξi−kj−16hj[(ξi−kj−1)2−hj2],0,…,0︸last K+1−j]. Moreover, denote by Q the N×(K+1) matrix in which the ith row i=1,…,N, given that kj−1≤ξ≤kj, can be written as qi︸1×(K+1)=[0,…,0︸first j−2,kj−ξihj,ξi−kj−1hj,0,…,0︸last K+1−j]. Now using (A1) and (A3), we get SΔ(ξ)=Pm+Qy=PΛ−1Θy+Qy=(PΛ−1Θ+Q)y=W︸N×(K+1)y︸(K+1)×1, where W=PΛ−1Θ+Q.  In practical situations, we might only know the knots but we do not know the spline values, which we observe with errors. In this case we have s=SΔ(ξ)+ɛ=Wy+ɛ, where s=(s1,s2,…,sN−1,sN)′ and ɛ=(ɛ1,ɛ2,…,ɛN−1,ɛN)′, with E(ɛ)=0 and E(ɛɛ′)=σ2I. Notice that after fixing the knots we only have to estimate the value of the spline at the knots and this fully determines the shape of the spline. We can do this by a simple OLS regression y^=(W⊤W)−1W⊤s. For the identification reasons we want ∑j:unique ξjSΔ(ξj)=∑j:unique ξjwjy=w*y=0, where wi is the ith row of W and w*=∑j:unique ξjwj. To this end, a restriction can be enforced on one of the elements of y. This ensures that E(st)=0, so that st and μh can be identified. If we drop yK we can substitute yK=−∑i=0K−1(wi*/wK*)yi, where wi* is the ith element of w*. Substituting this into ∑j:unique ξjSΔ(ξj)=∑j:unique ξjwjy=∑j:unique ξj∑i=0Kwjiyi=∑j:unique ξj∑i=0K−1wjiyi−wjK∑i=0K−1(wi*/wK*)yi=∑j:unique ξj∑i=0K−1(wji−wjKwi*/wK*)yi=∑i=0K−1∑j:unique ξj(wji−wjKwi*/wK*)yi=∑i=0K−1(wi*−wK*wi*/wK*)yi=∑i=0K−1(wi*−wi*)yi=0. Lets partition W in the following way W︸N×(K+1)=[W−K︸N×K:WK︸N×1], where W−K is equal to the first K columns of W and WK is the Kth column of W. Moreover w*︸1×(K+1)=[w−K*︸1×K:wK*︸1×1]. Finally, we can define W˜︸N×K=W−K︸N×K−1wK*WK︸N×1w−K*︸1×K, so that we obtain s=SΔ(ξ)+ɛ=W˜︸N×Ky˜︸K×1+ɛ. APPENDIX C: MCMC ESTIMATION OF THE ORDERED T-SV MODEL In this section, the t element vectors (v1,…,vt) containing time-dependent variables for all time periods, are denoted by v, the variable without a subscript. C.1. Generating the Parameters x,μh, ϕ, ση2 (Step 2) Notice that conditional on C={ct,t=1,…,T}, rt* we have 2⁡log⁡rt*=μ+st+xt+log⁡λt+mct+ɛt, ɛt∼N(0,vct2) which implies the following state space form y˜t =[1wt1]︸1×(K+2)[μβxt]︸(K+2)×1+ɛt, ɛt∼N(0,vct2), (A4) αt+1 =[μβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μβxt]︸(K+2)×1+[00ηt]︸(K+2)×1,ηt∼N(0,ση2), (A5) where [μβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000ση2/(1−ϕ2)]︸(K+2)×(K+2)) (A6) and y˜t=2⁡log⁡rt*−log⁡λt−mrt1. (A7) First we draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Notice that p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y)=p(ϕ,ση2|y˜t,C,N)∝p(y˜t|ϕ,ση2,C,N)p(ϕ)p(ση2), where y˜t is defined above in Equation (A11). The likelihood can be evaluated using standard Kalman filtering and prediction error decomposition (see, e.g., Durbin and Koopman, 2012) taking advantage of fact that conditional on the auxiliary variables we have a linear Gaussian state space form given by Equations (A8), (A9), (A10), and (A11). We draw from the posterior using an adaptive random walk Metropolis-Hastings step proposed by Roberts and Rosenthal (2009). Conditional on ϕ,ση2 we draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y), which is done simulating from the smoothed state density of the linear Gaussian state space model given by (A4), (A5), (A6), and (A7). We use the simulation smoother proposed by Durbin and Koopman (2002). C.2. Generating γ (Step 3) The conditional distribution for γ simplifies as follows p(γ|ν,μ,ϕ,ση2,x,s,C,y,rt*)=p(γ|ν,h,y), because given ν, h and y, γ does not depend on C,ϕ,ση2,rt*. We further have that p(γ|ν,h,y)∝p(y|γ,ν,h)p(γ|ν,h)=p(y|γ,ν,h)p(γ), as γ is independent from ν and h. Finally, p(y|γ,ν,h)p(γ)=∏t=1T{γ1{yt=0}+(1−γ) ×[T(yt+0.5exp⁡(ht/2),ν)−T(yt−0.5exp⁡(ht/2),ν)]}γa−1(1−γ)b−1B(a,b)∝∏t=1T{γa(1−γ)b−11{yt=0}+γa−1(1−γ)b × [T(yt+0.5exp⁡(ht/2),ν)−T(yt−0.5exp⁡(ht/2),ν)]}, where T(·,ν) is the Student’s t density function with mean zero scale one and degree of freedom parameter ν. We sample from this posterior using an adaptive random walk Metropolis-Hastings sampler by Roberts and Rosenthal (2009). C.3. Generating r* First, notice that the conditional distribution for r* can be simplified as follows p(r*|γ,ν,μ,ϕ,ση2,x,s,C,λ,y)=p(r*|γ,h,λ,y)=∏t=1Tp(rt*|γ,ht,λt,yt). Then, by the law of total probability we have p(rt*|γ,ν,ht,yt)=p(rt*|γ,ν,ht,λt,yt,zero)p(zero|γ,ht,λt,yt) +p(rt*|γ,ht,λt,yt,non-zero)p(non-zero|γ,ht,λt,yt), where p(rt*|γ,ht,λt,yt,zero) is a normal density with zero mean and variance λtexp⁡(ht) truncated to the interval [yt−0.5,yt+0.5]. If yt = 0, then p(zero|γ,ht,yt=0)=p(zero,γ,ht,yt=0)p(γ,ht,yt=0)=p(yt=0|zero,γ,ht)p(zero|γ,ht)p(yt=0|γ,ht)=1×γγ+(1−γ)[Φ(0.5λtexp⁡(ht/2))−Φ(−0.5λtexp⁡(ht/2),)]. If yt=k≠0, then p(zero|γ,ht,yt=k)=p(zero,γ,ht,yt=k)p(γ,ht,yt=k)=p(yt=k|zero,γ,ht)p(zero|γ,ht)p(yt=k|γ,ht)=0. Moreover p(non-zero|γ,ht,yt)=1−p(zero|γ,ht,yt). C.4. Generating ν and λ To sample ν and λ we use the method by Stroud and Johannes (2014). We can decompose the posterior density as p(ν,λ|γ,ϕ,ση2,h,C,y,r*)=p(ν,λ|h,r*)=p(λ|ν,h,r*)p(ν|h,r*). Note that we have to following mixture representation rt*=exp⁡(ht/2)λtɛt ɛt∼N(0,1) λt∼IG(ν/2,ν/2), which implies p(ν|h,r*)∝∏t=1Tp(rt*exp⁡(ht/2)|ht,ν)p(ν), where p(rt*exp⁡(ht/2)|ht,ν)∼tν(0,1). Combined with the prior ν∼DU(2,128), this leads to the following posterior p(ν|h,r*)∝∏t=1Tp(rt*exp⁡(ht/2)|ht,ν)=∏t=1Tgν*(rt*exp⁡(ht/2))=∏t=1Tgν*(wt), where wt=rt*/exp⁡(ht/2). To avoid the computationally intense evaluation of these probabilities, we can use a Metropolis-Hastings update. We can draw the proposal ν* from the neighborhood of the current value ν(i) using a discrete uniform distribution ν*∼DU(ν(i)−δ,ν(i)+δ) and accept with probability min⁡{1,∏t=1Tgν*(yt)∏t=1Tgν(i)(yt)}, where δ is chosen such that the acceptance rate is reasonable. Finally, we have p(λ|ν,h,r*)=∏t=1Tp(λt|ν,ht,rt*)∝∏t=1Tp(rt*|λt,ν,ht)p(λt|ν), where p(rt*exp⁡(ht/2)|λt,ν,ht)∼N(0,λt),p(λt|ν)∼IG(ν/2,ν/2),p(λt|ν,ht,rt*)∼IG(ν+12,ν+(rt*exp⁡(ht/2))22). APPENDIX D: MCMC ESTIMATION OF THE DYNAMIC ΔNB MODEL In this section, the t element vectors (v1,…,vt) containing time-dependent variables for all time periods, are denoted by v, the variable without a subscript. We discuss the algorithmic details for the ΔNB model and we note that these also apply to the dynamic Skellam model, except Step 4a (generating ν). D.1. Generating x,s,μh, ϕ, ση2 (Step 2) Notice that conditional on C={ctj,t=1,…,T,j=1,…,min⁡(Nt+1,2)}, τ, N, γ and s we have −log⁡τt1=log(zt1+zt2)+μh+st+xt+mct1(1)+ɛt1, ɛt1∼N(0,vct12(1)) and −log⁡τt2=log(zt1+zt2)+μh+st+xt+mct2(Nt)+ɛt2, ɛt2∼N(0,vct22(Nt)), which implies the following state space form y˜t︸min⁡(Nt+1,2)×1=[1wt11wt1]︸min⁡(Nt+1,2)×(K+2)[μhβxt]︸(K+2)×1+ɛt︸min⁡(Nt+1,2)×1, ɛt∼N(0,Ht), (A8) αt+1=[μhβxt+1]︸(K+2)×1=[1000IK000ϕ]︸(K+2)×(K+2)[μhβxt]︸(K+2)×1+[00ηt]︸(K+2)×1, (A9) where ηt∼N(0,ση2) and [μhβx1]︸(K+2)×1∼N([μ0β00]︸(K+2)×1,[σμ2000σβ2IK000σeta2/(1−ϕ2)]︸(K+2)×(K+2)), (A10) with Ht=diag(vct12(1),vct,22(Nt)) and y˜t︸min⁡(Nt+1,2)×1=(−log⁡τt1−mrt1(1)−log(zt1+zt2)−log⁡τt2−mrt2(Nt)−log(zt1+zt2)). (A11) First we draw ϕ,ση2 from p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y). Notice that p(ϕ,ση2|γ,ν,C,τ,N,z1,z2,s,y)=p(ϕ,ση2|y˜t,C,N)∝p(y˜t|ϕ,ση2,C,N)p(ϕ)p(ση2), (A12) where y˜t is defined above in Equation (A11). The likelihood can be evaluated using the standard Kalman filter and prediction error decomposition (see, e.g., Durbin and Koopman, 2012) taking advantage of fact that conditional on the auxiliary variables we have a linear Gaussian state space form given by Equations (A8)–(A11). We draw from the posterior using an adaptive random walk Metropolis-Hastings step proposed by Roberts and Rosenthal (2009). Conditional on ϕ,ση2 we draw μh,s and x from p(μh,s,x|ϕ,ση2,γ,ν,C,τ,N,z1,z2,s,y), which is done simulating from the smoothed state density of the linear Gaussian state space model given by (A8), (A9), (A10), and (A11). We use the simulation smoother proposed by Durbin and Koopman (2002). D.2. Generating γ (Step 3) Notice that we can simplify p(γ|ν,μh,ϕ,ση2,x,C,s,τ,N,z1,z2,y)=p(γ|ν,μh,s,x,y) (A13) because given ν, λ and y, the variables C,τ,N,z1,z2 are redundant. We can then decompose p(γ|ν,μh,s,x,y)∝p(y|γ,ν,μh,s,x)p(γ|ν,μh,s,x)=p(y|γ,ν,μh,s,x)p(γ) (A14) as γ is independent from ν and λt=exp⁡(μh+st+xt). Plugging in the formula for the likelihood and for the prior for γ yields p(y|γ,ν,μh,x)p(γ)=∏t=1T[γ1{yt=0}+(1−γ)(νλt+ν)2ν(λtλt+ν)|yt|Γ(ν+|yt|)Γ(ν)Γ(|yt|) ×F(ν+yt,ν,yt+1;(λtλt+ν)2)]γa−1(1−γ)b−1B(a,b)∝∏t=1T[γa(1−γ)b−11{yt=0}+γa−1(1−γ)b(νλt+ν)2ν(λtλt+ν)|yt| ×Γ(ν+|yt|)Γ(ν)Γ(|yt|)F(ν+yt,ν,yt+1;(λtλt+ν)2)]. We sample from this posterior using an adaptive random walk Metropolis-Hastings sampler. D.3. Generating C,τ,N,z1,z2,ν (Step 4) To start with, we decompose the joint posterior of C, τ, N, z1, z2 and ν into p(C,τ,N,z1,z2,ν|γ,μh,ϕ,ση2,s,x,y)=p(C|τ,N,z1,z2γ,p,μh,ϕ,ση2,s,x,y) ×p(τ|N,z1,z2γ,ν,μh,ϕ,ση2,s,x,y) ×p(N|z1,z2γ,ν,μh,ϕ,ση2,s,x,y) ×p(z1,z2|γ,ν,μh,ϕ,ση2,s,x,y) ×p(ν|γ,μh,ϕ,ση2,s,x,y). Generating ν (Step 4a) Note that p(ν|γ,μh,ϕ,ση2,s,x,y)=p(ν|γ,λ,y)∝p(ν,γ,λ,y)=p(y|γ,λ,ν)p(λ|γ,ν)p(γ|ν)p(ν)=p(y|γ,λ,ν)p(λ)p(γ)p(ν)∝p(y|γ,λ,ν)p(ν), where p(y|γ,λ,ν) is a product of zero inflated ΔNB probability mass functions. We draw ν using a discrete uniform prior ν∼DU(2,128) and a random walk proposal in the following fashion as suggested by Stroud and Johannes (2014) for degree of freedom parameter of a t density. We can write the posterior as a multinomial distribution p(ν|μh,x,z1,z2)∼M(π2*,…,π128*) with probabilities πν*∝∏t=1T[γI{yt=0}+(1−γ)fΔNB(yt;λt,ν)]=∏t=1Tgν(yt).  To avoid the computationally intense evaluation of these probabilities, we can use a Metropolis-Hastings update. We can draw the proposal ν* from the neighborhood of the current value ν(i) using a discrete uniform distribution ν*∼DU(ν(i)−δ,ν(i)+δ) and accept with the probability min⁡{1,∏t=1Tgν*(yt)∏t=1Tgν(i)(yt)}, where δ is chosen such that the acceptance rate is reasonable. Generating z1, z2 (Step 4b) Notice that given γ, μh, s, x and y the elements of the vectors z1 and z2 are independent over time, so that their posterior distribution factorizes as follows p(z1,z2|γ,ν,μh,ϕ,ση2,s,x,y)=∏t=1Tp(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt). Then we have for a single component p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt)∝p(zt1,zt2,γ,ν,μh,ϕ,ση2,st,xt,yt)=p(yt|zt1,zt2,γ,ν,μh,ϕ,ση2,st,xt) ×p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt), which we express as p(zt1,zt2|γ,ν,μh,ϕ,ση2,st,xt,yt)∝g(zt1,zt2)ννzt1νe−νzt1Γ(ν)ννzt2νe−νzt2Γ(ν), where g(zt1,zt2)=[γ1{yt=0}+(1−γ)exp[−λt(zt1+zt2)](zt1zt2)yt2I|yt|(2λtzt1zt2)], with λt=exp⁡(μh+st+xt). We can carry out an independent MH step by sampling z1t* and z2t* from Ga(λt,ν) with the acceptance probability equal to min⁡{g(z1t*,z2t*)g(zt1,zt2),1}. Generating N (Step 4c) As described in Section 2.3. Generating τ (Step 4d) Notice that p(τ|N,z1,z2,γ,ν,μh,ϕ,ση2,x,y)=p(τ|N,μh,z1,z2,s,x). Moreover p(τ|μh,z1,z2,s,x)=∏t=1Tp(τ1t,τ2t|Nt,μh,zt1,zt2,st,xt),=∏t=1Tp(τ1t|τ2t,Nt,μh,zt1,zt2,st,xt)p(τ2t|Nt,μh,zt1,zt2,st,xt), where we can sample from p(τ2t|Nt,μh,zt1,zt2,st,xt) using the fact that conditionally on Nt the arrival time τ2t of the Ntth jump is the maximum of Nt uniform random variables and it has a Beta(Nt,1) distribution. The arrival time of the (Nt+1)th jump after 1 is exponentially distributed with intensity λt(zt1+zt2), hence τ1t=1+ξt−τ2t, ξt∼Exp(λt(zt1+zt2)). Generating C (Step 4e) Notice that p(C|τ,N,z1,z2,γ,ν,μh,ϕ,ση2,s,x,y)=p(C|τ,N,z1,z2,ν,s,x). Moreover, p(C|τ,N,z1,z2,ν,s,x)=∏t=1T∏j=1min⁡(Nt+1,2)p(rtj|τt,Nt,μh,zt1,zt2,st,xt). We can than sample ct1 from the following discrete distribution p(ct1|τt,Nt,μh,zt1,zt2,st,xt)∝wk(1)ϕ(−log⁡τ1t−log⁡[λt(zt1+zt2)],mk(1),vk2(1)), where k=1,…,C(1). If Nt>0 then draw rt2 from the discrete distribution p(ct2|τt,Nt,μh,zt1,zt2,st,xt)∝wk(Nt)ϕ(−log⁡τ1t−log[λt(zt1+zt2)],mk(Nt),vk2(Nt)), for k=1,…,C(Nt). APPENDIX E: DATA CLEANING AND TRADE DURATIONS Tables E.1 and E.2 in Appendix E present the details of the data cleaning and aggregation procedure. Figure E.1 in Appendix E presents the time series of the durations between the subsequent trades and their histograms (on the log⁡10 scale for the frequencies), which are based on the cleaned data. Figure E.1. View largeDownload slide Durations between trades and their log⁡10 frequencies for IBM and KO, for the 2008 (top rows) and 2010 (bottom rows) samples, respectively. (a) IMB; (b) KO. Figure E.1. View largeDownload slide Durations between trades and their log⁡10 frequencies for IBM and KO, for the 2008 (top rows) and 2010 (bottom rows) samples, respectively. (a) IMB; (b) KO. APPENDIX F: INTRADAY FEATURES, INCLUDING OVERNIGHT EFFECTS In Figure F.1 in Appendix F, we present the estimated decompositions of the log volatility ht=μh+st+xt where we compare the component and signal estimates based on the model parameters that are estimated using all five days jointly and those from the model parameters that are estimated for each day separately. The motivation of this comparison is to verify whether the overnight effect, together with other intraday features, can be considered to be the same for each trading day or whether such features change from day to day. For our analysis of KO tick by tick transaction bid prices, we conclude that some features (such as intraday persistence) can be different from day to day but that the overall effects, including overnight effects, appear to be similar. Figure F.1. View largeDownload slide Volatility decomposition ht=μh+st+xt KO tick bid price returns for the 2008 data: ΔNB model parameters estimated based on the full sample (top two panels) and for each day separately (bottom two panels). Figure F.1. View largeDownload slide Volatility decomposition ht=μh+st+xt KO tick bid price returns for the 2008 data: ΔNB model parameters estimated based on the full sample (top two panels) and for each day separately (bottom two panels). References Aït-Sahalia Y. , Jacod J. , Li J. . 2012 Testing for Jumps in Noisy High Frequency Data . Journal of Econometrics 168 : 207 – 222 . Google Scholar CrossRef Search ADS Aït-Sahalia Y. , Mykland P. A. , Zhang L. . 2011 Ultra High Frequency Volatility Estimation with Dependent Microstructure Noise . Journal of Econometrics 160 : 160 – 175 . Google Scholar CrossRef Search ADS Alzaid A. , Omair M. A. . 2010 . On the Poisson Difference Distribution Inference and Applications . Bulletin of the Malaysian Mathematical Science Society 33 : 17 – 45 . Andersen T. G. . 2000 . Some Reflections on Analysis of High-Frequency Data . Journal of Business Economic Statistics 18 : 146 – 153 . Barndorff-Nielsen O. E. , Hansen P. R. , Lunde A. , Shephard N. . 2008 . Realized Kernels in Practice: Trades and Quotes . Econometrics Journal 4 : 1 – 32 . Barndorff-Nielsen O. E. , Pollard D. G. , Shephard N. . 2012 . Integer-valued Lévy Processes and Low Latency Financial Econometrics . Quantitative Finance 12 : 587 – 605 . Google Scholar CrossRef Search ADS Bos C. . 2008 . “ Model-based Estimation of High Frequency Jump Diffusions with Microstructure Noise and Stochastic Volatility .” TI Discussion paper . Google Scholar CrossRef Search ADS Boudt K. , Cornelissen J. , Payseur S. . 2012 . Highfrequency: Toolkit for the Analysis of Highfrequency Financial Data in R . Brownlees C. , Gallo G. . 2006 . Financial Econometrics Analysis at Ultra-High Frequency: Data Handling Concerns . Computational Statistics and Data Analysis 51 : 2232 – 2245 . Google Scholar CrossRef Search ADS Chakravarty S. , Wood R. A. , Ness R. A. V. . 2004 . Decimals and Liquidity: A Study of the NYSE . Journal of Financial Research 27 : 75 – 94 . Google Scholar CrossRef Search ADS Chib S. , Nardari F. , Shephard N. . 2002 . Markov Chain Monte Carlo for Stochastic Volatility Models . Journal of Econometrics 108 : 281 – 316 . Google Scholar CrossRef Search ADS Chordia T. , Subrahmanyam A. . 1995 . Market Making, the Tick Size and Payment-for-Order-Flow: Theory and Evidence . Journal of Business 68 : 543 – 576 . Google Scholar CrossRef Search ADS Cordella T. , Foucault T. . 1999 . Minimum Price Variations, Time Priority and Quote Dynamics . Journal of Financial Intermediation 8 : 141 – 173 . Google Scholar CrossRef Search ADS Czado C. , Haug S. . 2010 . An ACD-ECOGARCH(1,1) model . Journal of Financial Econometrics 8 : 335 – 344 . Google Scholar CrossRef Search ADS Dahlhaus R. , Neddermeyer J. C. . 2014 . Online Spot Volatility-Estimation and Decompostion with Nonlinear Market Microstructure Noise Models . Journal of Financial Econometrics 12 : 174 – 212 . Google Scholar CrossRef Search ADS Dayri K. , Rosenbaum M. . 2013 . “ Large Tick Assets: Implicit Spread and Optimal Tick Size .” Working paper . Google Scholar CrossRef Search ADS Durbin J. , Koopman S. J. . 2002 . A Simple and Efficient Simulation Smoother for State Space Time Series Analysis . Biometrika 89 : 603 – 616 . Google Scholar CrossRef Search ADS Durbin J. , Koopman S. J. . 2012 . Time Series Analysis by State Space Methods ( 2nd edn ). Oxford : Oxford University Press . Google Scholar CrossRef Search ADS Eisler Z. , Bouchaud J. P. , Kockelkoren J. . 2012 . The Price Impact of Order Book Events: Market Orders, Limit Orders and Cancellations . Quantitative Finance 12 : 1395 – 1419 . Google Scholar CrossRef Search ADS Engle R. F. . 2000 . The Econometrics of Ultra-High-Frequency Data . Econometrica 68 : 1 – 22 . Google Scholar CrossRef Search ADS Frühwirth-Schnatter S. , Frühwirth R. , Held L. , Rue H. . 2009 . Improved Auxiliary Mixture Sampling for Hierarchical Models of Non-Gaussian Data . Statistics and Computing 19 : 479 – 492 . Google Scholar CrossRef Search ADS Frühwirth-Schnatter S. , Wagner H. . 2006 . Auxiliary Mixture Sampling for Parameter-Driven Models of Time Series of Small Counts with Applications to State Space Modeling . Biometrika 93 : 827 – 841 . Google Scholar CrossRef Search ADS Griffin J. , Oomen R. . 2008 . Sampling Returns for Realized Variance Calculations: Tick Time or Transaction Time? Econometric Reviews 27 : 230 – 253 . Google Scholar CrossRef Search ADS Johnson N. L. , Kemp A. W. , Kotz S. . 2005 . Univariate Discrete Distributions ( 3rd edn ). New Jersey : John Wiley and Sons . Google Scholar CrossRef Search ADS Kastner G. , Frühwirth-Schnatter S. . 2014 . Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models . Computational Statistics & Data Analysis 76 : 408 – 423 . Google Scholar CrossRef Search ADS Kim S. , Shephard N. , Chib S. . 1998 . Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models . Review of Economic Studies 65 : 361 – 393 . Google Scholar CrossRef Search ADS Koopman S. J. , Lit R. , Lucas A. . 2017 . Intraday Stochastic Volatility in Discrete Price Changes: The Dynamic Skellam Model . Journal of the American Statistical Association 112 : 1490 – 1503 . Google Scholar CrossRef Search ADS Müller G. , Czado C. . 2009 . Stochastic Volatility Models for Ordinal-Valued Time Series with Application to Finance . Statistical Modelling 9 : 69 – 95 . Google Scholar CrossRef Search ADS O’Hara M. , Saar G. , Zhong Z. . 2014 . “ Relative Tick Size and the Trading Environment .” Working paper . Google Scholar CrossRef Search ADS Omori Y. , Chib S. , Shephard N. , Nakajima J. . 2007 . Stochastic Volatility with Leverage: Fast Likelihood Inference . Journal of Econometrics 140 : 425 – 449 . Google Scholar CrossRef Search ADS Poirier D. J. . 1973 . Piecewise Regression Using Cubic Splines . Journal of the American Statistical Association 68 : 515 – 524 . Roberts G. O. , Rosenthal J. S. . 2009 . Examples of Adaptive MCMC . Journal of Computational and Graphical Statistics 18 : 349 – 367 . Google Scholar CrossRef Search ADS Ronen T. , Weaver D. G. . 2001 . “Teenies” Anyone? Journal of Financial Markets 4 : 231 – 260 . Google Scholar CrossRef Search ADS Rydberg T. H. , Shephard N. . 2003 . Dynamics of Trade-by-Trade Price Movements: Decomposition and Models . Journal of Financial Econometrics 1 : 2 – 25 . Google Scholar CrossRef Search ADS SEC 2012. Report to Congress on Decimalization. US Securities and Exchange Commission report. Skellam J. G. . 1946 . The Frequency Distribution of the Difference Between Two Poisson Variates Belonging to Different Populations . Journal of the Royal Statistical Society 109 : 296 . Google Scholar CrossRef Search ADS Stefanos D . 2015 . “ Bayesian Inference for Ordinal-Response State Space Mixed Models with Stochastic Volatility .” Working paper . Stroud J. R. , Johannes M. S. . 2014 . Bayesian Modeling and Forecasting of 24-hour High-Frequency Volatility . Journal of the American Statistical Association 109 : 1368 – 1384 . Google Scholar CrossRef Search ADS Weinberg J. , Brown L. D. , Jonathan R. S. . 2007 . Bayesian Forecasting of an Imhomogenous Poisson Process with Application to Call Center Data . Journal of the American Statistical Association 102 : 1185 – 1199 . Google Scholar CrossRef Search ADS Ye M. , Yao C. . 2014 . “ Tick Size Constrains, Market Structure, and Liquidity .” Working paper . Available at: http://dx.doi.org/10.2139/ssrn.2359000. (accessed April 23, 2018). © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Journal of Financial EconometricsOxford University Press

Published: May 3, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off