# A frequency domain analysis of the error distribution from noisy high-frequency data

A frequency domain analysis of the error distribution from noisy high-frequency data SUMMARY Data observed at a high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate-optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and an application to real data validate our analysis. 1. Introduction High-frequency data, observed sequentially at small time intervals, arise in various settings and applications. For example, in social and behavioural investigations, such data are often collected in so-called intensive longitudinal studies; see Bolger & Laurenceau (2013). In functional data analysis, the observations are often considered to be of one of two types: the dense setting, which corresponds to high-frequency data, and the sparse setting, where the data are of low frequency; see, among others, Müller et al. (2011) and Wang et al. (2016). In finance, analysing high-frequency intraday transaction data has received increasing attention; see Hautsch (2012), Aït-Sahalia & Jacod (2014) and the references therein. High-frequency observations are often contaminated by measurement errors. For example, in the dense functional data setting, it is common to assume that the observed discrete data are a noisy version of an underlying unknown smooth curve. In finance too, high-frequency data are often regarded as a noisy version of a latent continuous-time stochastic process, observed at consecutive discrete time-points. The latent process is considered to have a continuous path, and the measurement error represents the market microstructure noise; see Aït-Sahalia & Yu (2009). Since increasing the sampling frequency implies smaller sampling errors caused by the discretization of the underlying continuous-time process, we might expect that high-frequency data would enable more accurate inference. However, as the sampling frequency increases, the difference between nearby observations is dominated by random noise, which makes standard methods of inference inconsistent; see, for example, Zhang et al. (2005). Therefore, a main concern in high-frequency financial data analysis has been to find ways of recovering the signal of quantities of interest from noisy high-frequency observations; see Aït-Sahalia & Jacod (2014). Revealing the distributional properties of the measurement errors is crucial for recovering the signal of the continuous-time process from noisy high-frequency data. First, many estimation procedures require distributional assumptions on the measurement errors. Second, statistical inference, including hypothesis testing and confidence set estimation, inevitably involves unknown nuisance parameters determined by the distribution of the measurement errors; see, for example, Zhang (2006), Aït-Sahalia et al. (2010), Xiu (2010) and Liu & Tang (2014). The measurement errors themselves sometimes contain useful information, both theoretical and practical, so that successfully recovering the measurement error distribution can improve our understanding of data structures. For example, Dufrenot et al. (2014) argued that the microstructure noise can help us understand financial crises, and Aït-Sahalia & Yu (2009) made a connection between the microstructure noise and the liquidity of stocks. In longitudinal studies, the measurement errors can help to reveal interesting characteristics of a population, such as the way in which individuals misreport their diet. Jacod et al. (2017) recently highlighted the importance of statistical properties of the measurement errors and studied the estimation of moments. Despite these important developments, to our knowledge, as yet no method has been proposed for estimating the entire distribution of the errors. In this paper we consider frequency domain analysis of high-frequency data, focusing on the measurement errors contaminating continuous-time processes. In high-frequency observations, the relative magnitude of the changes in values of the underlying continuous-time process is small. Compared with the measurement errors, it becomes negligible, locally in a small neighbourhood of a given time-point. As a result, estimating the error distribution shares some common features with nonparametric measurement error problems with repeated measurements studied in Delaigle et al. (2008), and with nonparametric estimation from aggregated data studied by Meister (2007) and Delaigle & Zhou (2015), where the estimation techniques require working in the Fourier domain. Motivated by this, we propose to estimate the characteristic function of the measurement errors by locally averaging the empirical characteristic functions of the changes in values of the high-frequency data. We obtain a nonparametric estimator of the probability density function of the measurement errors and show that it is consistent and minimax rate-optimal. We propose a simple method for consistently estimating the moments of the measurement errors. Using our estimator of the characteristic function of the errors, we develop a new rate-optimal multi-scale frequency domain estimator of the integrated volatility of the stochastic process, a key quantity of interest in high-frequency financial data analysis. In what follows, for two sequences of real numbers $$\{a_n\}$$ and $$\{b_n\}$$, we write $$a_n\asymp b_n$$ if there exist positive constants $$c_1$$ and $$c_2$$ such that $$c_1\leqslant a_n/b_n\leqslant c_2$$ for all $$n\geqslant1$$. We denote by $$\,f*g$$ the convolution of two functions $$\,f$$ and $$g$$, defined by $$\,f*g(s)=\int f(s-\tau)g(\tau)\,{\rm d}\tau$$. 2. Methodology 2.1. Model and data We are interested in a continuous-time process $$(X_t)_{t\in[0,T]}$$ observed at high frequency with $$T>0$$. We assume that $$\,X_t$$ follows a diffusion process $${\rm d} X_t=\mu_t\,{\rm d} t+\sigma_t\,{\rm d} B_t,$$ (1) where the drift $$\mu_t$$ is a locally bounded and progressively measurable process, $$\sigma_t$$ is a positive and locally bounded Itô semimartingale, and $$B_t$$ is a standard Brownian motion. The process $$\sigma_t^2$$ represents the volatility of the process $$\,X_t$$ at time $$t$$, and is often investigated in its integrated form $$\int_0^T \sigma_t^2\,{\rm d} t$$, called the integrated volatility. Remark 1. The model (1) is often used when $$\,X_t=\log S_t$$, where $$S_t$$ denotes the price process of an equity; see, for example, Zhang et al. (2005). It is also used to model applications in biology, physics and many other fields; see Olhede et al. (2009). All theoretical properties of our estimators are derived under this model, but the methods developed in § § 2.2 and 2.4 could be applied to other types of processes $$\,X_t$$, such as the smooth ones typically encountered in the functional data literature. The key property that makes our methods consistent is the continuity of the underlying process $$\,X_t$$, but the convergence rates of our estimators depend on more specific assumptions, such as those implied by the model (1). Our data are observed on a generic discrete grid $$\mathcal {G}=\{t_0,\ldots,t_n\}$$ of time-points where, without loss of generality, we let $$t_0=0$$ and $$t_n=T$$. The observed data are contaminated by additive measurement errors, so that what we observe is a sample $$\{Y_{t_j}\}_{j=0}^n$$ where $$Y_{t_j}=X_{t_j}+U_{t_j}\text{.}$$ (2) A conventional assumption when analysing noisy high-frequency data is that the random measurement error $$U_t$$ is independent of $$\,X_t$$; see Aït-Sahalia et al. (2010), Xiu (2010) and Liu & Tang (2014). This is also a standard assumption in the measurement error and the functional data literature; see Carroll & Hall (1988), Stefanski & Carroll (1990) and Wang et al. (2016). Likewise, we make the following assumption. Assumption 1. The errors $$\{U_{t_j}\}_{j=0}^n$$ are independently and identically distributed with unknown density $$\,f_U$$, and are independent of the process $$(X_t)_{t\in[0,T]}$$. We are interested in deriving statistical properties of the noise term $$U_{t_j}$$ when $$T$$ is fixed and the frequency of observations increases, that is, when $$\max_{1\leqslant j\leqslant n}\Delta t_j\rightarrow0$$ as $$n\rightarrow\infty$$, where $$\Delta t_j=t_j-t_{j-1}$$ for $$j=1,\ldots,n$$. Here, the time-points $$t_j$$ do not need to be equispaced. Formally, we make the following assumption. Assumption 2. As $$n\rightarrow\infty$$, $$\min_{1\leqslant j\leqslant n}\Delta t_j/\max_{1\leqslant j\leqslant n}\Delta t_j$$ is uniformly bounded away from zero and infinity. Throughout, we use $$\,f^{\mathrm{Ft}}$$ to denote the Fourier transform of a function $$\,f$$ and make the following assumption on the characteristic function of the errors, which is almost always assumed in the related nonparametric measurement error literature: Assumption 3. $$\,f_U^{\mathrm{Ft}}$$ is real-valued and does not vanish at any point on the real line. 2.2. Estimating the error density $$\,f_U$$ Motivated by our discussion in § 1, we wish to estimate the error density $$\,f_U$$. At a given $$t_j$$, if we had access to repeated noisy measurements of $$\,X_{t_j}$$, say $$Y_{t_j,k}=X_{t_j}+U_{t_j,k}$$ for $$k=1,\ldots,r$$, where the $$U_{t_j,k}$$s are independent and each $$U_{t_j,k}\sim f_U$$, then for $$\ell\neq k$$ we would have $$(Y_{t_j,\ell}-Y_{t_j,k})\sim f_U*f_U$$. Under Assumption 3, using the approach in Delaigle et al. (2008) we could estimate $$\,f_U^{\mathrm{Ft}}$$ by the square root of the empirical characteristic function of the $$(Y_{t_j,\ell}-Y_{t_j,k})$$s; then, by Fourier inversion, we could deduce an estimator of $$\,f_U$$. However, for high-frequency data, at each given $$t_j$$ we have access to only one contaminated measurement $$Y_{t_j}$$. Therefore, the above technique cannot be applied. But since $$(X_t)_{t\in[0,T]}$$ is a continuous-time and continuous-path stochastic process, $$|X_{t+h}-X_t|\rightarrow0$$ almost surely as $$h\rightarrow0$$. Thus, the collection of observations $$\{Y_{t_\ell}\}$$, where $$t_\ell$$ lies in a small neighbourhood $$\mathcal N$$ of $$t_j$$, can be approximately viewed as repeated measurements of $$\,X_{t_j}$$ contaminated by independently and identically distributed errors $$\{U_{t_\ell}\}$$. As the sampling frequency increases, we have multiple observations in smaller and smaller neighbourhoods $$\mathcal N$$, which suggests that the density of $$Y_{t_\ell}-Y_{t_j}$$, for $$t_j\neq t_\ell \in \mathcal N$$, gets closer and closer to $$\,f_U*f_U$$. Therefore, we can expect that as the sample frequency increases, the approach suggested by Delaigle et al. (2008), applied to the ($$Y_{t_\ell}-Y_{t_j}$$)s for $$t_\ell$$ and $$t_j$$ sufficiently close, can provide an increasingly accurate estimator of $$\,f_U$$. We shall prove in § 2.3 that this heuristic is correct as long as the $$t_\ell$$s and $$t_j$$s are carefully chosen, which we characterize through a distance $$\xi$$. For $$\mathcal {G}$$ defined in § 2.1 and $$\xi>0$$, we define $$S_j=\{t_\ell\in \mathcal {G}:|t_\ell-t_j|\leqslant \xi\!\quad\text{and}\!\quad \ell\neq j\}\quad (\,j=0,\ldots,n)$$ (3) and denote by $$N_j$$ the number of points in $$S_j$$. For a fixed $$T$$, Assumption 2 implies that $$\min_{1\leqslant j\leqslant n}\Delta t_j\asymp\max_{1\leqslant j\leqslant n}\Delta t_j\asymp n^{-1}$$, so that $$\max_{1\leqslant j\leqslant n}N_j\asymp\min_{1\leqslant j\leqslant n}N_j\asymp n\xi$$. Following the discussion above and recalling Assumption 3, for a given $$\xi$$ we define our estimator of $$\,f_U^{\mathrm{Ft}}(s)$$ by the square root of the real part of the empirical characteristic function of the difference between nearby $$Y_{t_{j}}$$s: $$\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)=\Bigg|\frac{1}{N(\xi)}\sum_{j=0}^n\sum_{t_\ell\in S_j}\cos\{s(Y_{t_{\ell}}-Y_{t_{j}})\}\Bigg|^{1/2},$$ (4) where $$N(\xi)=\sum_{j=0}^n N_j$$. Here $$\xi$$ can be viewed as a parameter controlling the trade-off between the bias and the variance of $$\,\hat{f}_{U,1}^{\mathrm{Ft}}$$: a smaller $$\xi$$ gives a smaller bias but also results in a smaller $$N(\xi)$$ so that the variance is higher. On the other hand, a larger $$\xi$$ induces a lower variance but comes at the price of a larger bias due to the contribution from the dynamics in $$\,X_t$$. The choice of $$\xi$$ in practice will be discussed in § 3.1. It follows from the Fourier inversion theorem that $$\,f_U(x)=(2\pi)^{-1}\int \exp(-{{\rm i}}sx) f_{U}^{\mathrm{Ft}}(s)\,{\rm d} s$$, where $${\rm i}^2=-1$$. We can obtain an estimator of $$\,f_U$$ by replacing $$\,f_{U}^{\mathrm{Ft}}(s)$$ with $$\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)$$ in this integral. However, since $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ is an empirical characteristic function, it is unreliable when $$|s|$$ is large. For the integral to exist, $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ needs to be multiplied by a regularizing factor that puts less weight on large $$|s|$$. As the sample size increases, $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ becomes more reliable, and this should be accounted for by letting the weight depend on the sample size. Using standard kernel smoothing techniques, this can be implemented by taking \begin{equation*} \hat{f}_{U,2}^{\mathrm{Ft}}(s;\xi)=\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\,\mathcal{K}^{\mathrm{Ft}}(sh), \end{equation*} where $$\mathcal{K}^\mathrm{Ft}$$ is the Fourier transform of a kernel function $$\mathcal{K}$$ and $$h>0$$ is a bandwidth parameter that satisfies $$h\rightarrow0$$ as $$n\rightarrow\infty$$. Then, we define our estimator of $$\,f_U(x)$$ by $$\hat{f}_U(x;\xi)=\frac{1}{2\pi}\int \exp(-{{\rm i}}sx)\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\mathcal{K}^{\mathrm{Ft}}(sh)\,{\rm d} s=\frac{1}{2\pi}\int \exp(-{{\rm i}}sx)\hat{f}_{U,2}^{\mathrm{Ft}}(s;\xi)\,{\rm d} s\text{.}$$ (5) For consistency of a kernel density estimator, the kernel function $$\mathcal{K}$$ needs to be symmetric and integrate to unity. In practice, the estimator is often not very sensitive to the choice of kernel compared with the choice of bandwidth. Popular choices are the Gaussian kernel, i.e., the standard normal density, and the sinc kernel, whose Fourier transform is $$\mathcal{K}^{\mathrm{Ft}}(s)=I(|s|\leqslant 1)$$ where $$I(\cdot)$$ denotes the indicator function. In practice, an advantage of the Gaussian kernel is that it can produce visually attractive, smoother estimators than the sinc kernel. The sinc kernel is more advantageous analytically; see § 2.3. 2.3. Properties of the density estimator Assume that the continuous-time process $$(X_t)_{t\in[0,T]}$$ in (1) belongs to the class $$\mathcal{X}(C_1)$$ for some $$C_1>0$$, where \begin{equation*} \begin{split} \mathcal{X}(C_1)=\Big\{(X_{t})_{t\in[0,T]}: \,X_t \,\textrm{satisfies (1) with}\sup_{0\leqslant t\leqslant T}E(\mu_t^4)\leqslant C_1\,\textrm{and}\sup_{0\leqslant t\leqslant T}E(\sigma_t^4)\leqslant C_1\Big\}, \end{split} \end{equation*} and that $$\,f_U$$ belongs to the class \begin{equation*} \begin{split} \mathcal{F}_1(\alpha,C_2)=\big\{\,f:\, |\,f^{\mathrm{Ft}}(s)|\leqslant C_2(1+|s|)^{-\alpha}\,\textrm{for all real}\,s\big\} \end{split} \end{equation*} for some constants $$\alpha>0$$ and $$C_2>1$$. This class is rich; for example, it contains the functions that have at least $$\alpha-1$$ square-integrable derivatives. Characterizing error distributions through their Fourier transforms is standard in nonparametric measurement error problems because it is key to deriving precise asymptotic properties of the estimators. We consider the sinc kernel $$\mathcal{K}$$ introduced in the paragraph following (5). Using this kernel simplifies our presentation of the theoretical derivations in two respects: its Fourier transform simplifies calculations, and it is a so-called infinite-order kernel, which implies that the bias of the resulting nonparametric curve estimators depends only on the smoothness of the target curve. Thus the sinc kernel has adaptive properties and automatically ensures optimal convergence rates. In contrast, the bias of estimators based on finite-order kernels, such as the Gaussian kernel, depends on both the order of the kernel and the smoothness of the target curve, which means that various smoothness subcases need to be considered when deriving properties. For any two square-integrable functions $$\,f$$ and $$g$$, let $$\|\,f-g\|_2=(\int|\,f-g|^2)^{1/2}$$. Proposition 1 gives the convergence rate of $$\,\hat{f}_U$$, defined in (5), to the true density function $$\,f_U$$. Proposition 1. Let $$(X_t)_{t\in[0,T]}\in\mathcal{X}(C_1)$$ and assume that the errors $$\{U_{t_j}\}_{j=0}^n$$ satisfy Assumptions $$\rm{1}$$ and $$\rm{3}$$ and that $$\,f_U\in \mathcal{F}_1(\alpha,C_2)$$. Let $$\mathcal {P}_1(\alpha,C_1,C_2)$$ denote the collection of models for $$(Y_t)_{t\in[0,T]}$$ such that $$Y_t=X_t+U_t$$. Under Assumption $$\rm{2}$$ and with the sinc kernel $$\mathcal{K}$$, if $$\alpha>3/2$$, then for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\hat{f}_U-f_U\|_2^2\big) \leqslant C\bigl(n^{-1}\xi^{-1/2}h^{-1}+n^{-1/2}+\xi+h^{2\alpha-1}\bigr)\text{.}$ Proposition 1 shows that the $$L_2$$ convergence rate of $$\,\hat{f}_U$$ to $$\,f_U$$ is affected by $$\xi$$, the length of each block $$S_j$$, and the bandwidth $$h$$. The next theorem shows that for appropriate choices of $$\xi$$ and $$h$$, the convergence rate attains $$n^{-1/2}$$. Theorem 1. If the conditions of Proposition $$\rm{1}$$ hold and we take $$\xi\asymp n^{-\delta_1}$$ and $$h\asymp n^{-\delta_2}$$, where $$\delta_1>0$$ and $$\delta_2>0$$ are such that $$\delta_1+2\delta_2\leqslant1$$, $$\delta_1\geqslant{1}/{2}$$ and $$\delta_2\geqslant (4\alpha-2)^{-1}$$, then for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\,\hat{f}_U-f_U\|_2^2\big)\leqslant Cn^{-1/2}\text{.}$ From Theorem 1 we learn that as long as $$\alpha>3/2$$, the convergence rate of $$\,\hat{f}_U$$ does not depend on $$\alpha$$. This is strikingly different from standard nonparametric density estimation problems, where convergence rates typically depend on the smoothness of the target density: the smoother the density, the faster the rate. For example, if we had access to the ($$U_{t_\ell}-U_{t_j}$$)s directly instead of just $$Y_{t_\ell}-Y_{t_j}=X_{t_\ell}-X_{t_j}+U_{t_\ell}-U_{t_j}$$, then we could apply the technique suggested by Meister (2007) and the convergence rate would increase with $$\alpha$$. However, in our case, the nuisance contribution due to the ($$X_{t_\ell}-X_{t_j}$$)s makes it impossible to reach rates faster than $$n^{-1/2}$$, even if $$\alpha$$ is very large. This is demonstrated in the next theorem, which shows that the $$n^{-1/2}$$ rate derived in Theorem 1 is minimax optimal. Theorem 2. Denote by $$\breve{\mathcal{F}}$$ the class of all measurable functionals of the data. Under the conditions in Proposition $$\rm{1}$$, for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\inf_{\hat{f}\in\breve{\mathcal{F}}}\: \sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\,\hat{f}-f_U\|_2^2\big)\geqslant C n^{-1/2}\text{.}$ 2.4. Estimating the moments of the microstructure noise We can deduce estimators of the moments of the microstructure noise $$U_{t_j}$$ from the density estimator derived in § 2.2, but proceeding in that way is unnecessarily complex. Recall from § 2.2 that when $$t_\ell\neq t_j$$ are close, $$Y_{t_\ell}-Y_{t_j}$$ behaves approximately like $$U_{t_\ell}-U_{t_j}\sim f_{\tilde U}=f_U*f_U$$, where $$U$$ and $$\tilde U$$ denote generic random variables with the same distributions as, respectively, $$U_{t_j}$$ and $$U_{t_\ell}-U_{t_j}$$. This suggests that we could estimate the moments of $$\tilde U$$ by the empirical moments of $$Y_{t_\ell}-Y_{t_j}$$ and, from these, deduce estimators of the moments of $$U$$. For each integer $$k\geqslant 1$$, let $$M_{U,k}$$ and $$M_{\tilde{U},k}$$ denote the $$k$$th moments of $$U$$ and $$\tilde U$$, respectively. Since $$\,f_U$$ is symmetric, $$M_{U,2k-1}$$ and $$M_{\tilde{U},2k-1}$$ are equal to zero for all $$k\geqslant1$$, and we only need to estimate even-order moments. For each $$k\geqslant1$$, we start by estimating $$M_{\tilde{U},2k}$$ by \begin{equation*} \hat{M}_{\tilde{U},2k}(\xi)= \frac{1}{N(\xi)}\sum_{j=0}^n\sum_{t_\ell\in S_j}(Y_{t_\ell}-Y_{t_j})^{2k}\text{.} \end{equation*} This is directly connected to our frequency domain analysis: it is easily proved that $$\hat{M}_{\tilde{U},2k}(\xi)=(-{\rm i})^{2k}\{\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(0;\xi)\}^{(2k)}$$, where $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}=(\,\hat{f}_{U,1}^{\mathrm{Ft}})^2$$ is an estimator of $$\,f_{\tilde U}^\mathrm{Ft}$$, with $$\,\hat{f}_{U,1}^{\mathrm{Ft}}$$ as in (4). Exploiting the fact that $$U_{t_\ell}-U_{t_j}\sim f_{\tilde U}$$, we can write $$M_{\tilde{U},2k}=\sum_{j=0}^kC_{2k}^{2j}M_{U,2j}M_{U,2k-2j}$$ where $$C_{2k}^{2j}=(2k)!/\{(2j)!(2k-2j)!\}$$, from which it can be deduced that \begin{equation*} M_{U,2k}=\frac{1}{2}\Biggl(M_{\tilde{U},2k}-\sum_{j=1}^{k-1}C_{2k}^{2j}M_{U,2j}M_{U,2k-2j}\Biggr)\text{.} \end{equation*} Therefore, we can use an iterative procedure to estimate the $$M_{U,2k}$$s. First, for $$k=1$$, we take $$\hat{M}_{U,2}(\xi)=\hat{M}_{\tilde{U},2}(\xi)/2$$. Then, for $$k>1$$, given $$\hat{M}_{U,2}(\xi),\ldots, \hat{M}_{U,2(k-1)}(\xi)$$ we take $$\hat{M}_{U,2k}(\xi)=\frac{1}{2}\Biggl\{\hat{M}_{\tilde{U},2k}(\xi)-\sum_{j=1}^{k-1}C_{2k}^{2j}\hat{M}_{U,2j}(\xi)\hat{M}_{U,2k-2j}(\xi)\Biggr\}\text{.}$$ (6) Remark 2. When $$k=1$$, $$M_{U,2}=M_{\tilde{U},2}/2$$ is equal to the variance of $$U_t$$, and our estimator is very similar to the so-called difference-based variance estimator often employed in related nonparametric regression problems; see, for example, Buckley et al. (1988) and Hall et al. (1990). The next theorem establishes the convergence rate of $$\hat{M}_{U,2k}(\xi)$$. Its proof follows from the convergence rates of the $$\hat{M}_{\tilde{U},2l}(\xi)$$s. Theorem 3. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, for any integer $$k\geqslant1$$, if $$E(\int_0^T\mu_s^{2k}\,{\rm d} s)<\infty$$ and $$E(\int_0^T\sigma_s^{2k}\,{\rm d} s)<\infty$$ hold and if there exists $$p\in(2,3]$$ such that $$M_{U,2kp}<\infty$$, then $$\hat{M}_{U,2k}(\xi)=M_{U,2k}+O_{\rm p}(n^{-1/2})$$ provided that $$\xi=o[n^{-p/\{2(p-1)\}}]$$. Next, we derive the asymptotic joint distribution of the proposed moment estimators. Let $$W=(W_1,\ldots,W_k)^{ \mathrm{\scriptscriptstyle T} }$$ be a random vector with a $$N(0,\Sigma_k)$$ distribution, where the $$(l_1,l_2)$$th element ($$l_1, l_2=1,\ldots,k$$) of $$\Sigma_k$$ is equal to $e_{l_1l_2}=\lim_{n\rightarrow\infty}E\!\left(\!\left[\,\sum_{j=0}^n\sum_{t_\ell\in S_j}\frac{(U_{t_\ell}-U_{t_j})^{2l_1}-M_{\tilde{U},2l_1}}{\{N(\xi)V_{2l_1}(\xi)\}^{1/2}}\right]\!\!\left[\sum_{j=0}^n\sum_{t_\ell\in S_j}\frac{(U_{t_\ell}-U_{t_j})^{2l_2}-M_{\tilde{U},2l_2}}{\{N(\xi)V_{2l_2}(\xi)\}^{1/2}}\right]\right)\!,$ with $$V_{2l}(\xi)=\mbox{var}[ \{N(\xi)\}^{-1/2}\sum_{j=0}^n\sum_{t_\ell\in S_j}\{(U_{t_{\ell}}-U_{t_j})^{2l}-M_{\tilde{U},2l}\}]$$. Recalling that $$N(\xi)\asymp n^2\xi$$ and noting that $$V_{2l}(\xi)\asymp n\xi$$, let \begin{equation*} a_l=\lim_{n\to\infty} \{nV_{2l}(\xi)/N(\xi)\}^{1/2}/2\quad (l=1,\ldots,k)\text{.} \end{equation*} The next theorem establishes the asymptotic joint distribution of our moment estimators. It can be used to derive confidence regions for $$(M_{U,2},\ldots,M_{U,2k})^{ \mathrm{\scriptscriptstyle T} }$$. Theorem 4. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, for any integer $$k\geqslant1$$, if $$E(\int_0^T\mu_s^{2k}\,{\rm d} s)<\infty$$ and $$E(\int_0^T\sigma_s^{2k}\,{\rm d} s)<\infty$$ and if there exists $$p\in(2,3]$$ such that $$M_{U,2kp}<\infty$$, then $n^{1/2}\big\{\hat{M}_{U,2}(\xi)-M_{U,2},\ldots,\hat{M}_{U,2k}(\xi)-M_{U,2k}\big\}^{ \mathrm{\scriptscriptstyle T} } \rightarrow Q=(Q_{1},\ldots, Q_{k})^{ \mathrm{\scriptscriptstyle T} }$ in distribution as $$n\rightarrow\infty$$, provided that $$\xi=o(n^{-p/\{2(p-1)\}})$$, where \begin{equation*} Q_1 = a_1W_1,\quad\; Q_l = a_lW_l-\sum_{j=1}^{l-1}C_{2l}^{2j}M_{U,2l-2j}Q_j\quad (2\leqslant l\leqslant k)\text{.} \end{equation*} 2.5. Efficient integrated volatility estimation We have demonstrated that our frequency domain analysis can be used to estimate the error density, which is difficult to estimate in the time domain. Since frequency domain approaches are unusual in high-frequency financial data analysis, a natural question is whether they can lead to an efficient estimator of the integrated volatility $$\int_0^T\sigma_t^2\,{\rm d} t$$, with $$\sigma_t$$ as in (1). The integrated volatility is a key quantity of interest in high-frequency financial data analysis; it represents the variability of a process over time. It is well known that in cases like ours where the data are observed with microstructure noise, the integrated volatility cannot be estimated using standard procedures, which are dominated by contributions from the noise. In such situations, one way of removing the bias caused by the noise is through multi-scale techniques; see Olhede et al. (2009) for a very nice description. Zhang (2006) and Tao et al. (2013) have successfully applied those methods to correct the bias of estimators in the time domain, and Olhede et al. (2009) have proposed a consistent discrete frequency domain estimator in the case where the data are observed at equispaced times. Below we show that these techniques can be applied in our continuous frequency domain context too, even if the observation times are not restricted to be equispaced. The real part of the empirical characteristic function $$n^{-1}\sum_{j=1}^n\exp\{{\rm i}s(Y_{t_j}-Y_{t_{j-1}})\}$$ is such that $$\begin{split} \sum_{j=1}^n\cos\{s(Y_{t_j}-Y_{t_{j-1}})\} =& \sum_{j=1}^n\cos\{s(U_{t_j}-U_{t_{j-1}})\}-\frac{s^2}{2}f_{\tilde{U}}^\mathrm{Ft}(s)\int_0^T\!\!\sigma_t^2\,{\rm d} t+O_{\rm p}(n^{-1/2})\text{.} \end{split}$$ (7) The second term on the right-hand side of (7) contains the integrated volatility, but the first term dominates because its mean is $$nf_{\tilde U}^\mathrm{Ft}(s)$$. This suggests that the integrated volatility could be estimated from $$\sum_{j=1}^n\exp\{{\rm i} s(Y_{t_j}-Y_{t_{j-1}})\}$$ if we could eliminate that first term. This can be done by applying, to the frequency domain, the multi-scale technique used by Zhang (2006) and Tao et al. (2013). We define a function $$G(s)$$ which combines the empirical characteristic functions calculated at different sampling frequencies in such a way as to eliminate the first term on the right-hand side of (7) while keeping the second. For $$N=\lfloor (n+1)^{1/2} \rfloor$$, we define $G(s)=\sum_{m=1}^N a_m \hat\phi^{K_m}(s)+\zeta \bigl\{\hat\phi^{K_1}(s)-\hat\phi^{K_2}(s)\bigr\}$ where, as in Zhang (2006), $$K_m=m$$, $$a_m=12K_m(m-N/2-1/2)/\{N(N^2-1)\}$$, $$\zeta=K_1K_2/\{(n+1)(K_2-K_1)\}$$, and $$\hat\phi^{K_m}(s)=K_m^{-1}\sum_{\ell=K_m}^n \exp\{{\rm i} s(Y_{t_\ell}-Y_{t_{\ell-K_m}})\}$$. We could also select $$K_m$$, $$a_m$$ and $$\zeta$$ as in Tao et al. (2013). The following proposition shows that the real part of $$G(s)$$, $$\mathrm{Re}\{G(s)\}$$, can be used to approximate the second term of (7). Proposition 2. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, if $$E(U_t^2)<\infty$$, then there exist second-order differentiable functions $$\tau_1(s)$$ and $$\tau_2(s)$$ such that for any $$s\in \mathbb{R}$$, $\left|\mathrm{Re}\{G(s)\}+\frac{s^2}{2}f_{\tilde{U}}^{\mathrm{Ft}}(s)\int_0^T\sigma_t^2\,{\rm d} t\right|=\tau_1(s)\, O_{\rm p}(n^{-1/4})+\tau_2(s)\, O_{\rm p}(n^{-1/2}),$where the terms $$O_{\rm p}(n^{-1/4})$$ and $$O_{\rm p}(n^{-1/2})$$ are independent of $$s$$, and $$\lim_{s\to 0}|\tau_1''(s)|\leqslant C$$ and $$\lim_{s\to 0}|\tau_2''(s)|\leqslant C$$ for some positive constant $$C$$. Since the function $$G(s)$$ depends only on the data, it is completely known. Moreover, we have seen in § 2.4 that we could estimate $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ by $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)=\{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\}^2$$. Finally, although the proposition holds for all $$s\in \mathbb{R}$$, the remainders are smaller when $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ is close to 1, especially since $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ is more reliable in that case too. Therefore, for $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ close to 1, Proposition 2 can be used to compute an estimator of the integrated volatility, $$\int_0^T\sigma_t^2\,{\rm d} t$$. We propose a regression-type approach as follows. For some $$s_1,\ldots,s_m$$ such that $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ is close to 1, we consider the fixed design regression problem \begin{equation*} \mathrm{Re}\{G(s_j)\}=-\frac{s_j^2}{2}\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s_j;\xi)\cdot \beta+\epsilon_j\quad (\,j=1,\ldots,m), \end{equation*} where $$\epsilon_j$$ represents the regression error and $$\beta=\int_0^T\sigma_t^2\,{\rm d} t$$. Applying a linear regression of $$\mathrm{Re}\{G(s_j)\}$$ on $$-s_j^2\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s_j;\xi)/2$$, we estimate $$\int_0^T\sigma_t^2\,{\rm d} t$$ by $$\hat \beta$$, the least squares estimator of $$\beta$$. For any fixed $$s\in \mathbb{R}$$, it can be shown that $$|\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)-f_{\tilde{U}}^{\mathrm{Ft}}(s)|=O_{\rm p}(n^{-1/2}+\xi)$$. If we select $$\xi=O(n^{-1/4})$$ in (3), then Proposition 2 still holds if we replace $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ by $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$. The next result establishes the convergence rate of $$\hat\beta$$. Theorem 5. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, if $$E(U_t^2)<\infty$$ and we select $$\xi=O(n^{-1/4})$$, then $\hat{\beta}-\int_0^T\sigma_t^2\,{\rm d} t=O_{\rm p}(n^{-1/4})\text{.}$ The convergence rate stated in Theorem 5 is optimal in the sense of Gloter & Jacod (2001) and is the same as for the time domain estimator of Tao et al. (2013). Hence, our frequency domain method is rate-efficient for estimating the integrated volatility. 3. Numerical study 3.1. Practical implementation of the density estimator To compute the density estimator $$\,\hat f_U$$ in (5), we need to choose the bandwidth $$h$$ and the parameter $$\xi$$. In our problem, doing this is much more complex than for standard nonparametric density estimators, since, unlike in that case, we do not have direct access to data from our target $$\,f_U$$; therefore we cannot use existing smoothing parameter selection methods, which all require noise-free data. Moreover, unlike in standard nonparametric problems, we do not have access to a formula measuring the distance between $$\,f_U$$ and its estimator. Similar difficulties arise in the classical errors-in-variables problem, where one is interested in a density $$\,f_V$$ but observes only data on $$W=V+\varepsilon$$, where $$V\sim f_V$$ is independent of $$\varepsilon\sim f_\varepsilon$$ with $$\,f_\varepsilon$$ known. Delaigle & Hall (2008) proposed choosing the bandwidth $$h$$ using a method called simulation approximation. Instead of computing $$h$$ for $$\,f_V$$, they approximate $$h$$ by extrapolating the bandwidths for estimating two other densities, $$\,f_{1}$$ and $$\,f_{2}$$, that are related to $$\,f_V$$. The rationale of the extrapolation scheme is to exploit the analogous relationships between $$\,f_1$$, $$\,f_2$$ and $$\,f_V$$. Our problem is different because our estimator of $$\,f_U$$ is completely different from that in Delaigle & Hall (2008), and we do not know $$\,f_X$$. Therefore we cannot apply their method directly; nevertheless, here we propose a method in the same spirit. In particular, we consider two density functions $$\,f_{1}(\cdot)=2f_U(\surd2\, \cdot)*f_U(\surd2\, \cdot)$$ and $$\,f_{2}=2f_{1}(\surd2\, \cdot)*f_{1}(\surd2\, \cdot)$$, and observe that the way in which $$\,f_2$$ and $$\,f_1$$ are connected mimics the way in which $$\,f_1$$ and $$\,f_U$$ are connected. As shown in the Appendix, data from both $$\,f_1$$ and $$\,f_2$$ can be made accessible, so that one may perform bandwidth selection for estimating them. We then choose the bandwidth for estimating $$\,f_U$$ by an extrapolation with a ratio adjustment from the bandwidths for estimating $$\,f_1$$ and $$\,f_2$$. The algorithms for the bandwidth selection are given in the Appendix, and Algorithm A1 summarizes the main steps. Specifically, for $$k=1,2$$, our procedure requires the construction of variables $$\Delta Y_{j,\ell}^{_k}$$ and time-points $$t_j^{_k}$$, which are defined in, respectively, Algorithms A2 and A3. Step (c) of Algorithm A1 requires choosing values of $$(h,\xi)$$, say $$(h_{k},\xi_{k})$$ for $$k=1,2$$, for estimating $$\,f_{k}$$ by $$\,\hat f_{k}$$, where $$\,\hat{f}_{k}$$ denotes our density estimator in (5) applied to the $$\Delta Y_{j,\ell}^{_k}$$s. The idea is that if we knew $$\,f_{1}$$, we would choose $$(h_{1},\xi_{1})$$ by minimizing the integrated squared error of $$\,\hat{f}_{1}$$, i.e., $$(h_{1},\xi_{1})=\mathop{\arg\min}_{(h,\xi)} \int \{\,\hat f_{1}(x;\xi) - f_{1}(x)\}^2\,{\rm d} x$$. In practice we do not know $$\,f_{1}$$, but we can construct a relatively accurate estimator of it, namely the standard kernel density estimator $$\,\tilde f_{1}$$ of $$\,f_{1}$$ applied to the data $$\Delta_{Y,\,j}=(Y_{t_{j+1}}-Y_{t_{j}})/\surd 2\approx (U_{t_{j+1}}-U_{t_{j}})/\surd 2$$$$(\,j=0,\ldots,n-1)$$. This is computed by using the Gaussian kernel with bandwidth selected by the method of Sheather & Jones (1991). Using arguments similar to those in Delaigle (2008), under mild conditions, $$\,\tilde{f}_{1}(x)=f_{1}(x)+O_{\rm p}(n^{-2/5})$$, whereas with the best possible choice of $$(h,\xi)$$, $$\:\hat f_{1}(x)=f_{1}(x)+O_{\rm p}(n^{-1/4})$$, where the rate $$n^{-1/4}$$ cannot be improved. Thus, $$\,\tilde f_{1}$$ converges to $$\,f_{1}$$ faster than $$\,\hat f_{1}$$ does. This motivates us to approximate $$(h_{1},\xi_{1})$$ defined above by $$(h_{1},\xi_{1})=\mathop{\arg\min}_{(h,\xi)} \int \{\,\hat f_{1}(x;\xi) -\tilde f_{1}(x)\}^2\,{\rm d} x\text{.}$$ (8) Paralleling the arguments in Delaigle & Hall (2008), it is more important to extrapolate the bandwidth $$h$$ than $$\xi$$. Motivated by their results, we take $$\xi_{2}=\xi_{1}$$. To choose $$h_{2}$$, let $$\,\tilde f_{2}$$ be the standard kernel density estimator with the Gaussian kernel and bandwidth selected by the method of Sheather & Jones (1991), applied to the data $$\Delta_{Y,\,j,2}= \big\{\Delta_{Y,\,j}-\Delta_{Y,k(\,j)}\big\}/\surd 2$$, where $$k(\,j)$$ is chosen at random from $$0,\ldots,n-1$$. We choose $$\Delta_{Y,\,j,2}$$ in this way rather than $$\Delta_{Y,\,j,2}=(\Delta_{Y,\,j}-\Delta_{Y,\,j+2})/\surd 2$$ to prevent accumulated residual $$\,X_t$$ effects. Then we take $$h_{2}=\mathop{\arg\min}_{h} \int \{\,\hat f_{2}(x;\xi_{1}) - \tilde f_{2}(x)\}^2\,{\rm d} x\text{.}$$ (9) Since $$\,\tilde f_1$$ and $$\,\tilde f_2$$ converge faster than $$\,\hat{f}_1$$ and $$\,\hat{f}_2$$, they can be computed using less data than the latter. Therefore, when the time-points are widely unequally spaced, to compute the $$\,\tilde f_k$$s we suggest using only a fraction, say one-quarter, of the $$\Delta_{Y,\,j}$$ which correspond to the smallest $$t_j-t_{j+1}$$ values; that is, we use less but more accurate data for computing the $$\,\tilde f_k$$s. Finally, as described in step (d) of Algorithm A1, we obtain our bandwidth for estimating $$\,f_U$$ by an extrapolation with a ratio adjustment, $$\hat h = h_{1}^2/ h_{2}$$, and we take $$\hat\xi=\xi_1$$. This method is not guaranteed to give the best possible bandwidth for estimating $$\,f_U$$, but it provides a sensible approximation for a problem that seems otherwise very hard, if not impossible, to solve. Theorem 1 implies that we have a lot of flexibility in choosing $$h$$, but it is impossible to know whether our bandwidth lies in the optimal range without knowing the exact orders of $$h_1$$ and $$h_2$$. However, we cannot determine these orders without deriving complex second-order asymptotic results. 3.2. Simulations We applied our method to data simulated from stochastic volatility models. We generated the data $$Y_{t_0},\ldots,Y_{t_n}$$ as in (2). Following the convention that a financial year has 252 active days, we took $$t\in[0,T]$$ with $$T=1/252$$, which represents one day of financial activity. We took time-points every $$\Delta s$$ seconds, where $$\Delta s=30$$, 5 or 1. Using the convention of $$6{\cdot}5$$ business hours in a trading day, this means that we took the $$t_j$$s to be equally spaced by $$\Delta s/(252\times60\times60\times6{\cdot}5)$$ and that $$n$$ was equal to $$60\times60\times6{\cdot}5/\Delta s$$. We generated the microstructure noise $$U_t$$ according to a normal or scaled $$t$$ distribution, and for the $$\,X_t$$ we used the Heston model ${\rm d} X_t=\sigma_t\,{\rm d} B_t,\quad {\rm d}\sigma_t^2 = \kappa(\tau-\sigma_t^2)\,{\rm d} t+\gamma \sigma_t\,{\rm d} W_t\,,$ where $$E({\rm d} B_t\, {\rm d} W_t)=\rho\, {\rm d} t$$ and $$\kappa$$, $$\tau$$, $$\gamma$$ and $$\rho$$ are parameters. As in Aït-Sahalia & Yu (2009), we set the drift part of $$\,X_t$$ to zero. The effect of the drift function is asymptotically negligible; see, for example, Xiu (2010). We considered two models, with values similar to those used by Aït-Sahalia & Yu (2009), which reflect practical scenarios in finance (see also Xiu, 2010; Liu & Tang, 2014): (i) $$(\kappa, \tau, \gamma, \rho)=(6,0{\cdot}16,0{\cdot}5,-0{\cdot}6)$$; (ii) $$(\kappa, \tau, \gamma, \rho)=(4, 0{\cdot}09, 0{\cdot}3, -0{\cdot}75)$$. In each case we took $$\,X_0=\log(100)$$ and considered $$U_t\sim N(0,\sigma_U^2)$$ and $$U_t\sim \sigma_U\, t(8)$$, where $$\sigma_U=0{\cdot}001$$ and $$\sigma_U=0{\cdot}005$$. Typical $$\,X_t$$ and $$Y_t$$ paths for each model, plotted in the Supplementary Material, show that the $$\,X_t$$ paths have smaller variation in model (ii) than in model (i). The $$\,X_t$$ paths with smaller variation have less nuisance impact on estimators of $$U_t$$-related quantities. Thus, estimating the moments and density of $$U_t$$ should be easier in model (ii). In each setting, we generated 1000 samples of the form $$Y_{t_0},\ldots,Y_{t_n}$$ and applied our estimator of the density $$\,f_U$$ to each sample, thus obtaining 1000 density estimators $$\,\hat f_U$$ computed as in (5). We chose the smoothing parameters as in § 3.1, and used the sinc kernel defined below (5). However, while this kernel guarantees optimal theoretical properties, in practice it produces negative wiggles in the tails, which we truncate to zero since $$\,f_U$$ is a density. In the Supplementary Material, we show the results obtained when using the Gaussian kernel, which suggest that overall the sinc kernel works better, but the Gaussian kernel produces more attractive estimators in the tails. In cases where the sample has ties, for example when the sample size is large and the data are observed with only a few significant digits, as in our real-data example, the wiggles of the sinc kernel cause it to perform poorly, and significantly better results can be obtained by using the Gaussian kernel; see § 3.3. For each estimator we computed the integrated squared error $$\int (\,\hat f_U-f_U)^2$$, and their median and first and third quartiles are reported in Table 1. In Fig. 1 we plot, for selected settings with normal error, the estimated curves $$\,\hat f_U$$ computed from the samples corresponding to those three quartiles. Our results indicate that our density estimator works well. For a given setting, error densities with larger variances are easier to estimate. Figure 1 shows that our estimator improves as the sample size increases, that is, as $$\Delta s$$ decreases. The estimated densities are better in model (ii) than in model (i), as expected. While it is difficult to compare estimators of different target densities, the figures suggest that the difficulty in estimating the error densities depends more on the smoothness of $$\,X_t$$ than on the error type. This reflects the fact that the smoothness of the error density has no first-order impact on the quality of estimators, as indicated by Theorems 1 and 2. Fig. 1. View largeDownload slide The estimator $$\,\hat f_U(x)$$ in (5) in the case of normal errors, for three samples corresponding to the first (- - -), second ($$\cdots$$) and third (-$$\cdot$$-) quartiles of the integrated squared errors of estimators computed from data under model (i) with $$\sigma_U=0{\cdot}005$$ (left panels) and with $$\sigma_U=0{\cdot}001$$ (middle panels), and under model (ii) with $$\sigma_U=0{\cdot}001$$ (right panels), when $$\Delta s=30$$ (upper panels) and when $$\Delta s=5$$ (lower panels). In each panel the solid curve depicts the true $$\,f_U(x)$$. Fig. 1. View largeDownload slide The estimator $$\,\hat f_U(x)$$ in (5) in the case of normal errors, for three samples corresponding to the first (- - -), second ($$\cdots$$) and third (-$$\cdot$$-) quartiles of the integrated squared errors of estimators computed from data under model (i) with $$\sigma_U=0{\cdot}005$$ (left panels) and with $$\sigma_U=0{\cdot}001$$ (middle panels), and under model (ii) with $$\sigma_U=0{\cdot}001$$ (right panels), when $$\Delta s=30$$ (upper panels) and when $$\Delta s=5$$ (lower panels). In each panel the solid curve depicts the true $$\,f_U(x)$$. Table 1. Median integrated squared error [first quartile, third quartile] of $$\,\hat f_U$$ in (5), calculated from $$1000$$ simulated samples from models $$\rm (i)$$ and $$\rm (ii)$$  Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Table 1. Median integrated squared error [first quartile, third quartile] of $$\,\hat f_U$$ in (5), calculated from $$1000$$ simulated samples from models $$\rm (i)$$ and $$\rm (ii)$$  Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] We also applied the moment estimators from § 2.4 to data simulated from a rescaled version of our two models; the rescaling was applied to avoid working with numbers all numerically rounded to zero. Specifically, we replaced $$(\tau, \gamma, \sigma^2_U,X_0)$$ in our models by $$(c^2\tau, c\gamma, c^2 \sigma^2_U,cX_0)$$, where $$c=100$$. We present the results in Table 2. These results indicate that, as expected, performance improves as the sample size increases. Moreover, the performance is best for lower-order moments, for higher noise levels, and for model (ii). For a given error variance, the moments are easier to recover when the errors have a rescaled Student distribution than when they have a normal distribution. Table 2. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat{M}_{U,{2k}}-M_{U,{2k}})/M_{U,{2k}}$$ of our estimator $$\hat{M}_{U,2k}$$ in (6), calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$ and computed using $$\xi=t_2-t_1$$, for $$k=1$$ and $$2$$  $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) Table 2. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat{M}_{U,{2k}}-M_{U,{2k}})/M_{U,{2k}}$$ of our estimator $$\hat{M}_{U,2k}$$ in (6), calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$ and computed using $$\xi=t_2-t_1$$, for $$k=1$$ and $$2$$  $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) Finally, we applied our volatility estimator in each of our four settings and compared it with the estimator of Zhang (2006). For our method we took $$\xi=t_2-t_1$$, and to choose $$s_1,\ldots,s_m$$ we took $$m=50$$ equispaced points located between 0 and $$S$$, where $$S$$ is the largest number for which $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(S;\xi)\geqslant 0{\cdot}99$$. The results are presented in Table 3. In the Supplementary Material, we also show the first three quartiles of 100 times the relative absolute deviation, $$|\hat \beta-\beta|/\beta$$, of both estimators. Together, the results indicate that the two estimators give similar results overall. In the case where the error variance was large, our estimator tended to work a little better, and in the case where the error variance was small, the estimator of Zhang (2006) worked a little better. Specifically, when the error variance was small, theirs was a little less biased, but ours was a little less variable. As expected, the performance of both estimators improved as the sample size increased. The estimators worked a little better when the error variance was small: they were a little more biased but significantly less variable. The integrated volatility is an $$\,X_t$$-related quantity and is therefore easier to estimate when there are less errors contaminating the $$\,X_t$$. Depending on the situation, either the Student errors or the normal errors yielded better results. Table 3. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat \beta-\beta)/\beta$$ using our estimator of $$\int_0^T \sigma_t^2$$$${\rm d}$$ t from §2.5, denoted by Ours, and the estimator of Zhang (2006), denoted by Zhang, calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$  Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Table 3. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat \beta-\beta)/\beta$$ using our estimator of $$\int_0^T \sigma_t^2$$$${\rm d}$$ t from §2.5, denoted by Ours, and the estimator of Zhang (2006), denoted by Zhang, calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$  Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) 3.3. Real-data analysis We applied our procedure to analysing the high-frequency price data of Microsoft Corporation on the ten trading days from 19 March to 2 April 2013, available from the Trade and Quote database. We took the $$Y_t$$s to equal the log prices. Following Barndorff-Nielsen et al. (2011), we pre-processed the data by deleting entries that have zero or negative prices, deleting entries with negative correlation indicator, deleting entries with a letter code in COND, except for E or F, deleting entries outside the period 9:30 a.m. to 4:00 p.m., and using the median price if there were multiple entries at the same time. In this example the sample size is very large, which entails a large number of ties among the $$Y_t$$s. As a result, the sinc kernel produced overly wiggly estimates, regarding the data as coming from a multimodal density with modes located at the ties. The oscillation problems of this kernel are well known, but here they are exacerbated by the ties, so we used the standard Gaussian kernel, which is less affected by such issues. The only small adjustment we had to make was to break the ties by adding a small perturbation $$\epsilon_t\sim N(0,a_j^2)$$ to the $$\Delta_{Y,\,j}$$s when computing the bandwidth of the standard kernel estimator $$\,\tilde f_{1}(x)$$ used in our bandwidth selection procedure, where $$2 a_j$$ was equal to the maximum of the distance between each $$\Delta_{Y,\,j}$$ and its first smaller and larger nearest neighbours. Figure 2 shows the error densities estimated by our method for three trading days in 2013: 20 March, 28 March and 2 April. In this example, the magnitude of the errors is about $$10^{-3}$$ smaller than that of the log prices themselves, but their aggregated impact on quantities such as integrated volatility is substantial. For example, for those three trading days the realized volatility was respectively 3$$\cdot$$5, 4$$\cdot$$5 and 3$$\cdot$$5 times $$10^{-4}$$, which is dominated by contributions from the errors. Indeed, for the same days, our error-corrected estimator was respectively 0$$\cdot$$5, 0$$\cdot$$5 and 0$$\cdot$$3 times $$10^{-4}$$, and the estimator of Zhang (2006) was respectively 0$$\cdot$$5, 0$$\cdot$$6 and 0$$\cdot$$4 times $$10^{-4}$$. Figure 2 View largeDownload slide Estimated densities of the errors contaminating the log prices of the Microsoft Corporation data for 20 March (solid), 28 March (dotted) and 2 April (dashed) 2013. Figure 2 View largeDownload slide Estimated densities of the errors contaminating the log prices of the Microsoft Corporation data for 20 March (solid), 28 March (dotted) and 2 April (dashed) 2013. Interestingly, even for this short period of ten trading days, the distributions of the errors were quite different, especially in their tails. Since heavier tails can be linked with higher levels of variation and may affect the properties of the moments, it would be interesting to further investigate the tails of the error distributions of high-frequency financial data. Different tail behaviour may also be associated with different trading or market conditions on different days. For example, the behaviour of the error distributions may differ on days when the whole market or certain industrial segments such as IT are roaring. Hence, further investigations into empirical features of this kind connecting the error distributions with practical market conditions can be helpful for gaining better understanding of the microstructure noise in high-frequency financial data. 4. Discussion As a first attempt at a deconvolution approach with a contaminated slow-varying continuous process, our error density estimator method and the technical analysis in this paper assume that the distribution of errors in the high-frequency data is symmetric. Our method and analysis have shown potential for revealing various features of high-frequency financial data concerning the error distributions. Relaxing the symmetry assumption is possible and of interest, but challenging. The corresponding frequency domain method is under investigation. Our analysis focuses on the univariate setting for an individual stock or trading instrument. An interesting open question would be to extend the frequency domain analysis to a multivariate context, which would help reveal the dependence structure between errors contaminating different stock prices. However, the problem is significantly more difficult because of issues such as high data dimensionality and asynchronous trading records in multivariate high-frequency data. We hope to address this problem in the future. Acknowledgement We are grateful to the editor, an associate editor and two referees for their helpful suggestions. Chang was supported in part by the Fundamental Research Funds for the Central Universities of China, the National Natural Science Foundation of China, and the Center of Statistical Research and the Joint Lab of Data Science and Business Intelligence at Southwestern University of Finance and Economics. Delaigle was supported by a Future Fellowship and a Discovery Project from the Australian Research Council. Hall was supported by a Laureate Fellowship and a Discovery Project from the Australian Research Council. Tang acknowledges support from the U.S. National Science Foundation. The main part of this work was completed while Jinyuan Chang was a postdoc under Peter Hall’s supervision. Peter was a generous mentor and friend with a warm heart. He will be greatly missed. Supplementary material Supplementary material available at Biometrika online includes a description of the simulation extrapolation bandwidth of Delaigle & Hall (2008), additional simulation results, a construction of confidence regions for the moment estimators of § 2.4, and all the proofs. Appendix Algorithm A1. Overview of our bandwidth selection procedure. (a) Find $$\,f_{1}$$ and $$\,f_{_2}$$ so that the relationship between $$\,f_{1}$$ and $$\,f_{2}$$ mimics that between $$\,f_U$$ and $$\,f_{1}$$ and so that $$\,f_{1}$$ and $$\,f_{2}$$ can be accurately estimated using our data. (b) (b) For our estimator in (5), $$\,f_U$$ is estimated from $$\Delta Y_{j,\ell}=Y_{t_{j+\ell}}-Y_{t_{j}} \approx U_{t_{j+\ell}}-U_{t_{j}}$$, with $$|t_{j+\ell}-t_j|$$ being small, $$U_{t_{j+\ell}}\sim f_U$$ and $$U_{t_{j}}\sim f_U$$ being independent. To imitate this, for $$k=1,2$$ we construct versions of $$\Delta Y_{j,\ell}$$, say $$\Delta Y_{j,\ell}^{_k}$$, such that if $$|t_{j+\ell}^{_k}-t_j^{_k}|$$ is small, $$\Delta Y_{j,\ell}^{_k} \approx U_{t_{j+\ell}^{_k}}^{_k}-U_{t_{j}^{_k}}^{_k}$$, where $$U_{t_{j+\ell}^{_k}}^{_k}\sim f_{k}$$ and $$U_{t_{j}^{_k}}^{_k}\sim f_{k}$$ are independent. (c) For $$k=1, 2$$, we can compute a bandwidth $$h_{k}$$ and a $$\xi_{k}$$ for estimating $$\,f_{k}$$ using our procedure in (5) applied to the data $$\Delta Y_{j,\ell}^{_k}$$; see (8) and (9). (d) Step (a) suggests that $$h/h_{1} \approx h_{1}/h_{2}$$, so that we can take $$\hat h = h_{1}^2/ h_{2}$$. Algorithm A2. Constructing $$\Delta Y_{j,\ell}^{_1}$$ and $$t_{j}^{_1}$$. (a) For $$j=0,\ldots, n-1$$, let $$t_j^{_1}=(t_{j}+t_{j+1})/2$$ and $$Y_{1,t_j^{_1}}^{_1}=(Y_{t_{j}}+Y_{t_{j+1}})/ \surd 2$$. For $$\ell> 1$$ and $$j=0,\ldots,n-\ell-1$$, take $$\Delta Y_{j,\ell}^{_1}=Y_{1,t_{j+\ell}^{_1}}^{_1}-Y_{1,t_j^{_1}}^{_1}\approx U_{1,t_{j+\ell}^{_1}}^{_1}-U_{1,t_{j}^{_1}}^{_1}$$, where $$U^{_1}_{1,t_j^{_1}}= (U_{t_{j}}+U_{t_{j+1}})/ \surd 2\sim f_{1}$$. This does not work for $$\ell=1$$ because $$Y_{1,t_{j+1}^{_1}}^{_1}-Y_{1,t_j^{_1}}^{_1}\approx (U_{t_{j+2}}-U_{t_{j}})/ \surd 2\not\sim f_{1}*f_{1}$$; we suggest taking $$\Delta Y_{j,1}^{_1}$$ as in (b). (b) For $$j=0,\ldots,n-2$$, let $$t_j^{_1}=(t_{j}+t_{j+2})/2$$ and $$Y_{2,t_j^{_1}}^{_1}=(Y_{t_{j}}+Y_{t_{j+2}})/\surd 2$$. Take $$\Delta Y_{j,1}^{_1}=Y_{2,t_{j+1}^{_1}}^{_1}-Y_{2,t_j^{_1}}^{_1}\approx U^{_1}_{2,t_{j+1}^{_1}}-U^{_1}_{2,t_j^{_1}}$$ with $$U^{_1}_{2,t_j^{_1}}=(U_{t_{j}}+U_{t_{j+2}})/\surd 2\sim f_{1}$$. Algorithm A3. Constructing $$\Delta Y_{j,\ell}^{_2}$$ and $$t_{j}^{_2}$$. Step 1. For $$j=0,\ldots, n-3$$, let $$Y_{4,t_{j}^{_2}}^{_2}=\sum_{k=0}^{3}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0}^{3}t_{j+k}/4$$. For $$\ell\geqslant 4$$ and $$j=0,\ldots, n-\ell-3$$, take $$\Delta Y_{j,\ell}^{_2}= Y_{4,t_{j+\ell}^{_2}}^{_2}-Y_{4,t_{j}^{_2}}^{_2}\approx U^{_2}_{4,t_{j+\ell}^{_2}}-U^{_2}_{4,t_j^{_2}}$$, where $$U^{_2}_{4,t_j^{_2}}= \sum_{k=0}^{3}U_{t_{j+k}}/ 2 \sim f_{2}$$. This does not work for $$\ell=1,2,3$$ for similar reasons as in Algorithm A2; for $$\ell=1,2,3$$, we suggest taking $$\Delta Y_{j,\ell}^{_2}$$ as in Steps 2 and 3. Step 2. For $$j=0,\ldots,n-6$$, let $$Y_{3,t_{j}^{_2}}^{_2}=\sum_{k=0,1,2,6}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0,1,2,6}t_{j+k}/4$$. For $$j=0,\ldots, n-5$$, let $$Y_{2,t_{j}^{_2}}^{_2}=\sum_{k=0,1,4,5}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0,1,4,5}t_{j+k}/4$$. For $$j=0,\ldots, n-6$$, let $$Y_{1,t_{j}^{_2}}^{_2}=\sum_{k=0}^{3}Y_{t_{j+2k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0}^{3}t_{j+2k}/4$$. Step 3. For $$j=0,\ldots,n-9$$, take $$\Delta Y_{j,3}^{_2}=Y_{3,t_{j+3}^{_2}}^{_2}-Y_{3,t_{j}^{_2}}^{_2}\approx U_{3,t_{j+3}^{_2}}^{_2}-U_{3,t_{j}^{_2}}^{_2}$$, where $$U^{_2}_{3,t_j^{_2}}= \sum_{k=0,1,2,6} U_{t_{j+k}}/ 2 \sim f_{2}$$. For $$j=0,\ldots,n-7$$ and $$\ell=1,2$$, take $$\Delta Y_{j,\ell}^{_2}=Y_{\ell,t_{j+\ell}^{_2}}^{_2}-Y_{\ell,t_{j}^{_2}}^{_2}\approx U_{\ell,t_{j+\ell}^{_2}}^{_2}-U_{\ell,t_{j}^{_2}}^{_2}$$, where $$U^{_2}_{1,t_j^{_2}}= \sum_{k=0}^{3}U_{t_{j+2k}}/ 2 \sim f_{2}$$ and $$U^{_2}_{2,t_j^{_2}}=\sum_{k=0,1,4,5}^{3}U_{t_{j+k}}/ 2 \sim f_{2}$$. References Aït-Sahalia, Y., Fan, J. & Xiu, D. ( 2010 ). High-frequency covariance estimates with noisy and asynchronous financial data. J. Am. Statist. Assoc. 105 , 1504 – 17 . Google Scholar CrossRef Search ADS Aït-Sahalia, Y. & Jacod, J. ( 2014 ). High-Frequency Financial Econometrics . Princeton : Princeton University Press . Google Scholar CrossRef Search ADS Aït-Sahalia, Y. & Yu, J. ( 2009 ). High frequency market microstructure noise estimates and liquidity measures. Ann. Appl. Statist. 3 , 422 – 57 . Google Scholar CrossRef Search ADS Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. & Shephard, N. ( 2011 ). Multivariate realized kernels: Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. J. Economet. 162 , 149 – 69 . Google Scholar CrossRef Search ADS Bolger, N. & Laurenceau, J.-P. ( 2013 ). Intensive Longitudinal Methods: An Introduction to Diary and Experience Sampling Research. New York : Guilford Press. Buckley, M. J., Eagleson, G. K. & Silverman, B. W. ( 1988 ). The estimation of residual variance in nonparametric regression. Biometrika 75 , 189 – 99 . Carroll, R. J. & Hall, P. ( 1988 ). Optimal rates of convergence for deconvolving a density. J. Am. Statist. Assoc. 83 , 1184 – 6 . Google Scholar CrossRef Search ADS Delaigle, A. ( 2008 ). An alternative view of the deconvolution problem. Statist. Sinica 18 , 1025 – 45 . Delaigle, A. & Hall, P. ( 2008 ). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. J. Am. Statist. Assoc. 103 , 280 – 7 . Google Scholar CrossRef Search ADS Delaigle, A., Hall, P. & Meister, A. ( 2008 ). On deconvolution with repeated measurements. Ann. Statist. 36 , 665 – 85 . Google Scholar CrossRef Search ADS Delaigle, A. & Zhou, W.-X. ( 2015 ). Nonparametric and parametric estimators of prevalence from group testing data with aggregated covariates. J. Am. Statist. Assoc. 110 , 1785 – 96 . Google Scholar CrossRef Search ADS Dufrenot, G., Jawadi, F. & Louhichi, W. ( 2014 ). Market Microstructure and Nonlinear Dynamics: Keeping Financial Crisis in Context . New York : Springer . Gloter, A. & Jacod, J. ( 2001 ). Diffusions with measurement errors: I. Local asymptotic normality. Eur. Ser. Appl. Indust. Math. 5 , 225 – 42 . Hall, P., Kay, J. W. & Titterington, D. M. ( 1990 ). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 , 521 – 8 . Google Scholar CrossRef Search ADS Hautsch, N. ( 2012 ). Econometrics of Financial High-Frequency Data . New York : Springer . Google Scholar CrossRef Search ADS Jacod, J., Li, Y. & Zheng, X. ( 2017 ). Statistical properties of microstructure noise. Econometrica 85 , 1133 – 74 . Google Scholar CrossRef Search ADS Liu, C. & Tang, C. Y. ( 2014 ). A quasi-maximum likelihood approach for integrated covariance matrix estimation with high-frequency data. J. Economet. 180 , 217 – 32 . Google Scholar CrossRef Search ADS Meister, A. ( 2007 ). Optimal convergence rates of density estimation from grouped data. Statist. Prob. Lett. 77 , 1091 – 7 . Google Scholar CrossRef Search ADS Müller, H.-G., Sen, R. & Stadtmüller, U. ( 2011 ). Functional data analysis for volatility. J. Economet. 165 , 233 – 45 . Google Scholar CrossRef Search ADS Olhede, S. C., Sykulski, A. M. & Pavliotis, G. A. ( 2009 ). Frequency domain estimation of integrated volatility for Itô processes in the presence of market-microstructure noise. Multiscale Mod. Simul. 8 , 393 – 427 . Google Scholar CrossRef Search ADS Sheather, S. J. & Jones, M. C. ( 1991 ). A reliable data-based bandwidth selection method for kernel density estimation. J. R. Statist. Soc. 53 , 683 – 90 . Stefanski, L. A. & Carroll, R. J. ( 1990 ). Deconvoluting kernel density estimators. Statistics 21 , 169 – 84 . Google Scholar CrossRef Search ADS Tao, M., Wang, Y. & Zhou, H. H. ( 2013 ). Optimal sparse volatility matrix estimation for high dimensional Itô processes with measurement errors. Ann. Statist. 41 , 1816 – 64 . Google Scholar CrossRef Search ADS Wang, J. -L., Chiou, J.-M. & Müller, H. G. ( 2016 ). Review of functional data analysis. Ann. Rev. Statist. Appl. 3 , 257 – 95 . Google Scholar CrossRef Search ADS Xiu, D. ( 2010 ). Quasi-maximum likelihood estimation of volatility with high-frequency data. J. Economet. 159 , 235 – 50 . Google Scholar CrossRef Search ADS Zhang, L. ( 2006 ). Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach. Bernoulli 12 , 1019 – 43 . Google Scholar CrossRef Search ADS Zhang, L., Mykland, P. A. & Aït-Sahalia, Y. ( 2005 ). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. J. Am. Statist. Assoc. 100 , 1394 – 411 . Google Scholar CrossRef Search ADS © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

# A frequency domain analysis of the error distribution from noisy high-frequency data

Biometrika, Volume Advance Article (2) – Mar 6, 2018
17 pages

/lp/ou_press/a-frequency-domain-analysis-of-the-error-distribution-from-noisy-high-89BXHewIir
Publisher
Oxford University Press
ISSN
0006-3444
eISSN
1464-3510
DOI
10.1093/biomet/asy006
Publisher site
See Article on Publisher Site

### Abstract

SUMMARY Data observed at a high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate-optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and an application to real data validate our analysis. 1. Introduction High-frequency data, observed sequentially at small time intervals, arise in various settings and applications. For example, in social and behavioural investigations, such data are often collected in so-called intensive longitudinal studies; see Bolger & Laurenceau (2013). In functional data analysis, the observations are often considered to be of one of two types: the dense setting, which corresponds to high-frequency data, and the sparse setting, where the data are of low frequency; see, among others, Müller et al. (2011) and Wang et al. (2016). In finance, analysing high-frequency intraday transaction data has received increasing attention; see Hautsch (2012), Aït-Sahalia & Jacod (2014) and the references therein. High-frequency observations are often contaminated by measurement errors. For example, in the dense functional data setting, it is common to assume that the observed discrete data are a noisy version of an underlying unknown smooth curve. In finance too, high-frequency data are often regarded as a noisy version of a latent continuous-time stochastic process, observed at consecutive discrete time-points. The latent process is considered to have a continuous path, and the measurement error represents the market microstructure noise; see Aït-Sahalia & Yu (2009). Since increasing the sampling frequency implies smaller sampling errors caused by the discretization of the underlying continuous-time process, we might expect that high-frequency data would enable more accurate inference. However, as the sampling frequency increases, the difference between nearby observations is dominated by random noise, which makes standard methods of inference inconsistent; see, for example, Zhang et al. (2005). Therefore, a main concern in high-frequency financial data analysis has been to find ways of recovering the signal of quantities of interest from noisy high-frequency observations; see Aït-Sahalia & Jacod (2014). Revealing the distributional properties of the measurement errors is crucial for recovering the signal of the continuous-time process from noisy high-frequency data. First, many estimation procedures require distributional assumptions on the measurement errors. Second, statistical inference, including hypothesis testing and confidence set estimation, inevitably involves unknown nuisance parameters determined by the distribution of the measurement errors; see, for example, Zhang (2006), Aït-Sahalia et al. (2010), Xiu (2010) and Liu & Tang (2014). The measurement errors themselves sometimes contain useful information, both theoretical and practical, so that successfully recovering the measurement error distribution can improve our understanding of data structures. For example, Dufrenot et al. (2014) argued that the microstructure noise can help us understand financial crises, and Aït-Sahalia & Yu (2009) made a connection between the microstructure noise and the liquidity of stocks. In longitudinal studies, the measurement errors can help to reveal interesting characteristics of a population, such as the way in which individuals misreport their diet. Jacod et al. (2017) recently highlighted the importance of statistical properties of the measurement errors and studied the estimation of moments. Despite these important developments, to our knowledge, as yet no method has been proposed for estimating the entire distribution of the errors. In this paper we consider frequency domain analysis of high-frequency data, focusing on the measurement errors contaminating continuous-time processes. In high-frequency observations, the relative magnitude of the changes in values of the underlying continuous-time process is small. Compared with the measurement errors, it becomes negligible, locally in a small neighbourhood of a given time-point. As a result, estimating the error distribution shares some common features with nonparametric measurement error problems with repeated measurements studied in Delaigle et al. (2008), and with nonparametric estimation from aggregated data studied by Meister (2007) and Delaigle & Zhou (2015), where the estimation techniques require working in the Fourier domain. Motivated by this, we propose to estimate the characteristic function of the measurement errors by locally averaging the empirical characteristic functions of the changes in values of the high-frequency data. We obtain a nonparametric estimator of the probability density function of the measurement errors and show that it is consistent and minimax rate-optimal. We propose a simple method for consistently estimating the moments of the measurement errors. Using our estimator of the characteristic function of the errors, we develop a new rate-optimal multi-scale frequency domain estimator of the integrated volatility of the stochastic process, a key quantity of interest in high-frequency financial data analysis. In what follows, for two sequences of real numbers $$\{a_n\}$$ and $$\{b_n\}$$, we write $$a_n\asymp b_n$$ if there exist positive constants $$c_1$$ and $$c_2$$ such that $$c_1\leqslant a_n/b_n\leqslant c_2$$ for all $$n\geqslant1$$. We denote by $$\,f*g$$ the convolution of two functions $$\,f$$ and $$g$$, defined by $$\,f*g(s)=\int f(s-\tau)g(\tau)\,{\rm d}\tau$$. 2. Methodology 2.1. Model and data We are interested in a continuous-time process $$(X_t)_{t\in[0,T]}$$ observed at high frequency with $$T>0$$. We assume that $$\,X_t$$ follows a diffusion process $${\rm d} X_t=\mu_t\,{\rm d} t+\sigma_t\,{\rm d} B_t,$$ (1) where the drift $$\mu_t$$ is a locally bounded and progressively measurable process, $$\sigma_t$$ is a positive and locally bounded Itô semimartingale, and $$B_t$$ is a standard Brownian motion. The process $$\sigma_t^2$$ represents the volatility of the process $$\,X_t$$ at time $$t$$, and is often investigated in its integrated form $$\int_0^T \sigma_t^2\,{\rm d} t$$, called the integrated volatility. Remark 1. The model (1) is often used when $$\,X_t=\log S_t$$, where $$S_t$$ denotes the price process of an equity; see, for example, Zhang et al. (2005). It is also used to model applications in biology, physics and many other fields; see Olhede et al. (2009). All theoretical properties of our estimators are derived under this model, but the methods developed in § § 2.2 and 2.4 could be applied to other types of processes $$\,X_t$$, such as the smooth ones typically encountered in the functional data literature. The key property that makes our methods consistent is the continuity of the underlying process $$\,X_t$$, but the convergence rates of our estimators depend on more specific assumptions, such as those implied by the model (1). Our data are observed on a generic discrete grid $$\mathcal {G}=\{t_0,\ldots,t_n\}$$ of time-points where, without loss of generality, we let $$t_0=0$$ and $$t_n=T$$. The observed data are contaminated by additive measurement errors, so that what we observe is a sample $$\{Y_{t_j}\}_{j=0}^n$$ where $$Y_{t_j}=X_{t_j}+U_{t_j}\text{.}$$ (2) A conventional assumption when analysing noisy high-frequency data is that the random measurement error $$U_t$$ is independent of $$\,X_t$$; see Aït-Sahalia et al. (2010), Xiu (2010) and Liu & Tang (2014). This is also a standard assumption in the measurement error and the functional data literature; see Carroll & Hall (1988), Stefanski & Carroll (1990) and Wang et al. (2016). Likewise, we make the following assumption. Assumption 1. The errors $$\{U_{t_j}\}_{j=0}^n$$ are independently and identically distributed with unknown density $$\,f_U$$, and are independent of the process $$(X_t)_{t\in[0,T]}$$. We are interested in deriving statistical properties of the noise term $$U_{t_j}$$ when $$T$$ is fixed and the frequency of observations increases, that is, when $$\max_{1\leqslant j\leqslant n}\Delta t_j\rightarrow0$$ as $$n\rightarrow\infty$$, where $$\Delta t_j=t_j-t_{j-1}$$ for $$j=1,\ldots,n$$. Here, the time-points $$t_j$$ do not need to be equispaced. Formally, we make the following assumption. Assumption 2. As $$n\rightarrow\infty$$, $$\min_{1\leqslant j\leqslant n}\Delta t_j/\max_{1\leqslant j\leqslant n}\Delta t_j$$ is uniformly bounded away from zero and infinity. Throughout, we use $$\,f^{\mathrm{Ft}}$$ to denote the Fourier transform of a function $$\,f$$ and make the following assumption on the characteristic function of the errors, which is almost always assumed in the related nonparametric measurement error literature: Assumption 3. $$\,f_U^{\mathrm{Ft}}$$ is real-valued and does not vanish at any point on the real line. 2.2. Estimating the error density $$\,f_U$$ Motivated by our discussion in § 1, we wish to estimate the error density $$\,f_U$$. At a given $$t_j$$, if we had access to repeated noisy measurements of $$\,X_{t_j}$$, say $$Y_{t_j,k}=X_{t_j}+U_{t_j,k}$$ for $$k=1,\ldots,r$$, where the $$U_{t_j,k}$$s are independent and each $$U_{t_j,k}\sim f_U$$, then for $$\ell\neq k$$ we would have $$(Y_{t_j,\ell}-Y_{t_j,k})\sim f_U*f_U$$. Under Assumption 3, using the approach in Delaigle et al. (2008) we could estimate $$\,f_U^{\mathrm{Ft}}$$ by the square root of the empirical characteristic function of the $$(Y_{t_j,\ell}-Y_{t_j,k})$$s; then, by Fourier inversion, we could deduce an estimator of $$\,f_U$$. However, for high-frequency data, at each given $$t_j$$ we have access to only one contaminated measurement $$Y_{t_j}$$. Therefore, the above technique cannot be applied. But since $$(X_t)_{t\in[0,T]}$$ is a continuous-time and continuous-path stochastic process, $$|X_{t+h}-X_t|\rightarrow0$$ almost surely as $$h\rightarrow0$$. Thus, the collection of observations $$\{Y_{t_\ell}\}$$, where $$t_\ell$$ lies in a small neighbourhood $$\mathcal N$$ of $$t_j$$, can be approximately viewed as repeated measurements of $$\,X_{t_j}$$ contaminated by independently and identically distributed errors $$\{U_{t_\ell}\}$$. As the sampling frequency increases, we have multiple observations in smaller and smaller neighbourhoods $$\mathcal N$$, which suggests that the density of $$Y_{t_\ell}-Y_{t_j}$$, for $$t_j\neq t_\ell \in \mathcal N$$, gets closer and closer to $$\,f_U*f_U$$. Therefore, we can expect that as the sample frequency increases, the approach suggested by Delaigle et al. (2008), applied to the ($$Y_{t_\ell}-Y_{t_j}$$)s for $$t_\ell$$ and $$t_j$$ sufficiently close, can provide an increasingly accurate estimator of $$\,f_U$$. We shall prove in § 2.3 that this heuristic is correct as long as the $$t_\ell$$s and $$t_j$$s are carefully chosen, which we characterize through a distance $$\xi$$. For $$\mathcal {G}$$ defined in § 2.1 and $$\xi>0$$, we define $$S_j=\{t_\ell\in \mathcal {G}:|t_\ell-t_j|\leqslant \xi\!\quad\text{and}\!\quad \ell\neq j\}\quad (\,j=0,\ldots,n)$$ (3) and denote by $$N_j$$ the number of points in $$S_j$$. For a fixed $$T$$, Assumption 2 implies that $$\min_{1\leqslant j\leqslant n}\Delta t_j\asymp\max_{1\leqslant j\leqslant n}\Delta t_j\asymp n^{-1}$$, so that $$\max_{1\leqslant j\leqslant n}N_j\asymp\min_{1\leqslant j\leqslant n}N_j\asymp n\xi$$. Following the discussion above and recalling Assumption 3, for a given $$\xi$$ we define our estimator of $$\,f_U^{\mathrm{Ft}}(s)$$ by the square root of the real part of the empirical characteristic function of the difference between nearby $$Y_{t_{j}}$$s: $$\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)=\Bigg|\frac{1}{N(\xi)}\sum_{j=0}^n\sum_{t_\ell\in S_j}\cos\{s(Y_{t_{\ell}}-Y_{t_{j}})\}\Bigg|^{1/2},$$ (4) where $$N(\xi)=\sum_{j=0}^n N_j$$. Here $$\xi$$ can be viewed as a parameter controlling the trade-off between the bias and the variance of $$\,\hat{f}_{U,1}^{\mathrm{Ft}}$$: a smaller $$\xi$$ gives a smaller bias but also results in a smaller $$N(\xi)$$ so that the variance is higher. On the other hand, a larger $$\xi$$ induces a lower variance but comes at the price of a larger bias due to the contribution from the dynamics in $$\,X_t$$. The choice of $$\xi$$ in practice will be discussed in § 3.1. It follows from the Fourier inversion theorem that $$\,f_U(x)=(2\pi)^{-1}\int \exp(-{{\rm i}}sx) f_{U}^{\mathrm{Ft}}(s)\,{\rm d} s$$, where $${\rm i}^2=-1$$. We can obtain an estimator of $$\,f_U$$ by replacing $$\,f_{U}^{\mathrm{Ft}}(s)$$ with $$\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)$$ in this integral. However, since $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ is an empirical characteristic function, it is unreliable when $$|s|$$ is large. For the integral to exist, $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ needs to be multiplied by a regularizing factor that puts less weight on large $$|s|$$. As the sample size increases, $$\smash{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s,\xi)}$$ becomes more reliable, and this should be accounted for by letting the weight depend on the sample size. Using standard kernel smoothing techniques, this can be implemented by taking \begin{equation*} \hat{f}_{U,2}^{\mathrm{Ft}}(s;\xi)=\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\,\mathcal{K}^{\mathrm{Ft}}(sh), \end{equation*} where $$\mathcal{K}^\mathrm{Ft}$$ is the Fourier transform of a kernel function $$\mathcal{K}$$ and $$h>0$$ is a bandwidth parameter that satisfies $$h\rightarrow0$$ as $$n\rightarrow\infty$$. Then, we define our estimator of $$\,f_U(x)$$ by $$\hat{f}_U(x;\xi)=\frac{1}{2\pi}\int \exp(-{{\rm i}}sx)\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\mathcal{K}^{\mathrm{Ft}}(sh)\,{\rm d} s=\frac{1}{2\pi}\int \exp(-{{\rm i}}sx)\hat{f}_{U,2}^{\mathrm{Ft}}(s;\xi)\,{\rm d} s\text{.}$$ (5) For consistency of a kernel density estimator, the kernel function $$\mathcal{K}$$ needs to be symmetric and integrate to unity. In practice, the estimator is often not very sensitive to the choice of kernel compared with the choice of bandwidth. Popular choices are the Gaussian kernel, i.e., the standard normal density, and the sinc kernel, whose Fourier transform is $$\mathcal{K}^{\mathrm{Ft}}(s)=I(|s|\leqslant 1)$$ where $$I(\cdot)$$ denotes the indicator function. In practice, an advantage of the Gaussian kernel is that it can produce visually attractive, smoother estimators than the sinc kernel. The sinc kernel is more advantageous analytically; see § 2.3. 2.3. Properties of the density estimator Assume that the continuous-time process $$(X_t)_{t\in[0,T]}$$ in (1) belongs to the class $$\mathcal{X}(C_1)$$ for some $$C_1>0$$, where \begin{equation*} \begin{split} \mathcal{X}(C_1)=\Big\{(X_{t})_{t\in[0,T]}: \,X_t \,\textrm{satisfies (1) with}\sup_{0\leqslant t\leqslant T}E(\mu_t^4)\leqslant C_1\,\textrm{and}\sup_{0\leqslant t\leqslant T}E(\sigma_t^4)\leqslant C_1\Big\}, \end{split} \end{equation*} and that $$\,f_U$$ belongs to the class \begin{equation*} \begin{split} \mathcal{F}_1(\alpha,C_2)=\big\{\,f:\, |\,f^{\mathrm{Ft}}(s)|\leqslant C_2(1+|s|)^{-\alpha}\,\textrm{for all real}\,s\big\} \end{split} \end{equation*} for some constants $$\alpha>0$$ and $$C_2>1$$. This class is rich; for example, it contains the functions that have at least $$\alpha-1$$ square-integrable derivatives. Characterizing error distributions through their Fourier transforms is standard in nonparametric measurement error problems because it is key to deriving precise asymptotic properties of the estimators. We consider the sinc kernel $$\mathcal{K}$$ introduced in the paragraph following (5). Using this kernel simplifies our presentation of the theoretical derivations in two respects: its Fourier transform simplifies calculations, and it is a so-called infinite-order kernel, which implies that the bias of the resulting nonparametric curve estimators depends only on the smoothness of the target curve. Thus the sinc kernel has adaptive properties and automatically ensures optimal convergence rates. In contrast, the bias of estimators based on finite-order kernels, such as the Gaussian kernel, depends on both the order of the kernel and the smoothness of the target curve, which means that various smoothness subcases need to be considered when deriving properties. For any two square-integrable functions $$\,f$$ and $$g$$, let $$\|\,f-g\|_2=(\int|\,f-g|^2)^{1/2}$$. Proposition 1 gives the convergence rate of $$\,\hat{f}_U$$, defined in (5), to the true density function $$\,f_U$$. Proposition 1. Let $$(X_t)_{t\in[0,T]}\in\mathcal{X}(C_1)$$ and assume that the errors $$\{U_{t_j}\}_{j=0}^n$$ satisfy Assumptions $$\rm{1}$$ and $$\rm{3}$$ and that $$\,f_U\in \mathcal{F}_1(\alpha,C_2)$$. Let $$\mathcal {P}_1(\alpha,C_1,C_2)$$ denote the collection of models for $$(Y_t)_{t\in[0,T]}$$ such that $$Y_t=X_t+U_t$$. Under Assumption $$\rm{2}$$ and with the sinc kernel $$\mathcal{K}$$, if $$\alpha>3/2$$, then for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\hat{f}_U-f_U\|_2^2\big) \leqslant C\bigl(n^{-1}\xi^{-1/2}h^{-1}+n^{-1/2}+\xi+h^{2\alpha-1}\bigr)\text{.}$ Proposition 1 shows that the $$L_2$$ convergence rate of $$\,\hat{f}_U$$ to $$\,f_U$$ is affected by $$\xi$$, the length of each block $$S_j$$, and the bandwidth $$h$$. The next theorem shows that for appropriate choices of $$\xi$$ and $$h$$, the convergence rate attains $$n^{-1/2}$$. Theorem 1. If the conditions of Proposition $$\rm{1}$$ hold and we take $$\xi\asymp n^{-\delta_1}$$ and $$h\asymp n^{-\delta_2}$$, where $$\delta_1>0$$ and $$\delta_2>0$$ are such that $$\delta_1+2\delta_2\leqslant1$$, $$\delta_1\geqslant{1}/{2}$$ and $$\delta_2\geqslant (4\alpha-2)^{-1}$$, then for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\,\hat{f}_U-f_U\|_2^2\big)\leqslant Cn^{-1/2}\text{.}$ From Theorem 1 we learn that as long as $$\alpha>3/2$$, the convergence rate of $$\,\hat{f}_U$$ does not depend on $$\alpha$$. This is strikingly different from standard nonparametric density estimation problems, where convergence rates typically depend on the smoothness of the target density: the smoother the density, the faster the rate. For example, if we had access to the ($$U_{t_\ell}-U_{t_j}$$)s directly instead of just $$Y_{t_\ell}-Y_{t_j}=X_{t_\ell}-X_{t_j}+U_{t_\ell}-U_{t_j}$$, then we could apply the technique suggested by Meister (2007) and the convergence rate would increase with $$\alpha$$. However, in our case, the nuisance contribution due to the ($$X_{t_\ell}-X_{t_j}$$)s makes it impossible to reach rates faster than $$n^{-1/2}$$, even if $$\alpha$$ is very large. This is demonstrated in the next theorem, which shows that the $$n^{-1/2}$$ rate derived in Theorem 1 is minimax optimal. Theorem 2. Denote by $$\breve{\mathcal{F}}$$ the class of all measurable functionals of the data. Under the conditions in Proposition $$\rm{1}$$, for some uniform constant $$C>0$$ depending only on $$\alpha$$, $$C_1$$ and $$C_2$$, $\inf_{\hat{f}\in\breve{\mathcal{F}}}\: \sup_{\mathcal{P}_1(\alpha,C_1,C_2)}E\big(\|\,\hat{f}-f_U\|_2^2\big)\geqslant C n^{-1/2}\text{.}$ 2.4. Estimating the moments of the microstructure noise We can deduce estimators of the moments of the microstructure noise $$U_{t_j}$$ from the density estimator derived in § 2.2, but proceeding in that way is unnecessarily complex. Recall from § 2.2 that when $$t_\ell\neq t_j$$ are close, $$Y_{t_\ell}-Y_{t_j}$$ behaves approximately like $$U_{t_\ell}-U_{t_j}\sim f_{\tilde U}=f_U*f_U$$, where $$U$$ and $$\tilde U$$ denote generic random variables with the same distributions as, respectively, $$U_{t_j}$$ and $$U_{t_\ell}-U_{t_j}$$. This suggests that we could estimate the moments of $$\tilde U$$ by the empirical moments of $$Y_{t_\ell}-Y_{t_j}$$ and, from these, deduce estimators of the moments of $$U$$. For each integer $$k\geqslant 1$$, let $$M_{U,k}$$ and $$M_{\tilde{U},k}$$ denote the $$k$$th moments of $$U$$ and $$\tilde U$$, respectively. Since $$\,f_U$$ is symmetric, $$M_{U,2k-1}$$ and $$M_{\tilde{U},2k-1}$$ are equal to zero for all $$k\geqslant1$$, and we only need to estimate even-order moments. For each $$k\geqslant1$$, we start by estimating $$M_{\tilde{U},2k}$$ by \begin{equation*} \hat{M}_{\tilde{U},2k}(\xi)= \frac{1}{N(\xi)}\sum_{j=0}^n\sum_{t_\ell\in S_j}(Y_{t_\ell}-Y_{t_j})^{2k}\text{.} \end{equation*} This is directly connected to our frequency domain analysis: it is easily proved that $$\hat{M}_{\tilde{U},2k}(\xi)=(-{\rm i})^{2k}\{\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(0;\xi)\}^{(2k)}$$, where $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}=(\,\hat{f}_{U,1}^{\mathrm{Ft}})^2$$ is an estimator of $$\,f_{\tilde U}^\mathrm{Ft}$$, with $$\,\hat{f}_{U,1}^{\mathrm{Ft}}$$ as in (4). Exploiting the fact that $$U_{t_\ell}-U_{t_j}\sim f_{\tilde U}$$, we can write $$M_{\tilde{U},2k}=\sum_{j=0}^kC_{2k}^{2j}M_{U,2j}M_{U,2k-2j}$$ where $$C_{2k}^{2j}=(2k)!/\{(2j)!(2k-2j)!\}$$, from which it can be deduced that \begin{equation*} M_{U,2k}=\frac{1}{2}\Biggl(M_{\tilde{U},2k}-\sum_{j=1}^{k-1}C_{2k}^{2j}M_{U,2j}M_{U,2k-2j}\Biggr)\text{.} \end{equation*} Therefore, we can use an iterative procedure to estimate the $$M_{U,2k}$$s. First, for $$k=1$$, we take $$\hat{M}_{U,2}(\xi)=\hat{M}_{\tilde{U},2}(\xi)/2$$. Then, for $$k>1$$, given $$\hat{M}_{U,2}(\xi),\ldots, \hat{M}_{U,2(k-1)}(\xi)$$ we take $$\hat{M}_{U,2k}(\xi)=\frac{1}{2}\Biggl\{\hat{M}_{\tilde{U},2k}(\xi)-\sum_{j=1}^{k-1}C_{2k}^{2j}\hat{M}_{U,2j}(\xi)\hat{M}_{U,2k-2j}(\xi)\Biggr\}\text{.}$$ (6) Remark 2. When $$k=1$$, $$M_{U,2}=M_{\tilde{U},2}/2$$ is equal to the variance of $$U_t$$, and our estimator is very similar to the so-called difference-based variance estimator often employed in related nonparametric regression problems; see, for example, Buckley et al. (1988) and Hall et al. (1990). The next theorem establishes the convergence rate of $$\hat{M}_{U,2k}(\xi)$$. Its proof follows from the convergence rates of the $$\hat{M}_{\tilde{U},2l}(\xi)$$s. Theorem 3. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, for any integer $$k\geqslant1$$, if $$E(\int_0^T\mu_s^{2k}\,{\rm d} s)<\infty$$ and $$E(\int_0^T\sigma_s^{2k}\,{\rm d} s)<\infty$$ hold and if there exists $$p\in(2,3]$$ such that $$M_{U,2kp}<\infty$$, then $$\hat{M}_{U,2k}(\xi)=M_{U,2k}+O_{\rm p}(n^{-1/2})$$ provided that $$\xi=o[n^{-p/\{2(p-1)\}}]$$. Next, we derive the asymptotic joint distribution of the proposed moment estimators. Let $$W=(W_1,\ldots,W_k)^{ \mathrm{\scriptscriptstyle T} }$$ be a random vector with a $$N(0,\Sigma_k)$$ distribution, where the $$(l_1,l_2)$$th element ($$l_1, l_2=1,\ldots,k$$) of $$\Sigma_k$$ is equal to $e_{l_1l_2}=\lim_{n\rightarrow\infty}E\!\left(\!\left[\,\sum_{j=0}^n\sum_{t_\ell\in S_j}\frac{(U_{t_\ell}-U_{t_j})^{2l_1}-M_{\tilde{U},2l_1}}{\{N(\xi)V_{2l_1}(\xi)\}^{1/2}}\right]\!\!\left[\sum_{j=0}^n\sum_{t_\ell\in S_j}\frac{(U_{t_\ell}-U_{t_j})^{2l_2}-M_{\tilde{U},2l_2}}{\{N(\xi)V_{2l_2}(\xi)\}^{1/2}}\right]\right)\!,$ with $$V_{2l}(\xi)=\mbox{var}[ \{N(\xi)\}^{-1/2}\sum_{j=0}^n\sum_{t_\ell\in S_j}\{(U_{t_{\ell}}-U_{t_j})^{2l}-M_{\tilde{U},2l}\}]$$. Recalling that $$N(\xi)\asymp n^2\xi$$ and noting that $$V_{2l}(\xi)\asymp n\xi$$, let \begin{equation*} a_l=\lim_{n\to\infty} \{nV_{2l}(\xi)/N(\xi)\}^{1/2}/2\quad (l=1,\ldots,k)\text{.} \end{equation*} The next theorem establishes the asymptotic joint distribution of our moment estimators. It can be used to derive confidence regions for $$(M_{U,2},\ldots,M_{U,2k})^{ \mathrm{\scriptscriptstyle T} }$$. Theorem 4. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, for any integer $$k\geqslant1$$, if $$E(\int_0^T\mu_s^{2k}\,{\rm d} s)<\infty$$ and $$E(\int_0^T\sigma_s^{2k}\,{\rm d} s)<\infty$$ and if there exists $$p\in(2,3]$$ such that $$M_{U,2kp}<\infty$$, then $n^{1/2}\big\{\hat{M}_{U,2}(\xi)-M_{U,2},\ldots,\hat{M}_{U,2k}(\xi)-M_{U,2k}\big\}^{ \mathrm{\scriptscriptstyle T} } \rightarrow Q=(Q_{1},\ldots, Q_{k})^{ \mathrm{\scriptscriptstyle T} }$ in distribution as $$n\rightarrow\infty$$, provided that $$\xi=o(n^{-p/\{2(p-1)\}})$$, where \begin{equation*} Q_1 = a_1W_1,\quad\; Q_l = a_lW_l-\sum_{j=1}^{l-1}C_{2l}^{2j}M_{U,2l-2j}Q_j\quad (2\leqslant l\leqslant k)\text{.} \end{equation*} 2.5. Efficient integrated volatility estimation We have demonstrated that our frequency domain analysis can be used to estimate the error density, which is difficult to estimate in the time domain. Since frequency domain approaches are unusual in high-frequency financial data analysis, a natural question is whether they can lead to an efficient estimator of the integrated volatility $$\int_0^T\sigma_t^2\,{\rm d} t$$, with $$\sigma_t$$ as in (1). The integrated volatility is a key quantity of interest in high-frequency financial data analysis; it represents the variability of a process over time. It is well known that in cases like ours where the data are observed with microstructure noise, the integrated volatility cannot be estimated using standard procedures, which are dominated by contributions from the noise. In such situations, one way of removing the bias caused by the noise is through multi-scale techniques; see Olhede et al. (2009) for a very nice description. Zhang (2006) and Tao et al. (2013) have successfully applied those methods to correct the bias of estimators in the time domain, and Olhede et al. (2009) have proposed a consistent discrete frequency domain estimator in the case where the data are observed at equispaced times. Below we show that these techniques can be applied in our continuous frequency domain context too, even if the observation times are not restricted to be equispaced. The real part of the empirical characteristic function $$n^{-1}\sum_{j=1}^n\exp\{{\rm i}s(Y_{t_j}-Y_{t_{j-1}})\}$$ is such that $$\begin{split} \sum_{j=1}^n\cos\{s(Y_{t_j}-Y_{t_{j-1}})\} =& \sum_{j=1}^n\cos\{s(U_{t_j}-U_{t_{j-1}})\}-\frac{s^2}{2}f_{\tilde{U}}^\mathrm{Ft}(s)\int_0^T\!\!\sigma_t^2\,{\rm d} t+O_{\rm p}(n^{-1/2})\text{.} \end{split}$$ (7) The second term on the right-hand side of (7) contains the integrated volatility, but the first term dominates because its mean is $$nf_{\tilde U}^\mathrm{Ft}(s)$$. This suggests that the integrated volatility could be estimated from $$\sum_{j=1}^n\exp\{{\rm i} s(Y_{t_j}-Y_{t_{j-1}})\}$$ if we could eliminate that first term. This can be done by applying, to the frequency domain, the multi-scale technique used by Zhang (2006) and Tao et al. (2013). We define a function $$G(s)$$ which combines the empirical characteristic functions calculated at different sampling frequencies in such a way as to eliminate the first term on the right-hand side of (7) while keeping the second. For $$N=\lfloor (n+1)^{1/2} \rfloor$$, we define $G(s)=\sum_{m=1}^N a_m \hat\phi^{K_m}(s)+\zeta \bigl\{\hat\phi^{K_1}(s)-\hat\phi^{K_2}(s)\bigr\}$ where, as in Zhang (2006), $$K_m=m$$, $$a_m=12K_m(m-N/2-1/2)/\{N(N^2-1)\}$$, $$\zeta=K_1K_2/\{(n+1)(K_2-K_1)\}$$, and $$\hat\phi^{K_m}(s)=K_m^{-1}\sum_{\ell=K_m}^n \exp\{{\rm i} s(Y_{t_\ell}-Y_{t_{\ell-K_m}})\}$$. We could also select $$K_m$$, $$a_m$$ and $$\zeta$$ as in Tao et al. (2013). The following proposition shows that the real part of $$G(s)$$, $$\mathrm{Re}\{G(s)\}$$, can be used to approximate the second term of (7). Proposition 2. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, if $$E(U_t^2)<\infty$$, then there exist second-order differentiable functions $$\tau_1(s)$$ and $$\tau_2(s)$$ such that for any $$s\in \mathbb{R}$$, $\left|\mathrm{Re}\{G(s)\}+\frac{s^2}{2}f_{\tilde{U}}^{\mathrm{Ft}}(s)\int_0^T\sigma_t^2\,{\rm d} t\right|=\tau_1(s)\, O_{\rm p}(n^{-1/4})+\tau_2(s)\, O_{\rm p}(n^{-1/2}),$where the terms $$O_{\rm p}(n^{-1/4})$$ and $$O_{\rm p}(n^{-1/2})$$ are independent of $$s$$, and $$\lim_{s\to 0}|\tau_1''(s)|\leqslant C$$ and $$\lim_{s\to 0}|\tau_2''(s)|\leqslant C$$ for some positive constant $$C$$. Since the function $$G(s)$$ depends only on the data, it is completely known. Moreover, we have seen in § 2.4 that we could estimate $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ by $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)=\{\,\hat{f}_{U,1}^{\mathrm{Ft}}(s;\xi)\}^2$$. Finally, although the proposition holds for all $$s\in \mathbb{R}$$, the remainders are smaller when $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ is close to 1, especially since $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ is more reliable in that case too. Therefore, for $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ close to 1, Proposition 2 can be used to compute an estimator of the integrated volatility, $$\int_0^T\sigma_t^2\,{\rm d} t$$. We propose a regression-type approach as follows. For some $$s_1,\ldots,s_m$$ such that $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$ is close to 1, we consider the fixed design regression problem \begin{equation*} \mathrm{Re}\{G(s_j)\}=-\frac{s_j^2}{2}\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s_j;\xi)\cdot \beta+\epsilon_j\quad (\,j=1,\ldots,m), \end{equation*} where $$\epsilon_j$$ represents the regression error and $$\beta=\int_0^T\sigma_t^2\,{\rm d} t$$. Applying a linear regression of $$\mathrm{Re}\{G(s_j)\}$$ on $$-s_j^2\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s_j;\xi)/2$$, we estimate $$\int_0^T\sigma_t^2\,{\rm d} t$$ by $$\hat \beta$$, the least squares estimator of $$\beta$$. For any fixed $$s\in \mathbb{R}$$, it can be shown that $$|\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)-f_{\tilde{U}}^{\mathrm{Ft}}(s)|=O_{\rm p}(n^{-1/2}+\xi)$$. If we select $$\xi=O(n^{-1/4})$$ in (3), then Proposition 2 still holds if we replace $$\,f_{\tilde{U}}^{\mathrm{Ft}}(s)$$ by $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(s;\xi)$$. The next result establishes the convergence rate of $$\hat\beta$$. Theorem 5. Under Assumptions $$\rm{1}$$–$$\rm{3}$$, if $$E(U_t^2)<\infty$$ and we select $$\xi=O(n^{-1/4})$$, then $\hat{\beta}-\int_0^T\sigma_t^2\,{\rm d} t=O_{\rm p}(n^{-1/4})\text{.}$ The convergence rate stated in Theorem 5 is optimal in the sense of Gloter & Jacod (2001) and is the same as for the time domain estimator of Tao et al. (2013). Hence, our frequency domain method is rate-efficient for estimating the integrated volatility. 3. Numerical study 3.1. Practical implementation of the density estimator To compute the density estimator $$\,\hat f_U$$ in (5), we need to choose the bandwidth $$h$$ and the parameter $$\xi$$. In our problem, doing this is much more complex than for standard nonparametric density estimators, since, unlike in that case, we do not have direct access to data from our target $$\,f_U$$; therefore we cannot use existing smoothing parameter selection methods, which all require noise-free data. Moreover, unlike in standard nonparametric problems, we do not have access to a formula measuring the distance between $$\,f_U$$ and its estimator. Similar difficulties arise in the classical errors-in-variables problem, where one is interested in a density $$\,f_V$$ but observes only data on $$W=V+\varepsilon$$, where $$V\sim f_V$$ is independent of $$\varepsilon\sim f_\varepsilon$$ with $$\,f_\varepsilon$$ known. Delaigle & Hall (2008) proposed choosing the bandwidth $$h$$ using a method called simulation approximation. Instead of computing $$h$$ for $$\,f_V$$, they approximate $$h$$ by extrapolating the bandwidths for estimating two other densities, $$\,f_{1}$$ and $$\,f_{2}$$, that are related to $$\,f_V$$. The rationale of the extrapolation scheme is to exploit the analogous relationships between $$\,f_1$$, $$\,f_2$$ and $$\,f_V$$. Our problem is different because our estimator of $$\,f_U$$ is completely different from that in Delaigle & Hall (2008), and we do not know $$\,f_X$$. Therefore we cannot apply their method directly; nevertheless, here we propose a method in the same spirit. In particular, we consider two density functions $$\,f_{1}(\cdot)=2f_U(\surd2\, \cdot)*f_U(\surd2\, \cdot)$$ and $$\,f_{2}=2f_{1}(\surd2\, \cdot)*f_{1}(\surd2\, \cdot)$$, and observe that the way in which $$\,f_2$$ and $$\,f_1$$ are connected mimics the way in which $$\,f_1$$ and $$\,f_U$$ are connected. As shown in the Appendix, data from both $$\,f_1$$ and $$\,f_2$$ can be made accessible, so that one may perform bandwidth selection for estimating them. We then choose the bandwidth for estimating $$\,f_U$$ by an extrapolation with a ratio adjustment from the bandwidths for estimating $$\,f_1$$ and $$\,f_2$$. The algorithms for the bandwidth selection are given in the Appendix, and Algorithm A1 summarizes the main steps. Specifically, for $$k=1,2$$, our procedure requires the construction of variables $$\Delta Y_{j,\ell}^{_k}$$ and time-points $$t_j^{_k}$$, which are defined in, respectively, Algorithms A2 and A3. Step (c) of Algorithm A1 requires choosing values of $$(h,\xi)$$, say $$(h_{k},\xi_{k})$$ for $$k=1,2$$, for estimating $$\,f_{k}$$ by $$\,\hat f_{k}$$, where $$\,\hat{f}_{k}$$ denotes our density estimator in (5) applied to the $$\Delta Y_{j,\ell}^{_k}$$s. The idea is that if we knew $$\,f_{1}$$, we would choose $$(h_{1},\xi_{1})$$ by minimizing the integrated squared error of $$\,\hat{f}_{1}$$, i.e., $$(h_{1},\xi_{1})=\mathop{\arg\min}_{(h,\xi)} \int \{\,\hat f_{1}(x;\xi) - f_{1}(x)\}^2\,{\rm d} x$$. In practice we do not know $$\,f_{1}$$, but we can construct a relatively accurate estimator of it, namely the standard kernel density estimator $$\,\tilde f_{1}$$ of $$\,f_{1}$$ applied to the data $$\Delta_{Y,\,j}=(Y_{t_{j+1}}-Y_{t_{j}})/\surd 2\approx (U_{t_{j+1}}-U_{t_{j}})/\surd 2$$$$(\,j=0,\ldots,n-1)$$. This is computed by using the Gaussian kernel with bandwidth selected by the method of Sheather & Jones (1991). Using arguments similar to those in Delaigle (2008), under mild conditions, $$\,\tilde{f}_{1}(x)=f_{1}(x)+O_{\rm p}(n^{-2/5})$$, whereas with the best possible choice of $$(h,\xi)$$, $$\:\hat f_{1}(x)=f_{1}(x)+O_{\rm p}(n^{-1/4})$$, where the rate $$n^{-1/4}$$ cannot be improved. Thus, $$\,\tilde f_{1}$$ converges to $$\,f_{1}$$ faster than $$\,\hat f_{1}$$ does. This motivates us to approximate $$(h_{1},\xi_{1})$$ defined above by $$(h_{1},\xi_{1})=\mathop{\arg\min}_{(h,\xi)} \int \{\,\hat f_{1}(x;\xi) -\tilde f_{1}(x)\}^2\,{\rm d} x\text{.}$$ (8) Paralleling the arguments in Delaigle & Hall (2008), it is more important to extrapolate the bandwidth $$h$$ than $$\xi$$. Motivated by their results, we take $$\xi_{2}=\xi_{1}$$. To choose $$h_{2}$$, let $$\,\tilde f_{2}$$ be the standard kernel density estimator with the Gaussian kernel and bandwidth selected by the method of Sheather & Jones (1991), applied to the data $$\Delta_{Y,\,j,2}= \big\{\Delta_{Y,\,j}-\Delta_{Y,k(\,j)}\big\}/\surd 2$$, where $$k(\,j)$$ is chosen at random from $$0,\ldots,n-1$$. We choose $$\Delta_{Y,\,j,2}$$ in this way rather than $$\Delta_{Y,\,j,2}=(\Delta_{Y,\,j}-\Delta_{Y,\,j+2})/\surd 2$$ to prevent accumulated residual $$\,X_t$$ effects. Then we take $$h_{2}=\mathop{\arg\min}_{h} \int \{\,\hat f_{2}(x;\xi_{1}) - \tilde f_{2}(x)\}^2\,{\rm d} x\text{.}$$ (9) Since $$\,\tilde f_1$$ and $$\,\tilde f_2$$ converge faster than $$\,\hat{f}_1$$ and $$\,\hat{f}_2$$, they can be computed using less data than the latter. Therefore, when the time-points are widely unequally spaced, to compute the $$\,\tilde f_k$$s we suggest using only a fraction, say one-quarter, of the $$\Delta_{Y,\,j}$$ which correspond to the smallest $$t_j-t_{j+1}$$ values; that is, we use less but more accurate data for computing the $$\,\tilde f_k$$s. Finally, as described in step (d) of Algorithm A1, we obtain our bandwidth for estimating $$\,f_U$$ by an extrapolation with a ratio adjustment, $$\hat h = h_{1}^2/ h_{2}$$, and we take $$\hat\xi=\xi_1$$. This method is not guaranteed to give the best possible bandwidth for estimating $$\,f_U$$, but it provides a sensible approximation for a problem that seems otherwise very hard, if not impossible, to solve. Theorem 1 implies that we have a lot of flexibility in choosing $$h$$, but it is impossible to know whether our bandwidth lies in the optimal range without knowing the exact orders of $$h_1$$ and $$h_2$$. However, we cannot determine these orders without deriving complex second-order asymptotic results. 3.2. Simulations We applied our method to data simulated from stochastic volatility models. We generated the data $$Y_{t_0},\ldots,Y_{t_n}$$ as in (2). Following the convention that a financial year has 252 active days, we took $$t\in[0,T]$$ with $$T=1/252$$, which represents one day of financial activity. We took time-points every $$\Delta s$$ seconds, where $$\Delta s=30$$, 5 or 1. Using the convention of $$6{\cdot}5$$ business hours in a trading day, this means that we took the $$t_j$$s to be equally spaced by $$\Delta s/(252\times60\times60\times6{\cdot}5)$$ and that $$n$$ was equal to $$60\times60\times6{\cdot}5/\Delta s$$. We generated the microstructure noise $$U_t$$ according to a normal or scaled $$t$$ distribution, and for the $$\,X_t$$ we used the Heston model ${\rm d} X_t=\sigma_t\,{\rm d} B_t,\quad {\rm d}\sigma_t^2 = \kappa(\tau-\sigma_t^2)\,{\rm d} t+\gamma \sigma_t\,{\rm d} W_t\,,$ where $$E({\rm d} B_t\, {\rm d} W_t)=\rho\, {\rm d} t$$ and $$\kappa$$, $$\tau$$, $$\gamma$$ and $$\rho$$ are parameters. As in Aït-Sahalia & Yu (2009), we set the drift part of $$\,X_t$$ to zero. The effect of the drift function is asymptotically negligible; see, for example, Xiu (2010). We considered two models, with values similar to those used by Aït-Sahalia & Yu (2009), which reflect practical scenarios in finance (see also Xiu, 2010; Liu & Tang, 2014): (i) $$(\kappa, \tau, \gamma, \rho)=(6,0{\cdot}16,0{\cdot}5,-0{\cdot}6)$$; (ii) $$(\kappa, \tau, \gamma, \rho)=(4, 0{\cdot}09, 0{\cdot}3, -0{\cdot}75)$$. In each case we took $$\,X_0=\log(100)$$ and considered $$U_t\sim N(0,\sigma_U^2)$$ and $$U_t\sim \sigma_U\, t(8)$$, where $$\sigma_U=0{\cdot}001$$ and $$\sigma_U=0{\cdot}005$$. Typical $$\,X_t$$ and $$Y_t$$ paths for each model, plotted in the Supplementary Material, show that the $$\,X_t$$ paths have smaller variation in model (ii) than in model (i). The $$\,X_t$$ paths with smaller variation have less nuisance impact on estimators of $$U_t$$-related quantities. Thus, estimating the moments and density of $$U_t$$ should be easier in model (ii). In each setting, we generated 1000 samples of the form $$Y_{t_0},\ldots,Y_{t_n}$$ and applied our estimator of the density $$\,f_U$$ to each sample, thus obtaining 1000 density estimators $$\,\hat f_U$$ computed as in (5). We chose the smoothing parameters as in § 3.1, and used the sinc kernel defined below (5). However, while this kernel guarantees optimal theoretical properties, in practice it produces negative wiggles in the tails, which we truncate to zero since $$\,f_U$$ is a density. In the Supplementary Material, we show the results obtained when using the Gaussian kernel, which suggest that overall the sinc kernel works better, but the Gaussian kernel produces more attractive estimators in the tails. In cases where the sample has ties, for example when the sample size is large and the data are observed with only a few significant digits, as in our real-data example, the wiggles of the sinc kernel cause it to perform poorly, and significantly better results can be obtained by using the Gaussian kernel; see § 3.3. For each estimator we computed the integrated squared error $$\int (\,\hat f_U-f_U)^2$$, and their median and first and third quartiles are reported in Table 1. In Fig. 1 we plot, for selected settings with normal error, the estimated curves $$\,\hat f_U$$ computed from the samples corresponding to those three quartiles. Our results indicate that our density estimator works well. For a given setting, error densities with larger variances are easier to estimate. Figure 1 shows that our estimator improves as the sample size increases, that is, as $$\Delta s$$ decreases. The estimated densities are better in model (ii) than in model (i), as expected. While it is difficult to compare estimators of different target densities, the figures suggest that the difficulty in estimating the error densities depends more on the smoothness of $$\,X_t$$ than on the error type. This reflects the fact that the smoothness of the error density has no first-order impact on the quality of estimators, as indicated by Theorems 1 and 2. Fig. 1. View largeDownload slide The estimator $$\,\hat f_U(x)$$ in (5) in the case of normal errors, for three samples corresponding to the first (- - -), second ($$\cdots$$) and third (-$$\cdot$$-) quartiles of the integrated squared errors of estimators computed from data under model (i) with $$\sigma_U=0{\cdot}005$$ (left panels) and with $$\sigma_U=0{\cdot}001$$ (middle panels), and under model (ii) with $$\sigma_U=0{\cdot}001$$ (right panels), when $$\Delta s=30$$ (upper panels) and when $$\Delta s=5$$ (lower panels). In each panel the solid curve depicts the true $$\,f_U(x)$$. Fig. 1. View largeDownload slide The estimator $$\,\hat f_U(x)$$ in (5) in the case of normal errors, for three samples corresponding to the first (- - -), second ($$\cdots$$) and third (-$$\cdot$$-) quartiles of the integrated squared errors of estimators computed from data under model (i) with $$\sigma_U=0{\cdot}005$$ (left panels) and with $$\sigma_U=0{\cdot}001$$ (middle panels), and under model (ii) with $$\sigma_U=0{\cdot}001$$ (right panels), when $$\Delta s=30$$ (upper panels) and when $$\Delta s=5$$ (lower panels). In each panel the solid curve depicts the true $$\,f_U(x)$$. Table 1. Median integrated squared error [first quartile, third quartile] of $$\,\hat f_U$$ in (5), calculated from $$1000$$ simulated samples from models $$\rm (i)$$ and $$\rm (ii)$$  Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Table 1. Median integrated squared error [first quartile, third quartile] of $$\,\hat f_U$$ in (5), calculated from $$1000$$ simulated samples from models $$\rm (i)$$ and $$\rm (ii)$$  Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] Normal errors Student errors $$\Delta s$$ $$\sigma_U$$ Model (i) Model (ii) Model (i) Model (ii) $$30$$ 0$$\cdot$$005 0$$\cdot$$29 [0$$\cdot$$13, 0$$\cdot$$59] 0$$\cdot$$26 [0$$\cdot$$10, 0$$\cdot$$58] 0$$\cdot$$37 [0$$\cdot$$19, 0$$\cdot$$79] 0$$\cdot$$36 [0$$\cdot$$16, 0$$\cdot$$76] 0$$\cdot$$001 9$$\cdot$$98 [5$$\cdot$$39, 16$$\cdot$$6] 5$$\cdot$$63 [2$$\cdot$$90, 10$$\cdot$$2] 9$$\cdot$$82 [5$$\cdot$$8, 15$$\cdot$$59] 6$$\cdot$$14 [3$$\cdot$$28, 10$$\cdot$$4] $$5$$ 0$$\cdot$$005 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$27] 0$$\cdot$$10 [0$$\cdot$$04, 0$$\cdot$$29] 0$$\cdot$$12 [0$$\cdot$$06, 0$$\cdot$$27] 0$$\cdot$$14 [0$$\cdot$$07, 0$$\cdot$$33] 0$$\cdot$$001 1$$\cdot$$46 [0$$\cdot$$64, 3$$\cdot$$21] 1$$\cdot$$02 [0$$\cdot$$41, 2$$\cdot$$18] 1$$\cdot$$59 [0$$\cdot$$78, 2$$\cdot$$97] 1$$\cdot$$16 [0$$\cdot$$56, 2$$\cdot$$19] $$1$$ 0$$\cdot$$005 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$14] 0$$\cdot$$04 [0$$\cdot$$01, 0$$\cdot$$15] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$14] 0$$\cdot$$05 [0$$\cdot$$02, 0$$\cdot$$13] 0$$\cdot$$001 0$$\cdot$$32 [0$$\cdot$$12, 1$$\cdot$$01] 0$$\cdot$$25 [0$$\cdot$$10, 0$$\cdot$$82] 0$$\cdot$$42 [0$$\cdot$$20, 0$$\cdot$$95] 0$$\cdot$$36 [0$$\cdot$$15, 0$$\cdot$$86] We also applied the moment estimators from § 2.4 to data simulated from a rescaled version of our two models; the rescaling was applied to avoid working with numbers all numerically rounded to zero. Specifically, we replaced $$(\tau, \gamma, \sigma^2_U,X_0)$$ in our models by $$(c^2\tau, c\gamma, c^2 \sigma^2_U,cX_0)$$, where $$c=100$$. We present the results in Table 2. These results indicate that, as expected, performance improves as the sample size increases. Moreover, the performance is best for lower-order moments, for higher noise levels, and for model (ii). For a given error variance, the moments are easier to recover when the errors have a rescaled Student distribution than when they have a normal distribution. Table 2. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat{M}_{U,{2k}}-M_{U,{2k}})/M_{U,{2k}}$$ of our estimator $$\hat{M}_{U,2k}$$ in (6), calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$ and computed using $$\xi=t_2-t_1$$, for $$k=1$$ and $$2$$  $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) Table 2. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat{M}_{U,{2k}}-M_{U,{2k}})/M_{U,{2k}}$$ of our estimator $$\hat{M}_{U,2k}$$ in (6), calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$ and computed using $$\xi=t_2-t_1$$, for $$k=1$$ and $$2$$  $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) $$\Delta s=30$$ $$\Delta s=5$$ $$\Delta s=1$$ Errors Model $$\sigma_U$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ $$k=1$$ $$k=2$$ Normal (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$7 (6$$\cdot$$4) 3 (18) 0$$\cdot$$27 (2$$\cdot$$5) 0$$\cdot$$53 (7) 0$$\cdot$$08 (1$$\cdot$$1) 0$$\cdot$$15 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 41 (15) 99 (51) 6$$\cdot$$8 (3$$\cdot$$6) 14 (9$$\cdot$$6) 1$$\cdot$$4 (1$$\cdot$$2) 2$$\cdot$$8 (3$$\cdot$$5) (ii) $$0$$$$\cdot$$$$005$$ 1 (6$$\cdot$$3) 1$$\cdot$$6 (18) 0$$\cdot$$15 (2$$\cdot$$5) 0$$\cdot$$29 (7) 0$$\cdot$$06 (1$$\cdot$$1) 0$$\cdot$$11 (3$$\cdot$$3) $$0$$$$\cdot$$$$001$$ 23 (10) 51 (31) 3$$\cdot$$8 (2$$\cdot$$9) 7$$\cdot$$8 (8$$\cdot$$1) 0$$\cdot$$79 (1$$\cdot$$2) 1$$\cdot$$6 (3$$\cdot$$4) Student (i) $$0$$$$\cdot$$$$005$$ 1$$\cdot$$4 (7$$\cdot$$7) 0$$\cdot$$5 (37) 0$$\cdot$$28 (3) 0$$\cdot$$13 (17) 0$$\cdot$$03 (1$$\cdot$$4) $$-$$0$$\cdot$$2 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 30 (14) 45 (48) 5$$\cdot$$2 (3$$\cdot$$6) 6$$\cdot$$9 (18) 1 (1$$\cdot$$4) 1$$\cdot$$1 (9$$\cdot$$9) (ii) $$0$$$$\cdot$$$$005$$ 0$$\cdot$$9 (7$$\cdot$$7) $$-$$0$$\cdot$$2 (37) 0$$\cdot$$2 (3) 0$$\cdot$$01 (17) 0$$\cdot$$01 (1$$\cdot$$4) $$-$$0$$\cdot$$23 (9$$\cdot$$9) $$0$$$$\cdot$$$$001$$ 17 (10) 23 (43) 3 (3$$\cdot$$2) 3$$\cdot$$8 (17) 0$$\cdot$$57 (1$$\cdot$$4) 0$$\cdot$$49 (9$$\cdot$$9) Finally, we applied our volatility estimator in each of our four settings and compared it with the estimator of Zhang (2006). For our method we took $$\xi=t_2-t_1$$, and to choose $$s_1,\ldots,s_m$$ we took $$m=50$$ equispaced points located between 0 and $$S$$, where $$S$$ is the largest number for which $$\,\hat{f}_{\tilde{U}}^{\mathrm{Ft}}(S;\xi)\geqslant 0{\cdot}99$$. The results are presented in Table 3. In the Supplementary Material, we also show the first three quartiles of 100 times the relative absolute deviation, $$|\hat \beta-\beta|/\beta$$, of both estimators. Together, the results indicate that the two estimators give similar results overall. In the case where the error variance was large, our estimator tended to work a little better, and in the case where the error variance was small, the estimator of Zhang (2006) worked a little better. Specifically, when the error variance was small, theirs was a little less biased, but ours was a little less variable. As expected, the performance of both estimators improved as the sample size increased. The estimators worked a little better when the error variance was small: they were a little more biased but significantly less variable. The integrated volatility is an $$\,X_t$$-related quantity and is therefore easier to estimate when there are less errors contaminating the $$\,X_t$$. Depending on the situation, either the Student errors or the normal errors yielded better results. Table 3. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat \beta-\beta)/\beta$$ using our estimator of $$\int_0^T \sigma_t^2$$$${\rm d}$$ t from §2.5, denoted by Ours, and the estimator of Zhang (2006), denoted by Zhang, calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$  Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Table 3. Bias $$(\times\,10^2)$$, with standard deviation $$(\times\,10^2)$$ in parentheses, of $$(\hat \beta-\beta)/\beta$$ using our estimator of $$\int_0^T \sigma_t^2$$$${\rm d}$$ t from §2.5, denoted by Ours, and the estimator of Zhang (2006), denoted by Zhang, calculated from $$1000$$ simulated samples from models $${\rm (i)}$$ and $${\rm (ii)}$$  Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) Normal errors Student errors $$\sigma_U$$ $$\Delta s$$ $$\hat\beta$$ Model (i) Model (ii) Model (i) Model (ii) $$0$$$$\cdot$$$$005$$ 30 Ours $$-$$1$$\cdot$$12 (25$$\cdot$$9) 0$$\cdot$$42 (29$$\cdot$$6) $$-$$1$$\cdot$$50 (28$$\cdot$$0) 0$$\cdot$$71 (33$$\cdot$$6) Zhang $$-$$2$$\cdot$$87 (26$$\cdot$$0) $$-$$2$$\cdot$$91 (29$$\cdot$$7) $$-$$3$$\cdot$$89 (28$$\cdot$$2) $$-$$3$$\cdot$$74 (33$$\cdot$$7) 5 Ours $$-$$1$$\cdot$$13 (17$$\cdot$$5) $$-$$0$$\cdot$$65 (20$$\cdot$$1) $$-$$0$$\cdot$$78 (18$$\cdot$$3) $$-$$0$$\cdot$$53 (21$$\cdot$$8) Zhang $$-$$1$$\cdot$$38 (17$$\cdot$$5) $$-$$1$$\cdot$$18 (20$$\cdot$$1) $$-$$1$$\cdot$$15 (18$$\cdot$$3) $$-$$1$$\cdot$$25 (21$$\cdot$$8) 1 Ours $$-$$0$$\cdot$$61 (11$$\cdot$$2) $$-$$0$$\cdot$$78 (12$$\cdot$$7) $$-$$0$$\cdot$$59 (12$$\cdot$$1) $$-$$0$$\cdot$$59 (14$$\cdot$$4) Zhang $$-$$0$$\cdot$$64 (11$$\cdot$$2) $$-$$0$$\cdot$$87 (12$$\cdot$$7) $$-$$0$$\cdot$$65 (12$$\cdot$$1) $$-$$0$$\cdot$$73 (14$$\cdot$$5) $$0$$$$\cdot$$$$001$$ 30 Ours $$-$$5$$\cdot$$29 (21$$\cdot$$4) $$-$$4$$\cdot$$38 (21$$\cdot$$8) $$-$$6$$\cdot$$25 (21$$\cdot$$6) $$-$$5$$\cdot$$40 (22$$\cdot$$1) Zhang $$-$$2$$\cdot$$80 (22$$\cdot$$3) $$-$$2$$\cdot$$82 (22$$\cdot$$4) $$-$$4$$\cdot$$25 (22$$\cdot$$5) $$-$$4$$\cdot$$21 (22$$\cdot$$7) 5 Ours $$-$$3$$\cdot$$17 (14$$\cdot$$4) $$-$$2$$\cdot$$54 (14$$\cdot$$7) $$-$$2$$\cdot$$30 (14$$\cdot$$5) $$-$$1$$\cdot$$80 (14$$\cdot$$7) Zhang $$-$$1$$\cdot$$73 (14$$\cdot$$8) $$-$$1$$\cdot$$71 (14$$\cdot$$9) $$-$$1$$\cdot$$17 (14$$\cdot$$8) $$-$$1$$\cdot$$16 (14$$\cdot$$9) 1 Ours $$-$$1$$\cdot$$02 (9$$\cdot$$62) $$-$$0$$\cdot$$72 (9$$\cdot$$71) $$-$$1$$\cdot$$15 (9$$\cdot$$67) $$-$$0$$\cdot$$91 (9$$\cdot$$76) Zhang $$-$$0$$\cdot$$31 (9$$\cdot$$75) $$-$$0$$\cdot$$32 (9$$\cdot$$79) $$-$$0$$\cdot$$63 (9$$\cdot$$76) $$-$$0$$\cdot$$62 (9$$\cdot$$80) 3.3. Real-data analysis We applied our procedure to analysing the high-frequency price data of Microsoft Corporation on the ten trading days from 19 March to 2 April 2013, available from the Trade and Quote database. We took the $$Y_t$$s to equal the log prices. Following Barndorff-Nielsen et al. (2011), we pre-processed the data by deleting entries that have zero or negative prices, deleting entries with negative correlation indicator, deleting entries with a letter code in COND, except for E or F, deleting entries outside the period 9:30 a.m. to 4:00 p.m., and using the median price if there were multiple entries at the same time. In this example the sample size is very large, which entails a large number of ties among the $$Y_t$$s. As a result, the sinc kernel produced overly wiggly estimates, regarding the data as coming from a multimodal density with modes located at the ties. The oscillation problems of this kernel are well known, but here they are exacerbated by the ties, so we used the standard Gaussian kernel, which is less affected by such issues. The only small adjustment we had to make was to break the ties by adding a small perturbation $$\epsilon_t\sim N(0,a_j^2)$$ to the $$\Delta_{Y,\,j}$$s when computing the bandwidth of the standard kernel estimator $$\,\tilde f_{1}(x)$$ used in our bandwidth selection procedure, where $$2 a_j$$ was equal to the maximum of the distance between each $$\Delta_{Y,\,j}$$ and its first smaller and larger nearest neighbours. Figure 2 shows the error densities estimated by our method for three trading days in 2013: 20 March, 28 March and 2 April. In this example, the magnitude of the errors is about $$10^{-3}$$ smaller than that of the log prices themselves, but their aggregated impact on quantities such as integrated volatility is substantial. For example, for those three trading days the realized volatility was respectively 3$$\cdot$$5, 4$$\cdot$$5 and 3$$\cdot$$5 times $$10^{-4}$$, which is dominated by contributions from the errors. Indeed, for the same days, our error-corrected estimator was respectively 0$$\cdot$$5, 0$$\cdot$$5 and 0$$\cdot$$3 times $$10^{-4}$$, and the estimator of Zhang (2006) was respectively 0$$\cdot$$5, 0$$\cdot$$6 and 0$$\cdot$$4 times $$10^{-4}$$. Figure 2 View largeDownload slide Estimated densities of the errors contaminating the log prices of the Microsoft Corporation data for 20 March (solid), 28 March (dotted) and 2 April (dashed) 2013. Figure 2 View largeDownload slide Estimated densities of the errors contaminating the log prices of the Microsoft Corporation data for 20 March (solid), 28 March (dotted) and 2 April (dashed) 2013. Interestingly, even for this short period of ten trading days, the distributions of the errors were quite different, especially in their tails. Since heavier tails can be linked with higher levels of variation and may affect the properties of the moments, it would be interesting to further investigate the tails of the error distributions of high-frequency financial data. Different tail behaviour may also be associated with different trading or market conditions on different days. For example, the behaviour of the error distributions may differ on days when the whole market or certain industrial segments such as IT are roaring. Hence, further investigations into empirical features of this kind connecting the error distributions with practical market conditions can be helpful for gaining better understanding of the microstructure noise in high-frequency financial data. 4. Discussion As a first attempt at a deconvolution approach with a contaminated slow-varying continuous process, our error density estimator method and the technical analysis in this paper assume that the distribution of errors in the high-frequency data is symmetric. Our method and analysis have shown potential for revealing various features of high-frequency financial data concerning the error distributions. Relaxing the symmetry assumption is possible and of interest, but challenging. The corresponding frequency domain method is under investigation. Our analysis focuses on the univariate setting for an individual stock or trading instrument. An interesting open question would be to extend the frequency domain analysis to a multivariate context, which would help reveal the dependence structure between errors contaminating different stock prices. However, the problem is significantly more difficult because of issues such as high data dimensionality and asynchronous trading records in multivariate high-frequency data. We hope to address this problem in the future. Acknowledgement We are grateful to the editor, an associate editor and two referees for their helpful suggestions. Chang was supported in part by the Fundamental Research Funds for the Central Universities of China, the National Natural Science Foundation of China, and the Center of Statistical Research and the Joint Lab of Data Science and Business Intelligence at Southwestern University of Finance and Economics. Delaigle was supported by a Future Fellowship and a Discovery Project from the Australian Research Council. Hall was supported by a Laureate Fellowship and a Discovery Project from the Australian Research Council. Tang acknowledges support from the U.S. National Science Foundation. The main part of this work was completed while Jinyuan Chang was a postdoc under Peter Hall’s supervision. Peter was a generous mentor and friend with a warm heart. He will be greatly missed. Supplementary material Supplementary material available at Biometrika online includes a description of the simulation extrapolation bandwidth of Delaigle & Hall (2008), additional simulation results, a construction of confidence regions for the moment estimators of § 2.4, and all the proofs. Appendix Algorithm A1. Overview of our bandwidth selection procedure. (a) Find $$\,f_{1}$$ and $$\,f_{_2}$$ so that the relationship between $$\,f_{1}$$ and $$\,f_{2}$$ mimics that between $$\,f_U$$ and $$\,f_{1}$$ and so that $$\,f_{1}$$ and $$\,f_{2}$$ can be accurately estimated using our data. (b) (b) For our estimator in (5), $$\,f_U$$ is estimated from $$\Delta Y_{j,\ell}=Y_{t_{j+\ell}}-Y_{t_{j}} \approx U_{t_{j+\ell}}-U_{t_{j}}$$, with $$|t_{j+\ell}-t_j|$$ being small, $$U_{t_{j+\ell}}\sim f_U$$ and $$U_{t_{j}}\sim f_U$$ being independent. To imitate this, for $$k=1,2$$ we construct versions of $$\Delta Y_{j,\ell}$$, say $$\Delta Y_{j,\ell}^{_k}$$, such that if $$|t_{j+\ell}^{_k}-t_j^{_k}|$$ is small, $$\Delta Y_{j,\ell}^{_k} \approx U_{t_{j+\ell}^{_k}}^{_k}-U_{t_{j}^{_k}}^{_k}$$, where $$U_{t_{j+\ell}^{_k}}^{_k}\sim f_{k}$$ and $$U_{t_{j}^{_k}}^{_k}\sim f_{k}$$ are independent. (c) For $$k=1, 2$$, we can compute a bandwidth $$h_{k}$$ and a $$\xi_{k}$$ for estimating $$\,f_{k}$$ using our procedure in (5) applied to the data $$\Delta Y_{j,\ell}^{_k}$$; see (8) and (9). (d) Step (a) suggests that $$h/h_{1} \approx h_{1}/h_{2}$$, so that we can take $$\hat h = h_{1}^2/ h_{2}$$. Algorithm A2. Constructing $$\Delta Y_{j,\ell}^{_1}$$ and $$t_{j}^{_1}$$. (a) For $$j=0,\ldots, n-1$$, let $$t_j^{_1}=(t_{j}+t_{j+1})/2$$ and $$Y_{1,t_j^{_1}}^{_1}=(Y_{t_{j}}+Y_{t_{j+1}})/ \surd 2$$. For $$\ell> 1$$ and $$j=0,\ldots,n-\ell-1$$, take $$\Delta Y_{j,\ell}^{_1}=Y_{1,t_{j+\ell}^{_1}}^{_1}-Y_{1,t_j^{_1}}^{_1}\approx U_{1,t_{j+\ell}^{_1}}^{_1}-U_{1,t_{j}^{_1}}^{_1}$$, where $$U^{_1}_{1,t_j^{_1}}= (U_{t_{j}}+U_{t_{j+1}})/ \surd 2\sim f_{1}$$. This does not work for $$\ell=1$$ because $$Y_{1,t_{j+1}^{_1}}^{_1}-Y_{1,t_j^{_1}}^{_1}\approx (U_{t_{j+2}}-U_{t_{j}})/ \surd 2\not\sim f_{1}*f_{1}$$; we suggest taking $$\Delta Y_{j,1}^{_1}$$ as in (b). (b) For $$j=0,\ldots,n-2$$, let $$t_j^{_1}=(t_{j}+t_{j+2})/2$$ and $$Y_{2,t_j^{_1}}^{_1}=(Y_{t_{j}}+Y_{t_{j+2}})/\surd 2$$. Take $$\Delta Y_{j,1}^{_1}=Y_{2,t_{j+1}^{_1}}^{_1}-Y_{2,t_j^{_1}}^{_1}\approx U^{_1}_{2,t_{j+1}^{_1}}-U^{_1}_{2,t_j^{_1}}$$ with $$U^{_1}_{2,t_j^{_1}}=(U_{t_{j}}+U_{t_{j+2}})/\surd 2\sim f_{1}$$. Algorithm A3. Constructing $$\Delta Y_{j,\ell}^{_2}$$ and $$t_{j}^{_2}$$. Step 1. For $$j=0,\ldots, n-3$$, let $$Y_{4,t_{j}^{_2}}^{_2}=\sum_{k=0}^{3}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0}^{3}t_{j+k}/4$$. For $$\ell\geqslant 4$$ and $$j=0,\ldots, n-\ell-3$$, take $$\Delta Y_{j,\ell}^{_2}= Y_{4,t_{j+\ell}^{_2}}^{_2}-Y_{4,t_{j}^{_2}}^{_2}\approx U^{_2}_{4,t_{j+\ell}^{_2}}-U^{_2}_{4,t_j^{_2}}$$, where $$U^{_2}_{4,t_j^{_2}}= \sum_{k=0}^{3}U_{t_{j+k}}/ 2 \sim f_{2}$$. This does not work for $$\ell=1,2,3$$ for similar reasons as in Algorithm A2; for $$\ell=1,2,3$$, we suggest taking $$\Delta Y_{j,\ell}^{_2}$$ as in Steps 2 and 3. Step 2. For $$j=0,\ldots,n-6$$, let $$Y_{3,t_{j}^{_2}}^{_2}=\sum_{k=0,1,2,6}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0,1,2,6}t_{j+k}/4$$. For $$j=0,\ldots, n-5$$, let $$Y_{2,t_{j}^{_2}}^{_2}=\sum_{k=0,1,4,5}Y_{t_{j+k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0,1,4,5}t_{j+k}/4$$. For $$j=0,\ldots, n-6$$, let $$Y_{1,t_{j}^{_2}}^{_2}=\sum_{k=0}^{3}Y_{t_{j+2k}}/2$$ and $$t_{j}^{_2}=\sum_{k=0}^{3}t_{j+2k}/4$$. Step 3. For $$j=0,\ldots,n-9$$, take $$\Delta Y_{j,3}^{_2}=Y_{3,t_{j+3}^{_2}}^{_2}-Y_{3,t_{j}^{_2}}^{_2}\approx U_{3,t_{j+3}^{_2}}^{_2}-U_{3,t_{j}^{_2}}^{_2}$$, where $$U^{_2}_{3,t_j^{_2}}= \sum_{k=0,1,2,6} U_{t_{j+k}}/ 2 \sim f_{2}$$. For $$j=0,\ldots,n-7$$ and $$\ell=1,2$$, take $$\Delta Y_{j,\ell}^{_2}=Y_{\ell,t_{j+\ell}^{_2}}^{_2}-Y_{\ell,t_{j}^{_2}}^{_2}\approx U_{\ell,t_{j+\ell}^{_2}}^{_2}-U_{\ell,t_{j}^{_2}}^{_2}$$, where $$U^{_2}_{1,t_j^{_2}}= \sum_{k=0}^{3}U_{t_{j+2k}}/ 2 \sim f_{2}$$ and $$U^{_2}_{2,t_j^{_2}}=\sum_{k=0,1,4,5}^{3}U_{t_{j+k}}/ 2 \sim f_{2}$$. References Aït-Sahalia, Y., Fan, J. & Xiu, D. ( 2010 ). High-frequency covariance estimates with noisy and asynchronous financial data. J. Am. Statist. Assoc. 105 , 1504 – 17 . Google Scholar CrossRef Search ADS Aït-Sahalia, Y. & Jacod, J. ( 2014 ). High-Frequency Financial Econometrics . Princeton : Princeton University Press . Google Scholar CrossRef Search ADS Aït-Sahalia, Y. & Yu, J. ( 2009 ). High frequency market microstructure noise estimates and liquidity measures. Ann. Appl. Statist. 3 , 422 – 57 . Google Scholar CrossRef Search ADS Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. & Shephard, N. ( 2011 ). Multivariate realized kernels: Consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. J. Economet. 162 , 149 – 69 . Google Scholar CrossRef Search ADS Bolger, N. & Laurenceau, J.-P. ( 2013 ). Intensive Longitudinal Methods: An Introduction to Diary and Experience Sampling Research. New York : Guilford Press. Buckley, M. J., Eagleson, G. K. & Silverman, B. W. ( 1988 ). The estimation of residual variance in nonparametric regression. Biometrika 75 , 189 – 99 . Carroll, R. J. & Hall, P. ( 1988 ). Optimal rates of convergence for deconvolving a density. J. Am. Statist. Assoc. 83 , 1184 – 6 . Google Scholar CrossRef Search ADS Delaigle, A. ( 2008 ). An alternative view of the deconvolution problem. Statist. Sinica 18 , 1025 – 45 . Delaigle, A. & Hall, P. ( 2008 ). Using SIMEX for smoothing-parameter choice in errors-in-variables problems. J. Am. Statist. Assoc. 103 , 280 – 7 . Google Scholar CrossRef Search ADS Delaigle, A., Hall, P. & Meister, A. ( 2008 ). On deconvolution with repeated measurements. Ann. Statist. 36 , 665 – 85 . Google Scholar CrossRef Search ADS Delaigle, A. & Zhou, W.-X. ( 2015 ). Nonparametric and parametric estimators of prevalence from group testing data with aggregated covariates. J. Am. Statist. Assoc. 110 , 1785 – 96 . Google Scholar CrossRef Search ADS Dufrenot, G., Jawadi, F. & Louhichi, W. ( 2014 ). Market Microstructure and Nonlinear Dynamics: Keeping Financial Crisis in Context . New York : Springer . Gloter, A. & Jacod, J. ( 2001 ). Diffusions with measurement errors: I. Local asymptotic normality. Eur. Ser. Appl. Indust. Math. 5 , 225 – 42 . Hall, P., Kay, J. W. & Titterington, D. M. ( 1990 ). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 , 521 – 8 . Google Scholar CrossRef Search ADS Hautsch, N. ( 2012 ). Econometrics of Financial High-Frequency Data . New York : Springer . Google Scholar CrossRef Search ADS Jacod, J., Li, Y. & Zheng, X. ( 2017 ). Statistical properties of microstructure noise. Econometrica 85 , 1133 – 74 . Google Scholar CrossRef Search ADS Liu, C. & Tang, C. Y. ( 2014 ). A quasi-maximum likelihood approach for integrated covariance matrix estimation with high-frequency data. J. Economet. 180 , 217 – 32 . Google Scholar CrossRef Search ADS Meister, A. ( 2007 ). Optimal convergence rates of density estimation from grouped data. Statist. Prob. Lett. 77 , 1091 – 7 . Google Scholar CrossRef Search ADS Müller, H.-G., Sen, R. & Stadtmüller, U. ( 2011 ). Functional data analysis for volatility. J. Economet. 165 , 233 – 45 . Google Scholar CrossRef Search ADS Olhede, S. C., Sykulski, A. M. & Pavliotis, G. A. ( 2009 ). Frequency domain estimation of integrated volatility for Itô processes in the presence of market-microstructure noise. Multiscale Mod. Simul. 8 , 393 – 427 . Google Scholar CrossRef Search ADS Sheather, S. J. & Jones, M. C. ( 1991 ). A reliable data-based bandwidth selection method for kernel density estimation. J. R. Statist. Soc. 53 , 683 – 90 . Stefanski, L. A. & Carroll, R. J. ( 1990 ). Deconvoluting kernel density estimators. Statistics 21 , 169 – 84 . Google Scholar CrossRef Search ADS Tao, M., Wang, Y. & Zhou, H. H. ( 2013 ). Optimal sparse volatility matrix estimation for high dimensional Itô processes with measurement errors. Ann. Statist. 41 , 1816 – 64 . Google Scholar CrossRef Search ADS Wang, J. -L., Chiou, J.-M. & Müller, H. G. ( 2016 ). Review of functional data analysis. Ann. Rev. Statist. Appl. 3 , 257 – 95 . Google Scholar CrossRef Search ADS Xiu, D. ( 2010 ). Quasi-maximum likelihood estimation of volatility with high-frequency data. J. Economet. 159 , 235 – 50 . Google Scholar CrossRef Search ADS Zhang, L. ( 2006 ). Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach. Bernoulli 12 , 1019 – 43 . Google Scholar CrossRef Search ADS Zhang, L., Mykland, P. A. & Aït-Sahalia, Y. ( 2005 ). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. J. Am. Statist. Assoc. 100 , 1394 – 411 . Google Scholar CrossRef Search ADS © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

### Journal

BiometrikaOxford University Press

Published: Mar 6, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create folders to

Export folders, citations