Convergence of regression-adjusted approximate Bayesian computation

Convergence of regression-adjusted approximate Bayesian computation SUMMARY We present asymptotic results for the regression-adjusted version of approximate Bayesian computation introduced by Beaumont et al. (2002). We show that for an appropriate choice of the bandwidth, regression adjustment will lead to a posterior that, asymptotically, correctly quantifies uncertainty. Furthermore, for such a choice of bandwidth we can implement an importance sampling algorithm to sample from the posterior whose acceptance probability tends to unity as the data sample size increases. This compares favourably to results for standard approximate Bayesian computation, where the only way to obtain a posterior that correctly quantifies uncertainty is to choose a much smaller bandwidth, one for which the acceptance probability tends to zero and hence for which Monte Carlo error will dominate. 1. Introduction Modern statistical applications increasingly require the fitting of complex statistical models, which are often intractable in the sense that it is impossible to evaluate the likelihood function. This excludes standard implementation of likelihood-based methods, such as maximum likelihood estimation or Bayesian analysis. To overcome this difficulty, there has been substantial interest in likelihood-free or simulation-based methods, which replace calculating the likelihood by simulation of pseudo-datasets from the model. Inference can then be performed by comparing these pseudo-datasets, simulated for a range of different parameter values, to the actual data. Examples of such likelihood-free methods include simulated methods of moments (Duffie & Singleton, 1993), indirect inference (Gouriéroux & Ronchetti, 1993; Heggland & Frigessi, 2004), synthetic likelihood (Wood, 2010) and approximate Bayesian computation (Beaumont et al., 2002). Of these, approximate Bayesian computation methods are arguably the most common methods for Bayesian inference, and have been popular in population genetics (e.g., Beaumont et al., 2002; Cornuet et al., 2008), ecology (e.g., Beaumont, 2010) and systems biology (e.g., Toni et al., 2009); more recently they have seen increased use in other application areas, such as econometrics (Calvet & Czellar, 2015) and epidemiology (Drovandi & Pettitt, 2011). The idea of approximate Bayesian computation is to first summarize the data using low-dimensional summary statistics, such as sample means or autocovariances or suitable quantiles of the data. The posterior density given the summary statistics is then approximated as follows. Assume the data are $$Y_{{\rm obs}}=(y_{{\rm obs},1},\ldots,y_{{\rm obs},n})$$ and modelled as a draw from a parametric model with parameter $$\theta \in \mathbb{R}^p$$. Let $$K(x)$$ be a positive kernel, where $$\max_{x}K(x)=1$$, and let $$\varepsilon>0$$ be the bandwidth. For a given $$d$$-dimensional summary statistic $$s_{}(Y)$$, our model will define a density $$f_n(s \mid \theta)$$. We then define a joint density, $$\pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})$$, for $$(\theta,s)$$ as \begin{align} & \frac{\pi(\theta)f_{n}(s\mid\theta)K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}}{\int_{\mathbb{R}^p\times\mathbb{R}^{d}}\pi(\theta) f_{n}(s\mid\theta)K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}\,{\rm d}\theta\, {\rm d} s}, \end{align} (1) where $$s_{{\rm obs}}=s_{}(Y_{{\rm obs}})$$. Our approximation to the posterior density is the marginal density \begin{equation} \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})=\int \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})\,{\rm d} s, \end{equation} (2) which we call the approximate Bayesian computation posterior density. For brevity we will often shorten this to posterior density in the following. We will always call the actual posterior given the summary the true posterior. The idea of approximate Bayesian computation is that we can sample from $$\pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$ without needing to evaluate the likelihood function or $$f_{n}(s\mid\theta)$$. The simplest approach is via rejection sampling (Beaumont et al., 2002), which proceeds by simulating a parameter value and an associated summary statistic from $$\pi(\theta)f_n(s\mid\theta)$$. This pair is then accepted with probability $$K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}$$. The accepted pairs will be drawn from (1), and the accepted parameter values will be drawn from the posterior (2). Implementing this rejection sampler requires only the ability to simulate pseudo-datasets from the model, and then to be able to calculate the summary statistics for those datasets. Alternative algorithms for simulating from the posterior include adaptive or sequential importance sampling (Beaumont et al., 2009; Bonassi & West, 2015; Lenormand et al., 2013; Filippi et al., 2013) and Markov chain Monte Carlo approaches (Marjoram et al., 2003; Wegmann et al., 2009). These aim to propose parameter values in areas of high posterior probability, and thus can be substantially more efficient than rejection sampling. However, the computational efficiency of all these methods is limited by the probability of acceptance for data simulated with a parameter value that has high posterior probability. This paper is concerned with the asymptotic properties of approximate Bayesian computation. It builds upon Li & Fearnhead (2018) and Frazier et al. (2016), who present results on the asymptotic behaviour of the posterior distribution and the posterior mean of approximate Bayesian computation as the amount of data, $$n$$, increases. Their results highlight the tension in approximate Bayesian computation between choices of the summary statistics and bandwidth that will lead to more accurate inferences, and choices that will reduce the computational cost or Monte Carlo error of algorithms for sampling from the posterior. An informal summary of some of these results is as follows. Assume a fixed-dimensional summary statistic and that the true posterior variance given this summary decreases like $$1/n$$ as $$n$$ increases. The theoretical results compare the posterior, or posterior mean, of approximate Bayesian computation to the true posterior, or true posterior mean, given the summary of the data. The accuracy of using approximate Bayesian computation is governed by the choice of bandwidth, and this choice should depend on $$n$$. Li & Fearnhead (2018) shows that the optimal choice of this bandwidth will be $$O(n^{-1/2})$$. With this choice, estimates based on the posterior mean of approximate Bayesian computation can, asymptotically, be as accurate as estimates based on the true posterior mean given the summary. Furthermore the Monte Carlo error of an importance sampling algorithm with a good proposal distribution will only inflate the mean square error of the estimator by a constant factor of the form $$1+O(1/N)$$, where $$N$$ is the number of pseudo-datasets. These results are similar to the asymptotic results for indirect inference, where error for a Monte Carlo sample of size $$N$$ also inflates the overall mean square error of estimators by a factor of $$1+O(1/N)$$ (Gouriéroux & Ronchetti, 1993). By comparison, choosing a bandwidth which is $$o(n^{-1/2})$$ will lead to an acceptance probability that tends to zero as $$n\rightarrow \infty$$, and the Monte Carlo error of approximate Bayesian computation will blow up. Choosing a bandwidth that decays more slowly than $$O(n^{-1/2})$$ will also lead to a regime where the Monte Carlo error dominates, and can lead to a nonnegligible bias in the posterior mean that inflates the error. While the above results for a bandwidth that is $$O(n^{-1/2})$$ are positive in terms of point estimates, they are negative in terms of the calibration of the posterior. With such a bandwidth the posterior density of approximate Bayesian computation always over-inflates the parameter uncertainty: see Proposition 1 below and Theorem 2 of Frazier et al. (2016). The aim of this paper is to show that a variant of approximate Bayesian computation can yield inference that is both accurate in terms of point estimation, with its posterior mean having the same frequentist asymptotic variance as the true posterior mean given the summaries, and calibrated, in the sense that its posterior variance equals this asymptotic variance, when the bandwidth converges to zero at a rate slower than $$O(n^{-1/2})$$. This means that the acceptance probability of a good approximate Bayesian computation algorithm will tend to unity as $$n\rightarrow \infty$$. 2. Notation and set-up We denote the data by $$Y_{{\rm obs}}=(y_{{\rm obs},1},\ldots,y_{{\rm obs},n})$$, where $$n$$ is the sample size, and each observation, $$y_{{\rm obs},i}$$, can be of arbitrary dimension. Assume the data are modelled as a draw from a parametric density, $$f_{n}(y\mid\theta)$$, and consider asymptotics as $$n\rightarrow\infty$$. This density depends on an unknown parameter $$\theta\in\mathbb{R}^{p}$$. Let $$\mathscr{B}^p$$ be the Borel sigma-field on $$\mathbb{R}^p$$. We will let $$\theta_{0}$$ denote the true parameter value and $$\pi(\theta)$$ the prior distribution for the parameter. Denote the support of $$\pi(\theta)$$ by $$\mathcal{P}$$. Assume that a fixed-dimensional summary statistic $$s_n(Y)$$ is chosen and its density under our model is $$f_{n}(s\mid\theta)$$. The shorthand $$S_n$$ is used to denote the random variable with density $$f_{n}(s\mid\theta)$$. Often we will simplify notation and write $$s$$ and $$S$$ for $$s_n$$ and $$S_n$$ respectively. Let $$N(x;\mu,\Sigma)$$ be the normal density at $$x$$ with mean $$\mu$$ and variance $$\Sigma$$. Let $$A^{\rm c}$$ be the complement of a set $$A$$ with respect to the whole space. For a series $$x_{n}$$ we write $$x_{n}=\Theta(a_n)$$ if there exist constants $$m$$ and $$M$$ such that $$0<m<|x_{n}/a_{n}|<M<\infty$$ as $$n\rightarrow\infty$$. For a real function $$g(x)$$, denote its gradient function at $$x=x_{0}$$ by $$D_{x}g(x_{0})$$. To simplify notation, $$D_{\theta}$$ is written as $$D$$. Hereafter $$\varepsilon$$ is considered to depend on $$n$$, so the notation $$\varepsilon_n$$ is used. The conditions of the theoretical results are stated below. Condition 1. There exists some $$\delta_{0}>0$$ such that $$\mathcal{P}_{0}=\{\theta:|\theta-\theta_{0}|<\delta_{0}\}\subset\mathcal{P}$$, $$\pi(\theta)\in C^{2}(\mathcal{P}_{0})$$ and $$\pi(\theta_{0})>0$$. Condition 2. The kernel satisfies (i) $$\int vK(v)\,{\rm d}v=0$$; (ii) $$\int\prod_{k=1}^{l}v_{i_{k}}K(v)\,{\rm d}v<\infty$$ for any coordinates $$(v_{i_{1}},\ldots,v_{i_{l}})$$ of $$v$$ and $$l\leqslant p+6$$; (iii) $$K(v)\propto\bar{K}(\|v\|_{\Lambda}^{2})$$ where $$\|v\|_{\Lambda}^{2}=v^{{ \mathrm{\scriptscriptstyle T} }}\Lambda v$$ and $$\Lambda$$ is a positive-definite matrix, and $$K(v)$$ is a decreasing function of $$\|v\|_{\Lambda}$$; (iv) $$K(v)=O\{\exp(-c_1\|v\|^{\alpha_1})\}$$ for some $$\alpha_{1}>0$$ and $$c_{1}>0$$ as $$\|v\|\rightarrow\infty$$. Condition 3. There exists a sequence $$a_{n}$$ satisfying $$a_{n}\rightarrow\infty$$ as $$n\rightarrow\infty$$, a $$d$$-dimensional vector $$s(\theta)$$ and a $$d\times d$$ matrix $$A(\theta)$$ such that for all $$\theta\in\mathcal{P}_{0}$$, \[ a_{n}\{S_{n}-s(\theta)\}\rightarrow N\{0,A(\theta)\},\quad n\rightarrow\infty \] in distribution. We also assume that $$s_{\rm obs}\rightarrow s(\theta_0)$$ in probability. Furthermore, (i) $$s(\theta)$$ and $$A(\theta)\in C^{1}(\mathcal{P}_{0})$$, and $$A(\theta)$$ is positive definite for any $$\theta$$; (ii) for any $$\delta>0$$ there exists $$\delta'>0$$ such that $$\|s(\theta)-s(\theta_0)\|>\delta'$$ for all $$\theta$$ satisfying $$\|\theta-\theta_0\|>\delta$$; and (iii) $$I(\theta)=Ds(\theta)^{{ \mathrm{\scriptscriptstyle T} }}A^{-1}(\theta)Ds(\theta)$$ has full rank at $$\theta=\theta_{0}$$. Let $$\tilde{f}_{n}(s\mid\theta)=N\{s;s(\theta),A(\theta)/a_{n}^{2}\}$$ be the density of the normal approximation to $$S$$ and introduce the standardized random variable $$W_{n}(s)=a_{n}A(\theta)^{-1/2}\{S-s(\theta)\}$$. We further let $$f_{W_n}(w\mid \theta)$$ and $$\tilde{f}_{W_n}(w\mid \theta)$$ be the densities for $$W_n$$ under the true model for $$S$$ and under our normal approximation to the model for $$S$$. Condition 4. There exists $$\alpha_{n}$$ satisfying $$\alpha_{n}/a_{n}^{2/5}\rightarrow\infty$$ and a density $$r_{\rm max}(w)$$ satisfying Condition 2(ii) and (iii), where $$K(v)$$ is replaced with $$r_{\rm max}(w)$$, such that $$\sup_{\theta\in\mathcal{P}_{0}}\alpha_{n}\big|f_{W_{n}}(w\mid\theta)-\tilde{f}_{W_{n}}(w\mid\theta)\big|\leqslant c_{3}r_{\rm max}(w)$$ for some positive constant $$c_{3}$$. Condition 5. The following statements hold: (i) $$r_{\rm max}(w)$$ satisfies Condition 2(iv); and (ii) $$\sup_{\theta\in\mathcal{P}_{0}^{\rm c}}f_{W_{n}}(w\mid\theta)=O\{\exp(-c_2\|w\|^{\alpha_2})\}$$ as $$\|w\|\rightarrow\infty$$ for some positive constants $$c_{2}$$ and $$\alpha_{2}$$, and $$A(\theta)$$ is bounded in $$\mathcal{P}$$. Conditions 1–5 are from Li & Fearnhead (2018). Condition 2 is a requirement for the kernel function and is satisfied by all commonly used kernels, such as any kernel with compact support or the Gaussian kernel. Condition 3 assumes a central limit theorem for the summary statistic with rate $$a_{n}$$ and, roughly speaking, requires the summary statistic to accumulate information when $$n$$ gets large. This is a natural assumption, since many common summary statistics are sample moments, proportions, quantiles and autocorrelations, for which a central limit theorem would apply. It is also possible to verify the asymptotic normality of auxiliary model-based or composite likelihood-based summary statistics (Drovandi et al., 2015; Ruli et al., 2016) by referring to the rich literature on asymptotic properties of quasi maximum likelihood estimators (Varin et al., 2011) or quasi-posterior estimators (Chernozhukov & Hong, 2003). This assumption does not cover ancillary statistics, using the full data as a summary statistic, or statistics based on distances, such as an asymptotically chi-square-distributed test statistic. Condition 4 assumes that, in a neighbourhood of $$\theta_{0}$$, $$f_{n}(s\mid\theta)$$ deviates from the leading term of its Edgeworth expansion by a rate $$a_{n}^{-2/5}$$. This is weaker than the standard requirement, $$o(a_n^{-1})$$, for the remainder from Edgeworth expansion. It also assumes that the deviation is uniform, which is not difficult to satisfy in a compact neighbourhood. Condition 5 further assumes that $$f_{n}(s\mid\theta)$$ has exponentially decreasing tails with rate uniform in the support of $$\pi(\theta)$$. This implies that posterior moments from approximate Bayesian computation are dominated by integrals in the neighbourhood of $$\theta_0$$ and have leading terms with concise expressions. With Condition 5 weakened, the requirement of $$\varepsilon_n$$ for the proper convergence to hold might depend on the specific tail behaviour of $$f_{n}(s\mid\theta)$$. Additionally, for the results regarding regression adjustment, the following moments of the summary statistic are required to exist. Condition 5. The first two moments, $$\int_{\mathbb{R}^{d}}s f_{n}(s\mid\theta)\,{\rm d} s$$ and $$\int_{\mathbb{R}^{d}}ss^{{ \mathrm{\scriptscriptstyle T} }}f_{n}(s\mid\theta)\,{\rm d} s$$, exist. 3. Asymptotics of approximate Bayesian computation 3.1. Posterior First we consider the convergence of the posterior distribution of approximate Bayesian computation, denoted by $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ for $$A\in\mathscr{B}^p$$, as $$n\rightarrow\infty$$. The distribution function is a random function with the randomness due to $$s_{{\rm obs}}$$. We present two convergence results. One is the convergence of the posterior distribution function of a properly scaled and centred version of $$\theta$$; see Proposition 1. The other is the convergence of the posterior mean, a result which comes from Li & Fearnhead (2018) but, for convenience, is repeated as Proposition 2. The following proposition gives three different limiting forms for $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$, corresponding to different rates for how the bandwidth decreases relative to the rate of the central limit theorem in Condition 3. We summarize these competing rates by defining $$c_{\varepsilon}=\lim_{n\rightarrow\infty}a_{n}\varepsilon_{n}$$. Proposition 1. Assume Conditions 1–5. Let $$\theta_{\varepsilon}$$ denote the posterior mean of approximate Bayesian computation. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ then the following convergence holds, depending on the value of $$c_{\varepsilon}$$. (i) If $$c_{\varepsilon}=0$$ then \[ \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{a_n(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}\psi(t)\,{\rm d}t\right|\rightarrow0 \]in probability, where $$\psi(t)=N\{t;0,I(\theta_{0})^{-1}\}$$. (ii) If $$c_{\varepsilon}\in (0,\infty)$$ then for any $$A\in\mathscr{B}^p$$, \[ \Pi_{\varepsilon}\{a_n(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\psi(t)\,{\rm d}t \]in distribution, where \[ \psi(t)\propto \int_{\mathbb{R}^{d}}N[t;c_{\varepsilon}\beta_0\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)\,{\rm d}v,\quad \beta_{0}=I(\theta_{0})^{-1}Ds(\theta_{0})^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_{0})^{-1}, \]and $$G(v)$$ is a random density of $$v$$, with mean $$E_{G}(v)$$, which depends on $$c_{\varepsilon}$$ and $$Z\sim N(0,I_{d})$$. (iii) If $$c_{\varepsilon}=\infty$$ then \[ \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{\varepsilon_n^{-1}(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}\psi(t)\,{\rm d}t\right|\rightarrow 0 \]in probability, where $$\psi(t)\propto K\{Ds(\theta_{0})t\}$$. For a similar result, under different assumptions, see Theorem 2 of Frazier et al. (2016). See also Soubeyrand & Haon-Lasportes (2015) for related convergence results for the true posterior given the summaries for some specific choices of summary statistics. The explicit form of $$G(v)$$ is stated in the Supplementary Material. When we have the same number of summary statistics and parameters, $$d=p$$, the limiting distribution simplifies to \[ \psi(t)\propto \int_{\mathbb{R}^{d}}N\{Ds(\theta_0)t;c_{\varepsilon}v,A(\theta_{0})\}K(v)\,{\rm d}v\text{.} \] The more complicated form in Proposition 1(ii) above arises from the need to project the summary statistics onto the parameter space. The limiting distribution may depend on the value of the summary statistic, $$s_{\rm obs}$$, in the space orthogonal to $$Ds(\theta_0)^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_0)^{-1/2}$$. Hence the limit depends on a random quantity, $$Z$$, which can be interpreted as the noise in $$s_{\rm obs}$$. The main difference between the three convergence results is the form of the limiting density $$\psi(t)$$ for the scaled random variable $$a_{n,\varepsilon}(\theta-\theta_{\varepsilon})$$, where $$a_{n,\varepsilon}=a_{n}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}+\varepsilon_{n}^{-1}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}=\infty}$$. For case (i) the bandwidth is sufficiently small that the approximation in approximate Bayesian computation due to accepting summaries close to the observed summary is asymptotically negligible. The asymptotic posterior distribution is Gaussian, and equals the limit of the true posterior for $$\theta$$ given the summary. For case (iii) the bandwidth is sufficiently big that this approximation dominates and the asymptotic posterior distribution of approximate Bayesian computation is determined by the kernel. For case (ii) the approximation is of the same order as the uncertainty in $$\theta$$, which leads to an asymptotic posterior distribution that is a convolution of a Gaussian distribution and the kernel. Since the limit distributions of cases (i) and (iii) are nonrandom in the space $$L^1(\mathbb{R}^p)$$, the weak convergence is strengthened to convergence in probability in $$L^1(\mathbb{R}^p)$$. See the proof in the Appendix. Proposition 2. (Theorem 3.1 of Li & Fearnhead, 2018). Assume the conditions of Proposition 1. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, then $$a_{n}(\theta_{\varepsilon}-\theta_{0})\rightarrow N\{0,I_{\rm ABC}^{-1}(\theta_{0})\}$$ in distribution. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ or $$d=p$$ or the covariance matrix of the kernel is proportional to $$A(\theta_{0})$$, then $$I_{\rm ABC}(\theta_{0})=I(\theta_{0})$$. For other cases, $$I(\theta_{0})-I_{\rm ABC}(\theta_{0})$$ is semipositive definite. Proposition 2 helps us to compare the frequentist variability in the posterior mean of approximate Bayesian computation with the asymptotic posterior distribution given in Proposition 1. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ then the posterior distribution is asymptotically normal with variance matrix $$a_{n}^{-2}I(\theta_{0})^{-1}$$, and the posterior mean is also asymptotically normal with the same variance matrix. These results are identical to those we would get for the true posterior and posterior mean given the summary. For an $$\varepsilon_{n}$$ which is of the same order as $$a_{n}^{-1}$$, the uncertainty in approximate Bayesian computation has rate $$a_{n}^{-1}$$. However, the limiting posterior distribution, which is a convolution of the true limiting posterior given the summary with the kernel, will overestimate the uncertainty by a constant factor. If $$\varepsilon_{n}$$ decreases slower than $$a_{n}^{-1}$$, the posterior contracts at a rate $$\varepsilon_{n}$$, and thus will overestimate the actual uncertainty by a factor that diverges as $$n\rightarrow \infty$$. In summary, it is much easier to get approximate Bayesian computation to accurately estimate the posterior mean. This is possible with $$\varepsilon_{n}$$ as large as $$o(a_{n}^{-3/5})$$ if the dimension of the summary statistic equals that of the parameter. However, accurately estimating the posterior variance, or getting the posterior to accurately reflect the uncertainty in the parameter, is much harder. As commented in § 1, this is only possible for values of $$\varepsilon_n$$ for which the acceptance probability in a standard algorithm will go to zero as $$n$$ increases. In this case the Monte Carlo sample size, and hence the computational cost, of approximate Bayesian computation will have to increase substantially with $$n$$. As one application of our theoretical results, consider observations that are independent and identically distributed from a parametric density $$f(\cdot\mid\theta)$$. One approach to constructing the summary statistics is to use the score vector of some tractable approximating auxiliary model evaluated at the maximum auxiliary likelihood estimator (Drovandi et al., 2015). Ruli et al. (2016) construct an auxiliary model from a composite likelihood, so the auxiliary likelihood for a single observation is $$\prod_{i\in\mathscr{I}}f(y\in A_i\mid\theta)$$ where $$\{A_i: i\in\mathscr{I}\}$$ is a set of marginal or conditional events for $$y$$. Denote the auxiliary score vector for a single observation by $$cl_{\theta}(\cdot\mid\theta)$$ and the maximum auxiliary likelihood estimator for our dataset by $$\hat \theta_{\rm cl}$$. Then the summary statistic, $$s$$, for any pseudo-dataset $$\{y_1,\ldots,y_n\}$$ is $$\sum_{j=1}^n cl_{\theta}(y_j\mid\hat \theta_{\rm cl})/n$$. For $$y\sim f(\cdot\mid\theta)$$, assume that the first two moments of $$cl_{\theta}(y\mid\theta_0)$$ exist and $$cl_{\theta}(y\mid\theta)$$ is differentiable at $$\theta$$. Let $$H(\theta)=E_{\theta}\{\partial cl_{\theta}(y\mid\theta_{0})/\partial\theta\}$$ and $$J(\theta)=\mbox{var}_{\theta}\{cl_{\theta}(y\mid\theta_{0})\}$$. Then if $$\hat \theta_{\rm cl}$$ is consistent for $$\theta_0$$, Condition 3 is satisfied with \begin{align*} n^{1/2}[S-E_{\theta}\{cl_{\theta}(Y\mid\theta_0)\}]\rightarrow N\{0,J(\theta)\},\quad n\rightarrow\infty \end{align*} in distribution, and with $$I(\theta_0)=H(\theta_0)^{{ \mathrm{\scriptscriptstyle T} }}J(\theta_0)^{-1}H(\theta_0)$$. Our results show that the posterior mean of approximate Bayesian computation, using $$\varepsilon_n=O(n^{-1/2})$$, will have asymptotic variance $$I(\theta_0)^{-1}/n$$. This is identical to the asymptotic variance of the maximum composite likelihood estimator (Varin et al., 2011). Furthermore, the posterior variance will overestimate this just by a constant factor. As we show below, using the regression correction of Beaumont et al. (2002) will correct this overestimation and produce a posterior that correctly quantifies the uncertainty in our estimates. An alternative approach to constructing an approximate posterior using composite likelihood is to use the product of the prior and the composite likelihood. In general, this leads to a poorly calibrated posterior density which substantially underestimates uncertainty (Ribatet et al., 2012). Adjustment of the composite likelihood is needed to obtain calibration, but this involves estimation of the curvature and the variance of the composite score (Pauli et al., 2011). Empirical evidence that approximate Bayesian computation more accurately quantifies uncertainty than alternative composite-based posteriors is given in Ruli et al. (2016). 3.2. Regression-adjusted approximate Bayesian computation The regression adjustment of Beaumont et al. (2002) involves post-processing the output of approximate Bayesian computation to try to improve the resulting approximation to the true posterior. Below we will denote a sample from the posterior of approximate Bayesian computation by $$\{(\theta_{i},s_{i})\}_{i=1,\ldots,N}$$. Under the regression adjustment, we obtain a new posterior sample by using $$\{\theta_{i}-\hat \beta _{\varepsilon}(s_{i}-s_{{\rm obs}})\}_{i=1,\ldots,N}$$ where $$\hat \beta _{\varepsilon}$$ is the least squares estimate of the coefficient matrix in the linear model \begin{align*} \theta_{i} & =\alpha+\beta(s_{i}-s_{{\rm obs}})+e_{i} \quad (i=1,\ldots,N), \end{align*} where $$e_{i}$$ are independent and identically distributed errors. We can view the adjusted sample as follows. Define a constant $$\alpha_{\varepsilon}$$ and a vector $$\beta_{\varepsilon}$$ as \[ (\alpha_{\varepsilon},\beta_{\varepsilon})=\underset{\alpha,\beta}{\arg\min}\ E_{\varepsilon}[\|\theta-\alpha-\beta(s-s_{{\rm obs}})\|^{2}\mid s_{{\rm obs}}], \] where expectation is with respect to the joint posterior distribution of $$(\theta,s)$$ given by approximate Bayesian computation. Then the ideal adjusted posterior is the distribution of $$\theta^*=\theta-\beta_{\varepsilon}(s-s_{{\rm obs}})$$ where $$(\theta,s) \sim \pi_{\varepsilon}(\theta,s)$$. The density of $$\theta^*$$ is \[ \pi_{\varepsilon}^*(\theta^{*}\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\pi_{\varepsilon}\{\theta^{*}+\beta_{\varepsilon}(s-s_{{\rm obs}}),s \mid s_{{\rm obs}}\}\,{\rm d}s \] and the sample we get from regression-adjusted approximate Bayesian computation is a draw from $$\pi_{\varepsilon}^*(\theta^{*}\mid s_{{\rm obs}})$$ but with $$\beta_{\varepsilon}$$ replaced by its estimator. The variance of $$\pi^*_{\varepsilon}(\theta^{*}\mid s_{{\rm obs}})$$ is strictly smaller than that of $$\pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$ provided $$s$$ is correlated with $$\theta$$. The following results, which are analogous to Propositions 1 and 2, show that this reduction in variation is by the correct amount to make the resulting adjusted posterior correctly quantify the posterior uncertainty. Theorem 1. Assume Conditions 1–6. Denote the mean of $$\pi^*_{\varepsilon}(\theta^{*}\mid s_{{\rm obs}})$$ by $$\theta_{\varepsilon}^{*}$$. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, \begin{align*} & \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t\right|\rightarrow 0 \end{align*}in probability, and $$a_{n}({{\theta_{\varepsilon}^{*}}}-\theta_{0})\rightarrow N\{0,I(\theta_{0})^{-1}\}$$ in distribution. Moreover, if $$\beta_{\varepsilon}$$ is replaced by $$\tilde \beta _{\varepsilon}$$ satisfying $$a_n\varepsilon_n(\tilde \beta _{\varepsilon}-\beta_{\varepsilon})=o_{\rm p}(1)$$, the above results still hold. The limit of the regression-adjusted posterior distribution is the true posterior given the summary provided $$\varepsilon_{n}$$ is $$o(a_{n}^{-3/5})$$. This is a slower rate than that at which the posterior contracts, which, as we will show in the next section, has important consequences in terms of the computational efficiency of approximate Bayesian computation. The regression adjustment corrects for both the additional noise of the posterior mean when $$d>p$$ and the overestimated uncertainty of the posterior. This correction comes from the removal of the first-order bias caused by $$\varepsilon$$. Blum (2010) shows that the regression adjustment reduces the bias of approximate Bayesian computation when $$E(\theta\mid s)$$ is linear and the residuals $$\theta-E(\theta\mid s)$$ are homoscedastic. Our results do not require these assumptions, and suggest that the regression adjustment should be applied routinely with approximate Bayesian computation provided the coefficients $$\beta_{\varepsilon}$$ can be estimated accurately. With the simulated sample, $$\beta_{\varepsilon}$$ is estimated by $$\hat \beta _{\varepsilon}$$. The accuracy of $$\hat \beta _{\varepsilon}$$ can be seen by the following decomposition, \begin{align*} \hat \beta _{\varepsilon} & =\mbox{cov}_N(s,\theta)\mbox{var}_N(s)^{-1}\\ & =\beta_{\varepsilon}+\frac{1}{a_{n}\varepsilon_{n}}\mbox{cov}_N\left\{\frac{s-s_{\varepsilon}}{\varepsilon_{n}},a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\right\} \mbox{var}_N\left(\frac{s-s_{\varepsilon}}{\varepsilon_{n}}\right)^{-1}, \end{align*} where $$\mbox{cov}_N$$ and $$\mbox{var}_N$$ are the sample covariance and variance matrices, and $$s_\epsilon$$ is the sample mean. Since $$\mbox{cov}(s,\theta^{*})=0$$ and the distributions of $$s-s_{\varepsilon}$$ and $$\theta^{*}-\theta_{\varepsilon}^{*}$$ contract at rates $$\varepsilon_n$$ and $$a_n^{-1}$$ respectively, the error $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}$$ can be shown to have the rate $$O_{\rm p}\{(a_{n}\varepsilon_{n})^{-1}N^{-1/2}\}$$ as $$n\rightarrow\infty$$ and $$N\rightarrow\infty$$. We omit the proof, since it is tedious and similar to the proof of the asymptotic expansion of $$\beta_{\varepsilon}$$ in Lemma A4. Thus, if $$N$$ increases to infinity with $$n$$, $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}$$ will be $$o_{\rm p}\{(a_n\varepsilon_n)^{-1}\}$$ and the convergence of Theorem 1 will hold instead. Alternatively, we can get an idea of the additional error for large $$N$$ from the following proposition. Proposition 3. Assume Conditions 1–6. Consider $$\theta^{*}=\theta-\hat \beta _{\varepsilon}(s-s_{{\rm obs}})$$. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ and $$N$$ is large enough, for any $$A\in\mathscr{B}^{p}$$, \begin{align*} \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\psi(t)\,{\rm d}t \end{align*}in distribution, where \begin{align*} \psi(t)\propto & \int_{\mathbb{R}^{d}}N\left[t;\frac{\eta}{N^{1/2}}\{v-E_{G}(v)\},I(\theta_{0})^{-1}\right]G(v)\,{\rm d}v \end{align*}when $$c_{\varepsilon}<\infty$$, \begin{align*} \psi(t)\propto & \int_{\mathbb{R}^{p}}N\left\{ t;\frac{\eta}{N^{1/2}}Ds(\theta_{0})t',I(\theta_{0})^{-1}\right\} K\{Ds(\theta_{0})t'\}\,{\rm d}t' \end{align*}when $$c_{\varepsilon}=\infty$$, and $$\eta=O_{\rm p}(1)$$. The limiting distribution here can be viewed as the convolution of the limiting distribution obtained when the optimal coefficients are used and that of a random variable which relates to the error in our estimate of $$\beta_{\varepsilon}$$, and that is $$O_{\rm p}(N^{-1/2})$$. 3.3. Acceptance rates when $$\varepsilon$$ is negligible Finally we present results for the acceptance probability of approximate Bayesian computation, the quantity that is central to the computational cost of importance sampling or Markov chain Monte Carlo-based algorithms. We consider a set-up where we propose the parameter value from a location-scale family. That is, we can write the proposal density as the density of a random variable, $$\sigma_{n}X+\mu_{n}$$, where $$X\sim q(\cdot)$$, $$E(X)=0$$, and $$\sigma_n$$ and $$\mu_n$$ are constants that can depend on $$n$$. The average acceptance probability $$p_{{\rm acc},q}$$ would then be \begin{align*} & \int_{\mathcal{P}\times\mathbb{R}^{d}}q_{n}(\theta)f_{n}(s\mid\theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}\,{\rm d} s\, {\rm d}\theta, \end{align*} where $$q_{n}(\theta)$$ is the density of $$\sigma_{n}X+\mu_{n}$$. This covers the proposal distribution in fundamental sampling algorithms, including the random-walk Metropolis algorithm and importance sampling with a unimodal proposal distribution, and serves as the building block for many advanced algorithms where the proposal distribution is a mixture of distributions from location-scale families, such as iterative importance sampling-type algorithms. We further assume that $$\sigma_{n}(\mu_{n}-\theta_{0})=O_{\rm p}(1)$$, which means $$\theta_{0}$$ is in the coverage of $$q_{n}(\theta)$$. This is a natural requirement for any good proposal distribution. The prior distribution and $$\theta_{0}$$ as a point mass are included in this proposal family. This condition would also apply to many Markov chain Monte Carlo implementations of approximate Bayesian computation after convergence. As above, define $$a_{n,\varepsilon}=a_{n}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}+\varepsilon_{n}^{-1}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}=\infty}$$ to be the smaller of $$a_{n}$$ and $$\varepsilon_{n}^{-1}$$. Asymptotic results for $$p_{{\rm acc},q}$$ when $$\sigma_{n}$$ has the same rate as $$a_{n,\varepsilon}^{-1}$$ are given in Li & Fearnhead (2018). Here we extend those results to other regimes. Theorem 2. Assume the conditions of Proposition 1. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-1/2})$$: (i) if $$c_{\varepsilon}=0$$ or $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow\infty$$, then $$p_{{\rm acc},q}\rightarrow0$$ in probability; (ii) if $$c_{\varepsilon}\in(0,\infty)$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow r_{1}\in[0,\infty)$$, or $$c_{\varepsilon}=\infty$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow r_{1}\in(0,\infty)$$, then $$p_{{\rm acc},q}=\Theta_{\rm p}(1)$$; (iii) if $$c_{\varepsilon}=\infty$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow0$$, then $$p_{{\rm acc},q}\rightarrow1$$ in probability. The proof of Theorem 2 can be found in the Supplementary Material. The underlying intuition is as follows. For the summary statistic, $$s_{}$$, sampled with parameter value $$\theta$$, the acceptance probability depends on \begin{align} \frac{s_{}-s_{{\rm obs}}}{\varepsilon_{n}} & =\frac{1}{\varepsilon_{n}}[\{s_{}-s(\theta)\}+\{s(\theta)-s(\theta_{0})\}+\{s(\theta_{0})-s_{{\rm obs}}\}], \end{align} (3) where $$s(\theta)$$ is the limit of $$s_{}$$ in Condition 3. The distance between $$s_{}$$ and $$s_{{\rm obs}}$$ is at least $$O_{\rm p}(a_{n}^{-1})$$, since the first and third bracketed terms are $$O_{\rm p}(a_{n}^{-1})$$. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ then, regardless of the value of $$\theta$$, (3) will blow up as $$n\rightarrow\infty$$ and hence $$p_{{\rm acc},q}$$ goes to $$0$$. If $$\varepsilon_{n}$$ decreases at a rate slower than $$a_{n}^{-1}$$, (3) will go to zero providing we have a proposal which ensures that the middle term is $$o_{\rm p}(\varepsilon_{n})$$, and hence $$p_{{\rm acc},q}$$ goes to unity. Theorem 1 shows that, without the regression adjustment, approximate Bayesian computation requires $$\varepsilon_{n}$$ to be $$o(a_{n}^{-1})$$ if its posterior is to converge to the true posterior given the summary. In this case Theorem 2 shows that the acceptance rate will degenerate to zero as $$n\rightarrow\infty$$ regardless of the choice of $$q(\cdot)$$. On the other hand, with the regression adjustment, we can choose $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ and still have convergence to the true posterior given the summary. For such a choice, if our proposal density satisfies $$\sigma_n=o(\varepsilon_{n})$$, the acceptance rate will go to unity as $$n\rightarrow\infty$$. 4. Numerical example Here we illustrate the gain in computational efficiency from using the regression adjustment on the $$g$$-and-$$k$$ distribution, a popular model for testing approximate Bayesian computation methods (e.g., Fearnhead & Prangle, 2012; Marin et al., 2014). The data are independent and identically distributed from a distribution defined by its quantile function, \begin{align*} F^{-1}(x;\alpha,\beta,\gamma,\kappa)&= \alpha+\beta\left[1+0.8\frac{1-\exp\{-\gamma z(x)\}}{1+\exp\{-\gamma z(x)\}}\right]\{1+z(x)^{2}\}^{\kappa}z(x),\quad x\in[0,1], \end{align*} where $$\alpha$$ and $$\beta$$ are location and scale parameters, $$\gamma$$ and $$\kappa$$ are related to the skewness and kurtosis of the distribution, and $$z(x)$$ is the corresponding quantile of a standard normal distribution. No closed form is available for the density, but simulating from the model is straightforward by transforming realizations from the standard normal distribution. In the following we assume that the parameter vector $$(\alpha,\beta,\gamma,\kappa)$$ has a uniform prior in $$[0,10]^{4}$$ and multiple datasets are generated from the model with $$(\alpha,\beta,\gamma,\kappa)=(3,1,2,0.5)$$. To illustrate the asymptotic behaviour of approximate Bayesian computation, $$50$$ datasets are generated for each of a set of values of $$n$$ ranging from $$500$$ to $$10\,000$$. Consider estimating the posterior means, denoted by $$\mu=(\mu_{1},\ldots,\mu_{4})$$, and standard deviations, denoted by $$\sigma=(\sigma_{1},\ldots,\sigma_{4})$$, of the parameters. The summary statistic is a set of evenly spaced quantiles of dimension $$19$$. The bandwidth is chosen via fixing the proportion of the Monte Carlo sample to be accepted, and the accepted proportions needed to achieve a certain approximation accuracy for estimates with and without the adjustment are compared. A higher proportion means more simulated parameter values can be kept for inference. The accuracy is measured by the average relative errors of estimating $$\mu$$ or $$\sigma$$, \begin{align*} \small{\text{RE}}_{\mu}=\frac{1}{4}\sum_{k=1}^{4}\frac{\left|\hat{\mu}_{k}-\mu_{k}\right|}{\mu_{k}},\quad \small{\text{RE}}_{\sigma}=\frac{1}{4}\sum_{k=1}^{4}\frac{\left|\hat{\sigma}_{k}-\sigma_{k}\right|}{\sigma_{k}}, \end{align*} for estimators $$\hat \mu=(\hat \mu_1,\ldots,\hat \mu_4)$$ and $$\hat{\sigma}=(\hat{\sigma}_1,\ldots,\hat{\sigma}_4)$$. The proposal distribution is normal, with the covariance matrix selected to inflate the posterior covariance matrix by a constant factor $$c^{2}$$ and the mean vector selected to differ from the posterior mean by half of the posterior standard deviation, which avoids the case that the posterior mean can be estimated trivially. We consider a series of increasing $$c$$ values in order to investigate the impact of the proposal distribution getting worse. Figure 1 shows that the required acceptance rate for the regression-adjusted estimates is higher than that for the unadjusted estimates in almost all cases. For estimating the posterior mean, the improvement is small. For estimating the posterior standard deviations, the improvement is much larger. To achieve each level of accuracy, the acceptance rates of the unadjusted estimates are all close to zero. Those of the regression-adjusted estimates are higher by up to two orders of magnitude, so the Monte Carlo sample size needed to achieve the same accuracy can be reduced correspondingly. Fig. 1. View largeDownload slide Acceptance rates required for different degrees of accuracy of approximate Bayesian computation and different variances of the proposal distribution, which are proportional to $$c$$. In each panel we show results for standard (grey) and regression-adjusted (black) approximate Bayesian computation with each method using three values of $$n: n=500$$ (dotted-grey/dotted-black), $$n=3000$$ (dashed-grey/dashed-black) and $$n=10000$$ (solid-grey/solid-black). When dotted or dashed lines overlap with solid lines, only solid lines appear. The averages over $$50$$ datasets (thick) and their $$95\%$$ confidence intervals (thin) are reported. Results are for a relative error of $$0.08$$ and $$0.05$$ in the posterior mean, in (a) and (b) respectively, and for a relative error of $$0.2$$ and $$0.1$$ in the posterior standard deviation, in (c) and (d) respectively. Fig. 1. View largeDownload slide Acceptance rates required for different degrees of accuracy of approximate Bayesian computation and different variances of the proposal distribution, which are proportional to $$c$$. In each panel we show results for standard (grey) and regression-adjusted (black) approximate Bayesian computation with each method using three values of $$n: n=500$$ (dotted-grey/dotted-black), $$n=3000$$ (dashed-grey/dashed-black) and $$n=10000$$ (solid-grey/solid-black). When dotted or dashed lines overlap with solid lines, only solid lines appear. The averages over $$50$$ datasets (thick) and their $$95\%$$ confidence intervals (thin) are reported. Results are for a relative error of $$0.08$$ and $$0.05$$ in the posterior mean, in (a) and (b) respectively, and for a relative error of $$0.2$$ and $$0.1$$ in the posterior standard deviation, in (c) and (d) respectively. 5. Discussion One way to implement approximate Bayesian computation so that the acceptance probability tends to unity as $$n$$ increases is to use importance sampling with a suitable proposal from a location-scale family. The key difficulty with finding a suitable proposal is to ensure that the location parameter is close to the true parameter, where close means the distance is $$O(\varepsilon_n)$$. This can be achieved by performing a preliminary analysis of the data, and using the point estimate of the parameter from this preliminary analysis as the location parameter (Beaumont et al., 2009; Li & Fearnhead, 2018). Acknowledgement This work was funded by the U.K. Engineering and Physical Sciences Research Council, under the i-like programme grant. Supplementary material Supplementary material available at Biometrika online includes proofs of the lemmas and Theorem 2. Appendix Proof of the result from § 3.1 Throughout the data are considered to be random. For any integer $$l>0$$ and a set $$A\subset\mathbb{R}^{l}$$, we use the convention that $$cA+x$$ denotes the set $$\{ct+x:t\in A\}$$ for $$c\in\mathbb{R}$$ and $$x\in\mathbb{R}^{l}$$. For a nonnegative function $$h(x)$$, integrable in $$\mathbb{R}^{l}$$, denote the normalized function $$h(x)/\int_{\mathbb{R}^{l}}h(x)\,{\rm d}x$$ by $$h(x)^{({\rm norm})}$$. For a vector $$x$$, denote a general polynomial of elements of $$x$$ with degree up to $$l$$ by $$P_{l}(x)$$. For any fixed $$\delta<\delta_{0}$$, let $$B_{\delta}$$ be the neighbourhood $$\{\theta:\|\theta-\theta_{0}\|<\delta\}$$. Let $$\pi_{\delta}(\theta)$$ be $$\pi(\theta)$$ truncated in $$B_{\delta}$$. Recall that $$\tilde f_{n}(s\mid\theta)$$ denotes the normal density with mean $$s(\theta)$$ and covariance matrix $$A(\theta)/a_{n}^{2}$$. Let $$t(\theta)=a_{n,\varepsilon}(\theta-\theta_{0})$$ and $$v(s)=\varepsilon_{n}^{-1}(s-s_{{\rm obs}})$$, rescaled versions $$\theta$$ and $$s$$. For any $$A\in\mathscr{B}^{p}$$, let $$t(A)$$ be the set $$\{\phi:\phi=t(\theta)\text{ for some }\theta\in A\}$$. Define $$\tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ to be the normal counterpart of $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ with truncated prior, obtained by replacing $$\pi(\theta)$$ and $$f_{n}(s\mid\theta)$$ in $$\Pi_{\varepsilon}$$ by $$\pi_{\delta}(\theta)$$ and $$\tilde{f}_{n}(s\mid\theta)$$. So let \[ \tilde \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})=\frac{\pi_{\delta}(\theta)\tilde f_{n}(s\mid \theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}}{\int_{B_{\delta}}\int_{\mathbb{R}^{d}}\pi_{\delta}(\theta)\tilde f_{n}(s\mid \theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}\,{\rm d}\theta\, {\rm d}s} \] and $$\tilde \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\tilde \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})\,{\rm d}s$$, and let $$\tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ be the distribution function with density $$\tilde \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$. Denote the mean of $$\tilde \Pi_{\varepsilon}$$ by $$\tilde \theta_{\varepsilon}$$. Let $$W_{{\rm obs}}=a_{n}A(\theta_{0})^{-1/2}\{s_{{\rm obs}}-s(\theta_{0})\}$$ and $$\beta_{0}=I(\theta_{0})^{-1}Ds(\theta_{0})^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_{0})^{-1}$$. By Condition 3, $$W_{{\rm obs}}\rightarrow Z$$ in distribution as $$n\rightarrow\infty$$, where $$Z\sim N(0,I_{d})$$. Since the approximate Bayesian computation likelihood within $$\tilde \Pi_{\varepsilon}$$ is an incorrect model for $$s_{{\rm obs}}$$, standard posterior convergence results do not apply. However, if we condition on the value of the summary, $$s$$, then the distribution of $$\theta$$ is just the true posterior given $$s$$. Thus we can express the posterior from approximate Bayesian computation as a continuous mixture of these true posteriors. Let $$\tilde \pi_{\varepsilon,tv}(t,v)=a_{n,\varepsilon}^{-d}\pi_{\delta}(\theta_{0}+a_{n,\varepsilon}^{-1}t)\tilde f_{n}(s_{{\rm obs}}+\varepsilon_{n}v \mid \theta_{0}+a_{n,\varepsilon}^{-1}t)K(v)$$. For any $$A\in\mathscr{B}^{p}$$, we rewrite $$\tilde \Pi_{\varepsilon}$$ as \begin{equation} \tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+\varepsilon_{n}v)\tilde \pi_{\varepsilon,tv}(t,v)^{({\rm norm})}\,{\rm d}t\,{\rm d}v, \end{equation} (A1) where $$\tilde \Pi(\theta\in A\mid s)$$ is the posterior distribution with prior $$\pi_{\delta}(\theta)$$ and likelihood $$\tilde{f}_{n}(s\mid\theta)$$. Using results from Kleijn & van der Vaart (2012), the leading term of $$\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+\varepsilon_{n}v)$$ can be obtained and is stated in the following lemma. Lemma A1. Assume Conditions 3 and 4. If $$\varepsilon_{n}=O(a_{n}^{-1})$$, for any fixed $$v\in\mathbb{R}^{d}$$ and small enough $$\delta$$, \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A\mid s_{{\rm obs}}+\varepsilon_{n}v\}-\int_{A}N[t;\beta_{0}\{A(\theta_{0})^{1/2}W_{{\rm obs}}+c_{\varepsilon}v\},I(\theta_{0})^{-1}]\,{\rm d}t\right|\rightarrow0 \]in probability as $$n\rightarrow\infty$$. This leading term is Gaussian, but with a mean that depends on $$v$$. Thus asymptotically, the posterior of approximate Bayesian computation is the distribution of the sum of a Gaussian random variable and $$\beta_0c_{\varepsilon} V$$, where $$V$$ has density proportional to $$\int \tilde \pi_{\varepsilon,tv}(t,v)\,{\rm d}t$$. To make this argument rigorous, and to find the distribution of this sum of random variables, we need to introduce several functions that relate to the limit of $$\tilde \pi_{\varepsilon,tv}(t',v)$$. For a rank-$$p$$$$d\times p$$ matrix $$A$$, a rank-$$d$$$$d\times d$$ matrix $$B$$ and a $$d$$-dimensional vector $$c$$, define $$g(v;A,B,c)=\exp[-(c+Bv)^{{ \mathrm{\scriptscriptstyle T} }}\{I-A(A^{{ \mathrm{\scriptscriptstyle T} }}A)^{-1}A^{{ \mathrm{\scriptscriptstyle T} }}\}(c+Bv)/2]/(2\pi)^{(d-p)/2}$$. Let \[ g_{n}(t,v)=\begin{cases} N\left\{ Ds(\theta_{0})t;a_{n}\varepsilon_{n}v+A(\theta_{0})^{1/2}W_{{\rm obs}},A(\theta_{0})\right\} K(v), & \ c_{\varepsilon}<\infty,\\[4pt] N\left\{ Ds(\theta_{0})t;v+\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}},\frac{1}{a_{n}^{2}\varepsilon_{n}^{2}}A(\theta_{0})\right\} K(v), & \ c_{\varepsilon}=\infty, \end{cases} \] let $$G_{n}(v)$$ be $$g\{v;A(\theta_{0})^{-1/2}Ds(\theta_{0}),a_{n}\varepsilon_{n}A(\theta_{0})^{-1/2},W_{{\rm obs}}\}K(v)$$, and let $$E_{G_{n}}(\cdot)$$ be the expectation under the density $$G_{n}(v)^{({\rm norm})}$$. In both cases it is straightforward to show that $$\int_{\mathbb{R}^{p}}g_{n}(t,v)\,{\rm d}t=|A(\theta_{0})|^{-1/2}G_{n}(v)$$. Additionally, for the case $$c_{\varepsilon}=\infty$$, define $$v'(v,t)=A(\theta_{0})^{1/2}W_{{\rm obs}}+a_{n}\varepsilon_{n}v-a_{n}\varepsilon_{n}Ds(\theta_{0})t$$ and \begin{align*} g'_{n}(t,v')=N\{v';0,A(\theta_{0})\}K\left\{ Ds(\theta_{0})t+\frac{1}{a_{n}\varepsilon_{n}}v'-\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}}\right\} \text{.} \end{align*} Then, with the transformation $$v'=v'(v,t)$$, $$g_{n}(t,v)\,{\rm d}v=g_{n}'(t,v')\,{\rm d}v'$$. Let \begin{align*} g(t,v) & =\begin{cases} N\{Ds(\theta_{0})t;c_{\varepsilon}v+A(\theta_{0})^{1/2}Z,A(\theta_{0})\}K(v), & \ c_{\varepsilon}<\infty,\\ K\{Ds(\theta_{0})t\}N\{v;0,A(\theta_{0})\}, & \ c_{\varepsilon}=\infty, \end{cases} \end{align*} let $$G(v)$$ be $$g\{v;A(\theta_{0})^{-1/2}Ds(\theta_{0}),c_{\varepsilon}A(\theta_{0})^{-1/2},Z\}K(v)$$ and let $$E_{G}(\cdot)$$ be the expectation under the density $$G(v)^{({\rm norm})}$$. When $$c_{\varepsilon}<\infty$$, $$\int_{\mathbb{R}^{p}}g(t,v)\,{\rm d}t=|A(\theta_{0})|^{-1/2}G(v)$$. Expansions of $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \pi_{\varepsilon,tv}(t,v)\,{\rm d}t\,{\rm d}v$$ are given in the following lemma. Lemma A2. Assume Conditions 1–3. If $$\varepsilon_{n}=o(a_{n}^{-1/2})$$, then $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\left|\tilde \pi_{\varepsilon,tv}(t,v)-\pi(\theta_{0})g_{n}(t,v)\right|\,{\rm d}t\,{\rm d}v\rightarrow0$$ in probability and $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}g_{n}(t,v)\,{\rm d}t\,{\rm d}v=\Theta_{\rm p}(1)$$ as $$n\rightarrow\infty$$. Furthermore, for $$l\leqslant6$$, $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}P_{l}(v)g_{n}(t,v)\,{\rm d}t\,{\rm d}v$$ converges to $$|A(\theta_{0})|^{-1/2}\int_{\mathbb{R}^{d}}P_{l}(v)G(v)\,{\rm d}v$$ in distribution when $$c_{\varepsilon}<\infty$$ and converges to $$\int_{\mathbb{R}^{p}}P_{l}\{Ds(\theta_{0})t\}K\{Ds(\theta_{0})t\}\,{\rm d}t$$ in probability when $$c_{\varepsilon}=\infty$$, as $$n\rightarrow\infty$$. The following lemma states that $$\Pi_{\varepsilon}$$ and $$\tilde \Pi_{\varepsilon}$$ are asymptotically the same and gives an expansion of $$\tilde \theta_{\varepsilon}$$. Lemma A3. Assume Conditions 1–4. If $$\varepsilon_{n}=o(a_{n}^{-1/2})$$, then: (a) for any $$\delta<\delta_{0}$$, $$\Pi_{\varepsilon}(\theta\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ and $$\tilde \Pi_{\varepsilon}(\theta\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ are $$o_{\rm p}(1)$$; (b) there exists a $$\delta<\delta_0$$ such that $$\sup_{A\in\mathscr{B}^{p}}|\Pi_{\varepsilon}(\theta\in A\cap B_{\delta}\mid s_{{\rm obs}})-\tilde \Pi_{\varepsilon}(\theta\in A\cap B_{\delta}\mid s_{{\rm obs}})|=o_{\rm p}(1)$$; (c) if, in addition, Condition 5 holds, then $$a_{n,\varepsilon}(\theta_{\varepsilon}-\tilde \theta_{\varepsilon})=o_{\rm p}(1)$$, and $$\tilde \theta_{\varepsilon}=\theta_{0}+a_{n}^{-1}\beta_{0}A(\theta_{0})^{1/2}W_{{\rm obs}}+\varepsilon_{n}\beta_{0}E_{G_{n}}(v)+r_{n,1}$$ where the remainder $$r_{n,1}=o_{\rm p}(a_{n}^{-1})$$. Proof of Proposition 1. Lemma 5 in the Supplementary Material shows that $$\Pi_{\varepsilon}\{a_{n,\varepsilon}(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ and $$\tilde \Pi_{\varepsilon}\{a_{n,\varepsilon}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ have the same limit, in distribution when $$c_{\varepsilon}\in(0,\infty)$$ and in total variation form when $$c_{\varepsilon}=0$$ or $$\infty$$. Therefore it is sufficient to only consider the convergence of $$\tilde \Pi_{\varepsilon}$$ of the properly scaled and centred $$\theta$$. When $$a_{n}\varepsilon_{n}\rightarrow c_{\varepsilon}<\infty$$, according to (A1), $$\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ equals \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+\varepsilon_{n}v\}\tilde \pi_{\varepsilon,tv}(t',v)^{({\rm norm})}\:{\rm d}t'{\rm d}v\text{.} \] By Lemmas A1 and A3(c), we have \begin{align*} \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+\varepsilon_{n}v\}-\int_{A}N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \end{align*} where $$\mu_{n}(v)=\beta_{0}\{c_{\varepsilon}v-a_{n}\varepsilon_{n}E_{G_{n}}(v)\}-a_{n}r_{n,1}$$. Then with Lemma A2, the leading term of $$\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon}) \in A\mid s_{{\rm obs}}\}$$ equals \begin{align} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon}) \in A \mid s_{{\rm obs}}\}-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\,{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1)\text{.} \end{align} (A2) The numerator of the leading term of (A2) is in the form \begin{align*} \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;c_{\varepsilon}\beta_{0}v+x_{3},I(\theta_{0})^{-1}\}N\{Ds(\theta_{0})t';x_{1}v+x_{2},A(\theta_{0})\}K(v)\,{\rm d}t\,{\rm d}t'{\rm d}v, \end{align*} where $$x_{1}\in\mathbb{R}$$, $$x_{2}\in\mathbb{R}^{d}$$ and $$x_{3}\in\mathbb{R}^{p}$$. This is continuous by Lemma 4 in the Supplementary Material. Then since $$E_{G_{n}}(v)\rightarrow E_{G}(v)$$ in distribution as $$n\rightarrow\infty$$ by Lemma A2, we have \begin{eqnarray*} \int_{\mathbb{R}^{d}}\! \int_{t(B_{\delta})}\! \int_{A} \! N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v \rightarrow\int_{\mathbb{R}^{d}}\! \int_{\mathbb{R}^{p}}\! \int_{A}\! N[t;c_{\varepsilon}\beta_{0}\{v-E_{G}(v)\},I(\theta_{0})^{-1}]g(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v \end{eqnarray*} in distribution as $$n\rightarrow\infty$$. Putting the above results together, it follows that \[ \tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\int_{\mathbb{R}^{p}}N[t;c_{\varepsilon}\beta_{0}\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)^{({\rm norm})}\,{\rm d}v\,{\rm d}t \] in distribution, and statement (ii) of the proposition holds. When $$c_{\varepsilon}=0$$, since $$\mu_{n}(v)$$ does not depend on $$v$$, (A2) becomes \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;-a_{n}\varepsilon_{n}\beta_{0}E_{G_{n}}(v)-a_{n}r_{n,1},I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \] and by the continuous mapping theorem (van der Vaart, 2000), \begin{align*} & \int_{\mathbb{R}^{p}}\left|N\{t;-a_{n}\varepsilon_{n}\beta_{0}E_{G_{n}}(v)-a_{n}r_{n,1},I(\theta_{0})^{-1}\}-N\{t;0,I(\theta_{0})^{-1}\}\right|\,{\rm d}t=o_{\rm p}(1)\text{.} \end{align*} Therefore $$\sup_{A\in\mathscr{B}^{p}}|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t|=o_{\rm p}(1)$$, and statement (i) of the proposition holds. When $$c_{\varepsilon}=\infty$$, Lemma A1 cannot be applied to the posterior distribution within (A1) directly. With the transformation $$v''=a_{n}\varepsilon_{n}v$$, $$\tilde \Pi_{\varepsilon}\{\theta\in A\mid s_{{\rm obs}}\}$$ equals \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+a_{n}^{-1}v'')\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v'')^{({\rm norm})}\,{\rm d}t'{\rm d}v'', \] which implies that $$\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ equals \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi\{a_{n}(\theta-\theta_{0})\in a_{n}\varepsilon_{n}A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+a_{n}^{-1}v''\}\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v'')^{({\rm norm})}\:{\rm d}t'{\rm d}v''\text{.} \end{align*} Then Lemma A1 can be applied. Using Lemma A2 and transforming $$v''$$ back to $$v$$, we have \begin{align*} & \sup_{A\in\mathscr{B}^{p}}\left|\vphantom{\int_{\mathbb{R}^{d}}}\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\right.\nonumber \\ &\quad \left.-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}t;\mu_{n}'(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\:{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1), \end{align*} where $$\mu_{n}'(v)=\beta_{0}\{A(\theta_{0})^{1/2}W_{{\rm obs}}+a_{n}\varepsilon_{n}v\}-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})$$. Let $$t''(t,t')=a_{n}\varepsilon_{n}(t-t')+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})$$ and let $$t''(A,B_{\delta})$$ be the set $$\{t''(t,t'):t\in A,t'\in t(B_{\delta})\}$$. With the transformations $$v'=v'(v,t')$$ and $$t''=t''(t,t')$$, since $$\beta_{0}Ds(\theta_{0})=I_{p}$$, we have \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}t;\mu_{n}'(v),I(\theta_{0})^{-1}\}g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v\\ &\quad =\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}(t-t');\beta_{0}v'-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0}),I(\theta_{0})^{-1}\}g_{n}'(t',v')\,{\rm d}t\,{\rm d}t'{\rm d}v'\\ &\quad =\int_{\mathbb{R}^{d}}\int_{t''(A,B_{\delta})}\int_{A}N\{t'';\beta_{0}v'-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0}),I(\theta_{0})^{-1}\}g'_{n}\left[t-\frac{1}{a_{n}\varepsilon_{n}}\{t''-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\},v'\right]\,{\rm d}t\,{\rm d}t''{\rm d}v'\text{.} \end{align*} The idea now is that as $$n\rightarrow \infty$$, $$a_n\varepsilon_n\rightarrow\infty$$, so the $$g'_n$$ term in the integral will tend to $$g'_n(t,v')$$. Then, by integrating first with respect to $$t''$$ and then with respect to $$v$$, we get the required result. To make this argument rigorous, consider the following function: $$\matrix{ {\matrix{ {\int_{{\mathbb{R}}} {\int_{t''(A,{B_\delta })} {\int_A N } } \{ t'';{\beta _0}v',I{{({\theta _0})}^{ - 1}}\} N\{ v';0,A({\theta _0})\} } \hfill \cr { \times \left| {K\{ Ds({\theta _0})t + {x_1}v' - {x_2}t'' + {x_3}\} - K\{ Ds({\theta _0})t\} } \right|{\mkern 1mu} {\rm{d}}t{\mkern 1mu} {\rm{d}}t''{\rm{d}}v',} \hfill \cr } } \hfill \cr } $$ where $$x_{1}\in\mathbb{R}$$, $$x_{2}\in\mathbb{R}$$ and $$x_{3}\in\mathbb{R}^{d}$$. This is continuous by Lemma 4 in the Supplementary Material, so by the continuous mapping theorem, \begin{align*} \sup_{A\in\mathscr{B}^{p}}\left|\int_{\mathbb{R}^{d}}\int_{t''(A,B_{\delta})}\int_{A}N\{t'';\beta_{0}v',I(\theta_{0})^{-1}\}g'_{n}\left(t-\frac{1}{a_{n}\varepsilon_{n}}t'',v'\right)\,{\rm d}t\,{\rm d}t''{\rm d}v'-\int_{A}K\{Ds(\theta_{0})t\}\,{\rm d}t\right| & =o_{\rm p}(1)\text{.} \end{align*} Then, using Lemma A2, \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}K\{Ds(\theta_{0})t\}\, {\rm d}t/\int_{\mathbb{R}^{p}}K\{Ds(\theta_{0})t\}\,{\rm d}t\right|=o_{\rm p}(1)\text{.} \] Therefore statement (iii) of the proposition holds. □ Proof of the result from § 3.2 The intuition behind this result is that, as shown above, the joint posterior of $$(\theta,s)$$ under approximate Bayesian computation can be viewed as a marginal distribution for $$s$$ times a conditional for $$\theta$$ given $$s$$. The latter is just the true posterior for $$\theta$$ given $$s$$, and this posterior converges to a Gaussian limit that depends on $$s$$ only through its mean. Regression adjustment works because it corrects for the dependence of this mean on $$s$$, so if we work with the regression-adjusted parameter $$\theta^*$$ then the conditional distribution for $$\theta^*$$ given $$s$$ will tend to a Gaussian limit whose mean is the same for all $$s$$. The following lemma gives an expansion of the optimal linear coefficient matrix $$\beta_{\varepsilon}$$. The order of the remainder leads to successful removal of the first-order bias in the posterior distribution of approximate Bayesian computation. Lemma A4. Assume Conditions 1–6. Then if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}-\beta_{0})=o_{\rm p}(1)$$. The following lemma, similar to Lemma A3, says that the approximate Bayesian computation posterior distribution of $$\theta^{*}$$ is asymptotically the same as $$\tilde \Pi_{\varepsilon}$$. Recall that $$\theta_{\varepsilon}^{*}$$ is the mean of $$\pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})$$. Let $$\tilde \pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\tilde \pi_{\varepsilon}\{\theta^{*}+\beta_{\varepsilon}(s-s_{{\rm obs}}),s\mid s_{{\rm obs}}\}\,{\rm d}s$$ and let $$\tilde \theta_{\varepsilon}^{*}$$ be the mean of $$\tilde \pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})$$. Lemma A5. Assume Conditions 1–6. If $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, then: (a) for any $$\delta<\delta_{0}$$, $$\Pi_{\varepsilon}(\theta^{*}\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ and $$\tilde \Pi_{\varepsilon}(\theta^{*}\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ are $$o_{\rm p}(1)$$; (b) there exists $$\delta<\delta_0$$ such that $$\sup_{A\in\mathscr{B}^{p}}|\Pi_{\varepsilon}(\theta^{*}\,{\in}\, A\cap B_{\delta}\mid s_{{\rm obs}})\,{-}\,\tilde \Pi_{\varepsilon}(\theta^{*}\,{\in}\, A\cap B_{\delta}\mid s_{{\rm obs}})|=o_{\rm p}(1)$$; (c) $$a_{n}(\theta_{\varepsilon}^{*}-\tilde \theta_{\varepsilon}^{*})=o_{\rm p}(1)$$, and $$\tilde \theta_{\varepsilon}^{*}=\theta_{0}+a_{n}^{-1}\beta_{0}A(\theta_{0})^{1/2}W_{{\rm obs}}+\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})E_{G_{n}}(v)+r_{n,2}$$ where the remainder $$r_{n,2}=o_{\rm p}(a_{n}^{-1})$$. Proof of Theorem 2. Similar to the proof of Proposition 1, it is sufficient to only consider the convergence of $$\tilde \Pi_{\varepsilon}$$ of the properly scaled and centred $$\theta^*$$. Similar to (A1), \begin{align*} \tilde \Pi_{\varepsilon}(\theta^{*}\in A\mid s_{{\rm obs}}) & =\begin{cases} \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A+\varepsilon_{n}\beta_{\varepsilon}v\mid s_{{\rm obs}}+\varepsilon_{n}v)\tilde \pi_{\varepsilon,tv}(t',v)^{({\rm norm})}\,{\rm d}t'{\rm d}v, & \ c_{\varepsilon}<\infty,\\ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A+\varepsilon_{n}\beta_{\varepsilon}v\mid s_{{\rm obs}}+a_{n}^{-1}v)\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v)^{({\rm norm})}\,{\rm d}t'{\rm d}v, & \ c_{\varepsilon}=\infty\text{.} \end{cases} \end{align*} Similar to (A2), by Lemmas A1 and A5(c), we have \begin{align} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\mu_{n}^{*}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\,{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1), \end{align} (A3) where $$\mu_{n}^{*}(v)=\{a_{n}\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})+(c_{\varepsilon}-a_{n}\varepsilon_{n})\beta_{0}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}\}\{v-E_{G_{n}}(v)\}+a_{n}r_{n,2}$$. Since $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}-\beta_{0})=o_{\rm p}(1)$$, by Lemma 4 in the Supplementary Material and the continuous mapping theorem, \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{\mathbb{R}^{p}}\left|N\{t;\mu_{n}^{*}(v),I(\theta_{0})^{-1}\}-N\{t;0,I(\theta_{0})^{-1}\}\right|g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v=o_{\rm p}(1)\text{.} \] Then we have \begin{align*} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \end{align*} and the first convergence in the theorem holds. The second convergence in the theorem holds by Lemma A5(c). Since the only requirement for $$\beta_{\varepsilon}$$ in the above is $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}^{'}-\beta_{\varepsilon})=o_{\rm p}(1)$$, the above arguments will hold if $$\beta_{\varepsilon}$$ is replaced by a $$p\times d$$ matrix $$\hat \beta _{\varepsilon}$$ satisfying $$a_{n}\varepsilon_{n}(\hat \beta _{\varepsilon}-\beta_{\varepsilon})=o_{\rm p}(1)$$. □ Proof of Proposition 3. Consider $$\theta^{*}$$ where $$\beta_{\varepsilon}$$ is replaced by $$\hat \beta _{\varepsilon}$$. Let $$\eta_{n}=N^{1/2}a_{n}\varepsilon_{n}(\hat \beta _{\varepsilon}-\beta_{\varepsilon})$$. Since $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}=O_{\rm p}\{(a_{n}\varepsilon_{n})^{-1}N^{-1/2}\}$$ as $$n\rightarrow\infty$$, $$\eta_{n}=O_{\rm p}(1)$$ as $$n\rightarrow\infty$$ and let its limit be $$\eta$$. In this case, if we replace $$\mu_{n}^{*}(v)$$ in (A3) with $$\mu_{n}^{*}(v)+N^{-1/2}\eta_{n}\{v-E_{G_{n}}(v)\}$$, denoted by $$\hat \mu_{n}^{*}(v)$$, the equation still holds. Denote this equation by (A4). Limits of the leading term in (A4) can be obtained by arguments similar to those for (A2). When $$c_{\varepsilon}<\infty$$, since for fixed $$v$$, $$\hat \mu_{n}^{*}(v)$$ converges to $$N^{-1/2}\eta\{v-E_{G}(v)\}$$ in distribution, by following the same line of argument we have \begin{align*} \tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\} & \rightarrow\int_{A}\int_{\mathbb{R}^{p}}N[t;N^{-1/2}\eta\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)^{({\rm norm})}\,{\rm d}v\,{\rm d}t \end{align*} in distribution as $$n\rightarrow\infty$$. When $$c_{\varepsilon}=\infty$$, by Lemma A2 we have $$E_{G_{n}}(v)\rightarrow0$$ in probability, and $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}g_{n}(t',v)\,{\rm d}t'{\rm d}v\rightarrow\int_{\mathbb{R}^{p}}K\{Ds(\theta_{0})t\}\,{\rm d}t$$ in probability. Then with the transformation $$v'=v'(v,t')$$, for fixed $$v$$, \begin{align*} \hat \mu_{n}^{*}(v) & =\{a_{n}\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})+N^{-1/2}\eta_{n}\}\left\{ Ds(\theta_{0})t'+\frac{1}{a_{n}\varepsilon_{n}}v'-\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}}-E_{G_{n}(v)}\right\} \\ & \rightarrow N^{-1/2}\eta Ds(\theta_{0})t' \end{align*} in distribution as $$n\rightarrow\infty$$. Recall that $$g_{n}(t',v)\,{\rm d}v=g_{n}'(t',v')\,{\rm d}v'$$. Then by Lemma 4 in the Supplementary Material and the continuous mapping theorem, \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\hat \mu_{n}^{*}(v),I(\theta_{0})^{-1}\}g_{n}'(t',v')\,{\rm d}t\,{\rm d}t'{\rm d}v\\ \rightarrow & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;N^{-1/2}\eta Ds(\theta_{0})t',I(\theta_{0})^{-1}\}K\{Ds(\theta_{0})t'\}\,{\rm d}t\,{\rm d}t'{\rm d}v \end{align*} in distribution as $$n\rightarrow\infty$$. Therefore by (A4) and the above convergence results, \begin{align*} \tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\} & \rightarrow\int_{A}\int_{\mathbb{R}^{p}}N\{t;N^{-1/2}\eta Ds(\theta_{0})t',I(\theta_{0})^{-1}\}K\{Ds(\theta_{0})t'\}^{({\rm norm})}\,{\rm d}t'{\rm d}t \end{align*} in distribution as $$n\rightarrow\infty$$. □ References Beaumont M. A. ( 2010 ). Approximate Bayesian computation in evolution and ecology . Ann. Rev. Ecol. Evol. Syst. 41 , 379 – 406 . Google Scholar CrossRef Search ADS Beaumont M. A. , Cornuet J.-M. , Marin J.-M. & Robert C. P. ( 2009 ). Adaptive approximate Bayesian computation . Biometrika 96 , 983 – 90 . Google Scholar CrossRef Search ADS Beaumont M. A. , Zhang W. & Balding D. J. ( 2002 ). Approximate Bayesian computation in population genetics . Genetics 162 , 2025 – 35 . Google Scholar PubMed Blum M. G. ( 2010 ). Approximate Bayesian computation: A nonparametric perspective . J. Am. Statist. Assoc. 105 , 1178 – 87 . Google Scholar CrossRef Search ADS Bonassi F. V. & West M. ( 2015 ). Sequential Monte Carlo with adaptive weights for approximate Bayesian computation . Bayesian Anal. 10 , 171 – 87 . Google Scholar CrossRef Search ADS Calvet L. E. & Czellar V. ( 2015 ). Accurate methods for approximate Bayesian computation filtering . J. Finan. Economet. 13 , 798 – 838 . Google Scholar CrossRef Search ADS Chernozhukov V. & Hong H. ( 2003 ). An MCMC approach to classical estimation . J. Economet. 115 , 293 – 346 . Google Scholar CrossRef Search ADS Cornuet J.-M. , Santos F. , Beaumont M. A. , Robert C. P. , Marin J.-M. , Balding D. J. , Guillemaud T. & Estoup A. ( 2008 ). Inferring population history with DIY ABC: A user-friendly approach to approximate Bayesian computation . Bioinformatics 24 , 2713 – 9 . Google Scholar CrossRef Search ADS PubMed Drovandi C. C. & Pettitt A. N. ( 2011 ). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation . Biometrics 67 , 225 – 33 . Google Scholar CrossRef Search ADS PubMed Drovandi C. C. , Pettitt A. N. & Lee A. ( 2015 ). Bayesian indirect inference using a parametric auxiliary model . Statist. Sci. 30 , 72 – 95 . Google Scholar CrossRef Search ADS Duffie D. & Singleton K. J. ( 1993 ). Simulated moments estimation of Markov models of asset prices . Econometrica 61 , 929 – 52 . Google Scholar CrossRef Search ADS Fearnhead P. & Prangle D. ( 2012 ). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation (with Discussion) . J. R. Statist. Soc. B 74 , 419 – 74 . Google Scholar CrossRef Search ADS Filippi S. , Barnes C. P. , Cornebise J. & Stumpf M. P. ( 2013 ). On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo . Statist. Applic. Genet. Molec. Biol. 12 , 87 – 107 . Frazier D. T. , Martin G. M. , Robert C. P. & Rousseau J. ( 2016 ). Asymptotic properties of approximate Bayesian computation . arXiv:1607.06903. Gouriéroux, C. & Ronchetti E. ( 1993 ). Indirect inference . J. Appl. Economet. 8 , s 85 – 118 . Google Scholar CrossRef Search ADS Heggland K. & Frigessi A. ( 2004 ). Estimating functions in indirect inference . J. R. Statist. Soc. B 66 , 447 – 62 . Google Scholar CrossRef Search ADS Kleijn B. & van der Vaart A. ( 2012 ). The Bernstein-von-Mises theorem under misspecification . Electron. J. Statist. 6 , 354 – 81 . Google Scholar CrossRef Search ADS Lenormand M. , Jabot F. & Deffuant G. ( 2013 ). Adaptive approximate Bayesian computation for complex models . Comp. Statist. 28 , 2777 – 96 . Google Scholar CrossRef Search ADS Li W. & Fearnhead P. ( 2018 ). On the asymptotic efficiency of approximate Bayesian computation estimators . Biometrika 105 , https://doi.org/10.1093/biomet/asx078 . Marin J.-M. , Pillai N. S. , Robert C. P. & Rousseau J. ( 2014 ). Relevant statistics for Bayesian model choice . J. R. Statist. Soc. B 76 , 833 – 59 . Google Scholar CrossRef Search ADS Marjoram P. , Molitor J. , Plagnol V. & Tavare S. ( 2003 ). Markov chain Monte Carlo without likelihoods . Proc. Nat. Acad. Sci. 100 , 15324 – 8 . Google Scholar CrossRef Search ADS Pauli F. , Racugno W. & Ventura L. ( 2011 ). Bayesian composite marginal likelihoods . Statist. Sinica 21 , 149 – 64 . Ribatet M. , Cooley D. & Davison A. C. ( 2012 ). Bayesian inference from composite likelihoods, with an application to spatial extremes . Statist. Sinica 22 , 813 – 45 . Ruli E. , Sartori N. & Ventura L. ( 2016 ). Approximate Bayesian computation with composite score functions . Statist. Comp. 26 , 679 – 92 . Google Scholar CrossRef Search ADS Soubeyrand S. & Haon-Lasportes E. ( 2015 ). Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC . Statist. Prob. Lett. 107 , 84 – 92 . Google Scholar CrossRef Search ADS Toni T. , Welch D. , Strelkowa N. , Ipsen A. & Stumpf M. P. ( 2009 ). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems . J. R. Soc. Interface 6 , 187 – 202 . Google Scholar CrossRef Search ADS PubMed van der Vaart A. W. ( 2000 ). Asymptotic Statistics . Cambridge : Cambridge University Press . Varin C. , Reid N. & Firth D. ( 2011 ). An overview of composite likelihood methods . Statist. Sinica 21 , 5 – 42 . Wegmann D. , Leuenberger C. & Excoffier L. ( 2009 ). Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood . Genetics 182 , 1207 – 18 . Google Scholar CrossRef Search ADS PubMed Wood S. N. ( 2010 ). Statistical inference for noisy nonlinear ecological dynamic systems . Nature 466 , 1102 – 4 . Google Scholar CrossRef Search ADS PubMed © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Biometrika Oxford University Press

Convergence of regression-adjusted approximate Bayesian computation

Biometrika , Volume Advance Article (2) – Jan 27, 2018

Loading next page...
 
/lp/ou_press/convergence-of-regression-adjusted-approximate-bayesian-computation-9CejWKf6Lv
Publisher
Oxford University Press
Copyright
© 2018 Biometrika Trust
ISSN
0006-3444
eISSN
1464-3510
D.O.I.
10.1093/biomet/asx081
Publisher site
See Article on Publisher Site

Abstract

SUMMARY We present asymptotic results for the regression-adjusted version of approximate Bayesian computation introduced by Beaumont et al. (2002). We show that for an appropriate choice of the bandwidth, regression adjustment will lead to a posterior that, asymptotically, correctly quantifies uncertainty. Furthermore, for such a choice of bandwidth we can implement an importance sampling algorithm to sample from the posterior whose acceptance probability tends to unity as the data sample size increases. This compares favourably to results for standard approximate Bayesian computation, where the only way to obtain a posterior that correctly quantifies uncertainty is to choose a much smaller bandwidth, one for which the acceptance probability tends to zero and hence for which Monte Carlo error will dominate. 1. Introduction Modern statistical applications increasingly require the fitting of complex statistical models, which are often intractable in the sense that it is impossible to evaluate the likelihood function. This excludes standard implementation of likelihood-based methods, such as maximum likelihood estimation or Bayesian analysis. To overcome this difficulty, there has been substantial interest in likelihood-free or simulation-based methods, which replace calculating the likelihood by simulation of pseudo-datasets from the model. Inference can then be performed by comparing these pseudo-datasets, simulated for a range of different parameter values, to the actual data. Examples of such likelihood-free methods include simulated methods of moments (Duffie & Singleton, 1993), indirect inference (Gouriéroux & Ronchetti, 1993; Heggland & Frigessi, 2004), synthetic likelihood (Wood, 2010) and approximate Bayesian computation (Beaumont et al., 2002). Of these, approximate Bayesian computation methods are arguably the most common methods for Bayesian inference, and have been popular in population genetics (e.g., Beaumont et al., 2002; Cornuet et al., 2008), ecology (e.g., Beaumont, 2010) and systems biology (e.g., Toni et al., 2009); more recently they have seen increased use in other application areas, such as econometrics (Calvet & Czellar, 2015) and epidemiology (Drovandi & Pettitt, 2011). The idea of approximate Bayesian computation is to first summarize the data using low-dimensional summary statistics, such as sample means or autocovariances or suitable quantiles of the data. The posterior density given the summary statistics is then approximated as follows. Assume the data are $$Y_{{\rm obs}}=(y_{{\rm obs},1},\ldots,y_{{\rm obs},n})$$ and modelled as a draw from a parametric model with parameter $$\theta \in \mathbb{R}^p$$. Let $$K(x)$$ be a positive kernel, where $$\max_{x}K(x)=1$$, and let $$\varepsilon>0$$ be the bandwidth. For a given $$d$$-dimensional summary statistic $$s_{}(Y)$$, our model will define a density $$f_n(s \mid \theta)$$. We then define a joint density, $$\pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})$$, for $$(\theta,s)$$ as \begin{align} & \frac{\pi(\theta)f_{n}(s\mid\theta)K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}}{\int_{\mathbb{R}^p\times\mathbb{R}^{d}}\pi(\theta) f_{n}(s\mid\theta)K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}\,{\rm d}\theta\, {\rm d} s}, \end{align} (1) where $$s_{{\rm obs}}=s_{}(Y_{{\rm obs}})$$. Our approximation to the posterior density is the marginal density \begin{equation} \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})=\int \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})\,{\rm d} s, \end{equation} (2) which we call the approximate Bayesian computation posterior density. For brevity we will often shorten this to posterior density in the following. We will always call the actual posterior given the summary the true posterior. The idea of approximate Bayesian computation is that we can sample from $$\pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$ without needing to evaluate the likelihood function or $$f_{n}(s\mid\theta)$$. The simplest approach is via rejection sampling (Beaumont et al., 2002), which proceeds by simulating a parameter value and an associated summary statistic from $$\pi(\theta)f_n(s\mid\theta)$$. This pair is then accepted with probability $$K\{\varepsilon^{-1}(s-s_{{\rm obs}})\}$$. The accepted pairs will be drawn from (1), and the accepted parameter values will be drawn from the posterior (2). Implementing this rejection sampler requires only the ability to simulate pseudo-datasets from the model, and then to be able to calculate the summary statistics for those datasets. Alternative algorithms for simulating from the posterior include adaptive or sequential importance sampling (Beaumont et al., 2009; Bonassi & West, 2015; Lenormand et al., 2013; Filippi et al., 2013) and Markov chain Monte Carlo approaches (Marjoram et al., 2003; Wegmann et al., 2009). These aim to propose parameter values in areas of high posterior probability, and thus can be substantially more efficient than rejection sampling. However, the computational efficiency of all these methods is limited by the probability of acceptance for data simulated with a parameter value that has high posterior probability. This paper is concerned with the asymptotic properties of approximate Bayesian computation. It builds upon Li & Fearnhead (2018) and Frazier et al. (2016), who present results on the asymptotic behaviour of the posterior distribution and the posterior mean of approximate Bayesian computation as the amount of data, $$n$$, increases. Their results highlight the tension in approximate Bayesian computation between choices of the summary statistics and bandwidth that will lead to more accurate inferences, and choices that will reduce the computational cost or Monte Carlo error of algorithms for sampling from the posterior. An informal summary of some of these results is as follows. Assume a fixed-dimensional summary statistic and that the true posterior variance given this summary decreases like $$1/n$$ as $$n$$ increases. The theoretical results compare the posterior, or posterior mean, of approximate Bayesian computation to the true posterior, or true posterior mean, given the summary of the data. The accuracy of using approximate Bayesian computation is governed by the choice of bandwidth, and this choice should depend on $$n$$. Li & Fearnhead (2018) shows that the optimal choice of this bandwidth will be $$O(n^{-1/2})$$. With this choice, estimates based on the posterior mean of approximate Bayesian computation can, asymptotically, be as accurate as estimates based on the true posterior mean given the summary. Furthermore the Monte Carlo error of an importance sampling algorithm with a good proposal distribution will only inflate the mean square error of the estimator by a constant factor of the form $$1+O(1/N)$$, where $$N$$ is the number of pseudo-datasets. These results are similar to the asymptotic results for indirect inference, where error for a Monte Carlo sample of size $$N$$ also inflates the overall mean square error of estimators by a factor of $$1+O(1/N)$$ (Gouriéroux & Ronchetti, 1993). By comparison, choosing a bandwidth which is $$o(n^{-1/2})$$ will lead to an acceptance probability that tends to zero as $$n\rightarrow \infty$$, and the Monte Carlo error of approximate Bayesian computation will blow up. Choosing a bandwidth that decays more slowly than $$O(n^{-1/2})$$ will also lead to a regime where the Monte Carlo error dominates, and can lead to a nonnegligible bias in the posterior mean that inflates the error. While the above results for a bandwidth that is $$O(n^{-1/2})$$ are positive in terms of point estimates, they are negative in terms of the calibration of the posterior. With such a bandwidth the posterior density of approximate Bayesian computation always over-inflates the parameter uncertainty: see Proposition 1 below and Theorem 2 of Frazier et al. (2016). The aim of this paper is to show that a variant of approximate Bayesian computation can yield inference that is both accurate in terms of point estimation, with its posterior mean having the same frequentist asymptotic variance as the true posterior mean given the summaries, and calibrated, in the sense that its posterior variance equals this asymptotic variance, when the bandwidth converges to zero at a rate slower than $$O(n^{-1/2})$$. This means that the acceptance probability of a good approximate Bayesian computation algorithm will tend to unity as $$n\rightarrow \infty$$. 2. Notation and set-up We denote the data by $$Y_{{\rm obs}}=(y_{{\rm obs},1},\ldots,y_{{\rm obs},n})$$, where $$n$$ is the sample size, and each observation, $$y_{{\rm obs},i}$$, can be of arbitrary dimension. Assume the data are modelled as a draw from a parametric density, $$f_{n}(y\mid\theta)$$, and consider asymptotics as $$n\rightarrow\infty$$. This density depends on an unknown parameter $$\theta\in\mathbb{R}^{p}$$. Let $$\mathscr{B}^p$$ be the Borel sigma-field on $$\mathbb{R}^p$$. We will let $$\theta_{0}$$ denote the true parameter value and $$\pi(\theta)$$ the prior distribution for the parameter. Denote the support of $$\pi(\theta)$$ by $$\mathcal{P}$$. Assume that a fixed-dimensional summary statistic $$s_n(Y)$$ is chosen and its density under our model is $$f_{n}(s\mid\theta)$$. The shorthand $$S_n$$ is used to denote the random variable with density $$f_{n}(s\mid\theta)$$. Often we will simplify notation and write $$s$$ and $$S$$ for $$s_n$$ and $$S_n$$ respectively. Let $$N(x;\mu,\Sigma)$$ be the normal density at $$x$$ with mean $$\mu$$ and variance $$\Sigma$$. Let $$A^{\rm c}$$ be the complement of a set $$A$$ with respect to the whole space. For a series $$x_{n}$$ we write $$x_{n}=\Theta(a_n)$$ if there exist constants $$m$$ and $$M$$ such that $$0<m<|x_{n}/a_{n}|<M<\infty$$ as $$n\rightarrow\infty$$. For a real function $$g(x)$$, denote its gradient function at $$x=x_{0}$$ by $$D_{x}g(x_{0})$$. To simplify notation, $$D_{\theta}$$ is written as $$D$$. Hereafter $$\varepsilon$$ is considered to depend on $$n$$, so the notation $$\varepsilon_n$$ is used. The conditions of the theoretical results are stated below. Condition 1. There exists some $$\delta_{0}>0$$ such that $$\mathcal{P}_{0}=\{\theta:|\theta-\theta_{0}|<\delta_{0}\}\subset\mathcal{P}$$, $$\pi(\theta)\in C^{2}(\mathcal{P}_{0})$$ and $$\pi(\theta_{0})>0$$. Condition 2. The kernel satisfies (i) $$\int vK(v)\,{\rm d}v=0$$; (ii) $$\int\prod_{k=1}^{l}v_{i_{k}}K(v)\,{\rm d}v<\infty$$ for any coordinates $$(v_{i_{1}},\ldots,v_{i_{l}})$$ of $$v$$ and $$l\leqslant p+6$$; (iii) $$K(v)\propto\bar{K}(\|v\|_{\Lambda}^{2})$$ where $$\|v\|_{\Lambda}^{2}=v^{{ \mathrm{\scriptscriptstyle T} }}\Lambda v$$ and $$\Lambda$$ is a positive-definite matrix, and $$K(v)$$ is a decreasing function of $$\|v\|_{\Lambda}$$; (iv) $$K(v)=O\{\exp(-c_1\|v\|^{\alpha_1})\}$$ for some $$\alpha_{1}>0$$ and $$c_{1}>0$$ as $$\|v\|\rightarrow\infty$$. Condition 3. There exists a sequence $$a_{n}$$ satisfying $$a_{n}\rightarrow\infty$$ as $$n\rightarrow\infty$$, a $$d$$-dimensional vector $$s(\theta)$$ and a $$d\times d$$ matrix $$A(\theta)$$ such that for all $$\theta\in\mathcal{P}_{0}$$, \[ a_{n}\{S_{n}-s(\theta)\}\rightarrow N\{0,A(\theta)\},\quad n\rightarrow\infty \] in distribution. We also assume that $$s_{\rm obs}\rightarrow s(\theta_0)$$ in probability. Furthermore, (i) $$s(\theta)$$ and $$A(\theta)\in C^{1}(\mathcal{P}_{0})$$, and $$A(\theta)$$ is positive definite for any $$\theta$$; (ii) for any $$\delta>0$$ there exists $$\delta'>0$$ such that $$\|s(\theta)-s(\theta_0)\|>\delta'$$ for all $$\theta$$ satisfying $$\|\theta-\theta_0\|>\delta$$; and (iii) $$I(\theta)=Ds(\theta)^{{ \mathrm{\scriptscriptstyle T} }}A^{-1}(\theta)Ds(\theta)$$ has full rank at $$\theta=\theta_{0}$$. Let $$\tilde{f}_{n}(s\mid\theta)=N\{s;s(\theta),A(\theta)/a_{n}^{2}\}$$ be the density of the normal approximation to $$S$$ and introduce the standardized random variable $$W_{n}(s)=a_{n}A(\theta)^{-1/2}\{S-s(\theta)\}$$. We further let $$f_{W_n}(w\mid \theta)$$ and $$\tilde{f}_{W_n}(w\mid \theta)$$ be the densities for $$W_n$$ under the true model for $$S$$ and under our normal approximation to the model for $$S$$. Condition 4. There exists $$\alpha_{n}$$ satisfying $$\alpha_{n}/a_{n}^{2/5}\rightarrow\infty$$ and a density $$r_{\rm max}(w)$$ satisfying Condition 2(ii) and (iii), where $$K(v)$$ is replaced with $$r_{\rm max}(w)$$, such that $$\sup_{\theta\in\mathcal{P}_{0}}\alpha_{n}\big|f_{W_{n}}(w\mid\theta)-\tilde{f}_{W_{n}}(w\mid\theta)\big|\leqslant c_{3}r_{\rm max}(w)$$ for some positive constant $$c_{3}$$. Condition 5. The following statements hold: (i) $$r_{\rm max}(w)$$ satisfies Condition 2(iv); and (ii) $$\sup_{\theta\in\mathcal{P}_{0}^{\rm c}}f_{W_{n}}(w\mid\theta)=O\{\exp(-c_2\|w\|^{\alpha_2})\}$$ as $$\|w\|\rightarrow\infty$$ for some positive constants $$c_{2}$$ and $$\alpha_{2}$$, and $$A(\theta)$$ is bounded in $$\mathcal{P}$$. Conditions 1–5 are from Li & Fearnhead (2018). Condition 2 is a requirement for the kernel function and is satisfied by all commonly used kernels, such as any kernel with compact support or the Gaussian kernel. Condition 3 assumes a central limit theorem for the summary statistic with rate $$a_{n}$$ and, roughly speaking, requires the summary statistic to accumulate information when $$n$$ gets large. This is a natural assumption, since many common summary statistics are sample moments, proportions, quantiles and autocorrelations, for which a central limit theorem would apply. It is also possible to verify the asymptotic normality of auxiliary model-based or composite likelihood-based summary statistics (Drovandi et al., 2015; Ruli et al., 2016) by referring to the rich literature on asymptotic properties of quasi maximum likelihood estimators (Varin et al., 2011) or quasi-posterior estimators (Chernozhukov & Hong, 2003). This assumption does not cover ancillary statistics, using the full data as a summary statistic, or statistics based on distances, such as an asymptotically chi-square-distributed test statistic. Condition 4 assumes that, in a neighbourhood of $$\theta_{0}$$, $$f_{n}(s\mid\theta)$$ deviates from the leading term of its Edgeworth expansion by a rate $$a_{n}^{-2/5}$$. This is weaker than the standard requirement, $$o(a_n^{-1})$$, for the remainder from Edgeworth expansion. It also assumes that the deviation is uniform, which is not difficult to satisfy in a compact neighbourhood. Condition 5 further assumes that $$f_{n}(s\mid\theta)$$ has exponentially decreasing tails with rate uniform in the support of $$\pi(\theta)$$. This implies that posterior moments from approximate Bayesian computation are dominated by integrals in the neighbourhood of $$\theta_0$$ and have leading terms with concise expressions. With Condition 5 weakened, the requirement of $$\varepsilon_n$$ for the proper convergence to hold might depend on the specific tail behaviour of $$f_{n}(s\mid\theta)$$. Additionally, for the results regarding regression adjustment, the following moments of the summary statistic are required to exist. Condition 5. The first two moments, $$\int_{\mathbb{R}^{d}}s f_{n}(s\mid\theta)\,{\rm d} s$$ and $$\int_{\mathbb{R}^{d}}ss^{{ \mathrm{\scriptscriptstyle T} }}f_{n}(s\mid\theta)\,{\rm d} s$$, exist. 3. Asymptotics of approximate Bayesian computation 3.1. Posterior First we consider the convergence of the posterior distribution of approximate Bayesian computation, denoted by $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ for $$A\in\mathscr{B}^p$$, as $$n\rightarrow\infty$$. The distribution function is a random function with the randomness due to $$s_{{\rm obs}}$$. We present two convergence results. One is the convergence of the posterior distribution function of a properly scaled and centred version of $$\theta$$; see Proposition 1. The other is the convergence of the posterior mean, a result which comes from Li & Fearnhead (2018) but, for convenience, is repeated as Proposition 2. The following proposition gives three different limiting forms for $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$, corresponding to different rates for how the bandwidth decreases relative to the rate of the central limit theorem in Condition 3. We summarize these competing rates by defining $$c_{\varepsilon}=\lim_{n\rightarrow\infty}a_{n}\varepsilon_{n}$$. Proposition 1. Assume Conditions 1–5. Let $$\theta_{\varepsilon}$$ denote the posterior mean of approximate Bayesian computation. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ then the following convergence holds, depending on the value of $$c_{\varepsilon}$$. (i) If $$c_{\varepsilon}=0$$ then \[ \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{a_n(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}\psi(t)\,{\rm d}t\right|\rightarrow0 \]in probability, where $$\psi(t)=N\{t;0,I(\theta_{0})^{-1}\}$$. (ii) If $$c_{\varepsilon}\in (0,\infty)$$ then for any $$A\in\mathscr{B}^p$$, \[ \Pi_{\varepsilon}\{a_n(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\psi(t)\,{\rm d}t \]in distribution, where \[ \psi(t)\propto \int_{\mathbb{R}^{d}}N[t;c_{\varepsilon}\beta_0\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)\,{\rm d}v,\quad \beta_{0}=I(\theta_{0})^{-1}Ds(\theta_{0})^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_{0})^{-1}, \]and $$G(v)$$ is a random density of $$v$$, with mean $$E_{G}(v)$$, which depends on $$c_{\varepsilon}$$ and $$Z\sim N(0,I_{d})$$. (iii) If $$c_{\varepsilon}=\infty$$ then \[ \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{\varepsilon_n^{-1}(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}\psi(t)\,{\rm d}t\right|\rightarrow 0 \]in probability, where $$\psi(t)\propto K\{Ds(\theta_{0})t\}$$. For a similar result, under different assumptions, see Theorem 2 of Frazier et al. (2016). See also Soubeyrand & Haon-Lasportes (2015) for related convergence results for the true posterior given the summaries for some specific choices of summary statistics. The explicit form of $$G(v)$$ is stated in the Supplementary Material. When we have the same number of summary statistics and parameters, $$d=p$$, the limiting distribution simplifies to \[ \psi(t)\propto \int_{\mathbb{R}^{d}}N\{Ds(\theta_0)t;c_{\varepsilon}v,A(\theta_{0})\}K(v)\,{\rm d}v\text{.} \] The more complicated form in Proposition 1(ii) above arises from the need to project the summary statistics onto the parameter space. The limiting distribution may depend on the value of the summary statistic, $$s_{\rm obs}$$, in the space orthogonal to $$Ds(\theta_0)^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_0)^{-1/2}$$. Hence the limit depends on a random quantity, $$Z$$, which can be interpreted as the noise in $$s_{\rm obs}$$. The main difference between the three convergence results is the form of the limiting density $$\psi(t)$$ for the scaled random variable $$a_{n,\varepsilon}(\theta-\theta_{\varepsilon})$$, where $$a_{n,\varepsilon}=a_{n}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}+\varepsilon_{n}^{-1}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}=\infty}$$. For case (i) the bandwidth is sufficiently small that the approximation in approximate Bayesian computation due to accepting summaries close to the observed summary is asymptotically negligible. The asymptotic posterior distribution is Gaussian, and equals the limit of the true posterior for $$\theta$$ given the summary. For case (iii) the bandwidth is sufficiently big that this approximation dominates and the asymptotic posterior distribution of approximate Bayesian computation is determined by the kernel. For case (ii) the approximation is of the same order as the uncertainty in $$\theta$$, which leads to an asymptotic posterior distribution that is a convolution of a Gaussian distribution and the kernel. Since the limit distributions of cases (i) and (iii) are nonrandom in the space $$L^1(\mathbb{R}^p)$$, the weak convergence is strengthened to convergence in probability in $$L^1(\mathbb{R}^p)$$. See the proof in the Appendix. Proposition 2. (Theorem 3.1 of Li & Fearnhead, 2018). Assume the conditions of Proposition 1. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, then $$a_{n}(\theta_{\varepsilon}-\theta_{0})\rightarrow N\{0,I_{\rm ABC}^{-1}(\theta_{0})\}$$ in distribution. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ or $$d=p$$ or the covariance matrix of the kernel is proportional to $$A(\theta_{0})$$, then $$I_{\rm ABC}(\theta_{0})=I(\theta_{0})$$. For other cases, $$I(\theta_{0})-I_{\rm ABC}(\theta_{0})$$ is semipositive definite. Proposition 2 helps us to compare the frequentist variability in the posterior mean of approximate Bayesian computation with the asymptotic posterior distribution given in Proposition 1. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ then the posterior distribution is asymptotically normal with variance matrix $$a_{n}^{-2}I(\theta_{0})^{-1}$$, and the posterior mean is also asymptotically normal with the same variance matrix. These results are identical to those we would get for the true posterior and posterior mean given the summary. For an $$\varepsilon_{n}$$ which is of the same order as $$a_{n}^{-1}$$, the uncertainty in approximate Bayesian computation has rate $$a_{n}^{-1}$$. However, the limiting posterior distribution, which is a convolution of the true limiting posterior given the summary with the kernel, will overestimate the uncertainty by a constant factor. If $$\varepsilon_{n}$$ decreases slower than $$a_{n}^{-1}$$, the posterior contracts at a rate $$\varepsilon_{n}$$, and thus will overestimate the actual uncertainty by a factor that diverges as $$n\rightarrow \infty$$. In summary, it is much easier to get approximate Bayesian computation to accurately estimate the posterior mean. This is possible with $$\varepsilon_{n}$$ as large as $$o(a_{n}^{-3/5})$$ if the dimension of the summary statistic equals that of the parameter. However, accurately estimating the posterior variance, or getting the posterior to accurately reflect the uncertainty in the parameter, is much harder. As commented in § 1, this is only possible for values of $$\varepsilon_n$$ for which the acceptance probability in a standard algorithm will go to zero as $$n$$ increases. In this case the Monte Carlo sample size, and hence the computational cost, of approximate Bayesian computation will have to increase substantially with $$n$$. As one application of our theoretical results, consider observations that are independent and identically distributed from a parametric density $$f(\cdot\mid\theta)$$. One approach to constructing the summary statistics is to use the score vector of some tractable approximating auxiliary model evaluated at the maximum auxiliary likelihood estimator (Drovandi et al., 2015). Ruli et al. (2016) construct an auxiliary model from a composite likelihood, so the auxiliary likelihood for a single observation is $$\prod_{i\in\mathscr{I}}f(y\in A_i\mid\theta)$$ where $$\{A_i: i\in\mathscr{I}\}$$ is a set of marginal or conditional events for $$y$$. Denote the auxiliary score vector for a single observation by $$cl_{\theta}(\cdot\mid\theta)$$ and the maximum auxiliary likelihood estimator for our dataset by $$\hat \theta_{\rm cl}$$. Then the summary statistic, $$s$$, for any pseudo-dataset $$\{y_1,\ldots,y_n\}$$ is $$\sum_{j=1}^n cl_{\theta}(y_j\mid\hat \theta_{\rm cl})/n$$. For $$y\sim f(\cdot\mid\theta)$$, assume that the first two moments of $$cl_{\theta}(y\mid\theta_0)$$ exist and $$cl_{\theta}(y\mid\theta)$$ is differentiable at $$\theta$$. Let $$H(\theta)=E_{\theta}\{\partial cl_{\theta}(y\mid\theta_{0})/\partial\theta\}$$ and $$J(\theta)=\mbox{var}_{\theta}\{cl_{\theta}(y\mid\theta_{0})\}$$. Then if $$\hat \theta_{\rm cl}$$ is consistent for $$\theta_0$$, Condition 3 is satisfied with \begin{align*} n^{1/2}[S-E_{\theta}\{cl_{\theta}(Y\mid\theta_0)\}]\rightarrow N\{0,J(\theta)\},\quad n\rightarrow\infty \end{align*} in distribution, and with $$I(\theta_0)=H(\theta_0)^{{ \mathrm{\scriptscriptstyle T} }}J(\theta_0)^{-1}H(\theta_0)$$. Our results show that the posterior mean of approximate Bayesian computation, using $$\varepsilon_n=O(n^{-1/2})$$, will have asymptotic variance $$I(\theta_0)^{-1}/n$$. This is identical to the asymptotic variance of the maximum composite likelihood estimator (Varin et al., 2011). Furthermore, the posterior variance will overestimate this just by a constant factor. As we show below, using the regression correction of Beaumont et al. (2002) will correct this overestimation and produce a posterior that correctly quantifies the uncertainty in our estimates. An alternative approach to constructing an approximate posterior using composite likelihood is to use the product of the prior and the composite likelihood. In general, this leads to a poorly calibrated posterior density which substantially underestimates uncertainty (Ribatet et al., 2012). Adjustment of the composite likelihood is needed to obtain calibration, but this involves estimation of the curvature and the variance of the composite score (Pauli et al., 2011). Empirical evidence that approximate Bayesian computation more accurately quantifies uncertainty than alternative composite-based posteriors is given in Ruli et al. (2016). 3.2. Regression-adjusted approximate Bayesian computation The regression adjustment of Beaumont et al. (2002) involves post-processing the output of approximate Bayesian computation to try to improve the resulting approximation to the true posterior. Below we will denote a sample from the posterior of approximate Bayesian computation by $$\{(\theta_{i},s_{i})\}_{i=1,\ldots,N}$$. Under the regression adjustment, we obtain a new posterior sample by using $$\{\theta_{i}-\hat \beta _{\varepsilon}(s_{i}-s_{{\rm obs}})\}_{i=1,\ldots,N}$$ where $$\hat \beta _{\varepsilon}$$ is the least squares estimate of the coefficient matrix in the linear model \begin{align*} \theta_{i} & =\alpha+\beta(s_{i}-s_{{\rm obs}})+e_{i} \quad (i=1,\ldots,N), \end{align*} where $$e_{i}$$ are independent and identically distributed errors. We can view the adjusted sample as follows. Define a constant $$\alpha_{\varepsilon}$$ and a vector $$\beta_{\varepsilon}$$ as \[ (\alpha_{\varepsilon},\beta_{\varepsilon})=\underset{\alpha,\beta}{\arg\min}\ E_{\varepsilon}[\|\theta-\alpha-\beta(s-s_{{\rm obs}})\|^{2}\mid s_{{\rm obs}}], \] where expectation is with respect to the joint posterior distribution of $$(\theta,s)$$ given by approximate Bayesian computation. Then the ideal adjusted posterior is the distribution of $$\theta^*=\theta-\beta_{\varepsilon}(s-s_{{\rm obs}})$$ where $$(\theta,s) \sim \pi_{\varepsilon}(\theta,s)$$. The density of $$\theta^*$$ is \[ \pi_{\varepsilon}^*(\theta^{*}\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\pi_{\varepsilon}\{\theta^{*}+\beta_{\varepsilon}(s-s_{{\rm obs}}),s \mid s_{{\rm obs}}\}\,{\rm d}s \] and the sample we get from regression-adjusted approximate Bayesian computation is a draw from $$\pi_{\varepsilon}^*(\theta^{*}\mid s_{{\rm obs}})$$ but with $$\beta_{\varepsilon}$$ replaced by its estimator. The variance of $$\pi^*_{\varepsilon}(\theta^{*}\mid s_{{\rm obs}})$$ is strictly smaller than that of $$\pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$ provided $$s$$ is correlated with $$\theta$$. The following results, which are analogous to Propositions 1 and 2, show that this reduction in variation is by the correct amount to make the resulting adjusted posterior correctly quantify the posterior uncertainty. Theorem 1. Assume Conditions 1–6. Denote the mean of $$\pi^*_{\varepsilon}(\theta^{*}\mid s_{{\rm obs}})$$ by $$\theta_{\varepsilon}^{*}$$. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, \begin{align*} & \sup_{A\in\mathscr{B}^p}\left|\Pi_{\varepsilon}\{a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t\right|\rightarrow 0 \end{align*}in probability, and $$a_{n}({{\theta_{\varepsilon}^{*}}}-\theta_{0})\rightarrow N\{0,I(\theta_{0})^{-1}\}$$ in distribution. Moreover, if $$\beta_{\varepsilon}$$ is replaced by $$\tilde \beta _{\varepsilon}$$ satisfying $$a_n\varepsilon_n(\tilde \beta _{\varepsilon}-\beta_{\varepsilon})=o_{\rm p}(1)$$, the above results still hold. The limit of the regression-adjusted posterior distribution is the true posterior given the summary provided $$\varepsilon_{n}$$ is $$o(a_{n}^{-3/5})$$. This is a slower rate than that at which the posterior contracts, which, as we will show in the next section, has important consequences in terms of the computational efficiency of approximate Bayesian computation. The regression adjustment corrects for both the additional noise of the posterior mean when $$d>p$$ and the overestimated uncertainty of the posterior. This correction comes from the removal of the first-order bias caused by $$\varepsilon$$. Blum (2010) shows that the regression adjustment reduces the bias of approximate Bayesian computation when $$E(\theta\mid s)$$ is linear and the residuals $$\theta-E(\theta\mid s)$$ are homoscedastic. Our results do not require these assumptions, and suggest that the regression adjustment should be applied routinely with approximate Bayesian computation provided the coefficients $$\beta_{\varepsilon}$$ can be estimated accurately. With the simulated sample, $$\beta_{\varepsilon}$$ is estimated by $$\hat \beta _{\varepsilon}$$. The accuracy of $$\hat \beta _{\varepsilon}$$ can be seen by the following decomposition, \begin{align*} \hat \beta _{\varepsilon} & =\mbox{cov}_N(s,\theta)\mbox{var}_N(s)^{-1}\\ & =\beta_{\varepsilon}+\frac{1}{a_{n}\varepsilon_{n}}\mbox{cov}_N\left\{\frac{s-s_{\varepsilon}}{\varepsilon_{n}},a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\right\} \mbox{var}_N\left(\frac{s-s_{\varepsilon}}{\varepsilon_{n}}\right)^{-1}, \end{align*} where $$\mbox{cov}_N$$ and $$\mbox{var}_N$$ are the sample covariance and variance matrices, and $$s_\epsilon$$ is the sample mean. Since $$\mbox{cov}(s,\theta^{*})=0$$ and the distributions of $$s-s_{\varepsilon}$$ and $$\theta^{*}-\theta_{\varepsilon}^{*}$$ contract at rates $$\varepsilon_n$$ and $$a_n^{-1}$$ respectively, the error $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}$$ can be shown to have the rate $$O_{\rm p}\{(a_{n}\varepsilon_{n})^{-1}N^{-1/2}\}$$ as $$n\rightarrow\infty$$ and $$N\rightarrow\infty$$. We omit the proof, since it is tedious and similar to the proof of the asymptotic expansion of $$\beta_{\varepsilon}$$ in Lemma A4. Thus, if $$N$$ increases to infinity with $$n$$, $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}$$ will be $$o_{\rm p}\{(a_n\varepsilon_n)^{-1}\}$$ and the convergence of Theorem 1 will hold instead. Alternatively, we can get an idea of the additional error for large $$N$$ from the following proposition. Proposition 3. Assume Conditions 1–6. Consider $$\theta^{*}=\theta-\hat \beta _{\varepsilon}(s-s_{{\rm obs}})$$. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ and $$N$$ is large enough, for any $$A\in\mathscr{B}^{p}$$, \begin{align*} \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\psi(t)\,{\rm d}t \end{align*}in distribution, where \begin{align*} \psi(t)\propto & \int_{\mathbb{R}^{d}}N\left[t;\frac{\eta}{N^{1/2}}\{v-E_{G}(v)\},I(\theta_{0})^{-1}\right]G(v)\,{\rm d}v \end{align*}when $$c_{\varepsilon}<\infty$$, \begin{align*} \psi(t)\propto & \int_{\mathbb{R}^{p}}N\left\{ t;\frac{\eta}{N^{1/2}}Ds(\theta_{0})t',I(\theta_{0})^{-1}\right\} K\{Ds(\theta_{0})t'\}\,{\rm d}t' \end{align*}when $$c_{\varepsilon}=\infty$$, and $$\eta=O_{\rm p}(1)$$. The limiting distribution here can be viewed as the convolution of the limiting distribution obtained when the optimal coefficients are used and that of a random variable which relates to the error in our estimate of $$\beta_{\varepsilon}$$, and that is $$O_{\rm p}(N^{-1/2})$$. 3.3. Acceptance rates when $$\varepsilon$$ is negligible Finally we present results for the acceptance probability of approximate Bayesian computation, the quantity that is central to the computational cost of importance sampling or Markov chain Monte Carlo-based algorithms. We consider a set-up where we propose the parameter value from a location-scale family. That is, we can write the proposal density as the density of a random variable, $$\sigma_{n}X+\mu_{n}$$, where $$X\sim q(\cdot)$$, $$E(X)=0$$, and $$\sigma_n$$ and $$\mu_n$$ are constants that can depend on $$n$$. The average acceptance probability $$p_{{\rm acc},q}$$ would then be \begin{align*} & \int_{\mathcal{P}\times\mathbb{R}^{d}}q_{n}(\theta)f_{n}(s\mid\theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}\,{\rm d} s\, {\rm d}\theta, \end{align*} where $$q_{n}(\theta)$$ is the density of $$\sigma_{n}X+\mu_{n}$$. This covers the proposal distribution in fundamental sampling algorithms, including the random-walk Metropolis algorithm and importance sampling with a unimodal proposal distribution, and serves as the building block for many advanced algorithms where the proposal distribution is a mixture of distributions from location-scale families, such as iterative importance sampling-type algorithms. We further assume that $$\sigma_{n}(\mu_{n}-\theta_{0})=O_{\rm p}(1)$$, which means $$\theta_{0}$$ is in the coverage of $$q_{n}(\theta)$$. This is a natural requirement for any good proposal distribution. The prior distribution and $$\theta_{0}$$ as a point mass are included in this proposal family. This condition would also apply to many Markov chain Monte Carlo implementations of approximate Bayesian computation after convergence. As above, define $$a_{n,\varepsilon}=a_{n}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}+\varepsilon_{n}^{-1}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}=\infty}$$ to be the smaller of $$a_{n}$$ and $$\varepsilon_{n}^{-1}$$. Asymptotic results for $$p_{{\rm acc},q}$$ when $$\sigma_{n}$$ has the same rate as $$a_{n,\varepsilon}^{-1}$$ are given in Li & Fearnhead (2018). Here we extend those results to other regimes. Theorem 2. Assume the conditions of Proposition 1. As $$n\rightarrow\infty$$, if $$\varepsilon_{n}=o(a_{n}^{-1/2})$$: (i) if $$c_{\varepsilon}=0$$ or $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow\infty$$, then $$p_{{\rm acc},q}\rightarrow0$$ in probability; (ii) if $$c_{\varepsilon}\in(0,\infty)$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow r_{1}\in[0,\infty)$$, or $$c_{\varepsilon}=\infty$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow r_{1}\in(0,\infty)$$, then $$p_{{\rm acc},q}=\Theta_{\rm p}(1)$$; (iii) if $$c_{\varepsilon}=\infty$$ and $$\sigma_{n}/a_{n,\varepsilon}^{-1}\rightarrow0$$, then $$p_{{\rm acc},q}\rightarrow1$$ in probability. The proof of Theorem 2 can be found in the Supplementary Material. The underlying intuition is as follows. For the summary statistic, $$s_{}$$, sampled with parameter value $$\theta$$, the acceptance probability depends on \begin{align} \frac{s_{}-s_{{\rm obs}}}{\varepsilon_{n}} & =\frac{1}{\varepsilon_{n}}[\{s_{}-s(\theta)\}+\{s(\theta)-s(\theta_{0})\}+\{s(\theta_{0})-s_{{\rm obs}}\}], \end{align} (3) where $$s(\theta)$$ is the limit of $$s_{}$$ in Condition 3. The distance between $$s_{}$$ and $$s_{{\rm obs}}$$ is at least $$O_{\rm p}(a_{n}^{-1})$$, since the first and third bracketed terms are $$O_{\rm p}(a_{n}^{-1})$$. If $$\varepsilon_{n}=o(a_{n}^{-1})$$ then, regardless of the value of $$\theta$$, (3) will blow up as $$n\rightarrow\infty$$ and hence $$p_{{\rm acc},q}$$ goes to $$0$$. If $$\varepsilon_{n}$$ decreases at a rate slower than $$a_{n}^{-1}$$, (3) will go to zero providing we have a proposal which ensures that the middle term is $$o_{\rm p}(\varepsilon_{n})$$, and hence $$p_{{\rm acc},q}$$ goes to unity. Theorem 1 shows that, without the regression adjustment, approximate Bayesian computation requires $$\varepsilon_{n}$$ to be $$o(a_{n}^{-1})$$ if its posterior is to converge to the true posterior given the summary. In this case Theorem 2 shows that the acceptance rate will degenerate to zero as $$n\rightarrow\infty$$ regardless of the choice of $$q(\cdot)$$. On the other hand, with the regression adjustment, we can choose $$\varepsilon_{n}=o(a_{n}^{-3/5})$$ and still have convergence to the true posterior given the summary. For such a choice, if our proposal density satisfies $$\sigma_n=o(\varepsilon_{n})$$, the acceptance rate will go to unity as $$n\rightarrow\infty$$. 4. Numerical example Here we illustrate the gain in computational efficiency from using the regression adjustment on the $$g$$-and-$$k$$ distribution, a popular model for testing approximate Bayesian computation methods (e.g., Fearnhead & Prangle, 2012; Marin et al., 2014). The data are independent and identically distributed from a distribution defined by its quantile function, \begin{align*} F^{-1}(x;\alpha,\beta,\gamma,\kappa)&= \alpha+\beta\left[1+0.8\frac{1-\exp\{-\gamma z(x)\}}{1+\exp\{-\gamma z(x)\}}\right]\{1+z(x)^{2}\}^{\kappa}z(x),\quad x\in[0,1], \end{align*} where $$\alpha$$ and $$\beta$$ are location and scale parameters, $$\gamma$$ and $$\kappa$$ are related to the skewness and kurtosis of the distribution, and $$z(x)$$ is the corresponding quantile of a standard normal distribution. No closed form is available for the density, but simulating from the model is straightforward by transforming realizations from the standard normal distribution. In the following we assume that the parameter vector $$(\alpha,\beta,\gamma,\kappa)$$ has a uniform prior in $$[0,10]^{4}$$ and multiple datasets are generated from the model with $$(\alpha,\beta,\gamma,\kappa)=(3,1,2,0.5)$$. To illustrate the asymptotic behaviour of approximate Bayesian computation, $$50$$ datasets are generated for each of a set of values of $$n$$ ranging from $$500$$ to $$10\,000$$. Consider estimating the posterior means, denoted by $$\mu=(\mu_{1},\ldots,\mu_{4})$$, and standard deviations, denoted by $$\sigma=(\sigma_{1},\ldots,\sigma_{4})$$, of the parameters. The summary statistic is a set of evenly spaced quantiles of dimension $$19$$. The bandwidth is chosen via fixing the proportion of the Monte Carlo sample to be accepted, and the accepted proportions needed to achieve a certain approximation accuracy for estimates with and without the adjustment are compared. A higher proportion means more simulated parameter values can be kept for inference. The accuracy is measured by the average relative errors of estimating $$\mu$$ or $$\sigma$$, \begin{align*} \small{\text{RE}}_{\mu}=\frac{1}{4}\sum_{k=1}^{4}\frac{\left|\hat{\mu}_{k}-\mu_{k}\right|}{\mu_{k}},\quad \small{\text{RE}}_{\sigma}=\frac{1}{4}\sum_{k=1}^{4}\frac{\left|\hat{\sigma}_{k}-\sigma_{k}\right|}{\sigma_{k}}, \end{align*} for estimators $$\hat \mu=(\hat \mu_1,\ldots,\hat \mu_4)$$ and $$\hat{\sigma}=(\hat{\sigma}_1,\ldots,\hat{\sigma}_4)$$. The proposal distribution is normal, with the covariance matrix selected to inflate the posterior covariance matrix by a constant factor $$c^{2}$$ and the mean vector selected to differ from the posterior mean by half of the posterior standard deviation, which avoids the case that the posterior mean can be estimated trivially. We consider a series of increasing $$c$$ values in order to investigate the impact of the proposal distribution getting worse. Figure 1 shows that the required acceptance rate for the regression-adjusted estimates is higher than that for the unadjusted estimates in almost all cases. For estimating the posterior mean, the improvement is small. For estimating the posterior standard deviations, the improvement is much larger. To achieve each level of accuracy, the acceptance rates of the unadjusted estimates are all close to zero. Those of the regression-adjusted estimates are higher by up to two orders of magnitude, so the Monte Carlo sample size needed to achieve the same accuracy can be reduced correspondingly. Fig. 1. View largeDownload slide Acceptance rates required for different degrees of accuracy of approximate Bayesian computation and different variances of the proposal distribution, which are proportional to $$c$$. In each panel we show results for standard (grey) and regression-adjusted (black) approximate Bayesian computation with each method using three values of $$n: n=500$$ (dotted-grey/dotted-black), $$n=3000$$ (dashed-grey/dashed-black) and $$n=10000$$ (solid-grey/solid-black). When dotted or dashed lines overlap with solid lines, only solid lines appear. The averages over $$50$$ datasets (thick) and their $$95\%$$ confidence intervals (thin) are reported. Results are for a relative error of $$0.08$$ and $$0.05$$ in the posterior mean, in (a) and (b) respectively, and for a relative error of $$0.2$$ and $$0.1$$ in the posterior standard deviation, in (c) and (d) respectively. Fig. 1. View largeDownload slide Acceptance rates required for different degrees of accuracy of approximate Bayesian computation and different variances of the proposal distribution, which are proportional to $$c$$. In each panel we show results for standard (grey) and regression-adjusted (black) approximate Bayesian computation with each method using three values of $$n: n=500$$ (dotted-grey/dotted-black), $$n=3000$$ (dashed-grey/dashed-black) and $$n=10000$$ (solid-grey/solid-black). When dotted or dashed lines overlap with solid lines, only solid lines appear. The averages over $$50$$ datasets (thick) and their $$95\%$$ confidence intervals (thin) are reported. Results are for a relative error of $$0.08$$ and $$0.05$$ in the posterior mean, in (a) and (b) respectively, and for a relative error of $$0.2$$ and $$0.1$$ in the posterior standard deviation, in (c) and (d) respectively. 5. Discussion One way to implement approximate Bayesian computation so that the acceptance probability tends to unity as $$n$$ increases is to use importance sampling with a suitable proposal from a location-scale family. The key difficulty with finding a suitable proposal is to ensure that the location parameter is close to the true parameter, where close means the distance is $$O(\varepsilon_n)$$. This can be achieved by performing a preliminary analysis of the data, and using the point estimate of the parameter from this preliminary analysis as the location parameter (Beaumont et al., 2009; Li & Fearnhead, 2018). Acknowledgement This work was funded by the U.K. Engineering and Physical Sciences Research Council, under the i-like programme grant. Supplementary material Supplementary material available at Biometrika online includes proofs of the lemmas and Theorem 2. Appendix Proof of the result from § 3.1 Throughout the data are considered to be random. For any integer $$l>0$$ and a set $$A\subset\mathbb{R}^{l}$$, we use the convention that $$cA+x$$ denotes the set $$\{ct+x:t\in A\}$$ for $$c\in\mathbb{R}$$ and $$x\in\mathbb{R}^{l}$$. For a nonnegative function $$h(x)$$, integrable in $$\mathbb{R}^{l}$$, denote the normalized function $$h(x)/\int_{\mathbb{R}^{l}}h(x)\,{\rm d}x$$ by $$h(x)^{({\rm norm})}$$. For a vector $$x$$, denote a general polynomial of elements of $$x$$ with degree up to $$l$$ by $$P_{l}(x)$$. For any fixed $$\delta<\delta_{0}$$, let $$B_{\delta}$$ be the neighbourhood $$\{\theta:\|\theta-\theta_{0}\|<\delta\}$$. Let $$\pi_{\delta}(\theta)$$ be $$\pi(\theta)$$ truncated in $$B_{\delta}$$. Recall that $$\tilde f_{n}(s\mid\theta)$$ denotes the normal density with mean $$s(\theta)$$ and covariance matrix $$A(\theta)/a_{n}^{2}$$. Let $$t(\theta)=a_{n,\varepsilon}(\theta-\theta_{0})$$ and $$v(s)=\varepsilon_{n}^{-1}(s-s_{{\rm obs}})$$, rescaled versions $$\theta$$ and $$s$$. For any $$A\in\mathscr{B}^{p}$$, let $$t(A)$$ be the set $$\{\phi:\phi=t(\theta)\text{ for some }\theta\in A\}$$. Define $$\tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ to be the normal counterpart of $$\Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ with truncated prior, obtained by replacing $$\pi(\theta)$$ and $$f_{n}(s\mid\theta)$$ in $$\Pi_{\varepsilon}$$ by $$\pi_{\delta}(\theta)$$ and $$\tilde{f}_{n}(s\mid\theta)$$. So let \[ \tilde \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})=\frac{\pi_{\delta}(\theta)\tilde f_{n}(s\mid \theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}}{\int_{B_{\delta}}\int_{\mathbb{R}^{d}}\pi_{\delta}(\theta)\tilde f_{n}(s\mid \theta)K\{\varepsilon_{n}^{-1}(s-s_{{\rm obs}})\}\,{\rm d}\theta\, {\rm d}s} \] and $$\tilde \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\tilde \pi_{\varepsilon}(\theta,s\mid s_{{\rm obs}})\,{\rm d}s$$, and let $$\tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})$$ be the distribution function with density $$\tilde \pi_{\varepsilon}(\theta\mid s_{{\rm obs}})$$. Denote the mean of $$\tilde \Pi_{\varepsilon}$$ by $$\tilde \theta_{\varepsilon}$$. Let $$W_{{\rm obs}}=a_{n}A(\theta_{0})^{-1/2}\{s_{{\rm obs}}-s(\theta_{0})\}$$ and $$\beta_{0}=I(\theta_{0})^{-1}Ds(\theta_{0})^{{ \mathrm{\scriptscriptstyle T} }}A(\theta_{0})^{-1}$$. By Condition 3, $$W_{{\rm obs}}\rightarrow Z$$ in distribution as $$n\rightarrow\infty$$, where $$Z\sim N(0,I_{d})$$. Since the approximate Bayesian computation likelihood within $$\tilde \Pi_{\varepsilon}$$ is an incorrect model for $$s_{{\rm obs}}$$, standard posterior convergence results do not apply. However, if we condition on the value of the summary, $$s$$, then the distribution of $$\theta$$ is just the true posterior given $$s$$. Thus we can express the posterior from approximate Bayesian computation as a continuous mixture of these true posteriors. Let $$\tilde \pi_{\varepsilon,tv}(t,v)=a_{n,\varepsilon}^{-d}\pi_{\delta}(\theta_{0}+a_{n,\varepsilon}^{-1}t)\tilde f_{n}(s_{{\rm obs}}+\varepsilon_{n}v \mid \theta_{0}+a_{n,\varepsilon}^{-1}t)K(v)$$. For any $$A\in\mathscr{B}^{p}$$, we rewrite $$\tilde \Pi_{\varepsilon}$$ as \begin{equation} \tilde \Pi_{\varepsilon}(\theta\in A\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+\varepsilon_{n}v)\tilde \pi_{\varepsilon,tv}(t,v)^{({\rm norm})}\,{\rm d}t\,{\rm d}v, \end{equation} (A1) where $$\tilde \Pi(\theta\in A\mid s)$$ is the posterior distribution with prior $$\pi_{\delta}(\theta)$$ and likelihood $$\tilde{f}_{n}(s\mid\theta)$$. Using results from Kleijn & van der Vaart (2012), the leading term of $$\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+\varepsilon_{n}v)$$ can be obtained and is stated in the following lemma. Lemma A1. Assume Conditions 3 and 4. If $$\varepsilon_{n}=O(a_{n}^{-1})$$, for any fixed $$v\in\mathbb{R}^{d}$$ and small enough $$\delta$$, \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A\mid s_{{\rm obs}}+\varepsilon_{n}v\}-\int_{A}N[t;\beta_{0}\{A(\theta_{0})^{1/2}W_{{\rm obs}}+c_{\varepsilon}v\},I(\theta_{0})^{-1}]\,{\rm d}t\right|\rightarrow0 \]in probability as $$n\rightarrow\infty$$. This leading term is Gaussian, but with a mean that depends on $$v$$. Thus asymptotically, the posterior of approximate Bayesian computation is the distribution of the sum of a Gaussian random variable and $$\beta_0c_{\varepsilon} V$$, where $$V$$ has density proportional to $$\int \tilde \pi_{\varepsilon,tv}(t,v)\,{\rm d}t$$. To make this argument rigorous, and to find the distribution of this sum of random variables, we need to introduce several functions that relate to the limit of $$\tilde \pi_{\varepsilon,tv}(t',v)$$. For a rank-$$p$$$$d\times p$$ matrix $$A$$, a rank-$$d$$$$d\times d$$ matrix $$B$$ and a $$d$$-dimensional vector $$c$$, define $$g(v;A,B,c)=\exp[-(c+Bv)^{{ \mathrm{\scriptscriptstyle T} }}\{I-A(A^{{ \mathrm{\scriptscriptstyle T} }}A)^{-1}A^{{ \mathrm{\scriptscriptstyle T} }}\}(c+Bv)/2]/(2\pi)^{(d-p)/2}$$. Let \[ g_{n}(t,v)=\begin{cases} N\left\{ Ds(\theta_{0})t;a_{n}\varepsilon_{n}v+A(\theta_{0})^{1/2}W_{{\rm obs}},A(\theta_{0})\right\} K(v), & \ c_{\varepsilon}<\infty,\\[4pt] N\left\{ Ds(\theta_{0})t;v+\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}},\frac{1}{a_{n}^{2}\varepsilon_{n}^{2}}A(\theta_{0})\right\} K(v), & \ c_{\varepsilon}=\infty, \end{cases} \] let $$G_{n}(v)$$ be $$g\{v;A(\theta_{0})^{-1/2}Ds(\theta_{0}),a_{n}\varepsilon_{n}A(\theta_{0})^{-1/2},W_{{\rm obs}}\}K(v)$$, and let $$E_{G_{n}}(\cdot)$$ be the expectation under the density $$G_{n}(v)^{({\rm norm})}$$. In both cases it is straightforward to show that $$\int_{\mathbb{R}^{p}}g_{n}(t,v)\,{\rm d}t=|A(\theta_{0})|^{-1/2}G_{n}(v)$$. Additionally, for the case $$c_{\varepsilon}=\infty$$, define $$v'(v,t)=A(\theta_{0})^{1/2}W_{{\rm obs}}+a_{n}\varepsilon_{n}v-a_{n}\varepsilon_{n}Ds(\theta_{0})t$$ and \begin{align*} g'_{n}(t,v')=N\{v';0,A(\theta_{0})\}K\left\{ Ds(\theta_{0})t+\frac{1}{a_{n}\varepsilon_{n}}v'-\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}}\right\} \text{.} \end{align*} Then, with the transformation $$v'=v'(v,t)$$, $$g_{n}(t,v)\,{\rm d}v=g_{n}'(t,v')\,{\rm d}v'$$. Let \begin{align*} g(t,v) & =\begin{cases} N\{Ds(\theta_{0})t;c_{\varepsilon}v+A(\theta_{0})^{1/2}Z,A(\theta_{0})\}K(v), & \ c_{\varepsilon}<\infty,\\ K\{Ds(\theta_{0})t\}N\{v;0,A(\theta_{0})\}, & \ c_{\varepsilon}=\infty, \end{cases} \end{align*} let $$G(v)$$ be $$g\{v;A(\theta_{0})^{-1/2}Ds(\theta_{0}),c_{\varepsilon}A(\theta_{0})^{-1/2},Z\}K(v)$$ and let $$E_{G}(\cdot)$$ be the expectation under the density $$G(v)^{({\rm norm})}$$. When $$c_{\varepsilon}<\infty$$, $$\int_{\mathbb{R}^{p}}g(t,v)\,{\rm d}t=|A(\theta_{0})|^{-1/2}G(v)$$. Expansions of $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \pi_{\varepsilon,tv}(t,v)\,{\rm d}t\,{\rm d}v$$ are given in the following lemma. Lemma A2. Assume Conditions 1–3. If $$\varepsilon_{n}=o(a_{n}^{-1/2})$$, then $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\left|\tilde \pi_{\varepsilon,tv}(t,v)-\pi(\theta_{0})g_{n}(t,v)\right|\,{\rm d}t\,{\rm d}v\rightarrow0$$ in probability and $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}g_{n}(t,v)\,{\rm d}t\,{\rm d}v=\Theta_{\rm p}(1)$$ as $$n\rightarrow\infty$$. Furthermore, for $$l\leqslant6$$, $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}P_{l}(v)g_{n}(t,v)\,{\rm d}t\,{\rm d}v$$ converges to $$|A(\theta_{0})|^{-1/2}\int_{\mathbb{R}^{d}}P_{l}(v)G(v)\,{\rm d}v$$ in distribution when $$c_{\varepsilon}<\infty$$ and converges to $$\int_{\mathbb{R}^{p}}P_{l}\{Ds(\theta_{0})t\}K\{Ds(\theta_{0})t\}\,{\rm d}t$$ in probability when $$c_{\varepsilon}=\infty$$, as $$n\rightarrow\infty$$. The following lemma states that $$\Pi_{\varepsilon}$$ and $$\tilde \Pi_{\varepsilon}$$ are asymptotically the same and gives an expansion of $$\tilde \theta_{\varepsilon}$$. Lemma A3. Assume Conditions 1–4. If $$\varepsilon_{n}=o(a_{n}^{-1/2})$$, then: (a) for any $$\delta<\delta_{0}$$, $$\Pi_{\varepsilon}(\theta\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ and $$\tilde \Pi_{\varepsilon}(\theta\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ are $$o_{\rm p}(1)$$; (b) there exists a $$\delta<\delta_0$$ such that $$\sup_{A\in\mathscr{B}^{p}}|\Pi_{\varepsilon}(\theta\in A\cap B_{\delta}\mid s_{{\rm obs}})-\tilde \Pi_{\varepsilon}(\theta\in A\cap B_{\delta}\mid s_{{\rm obs}})|=o_{\rm p}(1)$$; (c) if, in addition, Condition 5 holds, then $$a_{n,\varepsilon}(\theta_{\varepsilon}-\tilde \theta_{\varepsilon})=o_{\rm p}(1)$$, and $$\tilde \theta_{\varepsilon}=\theta_{0}+a_{n}^{-1}\beta_{0}A(\theta_{0})^{1/2}W_{{\rm obs}}+\varepsilon_{n}\beta_{0}E_{G_{n}}(v)+r_{n,1}$$ where the remainder $$r_{n,1}=o_{\rm p}(a_{n}^{-1})$$. Proof of Proposition 1. Lemma 5 in the Supplementary Material shows that $$\Pi_{\varepsilon}\{a_{n,\varepsilon}(\theta-\theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ and $$\tilde \Pi_{\varepsilon}\{a_{n,\varepsilon}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ have the same limit, in distribution when $$c_{\varepsilon}\in(0,\infty)$$ and in total variation form when $$c_{\varepsilon}=0$$ or $$\infty$$. Therefore it is sufficient to only consider the convergence of $$\tilde \Pi_{\varepsilon}$$ of the properly scaled and centred $$\theta$$. When $$a_{n}\varepsilon_{n}\rightarrow c_{\varepsilon}<\infty$$, according to (A1), $$\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ equals \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+\varepsilon_{n}v\}\tilde \pi_{\varepsilon,tv}(t',v)^{({\rm norm})}\:{\rm d}t'{\rm d}v\text{.} \] By Lemmas A1 and A3(c), we have \begin{align*} \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi\{a_{n}(\theta-\theta_{0})\in A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+\varepsilon_{n}v\}-\int_{A}N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \end{align*} where $$\mu_{n}(v)=\beta_{0}\{c_{\varepsilon}v-a_{n}\varepsilon_{n}E_{G_{n}}(v)\}-a_{n}r_{n,1}$$. Then with Lemma A2, the leading term of $$\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon}) \in A\mid s_{{\rm obs}}\}$$ equals \begin{align} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon}) \in A \mid s_{{\rm obs}}\}-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\,{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1)\text{.} \end{align} (A2) The numerator of the leading term of (A2) is in the form \begin{align*} \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;c_{\varepsilon}\beta_{0}v+x_{3},I(\theta_{0})^{-1}\}N\{Ds(\theta_{0})t';x_{1}v+x_{2},A(\theta_{0})\}K(v)\,{\rm d}t\,{\rm d}t'{\rm d}v, \end{align*} where $$x_{1}\in\mathbb{R}$$, $$x_{2}\in\mathbb{R}^{d}$$ and $$x_{3}\in\mathbb{R}^{p}$$. This is continuous by Lemma 4 in the Supplementary Material. Then since $$E_{G_{n}}(v)\rightarrow E_{G}(v)$$ in distribution as $$n\rightarrow\infty$$ by Lemma A2, we have \begin{eqnarray*} \int_{\mathbb{R}^{d}}\! \int_{t(B_{\delta})}\! \int_{A} \! N\{t;\mu_{n}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v \rightarrow\int_{\mathbb{R}^{d}}\! \int_{\mathbb{R}^{p}}\! \int_{A}\! N[t;c_{\varepsilon}\beta_{0}\{v-E_{G}(v)\},I(\theta_{0})^{-1}]g(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v \end{eqnarray*} in distribution as $$n\rightarrow\infty$$. Putting the above results together, it follows that \[ \tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\rightarrow\int_{A}\int_{\mathbb{R}^{p}}N[t;c_{\varepsilon}\beta_{0}\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)^{({\rm norm})}\,{\rm d}v\,{\rm d}t \] in distribution, and statement (ii) of the proposition holds. When $$c_{\varepsilon}=0$$, since $$\mu_{n}(v)$$ does not depend on $$v$$, (A2) becomes \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;-a_{n}\varepsilon_{n}\beta_{0}E_{G_{n}}(v)-a_{n}r_{n,1},I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \] and by the continuous mapping theorem (van der Vaart, 2000), \begin{align*} & \int_{\mathbb{R}^{p}}\left|N\{t;-a_{n}\varepsilon_{n}\beta_{0}E_{G_{n}}(v)-a_{n}r_{n,1},I(\theta_{0})^{-1}\}-N\{t;0,I(\theta_{0})^{-1}\}\right|\,{\rm d}t=o_{\rm p}(1)\text{.} \end{align*} Therefore $$\sup_{A\in\mathscr{B}^{p}}|\tilde \Pi_{\varepsilon}\{a_{n}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t|=o_{\rm p}(1)$$, and statement (i) of the proposition holds. When $$c_{\varepsilon}=\infty$$, Lemma A1 cannot be applied to the posterior distribution within (A1) directly. With the transformation $$v''=a_{n}\varepsilon_{n}v$$, $$\tilde \Pi_{\varepsilon}\{\theta\in A\mid s_{{\rm obs}}\}$$ equals \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A\mid s_{{\rm obs}}+a_{n}^{-1}v'')\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v'')^{({\rm norm})}\,{\rm d}t'{\rm d}v'', \] which implies that $$\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}$$ equals \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi\{a_{n}(\theta-\theta_{0})\in a_{n}\varepsilon_{n}A+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\mid s_{{\rm obs}}+a_{n}^{-1}v''\}\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v'')^{({\rm norm})}\:{\rm d}t'{\rm d}v''\text{.} \end{align*} Then Lemma A1 can be applied. Using Lemma A2 and transforming $$v''$$ back to $$v$$, we have \begin{align*} & \sup_{A\in\mathscr{B}^{p}}\left|\vphantom{\int_{\mathbb{R}^{d}}}\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}\right.\nonumber \\ &\quad \left.-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}t;\mu_{n}'(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\:{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1), \end{align*} where $$\mu_{n}'(v)=\beta_{0}\{A(\theta_{0})^{1/2}W_{{\rm obs}}+a_{n}\varepsilon_{n}v\}-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})$$. Let $$t''(t,t')=a_{n}\varepsilon_{n}(t-t')+a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})$$ and let $$t''(A,B_{\delta})$$ be the set $$\{t''(t,t'):t\in A,t'\in t(B_{\delta})\}$$. With the transformations $$v'=v'(v,t')$$ and $$t''=t''(t,t')$$, since $$\beta_{0}Ds(\theta_{0})=I_{p}$$, we have \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}t;\mu_{n}'(v),I(\theta_{0})^{-1}\}g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v\\ &\quad =\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}(a_{n}\varepsilon_{n})^{p}N\{a_{n}\varepsilon_{n}(t-t');\beta_{0}v'-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0}),I(\theta_{0})^{-1}\}g_{n}'(t',v')\,{\rm d}t\,{\rm d}t'{\rm d}v'\\ &\quad =\int_{\mathbb{R}^{d}}\int_{t''(A,B_{\delta})}\int_{A}N\{t'';\beta_{0}v'-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0}),I(\theta_{0})^{-1}\}g'_{n}\left[t-\frac{1}{a_{n}\varepsilon_{n}}\{t''-a_{n}(\tilde \theta_{\varepsilon}-\theta_{0})\},v'\right]\,{\rm d}t\,{\rm d}t''{\rm d}v'\text{.} \end{align*} The idea now is that as $$n\rightarrow \infty$$, $$a_n\varepsilon_n\rightarrow\infty$$, so the $$g'_n$$ term in the integral will tend to $$g'_n(t,v')$$. Then, by integrating first with respect to $$t''$$ and then with respect to $$v$$, we get the required result. To make this argument rigorous, consider the following function: $$\matrix{ {\matrix{ {\int_{{\mathbb{R}}} {\int_{t''(A,{B_\delta })} {\int_A N } } \{ t'';{\beta _0}v',I{{({\theta _0})}^{ - 1}}\} N\{ v';0,A({\theta _0})\} } \hfill \cr { \times \left| {K\{ Ds({\theta _0})t + {x_1}v' - {x_2}t'' + {x_3}\} - K\{ Ds({\theta _0})t\} } \right|{\mkern 1mu} {\rm{d}}t{\mkern 1mu} {\rm{d}}t''{\rm{d}}v',} \hfill \cr } } \hfill \cr } $$ where $$x_{1}\in\mathbb{R}$$, $$x_{2}\in\mathbb{R}$$ and $$x_{3}\in\mathbb{R}^{d}$$. This is continuous by Lemma 4 in the Supplementary Material, so by the continuous mapping theorem, \begin{align*} \sup_{A\in\mathscr{B}^{p}}\left|\int_{\mathbb{R}^{d}}\int_{t''(A,B_{\delta})}\int_{A}N\{t'';\beta_{0}v',I(\theta_{0})^{-1}\}g'_{n}\left(t-\frac{1}{a_{n}\varepsilon_{n}}t'',v'\right)\,{\rm d}t\,{\rm d}t''{\rm d}v'-\int_{A}K\{Ds(\theta_{0})t\}\,{\rm d}t\right| & =o_{\rm p}(1)\text{.} \end{align*} Then, using Lemma A2, \[ \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{\varepsilon_{n}^{-1}(\theta-\tilde \theta_{\varepsilon})\in A\mid s_{{\rm obs}}\}-\int_{A}K\{Ds(\theta_{0})t\}\, {\rm d}t/\int_{\mathbb{R}^{p}}K\{Ds(\theta_{0})t\}\,{\rm d}t\right|=o_{\rm p}(1)\text{.} \] Therefore statement (iii) of the proposition holds. □ Proof of the result from § 3.2 The intuition behind this result is that, as shown above, the joint posterior of $$(\theta,s)$$ under approximate Bayesian computation can be viewed as a marginal distribution for $$s$$ times a conditional for $$\theta$$ given $$s$$. The latter is just the true posterior for $$\theta$$ given $$s$$, and this posterior converges to a Gaussian limit that depends on $$s$$ only through its mean. Regression adjustment works because it corrects for the dependence of this mean on $$s$$, so if we work with the regression-adjusted parameter $$\theta^*$$ then the conditional distribution for $$\theta^*$$ given $$s$$ will tend to a Gaussian limit whose mean is the same for all $$s$$. The following lemma gives an expansion of the optimal linear coefficient matrix $$\beta_{\varepsilon}$$. The order of the remainder leads to successful removal of the first-order bias in the posterior distribution of approximate Bayesian computation. Lemma A4. Assume Conditions 1–6. Then if $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}-\beta_{0})=o_{\rm p}(1)$$. The following lemma, similar to Lemma A3, says that the approximate Bayesian computation posterior distribution of $$\theta^{*}$$ is asymptotically the same as $$\tilde \Pi_{\varepsilon}$$. Recall that $$\theta_{\varepsilon}^{*}$$ is the mean of $$\pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})$$. Let $$\tilde \pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})=\int_{\mathbb{R}^{d}}\tilde \pi_{\varepsilon}\{\theta^{*}+\beta_{\varepsilon}(s-s_{{\rm obs}}),s\mid s_{{\rm obs}}\}\,{\rm d}s$$ and let $$\tilde \theta_{\varepsilon}^{*}$$ be the mean of $$\tilde \pi_{\varepsilon}^{*}(\theta^{*}\mid s_{{\rm obs}})$$. Lemma A5. Assume Conditions 1–6. If $$\varepsilon_{n}=o(a_{n}^{-3/5})$$, then: (a) for any $$\delta<\delta_{0}$$, $$\Pi_{\varepsilon}(\theta^{*}\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ and $$\tilde \Pi_{\varepsilon}(\theta^{*}\in B_{\delta}^{\rm c}\mid s_{{\rm obs}})$$ are $$o_{\rm p}(1)$$; (b) there exists $$\delta<\delta_0$$ such that $$\sup_{A\in\mathscr{B}^{p}}|\Pi_{\varepsilon}(\theta^{*}\,{\in}\, A\cap B_{\delta}\mid s_{{\rm obs}})\,{-}\,\tilde \Pi_{\varepsilon}(\theta^{*}\,{\in}\, A\cap B_{\delta}\mid s_{{\rm obs}})|=o_{\rm p}(1)$$; (c) $$a_{n}(\theta_{\varepsilon}^{*}-\tilde \theta_{\varepsilon}^{*})=o_{\rm p}(1)$$, and $$\tilde \theta_{\varepsilon}^{*}=\theta_{0}+a_{n}^{-1}\beta_{0}A(\theta_{0})^{1/2}W_{{\rm obs}}+\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})E_{G_{n}}(v)+r_{n,2}$$ where the remainder $$r_{n,2}=o_{\rm p}(a_{n}^{-1})$$. Proof of Theorem 2. Similar to the proof of Proposition 1, it is sufficient to only consider the convergence of $$\tilde \Pi_{\varepsilon}$$ of the properly scaled and centred $$\theta^*$$. Similar to (A1), \begin{align*} \tilde \Pi_{\varepsilon}(\theta^{*}\in A\mid s_{{\rm obs}}) & =\begin{cases} \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A+\varepsilon_{n}\beta_{\varepsilon}v\mid s_{{\rm obs}}+\varepsilon_{n}v)\tilde \pi_{\varepsilon,tv}(t',v)^{({\rm norm})}\,{\rm d}t'{\rm d}v, & \ c_{\varepsilon}<\infty,\\ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\tilde \Pi(\theta\in A+\varepsilon_{n}\beta_{\varepsilon}v\mid s_{{\rm obs}}+a_{n}^{-1}v)\tilde \pi_{\varepsilon,tv}(t',a_{n}^{-1}\varepsilon_{n}^{-1}v)^{({\rm norm})}\,{\rm d}t'{\rm d}v, & \ c_{\varepsilon}=\infty\text{.} \end{cases} \end{align*} Similar to (A2), by Lemmas A1 and A5(c), we have \begin{align} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\mu_{n}^{*}(v),I(\theta_{0})^{-1}\}g_{n}(t',v)^{({\rm norm})}\,{\rm d}t\,{\rm d}t'{\rm d}v\right|=o_{\rm p}(1), \end{align} (A3) where $$\mu_{n}^{*}(v)=\{a_{n}\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})+(c_{\varepsilon}-a_{n}\varepsilon_{n})\beta_{0}{\rm 1}\kern-0.24em{\rm I}_{c_{\varepsilon}<\infty}\}\{v-E_{G_{n}}(v)\}+a_{n}r_{n,2}$$. Since $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}-\beta_{0})=o_{\rm p}(1)$$, by Lemma 4 in the Supplementary Material and the continuous mapping theorem, \[ \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{\mathbb{R}^{p}}\left|N\{t;\mu_{n}^{*}(v),I(\theta_{0})^{-1}\}-N\{t;0,I(\theta_{0})^{-1}\}\right|g_{n}(t',v)\,{\rm d}t\,{\rm d}t'{\rm d}v=o_{\rm p}(1)\text{.} \] Then we have \begin{align*} & \sup_{A\in\mathscr{B}^{p}}\left|\tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\}-\int_{A}N\{t;0,I(\theta_{0})^{-1}\}\,{\rm d}t\right|=o_{\rm p}(1), \end{align*} and the first convergence in the theorem holds. The second convergence in the theorem holds by Lemma A5(c). Since the only requirement for $$\beta_{\varepsilon}$$ in the above is $$a_{n}\varepsilon_{n}(\beta_{\varepsilon}^{'}-\beta_{\varepsilon})=o_{\rm p}(1)$$, the above arguments will hold if $$\beta_{\varepsilon}$$ is replaced by a $$p\times d$$ matrix $$\hat \beta _{\varepsilon}$$ satisfying $$a_{n}\varepsilon_{n}(\hat \beta _{\varepsilon}-\beta_{\varepsilon})=o_{\rm p}(1)$$. □ Proof of Proposition 3. Consider $$\theta^{*}$$ where $$\beta_{\varepsilon}$$ is replaced by $$\hat \beta _{\varepsilon}$$. Let $$\eta_{n}=N^{1/2}a_{n}\varepsilon_{n}(\hat \beta _{\varepsilon}-\beta_{\varepsilon})$$. Since $$\hat \beta _{\varepsilon}-\beta_{\varepsilon}=O_{\rm p}\{(a_{n}\varepsilon_{n})^{-1}N^{-1/2}\}$$ as $$n\rightarrow\infty$$, $$\eta_{n}=O_{\rm p}(1)$$ as $$n\rightarrow\infty$$ and let its limit be $$\eta$$. In this case, if we replace $$\mu_{n}^{*}(v)$$ in (A3) with $$\mu_{n}^{*}(v)+N^{-1/2}\eta_{n}\{v-E_{G_{n}}(v)\}$$, denoted by $$\hat \mu_{n}^{*}(v)$$, the equation still holds. Denote this equation by (A4). Limits of the leading term in (A4) can be obtained by arguments similar to those for (A2). When $$c_{\varepsilon}<\infty$$, since for fixed $$v$$, $$\hat \mu_{n}^{*}(v)$$ converges to $$N^{-1/2}\eta\{v-E_{G}(v)\}$$ in distribution, by following the same line of argument we have \begin{align*} \tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\} & \rightarrow\int_{A}\int_{\mathbb{R}^{p}}N[t;N^{-1/2}\eta\{v-E_{G}(v)\},I(\theta_{0})^{-1}]G(v)^{({\rm norm})}\,{\rm d}v\,{\rm d}t \end{align*} in distribution as $$n\rightarrow\infty$$. When $$c_{\varepsilon}=\infty$$, by Lemma A2 we have $$E_{G_{n}}(v)\rightarrow0$$ in probability, and $$\int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}g_{n}(t',v)\,{\rm d}t'{\rm d}v\rightarrow\int_{\mathbb{R}^{p}}K\{Ds(\theta_{0})t\}\,{\rm d}t$$ in probability. Then with the transformation $$v'=v'(v,t')$$, for fixed $$v$$, \begin{align*} \hat \mu_{n}^{*}(v) & =\{a_{n}\varepsilon_{n}(\beta_{0}-\beta_{\varepsilon})+N^{-1/2}\eta_{n}\}\left\{ Ds(\theta_{0})t'+\frac{1}{a_{n}\varepsilon_{n}}v'-\frac{1}{a_{n}\varepsilon_{n}}A(\theta_{0})^{1/2}W_{{\rm obs}}-E_{G_{n}(v)}\right\} \\ & \rightarrow N^{-1/2}\eta Ds(\theta_{0})t' \end{align*} in distribution as $$n\rightarrow\infty$$. Recall that $$g_{n}(t',v)\,{\rm d}v=g_{n}'(t',v')\,{\rm d}v'$$. Then by Lemma 4 in the Supplementary Material and the continuous mapping theorem, \begin{align*} & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;\hat \mu_{n}^{*}(v),I(\theta_{0})^{-1}\}g_{n}'(t',v')\,{\rm d}t\,{\rm d}t'{\rm d}v\\ \rightarrow & \int_{\mathbb{R}^{d}}\int_{t(B_{\delta})}\int_{A}N\{t;N^{-1/2}\eta Ds(\theta_{0})t',I(\theta_{0})^{-1}\}K\{Ds(\theta_{0})t'\}\,{\rm d}t\,{\rm d}t'{\rm d}v \end{align*} in distribution as $$n\rightarrow\infty$$. Therefore by (A4) and the above convergence results, \begin{align*} \tilde \Pi_{\varepsilon}\{a_{n}(\theta^{*}-\tilde \theta_{\varepsilon}^{*})\in A\mid s_{{\rm obs}}\} & \rightarrow\int_{A}\int_{\mathbb{R}^{p}}N\{t;N^{-1/2}\eta Ds(\theta_{0})t',I(\theta_{0})^{-1}\}K\{Ds(\theta_{0})t'\}^{({\rm norm})}\,{\rm d}t'{\rm d}t \end{align*} in distribution as $$n\rightarrow\infty$$. □ References Beaumont M. A. ( 2010 ). Approximate Bayesian computation in evolution and ecology . Ann. Rev. Ecol. Evol. Syst. 41 , 379 – 406 . Google Scholar CrossRef Search ADS Beaumont M. A. , Cornuet J.-M. , Marin J.-M. & Robert C. P. ( 2009 ). Adaptive approximate Bayesian computation . Biometrika 96 , 983 – 90 . Google Scholar CrossRef Search ADS Beaumont M. A. , Zhang W. & Balding D. J. ( 2002 ). Approximate Bayesian computation in population genetics . Genetics 162 , 2025 – 35 . Google Scholar PubMed Blum M. G. ( 2010 ). Approximate Bayesian computation: A nonparametric perspective . J. Am. Statist. Assoc. 105 , 1178 – 87 . Google Scholar CrossRef Search ADS Bonassi F. V. & West M. ( 2015 ). Sequential Monte Carlo with adaptive weights for approximate Bayesian computation . Bayesian Anal. 10 , 171 – 87 . Google Scholar CrossRef Search ADS Calvet L. E. & Czellar V. ( 2015 ). Accurate methods for approximate Bayesian computation filtering . J. Finan. Economet. 13 , 798 – 838 . Google Scholar CrossRef Search ADS Chernozhukov V. & Hong H. ( 2003 ). An MCMC approach to classical estimation . J. Economet. 115 , 293 – 346 . Google Scholar CrossRef Search ADS Cornuet J.-M. , Santos F. , Beaumont M. A. , Robert C. P. , Marin J.-M. , Balding D. J. , Guillemaud T. & Estoup A. ( 2008 ). Inferring population history with DIY ABC: A user-friendly approach to approximate Bayesian computation . Bioinformatics 24 , 2713 – 9 . Google Scholar CrossRef Search ADS PubMed Drovandi C. C. & Pettitt A. N. ( 2011 ). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation . Biometrics 67 , 225 – 33 . Google Scholar CrossRef Search ADS PubMed Drovandi C. C. , Pettitt A. N. & Lee A. ( 2015 ). Bayesian indirect inference using a parametric auxiliary model . Statist. Sci. 30 , 72 – 95 . Google Scholar CrossRef Search ADS Duffie D. & Singleton K. J. ( 1993 ). Simulated moments estimation of Markov models of asset prices . Econometrica 61 , 929 – 52 . Google Scholar CrossRef Search ADS Fearnhead P. & Prangle D. ( 2012 ). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation (with Discussion) . J. R. Statist. Soc. B 74 , 419 – 74 . Google Scholar CrossRef Search ADS Filippi S. , Barnes C. P. , Cornebise J. & Stumpf M. P. ( 2013 ). On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo . Statist. Applic. Genet. Molec. Biol. 12 , 87 – 107 . Frazier D. T. , Martin G. M. , Robert C. P. & Rousseau J. ( 2016 ). Asymptotic properties of approximate Bayesian computation . arXiv:1607.06903. Gouriéroux, C. & Ronchetti E. ( 1993 ). Indirect inference . J. Appl. Economet. 8 , s 85 – 118 . Google Scholar CrossRef Search ADS Heggland K. & Frigessi A. ( 2004 ). Estimating functions in indirect inference . J. R. Statist. Soc. B 66 , 447 – 62 . Google Scholar CrossRef Search ADS Kleijn B. & van der Vaart A. ( 2012 ). The Bernstein-von-Mises theorem under misspecification . Electron. J. Statist. 6 , 354 – 81 . Google Scholar CrossRef Search ADS Lenormand M. , Jabot F. & Deffuant G. ( 2013 ). Adaptive approximate Bayesian computation for complex models . Comp. Statist. 28 , 2777 – 96 . Google Scholar CrossRef Search ADS Li W. & Fearnhead P. ( 2018 ). On the asymptotic efficiency of approximate Bayesian computation estimators . Biometrika 105 , https://doi.org/10.1093/biomet/asx078 . Marin J.-M. , Pillai N. S. , Robert C. P. & Rousseau J. ( 2014 ). Relevant statistics for Bayesian model choice . J. R. Statist. Soc. B 76 , 833 – 59 . Google Scholar CrossRef Search ADS Marjoram P. , Molitor J. , Plagnol V. & Tavare S. ( 2003 ). Markov chain Monte Carlo without likelihoods . Proc. Nat. Acad. Sci. 100 , 15324 – 8 . Google Scholar CrossRef Search ADS Pauli F. , Racugno W. & Ventura L. ( 2011 ). Bayesian composite marginal likelihoods . Statist. Sinica 21 , 149 – 64 . Ribatet M. , Cooley D. & Davison A. C. ( 2012 ). Bayesian inference from composite likelihoods, with an application to spatial extremes . Statist. Sinica 22 , 813 – 45 . Ruli E. , Sartori N. & Ventura L. ( 2016 ). Approximate Bayesian computation with composite score functions . Statist. Comp. 26 , 679 – 92 . Google Scholar CrossRef Search ADS Soubeyrand S. & Haon-Lasportes E. ( 2015 ). Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC . Statist. Prob. Lett. 107 , 84 – 92 . Google Scholar CrossRef Search ADS Toni T. , Welch D. , Strelkowa N. , Ipsen A. & Stumpf M. P. ( 2009 ). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems . J. R. Soc. Interface 6 , 187 – 202 . Google Scholar CrossRef Search ADS PubMed van der Vaart A. W. ( 2000 ). Asymptotic Statistics . Cambridge : Cambridge University Press . Varin C. , Reid N. & Firth D. ( 2011 ). An overview of composite likelihood methods . Statist. Sinica 21 , 5 – 42 . Wegmann D. , Leuenberger C. & Excoffier L. ( 2009 ). Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood . Genetics 182 , 1207 – 18 . Google Scholar CrossRef Search ADS PubMed Wood S. N. ( 2010 ). Statistical inference for noisy nonlinear ecological dynamic systems . Nature 466 , 1102 – 4 . Google Scholar CrossRef Search ADS PubMed © 2018 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BiometrikaOxford University Press

Published: Jan 27, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off