A dust spectral energy distribution model with hierarchical Bayesian inference – I. Formalism and benchmarking

A dust spectral energy distribution model with hierarchical Bayesian inference – I. Formalism... Abstract This article presents a new dust spectral energy distribution (SED) model, named HerBIE, aimed at eliminating the noise-induced correlations and large scatter obtained when performing least-squares fits. The originality of this code is to apply the hierarchical Bayesian approach to full dust models, including realistic optical properties, stochastic heating, and the mixing of physical conditions in the observed regions. We test the performances of our model by applying it to synthetic observations. We explore the impact on the recovered parameters of several effects: signal-to-noise ratio, SED shape, sample size, the presence of intrinsic correlations, the wavelength coverage, and the use of different SED model components. We show that this method is very efficient: the recovered parameters are consistently distributed around their true values. We do not find any clear bias, even for the most degenerate parameters, or with extreme signal-to-noise ratios. methods: data analysis, methods: numerical, methods: statistical, dust, extinction, infrared: galaxies, infrared: ISM 1 INTRODUCTION Dust grains are an important, but elusive, component of the interstellar medium (ISM). Their ubiquity makes them a potential asset to understanding galaxy evolution, but their inherent complexity impedes our ability to use them to their full potential. As a diagnostic tool, dust can be used to estimate gas masses, without suffering from the phase-bias that gas lines are subject to. For this reason, dust is a popular dark gas tracer (e.g. Grenier, Casandjian & Terrier 2005; Leroy et al. 2011; Planck Collaboration XIX 2011c). However, our poor knowledge of the grain properties puts caution on the accuracy of these measurements. It is true that the grain cross-section per Hydrogen atom can be empirically calibrated in the ultraviolet (UV; e.g. Fitzpatrick & Massa 2007), for extinction studies, or in the infrared (IR; e.g. Planck Collaboration XVII 2014a), for emission studies. The problem is that these empirical laws lack an underlying physical model describing how dust properties vary in different environments. As a modelling ingredient, dust provides the major source of heating of the gas in photodissociation regions (PDRs; e.g. Hollenbach & Tielens 1997), through the photoelectric effect1 (Bakes & Tielens 1994; Weingartner & Draine 2001; Kimura 2016). The main discrepancies between different PDR codes actually originate in the diversity of assumptions about the grain physics (Röllig et al. 2007). In addition, grains are the catalysts for numerous chemical reactions, including the formation of H2, the most abundant molecule in the Universe. Accounting for the stochastic temperature fluctuations of small grains has led to a significant revision of H2 formation rates (Le Bourlot et al. 2012; Bron, Le Bourlot & Le Petit 2014). Overall, detailed modelling of the PDR and molecular lines in star-forming regions shows a qualitative agreement between the derived dust and gas physical conditions (e.g. Wu et al. 2015; Chevance et al. 2016), although the quantitative comparison can be more difficult to assess (Wu et al., in preparation). The current census on interstellar dust relies mainly on observations of the Galaxy. Most dust models are designed to reproduce the extinction, emission, and depletion pattern of the diffuse ISM (Zubko, Dwek & Arendt 2004; Draine & Li 2007; Compiègne et al. 2011; Jones et al. 2017). Some also take into account polarization constraints (Siebenmorgen, Voshchinnikov & Bagnulo 2014; Guillet et al. 2017). The dust properties in other environments are more sketchy. For instance, there is clear evidence that the grain far-IR/submm emissivity increases by a factor of ≃2–4, from the diffuse ISM to denser environments (e.g. Stepnik et al. 2003; Planck Collaboration XXIV 2011a; Roy et al. 2013). This increase can be explained by mantle accretion and coagulation (Köhler et al. 2012; Köhler, Ysard & Jones 2015). However, since the accretion of mantles in dense regions and their recycling back in the diffuse ISM is a hysteresis process (Jones et al. 2017), parametrizing the dust properties as a sole function of the density of the medium, n, and the UV flux, G0, may be too simplistic. Finally, the grain properties of other galaxies can exhibit significant differences (e.g. in the Magellanic clouds: Gordon et al. 2003; Galliano et al. 2011; Galametz et al. 2016; Chastenet et al. 2017). In particular, the evolution of the dust-to-gas mass ratio with metallicity is a subject of debate (Lisenfeld & Ferrara 1998; Draine et al. 2007; Galliano, Dwek & Chanial 2008a; Galametz et al. 2011; Rémy-Ruyer et al. 2014; De Vis et al. 2017). It appears to be non-linear in nearby galaxies (Rémy-Ruyer et al. 2014), consistent with theoretical models (Asano et al. 2013; Zhukovska 2014). On the contrary, studies in absorption on more distant systems suggest a constant dust-to-metal mass ratio down to extremely low-metallicity systems (De Cia et al. 2013; Zafar & Watson 2013). One of the ways to tackle these open questions consists in studying how the observed grain properties vary with the physical conditions they are experiencing, in a large range of environments. A fundamental step in this process is the accurate retrieval of the parameters and their intercorrelations. The recent IR/submm space observatories Spitzer, Akari, Herschel, and Planck have provided us with invaluable data on the spectral energy distributions (SEDs) of a wide variety of objects. However, despite a wealth of good quality observations, with complete spectral coverage, and calibration uncertainties down to only a few per cent (Planck Collaboration VIII 2014b), we are still facing limitations using basic analysis techniques. Among these limitations, there is a series of noise-induced false correlations between derived parameters, when performing least-squares fits (hereafter χ2; Shetty et al. 2009). A very efficient way to treat these degeneracies was proposed by Kelly et al. (2012). These authors designed a hierarchical Bayesian (hereafter HB) model, accounting for both noise and correlated calibration uncertainties. We will explain in detail, in the present article, how an HB model works. For now, we just need to note that it deals with two classes of parameters: (i) the dust parameters of each source (mass, temperature, etc.); and (ii) a set of hyperparameters controlling the distribution of these dust parameters. It is called a multilevel approach, for that reason. A least-squares fit is a single-level model, as it only deals with the dust parameters. In a single-level approach, the parameter's probability distribution of each source is independent of the other sources in the sample. Introducing hyperparameters allows one to sample a single large probability distribution for the parameters of all the sources. This way, the information about the distribution of parameters, among the different sources, has an impact on the likelihoods of individual sources. Kelly et al. (2012) were able to show that the HB approach could correct the false negative correlation between the temperature, T, and the emissivity index, β, of modified black bodies (hereafter MBBs). We emphasize that it is the multilevel nature of the HB approach that could lead to such an improvement. Indeed, several non-HB codes had been developed to interpret dust SEDs (da Cunha, Charlot & Elbaz 2008; Paradis et al. 2010; Planck Collaboration XVI 2011b; Serra et al. 2011), but those could not treat the degeneracies, due to their single-level nature. Following Kelly et al. (2012), several articles presented HB codes for single MBBs (Juvela et al. 2013), or the linear combination of two MBBs, and MBBs with parametrized emissivity (Veneziani et al. 2013). Until now, we were lacking an HB code working with full dust models, including (i) several types of grains (silicates and carbonaceous), (ii) with realistic optical properties, (iii) a size distribution, and (iv) accounting for the stochastic heating of small grains, (v) with the possibility to combine several components to account for the mixing of physical conditions in the observed regions. Such a model would allow us to apply state-of-the-art dust models to IR observations, extracting the physical parameters in an optimal way, consistent with the various known sources of uncertainties. This tool could provide unique constraints on dust evolution. This article discusses some efforts developed towards that goal. We present here a new HB dust SED model, named HerBIE (HiERarchical Bayesian Inference for dust Emission). The paper is ordered as follows. In Section 2, we present the different physical components that can be combined to fit the data. Section 3 attempts at giving a comprehensive view on the statistical hypotheses of the HB model and its numerical implementation. In Sections 4–6, we extensively discuss the tests we have conducted in order to assess the performances of the code. Section 4 presents a systematic analysis of the code performances, varying signal-to-noise ratios, SED shape, and sample size. Section 5 explores additional effects: wavelength coverage; the presence of intrinsic correlations; the addition of external parameters to the model. Section 6 briefly demonstrates the code performances using different physical models. Finally, Section 7 summarizes our results. We devote Appendices A–D to technical explanations that would otherwise alter the flow of the discussion. This is the first paper in a series of two articles. Paper II (Galliano, in preparation) will address the robustness of our model. 2 THE PHYSICAL MODEL 2.1 Notation conventions HerBIE is designed to solve a generic problem: the fit of a linear combination of SED model components to a set of n sources, si, observed through m frequencies, νj. The sources, si (i = 1, …, n), can be pixels in an image or individual objects (whole galaxies, star-forming regions, etc.) in a sample. The frequencies, νj (j = 1, …, m), can be photometric bands or frequency elements in a spectrum. The total physical model is controlled by q parameters, xk (k = 1, …, q). We note $${\boldsymbol y}$$, vectors of elements yi, and $$\mathcal {Y}$$, matrices of elements Yi, j. We note $$|\mathcal {Y}|$$ the determinant of matrix $$\mathcal {Y}$$. Probability density functions (PDFs) are written p(…); the name of the variable between parentheses defines the actual density we are referring to. Due to the inherent non-linearity of dust models, the derived PDF of several parameters is significantly skewed. Although this is natural, it makes quoting mean values and uncertainties complicated, as they do not coincide with the median nor the mode of the distribution. For these parameters, we rather consider the natural logarithm (noted $$\ln$$), in order to obtain a more symmetric distribution. We use the adjective specific to denote any quantity per unit mass. 2.2 Individual SED model components Fitting an observed SED requires choosing the number and the nature of the physical components. Each case has to be adapted, depending on the assumed properties of the studied object (e.g. diffuse cloud, H ii region, whole galaxy, etc.) and on the wavelength coverage. In this section, we describe the different components that our code currently implements. This list is meant to grow with future developments. 2.2.1 Equilibrium grains (BBQ) This component accounts for the emission of a collection of identical large grains at thermal equilibrium with a uniform radiation field. It is particularly useful to model H ii regions where strong solid-state emission bands can be observed (e.g. Wu et al., in preparation). This approximation would not be valid for smaller grains as they would be stochastically heated (e.g. Draine 2003). We designate this component as BBQ. The parameters controlling this component are simply the mass (Mi) and temperature (Ti): $${\boldsymbol x}_i=(\ln M_i,\ln T_i)$$. The monochromatic luminosity of source si, at frequency νj is expressed as   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = M_i\times \frac{3\pi }{a\rho }Q_{\rm abs}(a,\nu _j) \times B_\nu (T_i,\nu _j), \end{equation} (1)where a is the grain radius, Bν is the Planck function, and Qabs is the absorption efficiency of the chosen material, with mass density ρ. We have a large data base of optical properties to choose the Qabs from, depending on the spectral features we want to model. We arbitrarily set a = 30 nm,2 as Qabs/a is almost perfectly independent of a in the IR-mm range. 2.2.2 Modified blackbody (MBB) This component is similar to BBQ, in terms of physical assumptions. The only difference is that instead of considering a realistic Qabs, measured in the laboratory, we make the common assumption that it is a power law of the frequency: Qabs ∝ νβ, the emissivity index β being a free parameter. This approximation is the most widely used dust model, as it is supposed to provide constraints both on the physical conditions experienced by the grains (through the temperature) and on their intrinsic properties (through β). We designate this component as MBB. The parameters controlling this component are $${\boldsymbol x}_i=(\ln M_i,\ln T_i, \beta _i)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = M_i\times 4\pi \kappa (\nu _0) \left(\frac{\nu _j}{\nu _0}\right)^{\beta _i}\times B_\nu (T_i,\nu _j). \end{equation} (2)The opacity at reference wavelength λ0 = c/ν0 = 160 μm is fixed to κ(ν0) = 1.4 m2 kg− 1, as it corresponds to the Zubko et al. (2004, BARE-GR-S) Galactic dust model opacity. Although Hildebrand (1983) recommended normalizing the opacity in the submm regime, recent laboratory studies (e.g. Coupeaud et al. 2011) tend to show that dust analogues have less dispersion in the far-IR. This is the reason why we choose λ0 = 160 μm rather than 850 μm. 2.2.3 Broken emissivity modified blackbody (BEMBB) This component, introduced by Gordon et al. (2014), is a refinement of MBB, accounting for a possible change of the emissivity slope at long wavelengths, similar to what is observed in the laboratory (e.g. Coupeaud et al. 2011). There are now two emissivity indices: β1 for ν > νb and β2 for ν ≤ νb. In practice, this component could be used to fit an observed SED, finely sampled in the far-IR/submm range. We designate this component as BEMBB. The parameters controlling this component are $${\boldsymbol x}_i=(\ln M_i,\ln T_i, \beta _{1,i}, \beta _{2,i}, \nu _{b,i})$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = 4\pi \kappa (\nu _j,\beta _{1,i},\beta _{2,i},\nu _{b,i})\times B_\nu (\nu _j,T_i), \end{equation} (3)where the broken emissivity is parametrized as   \begin{eqnarray} \kappa (\nu ,\beta _{1,i},\beta _{2,i},\nu _{b,i}) {=}\left\{ \begin{array}{ll} \kappa (\nu _0)\left(\displaystyle\frac{\nu }{\nu _0} \right)^{\beta _{1,i}} & \nu > \nu _{b,i} \\ \kappa (\nu _0) \left(\displaystyle\frac{\nu }{\nu _0}\right)^{\beta _{2,i}} \left(\displaystyle\frac{\nu _{b,i}}{\nu _0}\right)^{\beta _{1,i}-\beta _{2,i}} & \nu \le \nu _{b,i}. \end{array}\right. \!\!\!\!\!\!\!\!\!\! \end{eqnarray} (4)Gordon et al. (2014) ‘calibrate’ the opacity at reference wavelength κ(ν0) on the diffuse ISM of the Milky Way. Here, we test this component with the same κ(ν0) value as the MBB, for simplicity. 2.2.4 Uniformly illuminated dust mixture (deltaU) This component represents a full ISM dust mixture, heated by a uniform interstellar radiation field with intensity U. This mixture is made of grains of different compositions (silicates, amorphous carbon, PAHs, etc.) each having a different size distribution. Several dust mixtures and their size distributions are implemented into our code: the BARE-GR-S model of Zubko et al. (2004), the model of Compiègne et al. (2011), the AC model of Galliano et al. (2011), and the THEMIS model (Jones et al. 2017). We assume that the radiation field has the spectral shape of the solar neighbourhood (with mean intensity noted $$J_\nu ^{\odot }(\nu )$$; Mathis, Mezger & Panagia 1983) scaled by the parameter U: $$J_\nu (\nu )=U\times J_\nu ^{\odot }(\nu )$$. The mean intensity is normalized so that U = 1 corresponds to   \begin{equation} \int _{c/8\,\mu {\rm m}}^{c/0.0912\,\mu {\rm m}} 4\pi J_\nu ^{\odot }(\nu ){\,\rm d}\nu = 2.2\times 10^{-5}\,\rm W\,m^{-2}. \end{equation} (5) The parameters controlling this component are the mass (Mi), the radiation field intensity (Ui), the PAH mass fraction ($$q^{{{\rm PAH}}}_i$$), and the fraction of charged PAHs ($$f^+_i$$): $${\boldsymbol x}_i=(\ln M_i,\ln U_i,q^{{\rm PAH}}_i,f^+_i)$$. The last two parameters allow us enough flexibility to fit the complex mid-IR range: qPAH controls the relative strength of the aromatic features, while f+ controls mainly the ratio between C–C and C–H bands (e.g. Allamandola, Hudgins & Sandford 1999; Li & Draine 2001; Galliano et al. 2008b). The monochromatic luminosity of source si, at frequency νj, is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) & = & M_i\times ( q^{{{\rm PAH}}}_i\times f_i^+\times l_\nu ^{{{\rm PAH}}^+}(U_i,\nu _j) \nonumber \\ && + \, q^{{\rm PAH}}_i\times (1-f_i^+)\times l_\nu ^{{{\rm PAH}}^0}(U_i,\nu _j) \nonumber \\ && + \, (1-q^{{{\rm PAH}}}_i)\times l_\nu ^{{\rm non-PAH\ dust}}(U_i,\nu _j)), \end{eqnarray} (6)where $$l_\nu ^{{X}}$$ is the specific monochromatic luminosity of component X. The emission spectrum has been computed with the Guhathakurta & Draine (1989) stochastic heating method and has been integrated over the size distribution. We designate this component as deltaU. 2.2.5 Non-uniformly illuminated dust mixture (powerU) The application of the previous component is often problematic as it does not allow us to account for the likely mixing of excitation conditions along the line of sight and in the instrumental beam. To account for non-uniform illumination, we adopt the prescription of Dale et al. (2001), assuming that the dust mass is distributed in different radiation field intensities following a power law:   \begin{equation} \frac{{\rm d}M}{{\rm d}U}=\mathcal {N}\times U^{-\alpha } \quad \,{\rm with}\quad \, U_-\le U\le U_-+\Delta U, \end{equation} (7)where the normalization factor depends on the value of α:   \begin{equation} \mathcal {N} = \left\{ \begin{array}{lc} \displaystyle\frac{(1-\alpha )}{(U_-+\Delta U)^{1-\alpha } -(U_-)^{1-\alpha }}\qquad \qquad & {\rm if }\ \alpha >1 \\ \displaystyle\frac{1}{\ln (U_-+\Delta U)-\ln (U_-)}\qquad \qquad& {\rm if }\ \alpha =1. \end{array} \right. \end{equation} (8)The three parameters of this power-law U−, ΔU, and α are free to vary to fit the shape of the observed SED. Since these three parameters are sometimes degenerate, we often discuss the more stable average starlight intensity, defined as   \begin{equation} \bar{U}= \frac{1}{M} \int _{U_-}^{U_-+\Delta U} U\times \frac{{\rm d}M}{{\rm d}U}{\,\rm d}U, \end{equation} (9)which also depends on the value of α:   \begin{eqnarray} \bar{U}= \left\{ \begin{array}{l@{\qquad}l} \displaystyle\frac{1-\alpha }{2-\alpha } \frac{\left(U_-+\Delta U\right)^{2-\alpha }-U_-^{2-\alpha }}{\left(U_-+\Delta U\right)^{1-\alpha }-U_-^{1-\alpha }} & \mbox{if } \alpha \ne 1\ {\rm and}\ \alpha \ne 2 \\ \displaystyle \frac{\Delta U}{\ln \left(U_-+\Delta U\right)-\ln U_-} & \mbox{if } \alpha = 1\\ \displaystyle \frac{\ln \left(U_-+\Delta U\right)-\ln U_-}{U_-^{-1}-\left(U_-+\Delta U\right)^{-1}} & \mbox{if } \alpha = 2. \end{array}\right. \nonumber\\ \end{eqnarray} (10) This component is simply the deltaU component integrated over the distribution of equation (7). The parameters controlling this component are thus $${\boldsymbol x}_i=(\ln M_i,\ln U_{-,i},\ln \Delta U_i,\alpha _i,q^{{{\rm PAH}}}_i,f_i^+)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) &=& M_i\times \mathcal {N}\nonumber\\ &&\int _{U_{-,i}}^{U_{-,i}+\Delta U_i} l_\nu \left(U_i,q^{\rm{PAH}}_i,f^+_i,\nu _j \right)\times U^{-\alpha _i} {\,\rm d}U, \end{eqnarray} (11)where $$M_i\times l_\nu (U_i,q^{{\rm PAH}}_i,f^+_i,\nu _j)$$ is the deltaU component (equation 6). We designate this component as powerU. 2.2.6 Near-IR emission by stellar populations (starBB) Direct or scattered starlight can significantly contaminate the emission in the near-IR range. Thus, to properly model the emission by PAHs and small grains, one needs to account for the stellar continuum. This component is simply modelled as a T⋆ = 50 000 K blackbody, the only free parameter being its bolometric luminosity $${\boldsymbol x}_i=(\ln L^\star _i)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = L^\star _i\times \frac{\pi }{\sigma T_\star ^4} B_\nu (T_\star ,\nu _j), \end{equation} (12)where σ is the Stefan–Boltzmann constant and Bν is the Planck function. This parametrization is however too rough to provide any reliable constraint on the actual stellar populations. In addition, it is only realistic in the near-IR regime. We designate this component as starBB. 2.2.7 Free–free and synchrotron continua (radio) In addition to dust emission, submillimetre bands also contain radio continuum. We therefore add the possibility to model this component, assuming it is the linear combination of two power laws representing the free–free (Fν ∝ ν−0.1) and synchrotron continua ($$F_\nu \propto \nu ^{-\alpha _{\rm s}}$$; 0.7 ≲ αs ≲ 0.9). The free parameters are the νLν at λ1 = 1 cm, L1, i, the fraction of free–free at λ1, fFF, i, and the synchrotron index, αs, i: $${\boldsymbol x}_i=(\ln L_{1,i},f_{\rm{{FF}},i},\alpha _{s,i})$$. The monochromatic luminosity of source si, at frequency νj, is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) & = & \frac{L_{1,i}}{\nu _1}\times \left( f_{{FF},i}\times \left(\frac{\nu }{\nu _1}\right)^{-0.1}\right. \nonumber \\ &&\, +\, \left.(1-f_{{{FF}},i})\times \left(\frac{\nu }{\nu _1}\right)^{-\alpha _{s,i}}\right), \end{eqnarray} (13)with ν1 = c/λ1. We designate this component as radio. 2.3 Linear combination of individual components We can linearly combine any of the previous components to fit an observed SED. Each parameter can be let free to vary (limits can be specified), or fixed to an arbitrary value. Alternatively, we can tie two parameters together (e.g. the β of two MBB components or the qPAH of a deltaU and powerU components). From a numerical point of view, we have pre-computed large grids of models for a large range of all the parameters, and performed synthetic photometry for a wide list of instrumental filters. Our code then simply interpolates these grids of templates to evaluate the SED model for a combination of parameters. The model grids and the interpolation method are described in Appendix A. 3 HIERARCHICAL BAYESIAN INFERENCE APPLIED TO SED FITTING Most of the formalism discussed in this section is the generalization to more complex dust models of the pioneering work of Kelly et al. (2012). 3.1 The non-hierarchical Bayesian point of view Let us first start by laying down the standard Bayesian formalism. If, for now, we omit systematic uncertainties, we can simply express the observed SED of a single source as the sum of our emission model and a random deviation due to the noise:   \begin{equation} L_\nu ^{{\rm obs}}(\nu _j) = L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}) + \epsilon (\nu _j)\times \sigma _\nu ^{{\rm noise}}(\nu _j), \end{equation} (14)where ε(ν) is a random variable with 〈ε〉 = 0 and σ(ε) = 1. The PDF of the noise p(ε(νj)) can be arbitrary (Gaussian, Student's t, etc.). We can reverse equation (14) to make explicit its dependence on the parameters $${\boldsymbol x}$$:   \begin{equation} \epsilon (\nu _j,{\boldsymbol x}) = \frac{L_\nu ^{\rm{obs}}(\nu _j)-L_\nu ^{\rm{mod}}(\nu _j,{\boldsymbol x})}{\sigma _\nu ^{{{\rm noise}}}(\nu _j)}. \end{equation} (15)In the absence of correlations, the likelihood can be expressed as a conditional PDF:   \begin{equation} p\left({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x}\right) = \prod _{j=1}^m p(\epsilon (\nu _j,{\boldsymbol x})), \end{equation} (16)noting $${\boldsymbol L}_\nu ^{{\rm obs}}=(L_\nu ^{{\rm obs}}(\nu _1),\ldots ,L_\nu ^{{\rm obs}}(\nu _m))$$, the SED vector containing the emission at each waveband. This is a well-known expression. For instance, if we assume Gaussian errors, finding the maximum of this likelihood is equivalent to minimizing the chi-squared. From a Bayesian point view, we are more interested in the PDF of the parameters, knowing the observations, $$p({\boldsymbol x}|{\boldsymbol L}_\nu ^{{\rm obs}})$$, rather than in the distribution of equation (16). Conveniently, Bayes’ theorem states that   \begin{equation} p({\boldsymbol x}|{\boldsymbol L}_\nu ^{{\rm obs}}) = \frac{p({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x}) \times p({\boldsymbol x})}{p({\boldsymbol L}_\nu ^{{\rm obs}})} \propto p({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x})\times p({\boldsymbol x}). \end{equation} (17)In the equation above, $$p({\boldsymbol x})$$ is called the prior distribution. It represents the intrinsic distribution of the parameters. We usually do not have an accurate knowledge of this distribution. The term $$p({\boldsymbol L}_\nu ^{{\rm obs}})$$ is a constant, since it is independent of the parameters. It enters as the normalization factor in the second part of equation (17). The left-hand term of equation (17) is called the posterior distribution. Standard Bayesian techniques consist of sampling it, i.e. mapping it in the space of parameters $${\boldsymbol x}$$. An assumption has to be made on the prior. It is common to assume it is constant or slowly varying over the interval range covered by the parameters. From the knowledge of the distribution of equation (17), one can estimate parameter averages, standard deviations, estimate confidence intervals, test hypotheses, etc. 3.2 The hierarchical model 3.2.1 The posterior distribution The HB method is built upon the previous formalism, with the difference that the prior distribution is now inferred from the data. This is achieved by parametrizing the shape and position of the new prior distribution with hyperparameters (which control the distribution of the parameters). It is common to assume a unimodal prior (e.g. multivariate Gaussian or Student's t), where the hyperparameters are (i) the average of the parameter vector, $${\boldsymbol \mu }$$, and (ii) their covariance matrix, Σ. This approach is relevant only when analysing a sample of n > 1 sources. Equation (17) now becomes, for one source si  \begin{equation} p({\boldsymbol x}_i|{\boldsymbol L}_{\nu ,i}^{{\rm obs}},{\boldsymbol \mu }, \Sigma) \propto p({\boldsymbol L}_{\nu ,i}^{{\rm obs}}|{\boldsymbol x}_i) \times p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma ), \end{equation} (18)where $$p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma )$$ is the new prior parametrized by $${\boldsymbol \mu }$$ and Σ. The posterior distribution of the parameters and their hyperparameters is then   \begin{eqnarray} p&(&\!{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,{\boldsymbol \mu },\Sigma |{\boldsymbol L}_{\nu ,1}^{{\rm obs}},\ldots ,{\boldsymbol L}_{\nu ,n}^{{\rm obs}}) \nonumber\\ &&\propto \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol L}_{\nu ,i}^{{\rm obs}},{\boldsymbol \mu },\Sigma ) \times p({\boldsymbol \mu })\times p(\Sigma), \end{eqnarray} (19)where $$p({\boldsymbol \mu })$$ and p(Σ) are the prior distributions of the hyperparameters. They will be described in Section 3.2.4. Note that, in equation (19), there is only one common set of hyperparameters for all of the sources. Fig. 1 illustrates how the HB method works. To make it simple, we consider a model with only one parameter x, such as, for example, the intensity of a single line fit. We simulate, in panel (a), what could be the actual distribution of parameters (in grey). We have drawn three values (colour dotted lines) from this distribution, representing what we would get if we were observing a line in three different locations in a cloud: $$x_{1,2,3}^{\rm{true}}$$. In panel (b), we introduce noise and show the likelihoods (colour curves). We have assigned an arbitrary uncertainty to the three parameters: σ1, 2, 3. We have drawn a random deviation according to this uncertainty: Δx1, 2, 3. If we were implementing a non-HB method or a least-squares fit, our estimate of the parameters would be the vertical dashed lines at the mean/mode of each likelihood: $$x_{1,2,3}^{{\rm{B}}}$$. In panel (c), we show the inferred mean prior distribution (in grey). The colour curves represent the HB posterior, i.e. the product of the prior and the likelihood. The mean values that we would derive from the HB analysis are the $$x_{1,2,3}^{{\rm{HB}}}$$, which are closer to the true values than the $$x_{1,2,3}^{{\rm{B}}}$$. The reason is that the multiplication by the prior reduces the dispersion of the values due to the noise. The posterior number 1 (in blue) has a high signal-to-noise ratio. Its likelihood is narrower than the prior. Thus, it is only weakly modified by the prior, and $$x_1^{\rm{{B}}}\simeq x_1^{{\rm{HB}}}$$. In contrast to this, the posterior number 3 (in red) deviates more, since it has a low signal-to-noise ratio. The prior thus has a major effect on this distribution and brings $$x_3^{{\rm{HB}}}$$ closer to $$x_3^{{\rm{true}}}$$. Figure 1. View largeDownload slide Illustration of the HB method. Panel (a) shows the actual distribution of parameter x (in grey) and three values of these parameters ($$x_{1,2,3}^{{\rm{true}}}$$; in colours; dotted lines) that correspond to the observations. Panel (b) shows the likelihood of the fit (colour curves) when introducing noise (σ1, 2, 3). The mean value of x derived from the likelihood $$x_{1,2,3}^{\rm{B}}$$ is displaced from the true value by $$\Delta x_{1,2,3}=|x_{1,2,3}^{{\rm{B}}}-x_{1,2,3}^{{\rm{true}}}|$$. Panel (c) shows the inferred prior (in grey), the posterior distributions (colour curves) and the derived mean values ($$x_{1,2,3}^{{\rm{HB}}}$$). Figure 1. View largeDownload slide Illustration of the HB method. Panel (a) shows the actual distribution of parameter x (in grey) and three values of these parameters ($$x_{1,2,3}^{{\rm{true}}}$$; in colours; dotted lines) that correspond to the observations. Panel (b) shows the likelihood of the fit (colour curves) when introducing noise (σ1, 2, 3). The mean value of x derived from the likelihood $$x_{1,2,3}^{\rm{B}}$$ is displaced from the true value by $$\Delta x_{1,2,3}=|x_{1,2,3}^{{\rm{B}}}-x_{1,2,3}^{{\rm{true}}}|$$. Panel (c) shows the inferred prior (in grey), the posterior distributions (colour curves) and the derived mean values ($$x_{1,2,3}^{{\rm{HB}}}$$). One of the subtleties of this process is the inference of the prior. In our simple example, its hyperparameters are the mean, $$\bar{x}$$, and standard deviation, σx. The full posterior distribution (equation 19) is thus a five-dimension PDF depending here on x1, x2, x3, $$\bar{x}$$, and σx. The most likely hyperparameters will be the values of $$\bar{x}$$ and σx for which the posterior is maximum, in the five-dimension space. We can also estimate averages of the hyperparameters $$\langle \bar{x}\rangle$$ and 〈σx〉, marginalizing the posterior over the rest of the variables. In that sense, the representation of panel (c) is a simplification. It actually shows unidimensional cuts of the prior and the posteriors, fixing $$\bar{x}$$ to $$\langle \bar{x}\rangle$$ and σx to 〈σx〉. 3.2.2 Introducing systematic uncertainties Instrumental calibration uncertainties, on top of being partially correlated between wavebands, are assumed to be fully correlated between sources. Kelly et al. (2012) treat these calibration errors as nuisance parameters. Following their formalism, we can rewrite equation (15)   \begin{equation} \epsilon (\nu _j,{\boldsymbol x},\delta _j) = \frac{L_\nu ^{{\rm{obs}}}(\nu _j) -L_\nu ^{{\rm{mod}}}(\nu _j,{\boldsymbol x})\times (1+\delta _j)}{\sigma _\nu ^{{\rm{noise}}}(\nu _j)}, \end{equation} (20)where we have introduced the calibration offset δj at frequency νj. The posterior distribution of equation (19) becomes   \begin{eqnarray} p&(&\!{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,{\boldsymbol \mu },\Sigma,{\boldsymbol \delta } |{\boldsymbol L}_{\nu ,1}^{{\rm{obs}}},\ldots ,{\boldsymbol L}_{\nu ,n}^{{{\rm obs}}}) \nonumber\\ &&\!\propto\! \prod _{i=1}^n\prod _{j=1}^m p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)\times p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \times p({\boldsymbol \mu })\!\times\! p(\Sigma)\times p({\boldsymbol \delta}),\nonumber \\ \end{eqnarray} (21)where the new likelihood, $$p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)$$, is simply $$p(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j))$$. We have also introduced the prior distribution of $${\boldsymbol \delta }$$, $$p({\boldsymbol \delta })$$. Its mean is 〈δj〉 = 0 and its covariance matrix, Vcal, is made of the calibration uncertainties of the different wavebands and their correlations. 3.2.3 The noise distribution The noise is assumed to be uncorrelated between wavelengths and sources. Our model let us choose between different distributions for the variable, depending on the actual distribution of the noise measured on the data. We currently can choose between the three following types of noise. Normal noise: This is the default case. The monochromatic likelihood of source si, at frequency νj, is in this case   \begin{equation} p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j) \propto \exp \left[-\frac{1}{2} \left(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j)\right)^2\right]. \end{equation} (22) Robust noise: In case of outliers, we can assume that the statistical errors follow a Student's t distribution with f = 3 degrees of freedom. It has broader wings than a Gaussian PDF with the same σ. The monochromatic likelihood of source si, at frequency νj, is then   \begin{equation} p(L_{\nu ,i,j}^{{{\rm obs}}}|{\boldsymbol x}_i,\delta _j) \propto \left[1+\frac{1}{f}\left(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j)\right)^2\right]^{-(f+1)/2}. \end{equation} (23) Asymmetric noise: Background galaxies or Galactic cirruses can skew the noise distribution towards high fluxes. In this case, we use a split-normal PDF (Villani & Larsson 2006):   \begin{eqnarray} &&p(\!L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)\propto \\ &&\left\{\begin{array}{lc} {\exp \left[-\frac{1}{2} \left(\displaystyle\frac{L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}} \times (1+\delta _j)-\mu _{i,j}}{\lambda _{i,j}}\right)^2\right]}\\ \qquad \qquad{{\rm if }\ L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}}\times (1+\delta _j) \le \mu _{i,j}}\\ {\exp \left[-\frac{1}{2} \left(\displaystyle\frac{L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}} \times (1+\delta _j)-\mu _{i,j}}{\lambda _{i,j}\tau _{i,j}}\right)^2\right]}\\ {{\rm if }\ L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}}\times (1+\delta _j) > \mu _{i,j},} \end{array}\right. \end{eqnarray} (24)where the position parameter, μi, j, the scale parameter, λi, j, and the shape parameter, τi, j, are derived from the mean (0), standard deviation ($$\sigma _{\nu ,i,j}^{{\rm{noise}}}$$) and skewness of the noise (Appendix B1). 3.2.4 The prior distributions We follow Kelly et al. (2012), and assume a g = 8 degrees of freedom multivariate Student's t distribution, for the distribution of our parameters:   \begin{eqnarray} p&(&\!{\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \propto \frac{1}{\sqrt{|\Sigma }|}\nonumber\\ &&\times \left(1+\frac{1}{g}({\boldsymbol x}_i-{\boldsymbol \mu })^{\rm T}\Sigma^{-1}({\boldsymbol x}_i-{\boldsymbol \mu })\right)^{-(g+q)/2}, \end{eqnarray} (25)where q is the number of parameters (Section 2.1). Notice that the factor $$1/\sqrt{|\Sigma |}$$, in front of the exponential, is not a normalization constant here, since we are sampling the distribution as a function of the elements of Σ. We assume a uniform prior on $${\boldsymbol \mu }$$. For the q × q covariance matrix, Σ, we use the Barnard, McCulloch & Meng (2000) separation strategy, decomposing it as: $$\Sigma =\mathcal {S}\mathcal {R}\mathcal {S}$$, where $$\mathcal {S}$$ is the diagonal matrix of standard deviations, and $$\mathcal {R}$$ is the correlation matrix. We place wide, independent normal priors on the diagonal elements of $$\ln \mathcal {S}$$, centred on the standard deviations of the least-squares best-fitting parameters (Appendix C), $$\ln S_{k,k}^{\chi ^2}$$, with σ(ln Sk, k) = 10:   \begin{eqnarray} p(\mathcal {S}) = \prod _{k=1}^q \frac{1}{S_{k,k}}\frac{1}{\sqrt{2\pi }\sigma (\ln S_{k,k})} \exp \left[-\frac{1}{2}\left(\frac{\ln S_{k,k}-\ln S_{k,k}^{\chi ^2}}{\sigma (\ln S_{k,k})}\right)^2\right].\!\!\!\!\!\!\nonumber\\ \end{eqnarray} (26)The prior on $$\mathcal {R}$$ is chosen to make sure that each correlation coefficient is uniformly distributed between −1 and 1 and $$\mathcal {R}$$ is positive definite. The formalism developed by Barnard et al. (2000) assumes that $$p(\Sigma |\mathcal {S})$$ is distributed according to an inverse Wishart distribution, with h = q + 1 degrees of freedom. The resulting prior on $$\mathcal {R}$$ is then   \begin{equation} p(\mathcal {R}) \propto |\mathcal {R}|^{(h-1)(q-1)/2-1}\left(\prod _{k=1}^q|\mathcal {R}_{(kk)}|\right)^{-h/2} \end{equation} (27)  \begin{equation} \phantom{p(\mathcal {R})}\propto |\mathcal {R}|^{q(q-1)/2-1}\left(\prod _{k=1}^q|\mathcal {R}_{(kk)}|\right)^{-(q+1)/2}, \end{equation} (28)where $$\mathcal {R}_{(kk)}$$ is the principal submatrix of order k, i.e. the matrix of elements $$R_{k_1,k_2}$$, with k1 = 1, …, k and k2 = 1, …, k. In the end, the prior distribution of the hyperparameters is   \begin{equation} p({\boldsymbol \mu })\times p(\Sigma) \propto p(\mathcal {S})\times p(\mathcal {R}). \end{equation} (29) Finally, the prior on the calibration offsets, $${\boldsymbol \delta }$$, is designed to reflect the absolute calibration uncertainties recommended by each instrument's team, with m × m covariance matrix, $$\mathcal {V}_{\rm cal}$$. Since the calibration factor $$(1+{\boldsymbol \delta })$$ cannot be zero, we draw the variable $${\boldsymbol \delta ^\prime }=\ln (1+{\boldsymbol \delta })$$. Noting that δj ≪ 1, we can assume that $$\delta _j^\prime \simeq \delta _j$$, and thus that $$\mathcal {V}_{\rm cal}$$ can be used as the covariance matrix of $${\boldsymbol \delta }^\prime$$. Similarly to the statistical error, we consider several types of distribution. The first case is a multivariate normal distribution:   \begin{equation} p({\boldsymbol \delta ^\prime }) \propto \exp \left(-\frac{1}{2}{\boldsymbol \delta ^\prime }^T\mathcal {V}_{\rm cal}^{-1} {\boldsymbol \delta ^\prime }\right). \end{equation} (30)The second case is a more robust multivariate Student's t distribution with f = 3 degrees of freedom:   \begin{equation} p({\boldsymbol \delta ^\prime }) \propto \left(1+\frac{1}{f}{\boldsymbol \delta ^\prime }^T\mathcal {V}_{\rm cal}^{-1} {\boldsymbol \delta ^\prime }\right)^{-(f+m)/2}. \end{equation} (31) 3.3 The numerical implementation Sampling the distribution of equation (21) is a numerical challenge, as its number of dimensions is   \begin{equation} N_{\rm dim} = \underbrace{n\times q}_{\rm parameters} + \underbrace{2\times q+q\times (q-1)/2}_{\rm hyperparameters} + \underbrace{m}_{\rm calibration}. \end{equation} (32)For a typical sample with n = 100 sources, observed through m = 11 photometric filters, modelled with q = 7 free parameters, the dimension is Ndim = 767. It is thus impractical to map the posterior on a regular Cartesian grid of parameters. 3.3.1 The Metropolis–Hastings move with ancillarity/sufficiency interweaving strategy The most popular way to sample the posterior distribution is to use a Markov Chain Monte Carlo (MCMC). This class of algorithm allows one to randomly draw variables from the posterior distribution. An MCMC is essentially a sequence of values of the set of parameters and hyperparameters. The number density of values in the chain scales with the probability density, i.e. more models are computed around the maximum likelihood, and few or none in regions of the parameter space where the solution is unlikely. It makes this method efficient, as the SED model (which can be numerically intensive) is computed only for relevant combinations of the parameters. MCMCs also make post-processing simple, as one can easily marginalize over any parameter, estimate moments, test hypotheses, etc. without having to perform multidimensional integrals. The MCMC sampler we have developed applies Gibbs sampling within the Metropolis–Hastings algorithm (MH; e.g. Geman & Geman 1984; Gelman et al. 2004; Press et al. 2007). This particular method consists of drawing each parameter, one by one, from their unidimensional conditional posterior distribution, fixing all the other parameters to their current value in the chain. However, as noted by Kelly (2011), calibration uncertainties introduce correlations within the MCMC, and thus require running very long chains to ensure convergence. The reason is that there is a degeneracy between the values of the calibration offsets, $${\boldsymbol \delta }$$, and the SED model parameters, $${\boldsymbol x}$$. To address this problem, Kelly (2011) demonstrated that the ancillarity–sufficiency interweaving strategy (ASIS; Yu & Meng 2011) could reduce the autocorrelation of the MCMC and thus obtain convergence towards the posterior distribution with a shorter chain. In simple terms, ASIS consists in inserting an extra step at each MCMC iteration. In this new step, we draw values of the parameters in a direction where $$(1+\delta _j)\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j)$$ is constant, in order to decouple $${\boldsymbol \delta }$$ and $${\boldsymbol x}$$. Our code applies ASIS to all model parameters. At first, we set the initial values of the parameters, $${\boldsymbol x}_i$$, of the MCMC to the best-fitting values given by the least-squares fit (Section C). For the hyperparameters, we set $${\boldsymbol \mu }$$ and the diagonal elements of $$\mathcal {S}$$ to the mean and standard deviation of the least-squares parameters. The correlation coefficients (the non-diagonal elements of $$\mathcal {R}$$) are set to 0. Finally, the initial calibration offsets, $${\boldsymbol \delta }$$ are set to $${\boldsymbol 0}$$. We then iterate the following steps NMCMC times. We draw the calibration offsets, for each frequency νj, from   \begin{eqnarray} p&(&\!\delta _j|{\boldsymbol \delta }_{j^\prime \ne j}, {\boldsymbol x}_1,\ldots {\boldsymbol x}_n, L_{\nu ,1,j}^{{\rm{obs}}},\ldots ,L_{\nu ,n,j}^{{{\rm obs}}}) \nonumber\\ &&\propto p({\boldsymbol \delta }) \times \prod _{i=1}^n p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j). \end{eqnarray} (33)If $$\mathcal {V_{\rm cal}}$$ is not diagonal (correlated calibration uncertainties), then all of the values of $${\boldsymbol \delta }$$ contribute to every frequency, νj. Otherwise, only δj contributes. For each source, si, we draw each parameter, xi, k, from   \begin{eqnarray} p&(&\!x_{i,k}|{\boldsymbol \delta },{\boldsymbol x}_{i,k^\prime \ne k},{\boldsymbol \mu }, \Sigma,{\boldsymbol L}_{\nu ,i}^{{\rm{obs}}}) \nonumber\\ &&\propto p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \times \prod _{j=1}^m p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j). \end{eqnarray} (34) We implement the component-wise interweaving strategy (CIS; Yu & Meng 2011). To do so, we iterate the following steps by looping on each parameter of index k΄. For each source, si, and each frequency, νj, we compute the new following variable:   \begin{equation} \tilde{\delta }_{i,j} = (1+\delta _j)\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j). \end{equation} (35)In practice, $$L_\nu ^{{{\rm mod}}}({\boldsymbol x}_i,\nu _j)$$ can simply be the model component which is controlled by the parameter we are looping over. If the parameter is tied to another one, then its component needs to be added. Then, for one given source, $$s_{i^\prime }$$, we draw a new value of the physical parameter we are looping over, keeping $${\tilde{{\boldsymbol \delta}}}_{i^\prime }$$ constant:   \begin{eqnarray} p&(&\!x_{i^\prime ,k^\prime }|{\tilde{{\boldsymbol \delta}}}_{i^\prime }, x_{i^\prime ,k\ne k^\prime },{\boldsymbol \mu }, \Sigma )\propto \nonumber\\ &&p ({\boldsymbol x}_{i^\prime}|\mu, \Sigma )\times p\left({\boldsymbol \delta ^\prime }=\ln \frac{{\tilde{{\boldsymbol \delta}}}_{i^\prime }}{{\boldsymbol L}_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime })}\right), \end{eqnarray} (36)where the second term of the right-hand side is simply the distribution of equation (30), replacing each $$\delta ^\prime _j$$ by $$\ln (\tilde{\delta }_{i^\prime ,j} /L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _j))$$. In practice, we select a different i΄ at each MCMC cycle, to improve statistical mixing. Notice that the likelihood does not appear, as we sample in a direction where it is constant. That is the key of the success of ASIS. We then compute the values of the parameters of the remaining sources, $$x_{i\ne i^\prime ,k^\prime }$$. This is achieved by inverting equation (35) for an arbitrarily chosen frequency $$\nu _{j^\prime }$$ (we change j΄ at each MCMC cycle) and eliminating δj:   \begin{equation} x_{i\ne i^\prime ,k^\prime } = L_{\nu ,i,j^\prime ,k^\prime }^{{\rm{inv}}} \left(\frac{\tilde{\delta }_{i,j^\prime }}{\tilde{\delta }_{i^\prime ,j^\prime }}\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _{j^\prime })\right), \end{equation} (37)where $$L_{\nu ,i,j,k}^{{\rm{inv}}}$$ is the model inverse function, i.e. it is the value of parameter xk corresponding to the monochromatic luminosity in the argument, at frequency νj, fixing the other parameters, $$x_{k\ne k^\prime }$$. Its numerical implementation is discussed in Appendix A3. We update the calibration offsets, for each frequency, νj, by solving equation (35):   \begin{equation} \delta _j = \frac{\tilde{\delta }_{i^\prime ,j}}{L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _j)}-1. \end{equation} (38) The prior on $${\boldsymbol \mu }$$ being uniform, we draw the μk elements, one by one, from   \begin{equation} p(\mu _k|\mu _{k^\prime \ne k},{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n, \Sigma) \propto \prod _{i=1}^np({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (39) The standard deviations, Sk, k, are drawn, one by one, from   \begin{equation} p(S_{k,k}|S_{k^\prime \ne k,k^\prime \ne k}, {\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,\mathcal {R}) \propto p(\mathcal {S})\times \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (40) Finally, the elements of the correlation matrix, $$\mathcal {R}$$, are drawn, one by one, from   \begin{equation} p(R_{k_1,k_2}|R_{k_1^\prime \ne k_1,k_2^\prime \ne k_2},{\boldsymbol x}_i,\mathcal {S}) \propto p(\mathcal {R})\times \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (41) 3.3.2 Assessing optimal sampling We derive several quantities in order to assess the convergence of the MCMC. The autocorrelation function (ACF) of a parameter, x, is, for a lag,3k:   \begin{equation} \rho _x(k) = \frac{{\rm cov}_t(x^{(t)},x^{(t+k)})}{\sqrt{{\rm var}_t(x^{(t)}) {\rm var}_t(x^{(t+k)})}}, \end{equation} (42)where x(t) is the value of parameter x at the step t of the MCMC. The notations vart and covt indicate that the variance and covariance are taken over the index t. We numerically evaluate it, using the FFT of the MCMC. From the ACF, we can estimate the integrated autocorrelation time (e.g. Foreman-Mackey et al. 2013):   \begin{equation} \tau _{\rm int}(x) = 1 + 2\sum _{k=1}^{N_{\rm MCMC}}\rho _x(k), \end{equation} (43)following the method of Sokal (1997). It quantifies the length after which the drawings are truly independent. The effective sample size, defined as   \begin{equation} N_{\rm eff}(x) = \frac{N_{\rm MCMC}}{\tau _{\rm int}(x)} \end{equation} (44)provides an estimate of the number of effective independent draws. We let our MCMC run until Neff > 30, at least, for each parameter. 3.3.3 Derived quantities Once the MCMC has been computed, we estimate a few quantities characterizing the parameter distribution. For each parameter and hyperparameter, y, we derive its mean and standard deviation, 〈y〉 and σy, marginalizing over the other parameters. This is technically achieved by taking the average and standard deviation of the MCMC of the parameter. Similarly, we can estimate any other quantity (correlation coefficients, degree of confidence, etc.), by computing this quantity at each MCMC step, and estimating the average and standard deviation of the resulting distribution of values. 4 EFFECTS OF THE NOISE, SAMPLE SIZE, AND SED SHAPE In this section and in Sections 5 and 6, we dissect several tests aimed at assessing the performance of our HB code. These tests are all performed on simulated data, so that one can compare the results of the model to the true values of the parameters. In the present article, we exclusively simulate data with the same assumptions as in the HB model. These assumptions are the followings. The noise: Both the distribution (normal, asymmetric, etc.; see Section 3.2.3) and the amplitude of the noise have an effect on the results. In this paper, we exclusively use normal, uncorrelated noise and assume that we perfectly know its amplitude. The physical model: We simulate SEDs with the same combination of components (see Section 2.2) as in the HB method. In Sections 4 and 5, we use only the combination of the powerU (Section 2.2.5) and starBB (Section 2.2.6) components. This combination is indeed one of the most relevant, when modelling the near-IR-to-submm emission of interstellar regions. We use the grain properties of the AC model of Galliano et al. (2011). We demonstrate the model performances with several other components in Section 6. The prior distribution: The assumed shape of the prior (equation 25) also has an impact on low signal-to-noise ratio sources. In what follows, we draw parameter values from the same distribution as in equation (25). 4.1 The simulation grid We start by studying the combined effects of signal-to-noise ratio, sample size, and SED shape on the performance of the HB method. To do so, we simulate a grid of SED samples, varying these three quantities. We simulate three different classes of SED shapes, labelled cold, warm, and hot. The physical parameters of each simulation are drawn from the distribution of equation (25) with the hyperparameters listed in Table 1. These SEDs are shown in Fig. 2. We assume that these SEDs are observed through a typical collection of photometric filters: the four Spitzer/IRAC bands (3.6, 4.5, 5.8, and 8 μm), the Spitzer/MIPS band at 24 μm, the three Herschel/PACS bands (70, 100, and 160 μm) and the three Herschel/SPIRE bands (250, 350, and 500 μm). For each SED shape, we generate three samples of n = 10, 100, and 1000 sources. Figure 2. View largeDownload slide Classes of simulated SEDs. The solid lines present the mean of the simulated SEDs of Table 1. The dashed lines show the ±1σ ranges. For each class, we quote the average starlight intensity, $$\bar{U}$$, of the mean, and the approximate equilibrium grain temperature, Td. The latter is derived from $$T_{\rm d}\simeq \bar{U}^{1/6}\times 18\,\rm K$$. We have displayed the transmission of each filter used (grey densities) and labelled its nominal wavelength, in microns. Figure 2. View largeDownload slide Classes of simulated SEDs. The solid lines present the mean of the simulated SEDs of Table 1. The dashed lines show the ±1σ ranges. For each class, we quote the average starlight intensity, $$\bar{U}$$, of the mean, and the approximate equilibrium grain temperature, Td. The latter is derived from $$T_{\rm d}\simeq \bar{U}^{1/6}\times 18\,\rm K$$. We have displayed the transmission of each filter used (grey densities) and labelled its nominal wavelength, in microns. Table 1. True hyperparameters for the three SED shapes of Section 4.1. These values are the elements of $${\boldsymbol \mu }$$, $$\mathcal {S}$$, and $$\mathcal {R}$$ of the distribution in equation (25). They correspond to the parameters described in Sections 2.2.5 and 2.2.6. M is in M⊙ and L⋆ in L⊙.   Cold  Warm  Hot  μ[ln M]  0  μ[ln U−]  ln 0.3  ln 10  ln 100  μ[ln ΔU]  ln 10  ln 300  ln 104  μ[α]  1.8  μ[qPAH]  0.06  0.04  0.02  μ[f+]  0.5  μ[ln L⋆]  ln 103  ln 104  ln 105  S[ln M]  0.5  S[ln U−]  0.4  S[ln ΔU]  0.5  S[α]  0.3  S[qPAH]  0.01  S[f+]  0.1  S[ln L⋆]  0.1  R[all]  0    Cold  Warm  Hot  μ[ln M]  0  μ[ln U−]  ln 0.3  ln 10  ln 100  μ[ln ΔU]  ln 10  ln 300  ln 104  μ[α]  1.8  μ[qPAH]  0.06  0.04  0.02  μ[f+]  0.5  μ[ln L⋆]  ln 103  ln 104  ln 105  S[ln M]  0.5  S[ln U−]  0.4  S[ln ΔU]  0.5  S[α]  0.3  S[qPAH]  0.01  S[f+]  0.1  S[ln L⋆]  0.1  R[all]  0  View Large We add random deviations to the simulated SED samples. These deviations are divided in the two following categories. Calibration offsets, δj, are drawn from the distribution of equation (30), keeping the same values for each source in the sample. However, we draw different offsets for each sample, in order to average out biases that could result from a particular realization. Noise deviations, εi, j, are drawn from a normal distribution (equation 22). These variables are independent. We assume that the absolute noise level is the same in each source of a given sample. This is to mimic observations of spatially resolved regions, where the RMS per waveband is roughly constant. We set the noise uncertainty at a frequency, νj, proportional to the simulated median of the monochromatic luminosities of all the sources, si:   \begin{equation} \sigma _{\nu ,j} = \frac{{\rm med}\left(L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j)\right)}{f_{\rm S/N}}. \end{equation} (45)For each SED shape and sample size, we simulate three realizations of the noise with median signal-to-noise ratios, fS/N = 0.3, 3, and 30. Samples with fS/N = 0.3 are dominated by the noise, while samples with fS/N = 30 are dominated by the calibration errors. In total, we have 33 = 27 simulations. 4.2 Dissection of a model's results To start, we analyse in details the central run in the simulation grid (warm SED, with n = 100 and fS/N = 3), in order to demonstrate how the model works, on a concrete example. 4.2.1 The MCMC In Fig. 3, we show the first 2000 steps of the MCMC of the parameter ln M, for the brightest source in the sample. The distribution of values sampled by the chain, shown in the right-hand panel of Fig. 3, is the marginal posterior distribution of the parameter. Notice that the fluctuations of the chain are not independent. There are structures, like the one highlighted in green, having a length of the order of the integrated autocorrelation time (τint; equation 43). We have chosen an example with a particularly short autocorrelation time, for clarity. However, τint can reach up to ≃105 for some parameters (see Section 4.4). We have also highlighted the burn-in phase, in red. This phase corresponds to the time spent by the MCMC to walk from the initial condition to the region of relevant likelihood. It is advised to exclude this part of the chain from the analysis. Using the least-squares best-fitting parameters as initial conditions (Section 3.3.1), we have not witnessed any particularly long burn-in. It is usually of the order of τint. Figure 3. View largeDownload slide MCMCof a parameter. The parameter is ln M for the brightest source in the central simulation of Section 4.1. The right-hand panel shows the corresponding distribution of the parameter, and its derived average and standard deviation. We highlight the burn-in phase (in red) and a typical autocorrelation time (in green). For clarity, we show only the first 2000 steps. Figure 3. View largeDownload slide MCMCof a parameter. The parameter is ln M for the brightest source in the central simulation of Section 4.1. The right-hand panel shows the corresponding distribution of the parameter, and its derived average and standard deviation. We highlight the burn-in phase (in red) and a typical autocorrelation time (in green). For clarity, we show only the first 2000 steps. The autocorrelation of the chain can be quantified. Fig. 4 shows the ACF (equation 42) for two hyperparameters of the central simulation. These ACFs all have the same qualitative behaviour. They start around 1, at small lags. They then drop towards 0 in a time that is comparable to τint. Finally, for large lags, they oscillate around 0 with a small amplitude. In Fig. 4, we compare the ACFs and integrated autocorrelation times obtained with (i) our full method including ASIS (Section 3.3.1), in blue; and (ii) standard Gibbs sampling, switching off ASIS, in red. We can see that, using ASIS, τint is reduced by a factor of ≃6–9, in this particular case. It illustrates that the implementation of ASIS can help reach convergence with a significantly shorter chain. Figure 4. View largeDownload slide Autocorrelation functions (equation 42) for two hyperparameters. The two hyperparameters are the averages of ln M and ln U− (Section 2.2.5), for the central simulation of Section 4.1. The blue curves correspond to the ACFs obtained with our full method, including ASIS (Section 3.3.1). The red curves correspond to the same ACFs, but switching off ASIS. Figure 4. View largeDownload slide Autocorrelation functions (equation 42) for two hyperparameters. The two hyperparameters are the averages of ln M and ln U− (Section 2.2.5), for the central simulation of Section 4.1. The blue curves correspond to the ACFs obtained with our full method, including ASIS (Section 3.3.1). The red curves correspond to the same ACFs, but switching off ASIS. 4.2.2 The SEDs Fig. 5 shows examples of SED fits for the central simulation. Panel (a) shows the SED of the brightest pixel and panel (b), the faintest. The SED probability density is simply the distribution of SED models, computed with the values of the parameters, at each step in the MCMC. Obviously, the SED density is more dispersed for the low signal-to-noise ratio source. We see that, in both cases, the HB SED density is in better agreement with the true SED (in red) than the least-squares fit (in green). We also notice that the PAH fraction of the χ2, in panel (b), has been clearly underestimated, while the HB model is close to its true value. Panel (c) shows the calibration offsets ($${\boldsymbol \delta }$$). The simulated offsets are shown in red. We emphasize that they are common to all the sources in the sample. We can see that the inferred values of $${\boldsymbol \delta }$$ (in blue) are consistent with the true values. Figure 5. View largeDownload slide SED fits of two sources. The two sources are the brightest (panel a) and the faintest (panel b) of the central simulation of Section 4.1. The circle with error bars are the synthetic observations. Upper limits are quoted at 3σ. The blue-to-black density shows the HB probability distribution of the SED. The red line shows the true SED (without noise). The green lines are the least-squares fit, for comparison. For each SED, we quote the median (over frequencies) of the signal-to-noise ratio (medν[S/N]). Panel (c) shows the calibration offsets $${\boldsymbol \delta }$$ (common to both SEDs). The red dots are the true offsets, and the blue circle with error bars are the inferred values. Figure 5. View largeDownload slide SED fits of two sources. The two sources are the brightest (panel a) and the faintest (panel b) of the central simulation of Section 4.1. The circle with error bars are the synthetic observations. Upper limits are quoted at 3σ. The blue-to-black density shows the HB probability distribution of the SED. The red line shows the true SED (without noise). The green lines are the least-squares fit, for comparison. For each SED, we quote the median (over frequencies) of the signal-to-noise ratio (medν[S/N]). Panel (c) shows the calibration offsets $${\boldsymbol \delta }$$ (common to both SEDs). The red dots are the true offsets, and the blue circle with error bars are the inferred values. 4.2.3 The derived parameters Fig. 6 compares the performances of different methods, applied to the central simulation. It shows the derived correlation between two of the main parameters: the dust mass, M, and the average starlight intensity, $$\bar{U}$$ (equation 10). The true values are shown in red, in each panel. Notice that there is no intrinsic correlation between the two parameters. However, the least-squares fit, in panel (a), shows a clear negative correlation, with a significant scatter. This is the equivalent of the noise-induced β–T negative correlation, for MBBs, demonstrated by Shetty et al. (2009). In panel (b), we display the non-HB results. The stars and error ellipses show the means and covariances of the posterior distribution (equation 17), including the calibration offsets in the likelihood and its prior distribution. The inferred values show a significant scatter and a negative correlation between the two parameters. Notice that the uncertainties are very different between panels (a) and (b). The Bayesian error ellipses are more rigorous, as they are directly derived from the actual shape of the likelihood, and not from a parabolic approximation, in the χ2 case (Appendix C). Panel (b) demonstrates that, although the non-HB method provides an accurate description of the likelihoods of each source, it does not help to significantly reduce the scatter and eliminate false correlations. Finally, panel (c) shows the HB results (in blue). The scatter is considerably reduced, compared to the previous cases. In addition, there is no more false correlation between the parameters. The sizes of the error ellipses have also been greatly reduced, especially for the lowest signal-to-noise ratio sources. However, notice that these uncertainties are still consistent with the true values. Panel (c) of Fig. 6 demonstrates, on a concrete case, the effect that panel (c) of Fig. 1 was trying to illustrate: the reduction of the dispersion of low signal-to-noise ratio sources by the prior. Panel (a) of Fig. 7 shows the non-HB posterior distribution (green density), for the brightest and faintest sources of the simulation, in the same parameter space as Fig. 6. The PDF of the faintest source clearly extends out of the range of the figure, as its SED has a median signal-to-noise ratio of only 0.66 (panel b of Fig. 5). Panel (b) of Fig. 7 shows the HB posteriors, for the same sources. Comparing the two panels, we see that the PDF of the brightest source is almost not modified by the prior. On the contrary, the PDF of the faintest source is brought back towards its true value. Comparing this PDF to its mean value and error ellipse, we see it is noticeably skewed. Figure 6. View largeDownload slide Efficiency of different methods at recovering parameters. In each panel, we show the true (in red) and the inferred values (stars with error ellipses) of the mass, M, and average starlight intensity, $$\bar{U}$$, for the central simulation of Section 4.1. Panel (a) shows the least-squares results (Appendix C; in green). Panel (b) shows the non-HB inference (in orange). Panel (c) shows the HB results (in blue). Figure 6. View largeDownload slide Efficiency of different methods at recovering parameters. In each panel, we show the true (in red) and the inferred values (stars with error ellipses) of the mass, M, and average starlight intensity, $$\bar{U}$$, for the central simulation of Section 4.1. Panel (a) shows the least-squares results (Appendix C; in green). Panel (b) shows the non-HB inference (in orange). Panel (c) shows the HB results (in blue). Figure 7. View largeDownload slide Posterior distributions of two sources. These are the brightest and faintest sources of the central simulation of Section 4.1. We show the same correlation between M and $$\bar{U}$$ as in Fig. 6, and keep the same dynamic range. Panel (a) shows the non-HB values (green density), corresponding to panel (b) of Fig. 6. Panel (b) shows the HB values (blue density), corresponding to panel (c) of Fig. 6. In each panel, the grey stars are the averages of the parameters over the posterior, and the grey ellipses are their covariances. The true values are shown in red, in both panels. Figure 7. View largeDownload slide Posterior distributions of two sources. These are the brightest and faintest sources of the central simulation of Section 4.1. We show the same correlation between M and $$\bar{U}$$ as in Fig. 6, and keep the same dynamic range. Panel (a) shows the non-HB values (green density), corresponding to panel (b) of Fig. 6. Panel (b) shows the HB values (blue density), corresponding to panel (c) of Fig. 6. In each panel, the grey stars are the averages of the parameters over the posterior, and the grey ellipses are their covariances. The true values are shown in red, in both panels. 4.3 The role of the prior distribution To illustrate the role of the prior distribution, Fig. 8 shows the results of the HB code on three of the simulations of Section 4.1: a warm SED, with n = 1000 sources. The three panels show the relation between the average starlight intensity, $$\bar{U}$$, and the PAH mass fraction, qPAH. In panel (a), the median signal-to-noise ratio (equation 45) is high (fS/N = 30). As a consequence, the parameters of each source are well constrained. The uncertainty ellipses have a characteristic size much smaller than the width of the distribution of parameters. The typical uncertainty in qPAH is indeed $$\sigma _{q^{{{\rm PAH}}}}\simeq 2\times 10^{-3}$$, while the standard deviation of the distribution along qPAH is S[qPAH] ≃ 0.01. The prior distribution is thus rather flat compared to the likelihood of an individual source. Therefore, the multiplication of the likelihood by the prior (equation 19) does not have a significant effect. As a result, the posterior distribution is close to the non-hierarchical case (e.g. Fig. 7). Figure 8. View largeDownload slide Demonstration of the role of the prior. The three panels show the results for the simulations of Section 4.1, with a warm SED, n = 1000 sources, and with median signal-to-noise ratios fS/N = 30 (panel a), fS/N = 3 (panel b), and fS/N = 0.3 (panel c). The red dots show the true values of the average starlight intensity, $$\bar{U}$$, and of the PAH mass fraction, qPAH. The blue stars and their error ellipses are the HB inference. Figure 8. View largeDownload slide Demonstration of the role of the prior. The three panels show the results for the simulations of Section 4.1, with a warm SED, n = 1000 sources, and with median signal-to-noise ratios fS/N = 30 (panel a), fS/N = 3 (panel b), and fS/N = 0.3 (panel c). The red dots show the true values of the average starlight intensity, $$\bar{U}$$, and of the PAH mass fraction, qPAH. The blue stars and their error ellipses are the HB inference. In contrast, when the median signal-to-noise ratio decreases (panels b and c of Fig. 8), the width of the parameter distribution is unchanged (S[qPAH] ≃ 0.01 for each panel), but the uncertainty on the parameters of each source increases. In panel (b), with fS/N = 3, the two quantities are roughly equal. The multiplication of the likelihood by the prior thus has an effect on the posterior distributions. In particular, the sources at low $$\bar{U}$$ have a lower signal-to-noise ratio. Their mean values (blue stars) tend to delineate the shape of the prior distribution. The size of their uncertainty ellipses is also reduced by the prior. In panel (c), with fS/N = 0.3, the signal-to-noise ratio is so low that the individual likelihoods are much larger than the prior. As a result, the posterior distribution is very close to the prior distribution. Consequently, all the mean values of qPAH (blue stars) are almost perfectly equal to the prior's average ($$\langle q^{{\rm{PAH}}}_i\rangle \simeq \mu [q^{{\rm{PAH}}}]$$). The uncertainty ellipses of the posterior distribution have the width of the prior distribution ($$\sigma _{q^{{\rm{PAH}}}_i}\simeq S[q^{{\rm{PAH}}}]$$). However, notice that along the horizontal axis, the parameter $$\bar{U}$$, which is better constrained, still exhibits an intrinsic distribution of values. If we were to decrease even more the signal-to-noise ratio, the blue stars would all collapse on to one single point in the panel with coordinates $$(\mu [\bar{U}],\mu [q^{{\rm{PAH}}}])$$. The HB method is particularly useful in cases like panel (c) of Fig. 8. It is in such a case that it provides results significantly better than non-HB and least-squares methods. In panel (c), the low signal-to-noise ratio prevents performing any relevant analysis of individual sources. However, the fact that the HB method deals with the whole probability distribution of the sample, allows us to recover average properties. In other words, any constraint, even with an extremely low signal-to-noise ratio, is taken into account in the HB approach. 4.4 Systematic analysis of the model's performances After scrutinizing select model's results, let us now study the performances of the HB method over the whole simulation grid of Section 4.1. In particular, we need to understand how close the derived parameters are from their true values. We are also interested in knowing how the HB method improves the results, compared to the χ2 fit. For that purpose, we define the following metric, for each parameter and hyperparameter, y:   \begin{equation} D[y] =\left\{ \begin{array}{lc} \displaystyle\frac{\left\langle y^{\rm{HB}}\right\rangle -y^{{\rm true}}}{\sigma _y^{{\rm {HB}}}} &\quad{\rm{for\, the\, HB\, case,}} \\ \displaystyle\frac{y_{\chi ^2}-y^{\rm{true}}}{\sigma _y^{\rm{HB}}} & \,\,\,{\rm for\, the\, }\chi ^2{\rm case,} \end{array} \right. \end{equation} (46)where 〈yHB〉 and $$\sigma _y^{{{HB}}}$$ are the mean and standard deviation over the MCMC of y, $$y_{\chi ^2}$$, the least-squares value, and ytrue, the true value. With this definition, we can study the relative deviation of the HB values, from their true values: a value |D[y]| ≤ Nσ means that the HB value is consistent within Nσ. In addition, we can directly compare the HB and χ2 deviations, as they have the same denominator. 4.4.1 Performances for the hyperparameters Fig. 9 compares the distributions of D[y] for all of the hyperparameters of the simulation grid (Section 4.1). Panel (a) shows the distribution of the recovered means, 〈μk〉 (Section 3.2.4). Since there are 27 models, with 7 parameters per model (Table 1), this distribution contains a total of 7 × 27 = 189 values. Similarly, panel (b) shows the distribution of the standard deviations, 〈Sk, k〉 (diagonal elements of $$\mathcal {S}$$; Section 3.2.4; 189 values). Panel (c) shows the distribution of the non-diagonal elements of the correlation matrix, $$\langle R_{k,k^\prime }\rangle$$ (Section 3.2.4; 7(7 − 1)/2 × 27 = 567 values). Panel (d) shows the calibration offsets, 〈ln (1 + δj)〉. There are 11 photometric filters (Section 4.1), thus this distribution contains 11 × 27 = 297 values. Figure 9. View largeDownload slide Recovery performances for the hyperparameters and calibration offsets. The four panels show the distribution of the relative deviation D[y] (equation 46), for the hyperparameters of all the simulations of Section 4.1. We have separated the hyperparameters by type: panel (a) shows the distribution for all the means, μk; panel (b), all the standard deviations, Sk, k; panel (c), all the correlation coefficients, $$R_{k,k^\prime }$$; and panel (d), all the calibration offsets, ln (1 + δj). For the χ2 method, we quote the mean relative residuals, as the calibration offsets. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. Figure 9. View largeDownload slide Recovery performances for the hyperparameters and calibration offsets. The four panels show the distribution of the relative deviation D[y] (equation 46), for the hyperparameters of all the simulations of Section 4.1. We have separated the hyperparameters by type: panel (a) shows the distribution for all the means, μk; panel (b), all the standard deviations, Sk, k; panel (c), all the correlation coefficients, $$R_{k,k^\prime }$$; and panel (d), all the calibration offsets, ln (1 + δj). For the χ2 method, we quote the mean relative residuals, as the calibration offsets. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. We can first note that the recovered values are tightly distributed around the true values. Most of the values of the hyperparameters are within 3σ of their true values. The quantity having the widest distribution is the standard deviation Sk, k. Secondly, comparing the HB to the χ2 results, we can see that the HB method systematically improves the results. In particular, the standard deviations, Sk, k, and correlation coefficients, $$R_{k,k^\prime }$$, are the quantities for which the improvement is the most notable. The χ2 distribution of Sk, k is clearly skewed towards positive values. It is due to the fact that the χ2 method always leads to more dispersed parameter distributions (e.g. Fig. 6, panel a). Table 2 quantifies the main properties of these histograms. The fraction of outliers (>3σ; column 4) is consistent with a Gaussian distribution, except for the elements of $$\mathcal {S}$$. The Sk, k values are notably more spread out. They also have the most skewed distribution (largest absolute 〈DHB[y]〉; column 2). Inspecting these histograms, we notice that most of the outliers correspond to the PAH charge fraction, f+ (equation 6). It is indeed the most degenerate parameter, with the collection of photometric filters we have chosen. This parameter controls mainly the 3.3, 11.2, and 12.7 μm PAH features. In our simulation grid, the only constraint on these bands is provided by the IRAC3.6 μm band, as can be seen in Fig. 2. However, this photometric band also constrains the stellar continuum, L⋆, and the calibration offset, $$\delta _{\rm {IRAC}_{\rm 3.6\,\mu m}}$$. In addition, the 3.3 μm-to-continuum ratio is often very low (e.g. panel b of Fig. 5). The value of f+ is thus very poorly constrained. All things considered, it is remarkable that the values of this parameter are properly recovered, in most of the cases. Table 2. Statistics of the recovery performances. For each parameter and hyperparameter, y, we quote the following properties of the histograms presented in Figs 9 and 10. (2) 〈DHB[y]〉 is the average of the distribution. (3,4) f(|DHB[y]| ≤ N) is the fraction of absolute values below N. (5) Max(|DHB[y]|) is the maximum deviation, in number of σ. (6) Finally, $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$ is the median of the ratio between the χ2 and HB absolute deviations. The latter quantifies by how much the parameter recovery has been improved. The last two lines show the corresponding confidence levels of two common probability distributions: a Gaussian and a Student's t distribution with f = 3 degrees of freedom. (1)  (2)  (3)  (4)  (5)  (6)  y  〈DHB[y]〉  f(|DHB[y]| ≤ 1)  f(|DHB[y]| ≤ 3)  Max(|DHB[y]|)  $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$  μk  −0.13  79.9  per cent  99.5  per cent  3.8  1.7  Sk, k  −0.52  56.1  per cent  91.5  per cent  5.8  9.0  $$R_{k,k^\prime }$$  0.0069  95.4  per cent  100.0  per cent  2.2  8.4  ln (1 + δj)  0.027  69.4  per cent  99.3  per cent  4.2  2.4  ln Mi  0.16  76.6  per cent  99.8  per cent  4.8  2.3  $$\ln \bar{U}_i$$  −0.29  78.6  per cent  99.9  per cent  5.0  2.7  $$q^{{\rm{PAH}}}_i$$  0.46  60.4  per cent  97.8  per cent  5.5  2.2  $$f^+_i$$  −0.072  68.2  per cent  99.2  per cent  5.0  3.6  Common probability distributions  Gaussian  0  68.3  per cent  99.7  per cent  –  –  Student's t (f = 3)  0  60.9  per cent  94.2  per cent  –  –  (1)  (2)  (3)  (4)  (5)  (6)  y  〈DHB[y]〉  f(|DHB[y]| ≤ 1)  f(|DHB[y]| ≤ 3)  Max(|DHB[y]|)  $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$  μk  −0.13  79.9  per cent  99.5  per cent  3.8  1.7  Sk, k  −0.52  56.1  per cent  91.5  per cent  5.8  9.0  $$R_{k,k^\prime }$$  0.0069  95.4  per cent  100.0  per cent  2.2  8.4  ln (1 + δj)  0.027  69.4  per cent  99.3  per cent  4.2  2.4  ln Mi  0.16  76.6  per cent  99.8  per cent  4.8  2.3  $$\ln \bar{U}_i$$  −0.29  78.6  per cent  99.9  per cent  5.0  2.7  $$q^{{\rm{PAH}}}_i$$  0.46  60.4  per cent  97.8  per cent  5.5  2.2  $$f^+_i$$  −0.072  68.2  per cent  99.2  per cent  5.0  3.6  Common probability distributions  Gaussian  0  68.3  per cent  99.7  per cent  –  –  Student's t (f = 3)  0  60.9  per cent  94.2  per cent  –  –  View Large Column (6) of Table 2 shows that the standard deviations and correlation coefficients are recovered a factor ≃8 times better for the HB method than with the χ2. Quoting the median instead of the mean of the ratio in column (6) is conservative, as there are more outliers with the χ2 method. 4.4.2 Performances for the parameters Fig. 10 shows the distributions of D[y] for the four most physically relevant parameters (M, $$\bar{U}$$, qPAH, and f+). Each of these panels contains the values of the parameter for each region in the 27 simulations (Section 4.1). Since, there are nine simulations with 10 regions, nine with 100, and nine with 1000, the total number of points in each of these distributions is 9990. The HB distributions are tightly centred around 0. The two best constrained parameters, M and $$\bar{U}$$, have a 3σ degree of confidence (column 4 of Table 2) close to a Gaussian. The two other parameters, qPAH and f+, which are more degenerate, are still well constrained. They are, however, slightly more spread out, with a fraction of outliers between a Gaussian and a Student's t distribution. On average, there is no clear bias. After inspection, we do not find any clear trend of these residuals with signal-to-noise ratio, SED shape or sample size. Figure 10. View largeDownload slide Recovery performances for the main parameters. The four panels show the distribution of the relative deviation D[y] (equation 46), for the main parameters of each source in all the simulations of Section 4.1. We focus here on the most important parameters: panel (a) shows the distribution of the dust mass per source, ln Mi; panel (b), the average starlight intensity, $$\bar{U}_i$$ (equation 10); panel (c), PAH mass fraction, $$q^{{\rm{PAH}}}_i$$; and panel (d), the charge fraction of PAHs, $$f^+_i$$. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. Figure 10. View largeDownload slide Recovery performances for the main parameters. The four panels show the distribution of the relative deviation D[y] (equation 46), for the main parameters of each source in all the simulations of Section 4.1. We focus here on the most important parameters: panel (a) shows the distribution of the dust mass per source, ln Mi; panel (b), the average starlight intensity, $$\bar{U}_i$$ (equation 10); panel (c), PAH mass fraction, $$q^{{\rm{PAH}}}_i$$; and panel (d), the charge fraction of PAHs, $$f^+_i$$. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. The comparison to the χ2 distribution demonstrates that the parameters are recovered ≃2–3 times better with the HB method (column 6 of Table 2). In addition, the outliers are more numerous with the χ2 method, as the χ2 uncertainties are less reliable (Section 4.2.3). 4.5 The integrated autocorrelation times The integrated autocorrelation times, τint (equation 43), for each parameter and hyperparameters of the simulation grid are represented in Fig. 11. We can see that the calibration offsets and the elements of the covariance matrix converge faster than the averages and the parameters. The inspection of these distributions does not show any obvious trend of τint with signal-to-noise ratio, sample size or SED shape. Overall, the maximum time in the whole grid is τint ≃ 3 × 104. It means that, to make sure the MCMC has converged towards the stationary posterior, a chain of length of NMCMC ≃ 106, after the burn-in phase, is adequate. Figure 11. View largeDownload slide Integrated autocorrelation times for the simulation grid. This figure shows the distribution of τint (equation 43) for the distribution of individual parameters (panel a, red); the calibration offsets, ln (1 + δj) (panel a, green); the averages, μk (panel b, blue); and the elements of the covariance matrix, Sk, k and $$R_{k,k^\prime }$$ (panel b, purple). To build the histogram of the parameters (panel a), we have normalized the number of points by the number of sources, n, in order to give the same weight to each simulation. Figure 11. View largeDownload slide Integrated autocorrelation times for the simulation grid. This figure shows the distribution of τint (equation 43) for the distribution of individual parameters (panel a, red); the calibration offsets, ln (1 + δj) (panel a, green); the averages, μk (panel b, blue); and the elements of the covariance matrix, Sk, k and $$R_{k,k^\prime }$$ (panel b, purple). To build the histogram of the parameters (panel a), we have normalized the number of points by the number of sources, n, in order to give the same weight to each simulation. Knowing where the burn-in phase ends is however more difficult. One can, for example, run several parallel chains, with different initial conditions. In our case, since we are applying our code to simulated data, we know the value towards which each parameter should converge. It is thus possible to estimate if the burn-in phase is passed. Inspecting our results, we did not find any parameter where the burn-in phase lasts longer than a few τint. This reassuring feature seems to be the consequence of properly chosen initial conditions: the least-squares values (Section 3.3.1). Indeed, starting the chain with random initial conditions can lead to burn-in phases longer than 106. In this paper, we have excluded the first 105 steps of each MCMC to account for burn-in. 5 VARIATIONS ON THE SIMULATION GRID In this section, we demonstrate the performances of our HB code on additional effects, that were not covered by the simulation grid of Section 4.1. 5.1 The presence of intrinsic correlations The model grid studied in Section 4 was simulated with no intrinsic correlation between parameters (R[all] = 0; Table 1). The purpose was to stay as general as possible. However, our HB code is designed to efficiently recover correlation coefficients between parameters. To demonstrate its efficiency, we have simulated two samples, with n = 100 sources, and a median signal-to-noise ratio, fS/N = 3. The distribution of hyperparameters is based on the warm SED shape (Table 1), with the following modifications: S[ln U−] = 0.6, S[ln ΔU] = 0.7, S[qPAH] = 0.2, R[ln M, ln U−] = ς 0.5, R[ln M, ln ΔU] = ς 0.5, R[ln M, qPAH] = −ς 0.5, R[ln M, f+] = ς 0.5, R[ln U−, ln ΔU] = 0.5, R[ln U−, qPAH] = −0.5, R[ln U−, f+] = 0.5, R[ln ΔU, qPAH] = −0.5, R[ln ΔU, f+] = 0.5, and R[qPAH, f+] = −0.5. The parameter ς is set to ς = −1, to induce a negative correlation between M and $$\bar{U}$$, and to ς = 1, to induce a positive correlation. Fig. 12 shows the results of the χ2 and HB codes on these two simulations. As noted on the previous examples, the χ2 method leads to a more dispersed distribution of values (panels a and c). More interestingly here, we see that when the true correlation is negative (panel a; ρ = −0.55), the χ2 method finds a tighter correlation (ρ = −0.85); when the true correlation is positive (panel c; ρ = 0.62), the χ2 method finds a negative correlation (ρ = −0.68). Figure 12. View largeDownload slide Efficiency of the method in presence of intrinsic correlations. Each panel shows the relation between the dust mass, M, and the average starlight intensity, $$\bar{U}$$. The true values are the red dots. Panels (a) and (b) show the results applied to simulations with an intrinsic negative correlation. Panels (c) and (d) show the results with a positive correlation. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. We quote, in each panel, the true and inferred values of the correlation coefficient, ρ, between ln M and $$\ln \bar{U}$$. Figure 12. View largeDownload slide Efficiency of the method in presence of intrinsic correlations. Each panel shows the relation between the dust mass, M, and the average starlight intensity, $$\bar{U}$$. The true values are the red dots. Panels (a) and (b) show the results applied to simulations with an intrinsic negative correlation. Panels (c) and (d) show the results with a positive correlation. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. We quote, in each panel, the true and inferred values of the correlation coefficient, ρ, between ln M and $$\ln \bar{U}$$. In contrast, the HB method is able to consistently recover the correlation coefficients, whether the true correlation is negative (panel b) or positive (panel d). The quoted correlation coefficients, in panels b and d, for the HB method, have been derived directly from the MCMC, as explained in Section 3.3.3. 5.2 Effect of the wavelength coverage It is obvious that the wavelength coverage has an impact on the recovery of the parameters. However, the nature of this impact on the HB results is not trivial. We have generated two simulations to illustrate this effect. We start from the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3) and make the following modifications. Far-IR coverage: we remove the three SPIRE bands from the set of constraints. The longest wavelength constraint is thus moved from λ = 500 μm to λ = 160 μm. Mid-IR coverage: We add the following WISE and Akari/IRC bands: WISE3.4 μm, WISE4.6 μm, WISE11.6 μm, WISE22.1 μm, IRC3.2 μm, IRC4.1 μm, IRC7.0 μm, IRC9.0 μm, IRC11 μm, IRC15 μm, and IRC18 μm. Fig. 13 shows the results of the HB code, with reduced far-IR coverage. The parameters which are the most affected by this modification are those controlling the shape of the far-IR part of the SED, mainly M and $$\bar{U}$$. Comparing panels (a) and (c), we see that, without long wavelength constraints, the χ2 results are significantly more dispersed. The noise-induced correlation, discussed in Sections 4.2.3 and 5.1, is enhanced in panel (c). On the contrary, comparing panels (b) and (d), we see that the HB inference does not result in an increase of the dispersion, nor any false correlation. Most true values are within 1σ of their HB results. We note that decreasing the wavelength coverage has the same qualitative effect as decreasing the signal-to-noise ratio (see Section 4.3): the prior becomes dominant in the posterior distribution. Figure 13. View largeDownload slide Effect of the far-IR wavelength coverage. The four panels show the results of different methods, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the dust mass, M, and average starlight intensity, $$\bar{U}$$ equation (10). Panels (a) and (b) correspond to the case where the whole wavelength coverage is used. Panels (c) and (d) show the results obtained, excluding the three SPIRE bands, from the set of constraints. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. Figure 13. View largeDownload slide Effect of the far-IR wavelength coverage. The four panels show the results of different methods, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the dust mass, M, and average starlight intensity, $$\bar{U}$$ equation (10). Panels (a) and (b) correspond to the case where the whole wavelength coverage is used. Panels (c) and (d) show the results obtained, excluding the three SPIRE bands, from the set of constraints. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. The mid-IR coverage has an effect, mainly on the PAH mass and charge fractions, qPAH and f+. We discussed in Section 4.4.1 the fact that f+ was particularly poorly constrained. This can be seen in panel (a) of Fig. 14. Increasing the mid-IR wavelength coverage (panel b), this problem is solved and the distribution of parameters is now more accurately recovered. Figure 14. View largeDownload slide Effect of the mid-IR wavelength coverage. The two panels show the HB results, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the PAH mass fraction, qPAH, and PAH charge fraction, f+. The blue stars and uncertainty ellipses are the HB results. Panel (a) corresponds to the original coverage with Spitzer and Herschel photometry, only. Panel (b) shows the results increasing the mid-IR coverage, by adding WISE and Akari bands. Figure 14. View largeDownload slide Effect of the mid-IR wavelength coverage. The two panels show the HB results, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the PAH mass fraction, qPAH, and PAH charge fraction, f+. The blue stars and uncertainty ellipses are the HB results. Panel (a) corresponds to the original coverage with Spitzer and Herschel photometry, only. Panel (b) shows the results increasing the mid-IR coverage, by adding WISE and Akari bands. 5.3 The introduction of an external parameter As we start to see here, the HB method is particularly efficient at recovering intrinsic correlations between parameters of the SED model. However, the correlation of these SED parameters with other quantities (e.g. gas mass, metallicity, etc.) is also relevant. In order to treat these external parameters in the HB framework, we need to extend the prior distribution to them (see Appendix B3). Although these external parameters are not free parameters, their mean values are going to be modified consistently with their uncertainties. To demonstrate this process, we have drawn an SED sample and an associated distribution of gas mass Mgas (assuming a typical Galactic Mgas/Mdust ≃ 100). This sample has n = 300 sources and a median signal-to-noise ratio of fS/N = 0.3. We adopt the warm SED distribution (Section 4.1), with the following modifications: S[ln U−] = 0.6, S[ln ΔU] = 0.7, μ[ln Mgas] = ln (100 M⊙), S[ln Mgas] = 0.5, and R[ln Mdust, ln Mgas] = 0.9, where we have noted the dust mass Mdust to avoid confusion. We attribute a 10  per cent uncertainty to the gas mass. Fig. 15 compares the results obtained with and without prior extension. Panel (a) shows that, when Mgas is not in the prior, the correlation found is significantly weaker (ρ = 0.49 ± 0.04) than the intrinsic one (ρ = 0.9). Notice however that the inferred dust masses are consistent with the true values. The largest discrepancies (2–3σ) happen at low column density, where signal-to-noise ratio is low. Panel (b) presents the results with extension of the prior. The correlation between the two parameters is better recovered (ρ = 0.78 ± 0.04). The efficiency of extending the prior to external parameters is increased if the number of sources is larger. Figure 15. View largeDownload slide Extension of the prior to an external parameter. The two panels show the relation between the dust mass, noted Mdust, for clarity here, and the gas mass, Mgas, a parameter not controlling the SED model. The red dots are the true values in both panels. The blue stars and uncertainty bars/ellipses are the HB inference. Panel (a) shows the standard result, with the prior of equation (19): Mdust is derived from the HB code and Mgas is plotted as is. Panel (b) shows the result, introducing the gas mass as an extra parameter in the prior (see Appendix B3). We show error bars rather than ellipses in panel (a), because the two plotted quantities do not come from the same joint PDF. Figure 15. View largeDownload slide Extension of the prior to an external parameter. The two panels show the relation between the dust mass, noted Mdust, for clarity here, and the gas mass, Mgas, a parameter not controlling the SED model. The red dots are the true values in both panels. The blue stars and uncertainty bars/ellipses are the HB inference. Panel (a) shows the standard result, with the prior of equation (19): Mdust is derived from the HB code and Mgas is plotted as is. Panel (b) shows the result, introducing the gas mass as an extra parameter in the prior (see Appendix B3). We show error bars rather than ellipses in panel (a), because the two plotted quantities do not come from the same joint PDF. 6 APPLICATION TO OTHER PHYSICAL MODELS 6.1 Modified black bodies As discussed in Section 2.2.2, MBB is the most widely used dust model, due to its simplicity. The HB dust codes that have been previously presented in the literature all implement it (Kelly et al. 2012; Juvela et al. 2013; Veneziani et al. 2013; Marsh, Whitworth & Lomax 2015). In this section, we apply our HB code to one MBB simulation, in order to confirm its ability to correct the noise-induced negative correlation between T and β (Shetty et al. 2009). This simulation is designed to mimic typical Planck observations of the diffuse Galactic ISM (e.g. Planck Collaboration XVII 2014a; Ysard et al. 2015). We assume: μ[ln M] = 0, μ[ln T] = ln (20.5 K), μ[β] = 1.65, S[ln M] = 0.05, S[ln T] = 0.05, S[β] = 0.01, and R[all] = 0. We draw n = 1000 sources, observed through the following COBE and Planck photometric bands: DIRBE100 μm, DIRBE140 μm, DIRBE240 μm, HFI350 μm, HFI550 μm, and HFI850 μm. We set the median signal-to-noise ratio to fS/N = 10. The results are presented in Fig. 16. Panel (a) displays the least-squares values. Notice the well-known false β–T correlation. Panel (b) demonstrates that the HB method can indeed accurately correct this false correlation, as was also shown by Kelly et al. (2012). Paper II will present more MBB simulations to discuss robustness. Figure 16. View largeDownload slide Application to MBB models. The two panels represent the temperature and emissivity index of a simulation with n = 1000 sources having an MBB SED (Section 2.2.2). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 16. View largeDownload slide Application to MBB models. The two panels represent the temperature and emissivity index of a simulation with n = 1000 sources having an MBB SED (Section 2.2.2). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 6.2 Broken emissivity modified black bodies BEMBB models (Section 2.2.3) have been used by Gordon et al. (2014) and Roman-Duval et al. (2014) to model the Magellanic clouds. To roughly mimic these data, we simulate a sample of n = 300 sources, observed through: PACS100 μm, PACS160 μm, SPIRE250 μm, SPIRE350 μm, and SPIRE500 μm. The parameters are distributed as: μ[ln M] = 0, S[ln M] = 0.05, μ[ln T] = ln (20.5 K), S[ln T] = 0.05, μ[β1] = 1.65, S[β1] = 0.01, μ[β2] = 1.65, S[β2] = 0.01, μ[ln νb] = ln (c/350 μm), S[ln νb] = 0.01, and R[all] = 0. We assume that the median signal-to-noise ratio is fS/N = 10. One of the derived quantities discussed by Gordon et al. (2014) is the submm excess, e500, defined as the relative difference between the actual emission at 500 μm and the emission of an MBB with the same temperature and β = β1:   \begin{equation} e_{500} = \frac{1-\nu _{500}^{\beta _1}}{\nu _{500}^{\beta _2}\nu _b^{\beta _1-\beta _2}}, \end{equation} (47)where ν500 = c/500 μm. Since μ[β1] = μ[β2], our simulation exhibits on average a zero excess. Fig. 17 presents the results. Panel (a) shows that the χ2 method is unable to recover any meaningful excess. It is due to the fact that the parameters of this model are extremely degenerate. On the contrary the HB approach is able to provide a tight distribution, consistent with the true values (panel b). Figure 17. View largeDownload slide Application to BEMBB models. The two panels represent the temperature and 500 μm excess, e500, of a simulation with n = 300 sources having a BEMBB SED (Section 2.2.3). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 17. View largeDownload slide Application to BEMBB models. The two panels represent the temperature and 500 μm excess, e500, of a simulation with n = 300 sources having a BEMBB SED (Section 2.2.3). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 6.3 Multicomponent dust SED models The number of combinations of dust models is almost infinite. We end this section with a widely used one, introduced by Draine & Li (2007). It consists in the linear combination of a powerU (Section 2.2.5), a deltaU (Section 2.2.4), and a starBB (Section 2.2.6) component. It is aimed at approximating the multiphase nature of the ISM of galaxies, the deltaU component being attributed to diffuse ISM, and the powerU component to hotter PDRs. Several parameters are tied: $$U_{ {\it deltaU}} = U_-^{{\it powerU}}$$, $$q^{{\rm{PAH}}}_{\rm {\it deltaU}}=q^{{\rm{PAH}}}_{\rm {\it powerU}}$$, and $$f^+_{\rm {\it deltaU}}=f^+_{\rm {\it powerU}}$$. In addition, some parameters are fixed: ΔUpowerU = 106 and αpowerU = 2. This way, the powerU component represents dust hotter than the deltaU component and can account for the contribution of star-forming regions to the mid-IR emission of galaxies. The two PAH parameters are tied, as they would otherwise be degenerate. The mass fraction of the powerU component is   \begin{equation} \gamma = \frac{M_{{\it powerU}}}{M_{{\it powerU}}+M_{{\it deltaU}}}. \end{equation} (48)We have performed one simulation to illustrate the performances of the HB method with this model. We have drawn parameters from the warm SED distribution (Section 4.1), with n = 100 sources, and a median signal-to-noise ratio, fS/N = 1, with the following modifications: μ[ln MpowerU] = ln 0.01 and $$\mu [\ln U_-^{{\rm {\it powerU}}}]=\ln 1$$. We have added the deltaU component with μ[ln MdeltaU] = ln 1. The results are presented in Fig. 18. It shows the relation between the mass of the two components, Mtot = MpowerU + MdeltaU, and the parameter γ. As can be seen in panel (a), with the χ2 method, γ is completely degenerate. It spreads the whole range of values between 0 and 100  per cent. On the contrary, the HB method is able to significantly reduce the uncertainties and provide a relevant distribution of parameters, consistent with their true values. Figure 18. View largeDownload slide Application to a multiphase dust model. The two panels represent the total dust mass, Mtot, and the mass fraction of non-uniformly illuminated dust, γ (equation 48), of a simulation with n sources. The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 18. View largeDownload slide Application to a multiphase dust model. The two panels represent the total dust mass, Mtot, and the mass fraction of non-uniformly illuminated dust, γ (equation 48), of a simulation with n sources. The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 7 SUMMARY AND CONCLUSION This is the first article in a series of two papers presenting a new model, HerBIE, aimed at deriving dust parameters from IR observations. The main originality of this model is to apply HB inference to full dust models, taking into account realistic optical properties, size distribution, stochastic heating, and distribution of starlight intensities, as well as colour correction and correlated calibration uncertainties. Simply put, the HB method consists of sampling the probability distribution of the physical parameters of an ensemble of sources. This distribution of parameters is modelled with a prior distribution controlled by hyperparameters. The inferred prior distribution does not significantly modify the PDF of sources for which the likelihood is much narrower than the dispersion of the distribution of parameters (Sections 4.2.3 and 4.3). However, the prior has an important effect on sources with low signal-to-noise ratios. We have described the formalism of our model and its numerical implementation (Sections 2–3). We have subsequently applied our model to synthetic observations, in order to quantify its performances (Sections 4–6). The main conclusions are the following. We have compared the performances of least-squares, non-hierarchical, and hierarchical Bayesian methods (Section 4.2.3). We have shown that, although the non-HB approach is better than least squares at estimating the uncertainties on derived parameters, it is not able to correct noise-induced correlations and scatter. On the contrary, the HB approach is particularly efficient at recovering true correlations between parameters and their intrinsic scatter (Sections 4.4 and 5.1). The HB inferred values are also consistently closer to their true values than the least-squares values (Section 4.4). We have systematically studied the performances of our model, varying signal-to-noise ratio, SED shape, and sample size (Section 4.4). We have shown that the recovered parameters are distributed symmetrically around their true values. We did not find any clear bias. The scatter of the best constrained parameters from their true values are close to a normal distribution, with almost 100  per cent of the distribution within 3σ (Section 4.4.2). Poorly constrained, degenerate, parameters are also consistently recovered, but with a fraction of outliers closer to a Student's t distribution. The HB method can reasonably compensate for a partial lack of spectral coverage (Section 5.2). We have demonstrated that one can easily include external parameters in the prior distribution in order to improve the recovery of a potential correlation between these external parameters and the dust properties (Section 5.3). We have applied our HB code with other physical models (different types of MBBs and multicomponent dust SED models; Section 6). We have demonstrated that it works well in these cases, too, and that it can help obtain better constraints on degenerate parameters. We have discussed the issues of convergence towards the stationary posterior. Our choice of starting the MCMC at the least-squares values appears to lead to relatively short burn-in phases. The integrated autocorrelation times of the simulations in this paper are all shorter than a few 104 (Section 4.5). Having implemented ancillarity–sufficiency interweaving strategy to all parameters, we have demonstrated that it significantly reduces the autocorrelation of the Markov chain (Section 4.2.1). The HB technique is quite general (e.g. Shahmoradi 2017). It has been successfully applied to several fields in astrophysics, among others: luminosity distribution of γ-ray bursts (Loredo & Wasserman 1998); supernova studies (Mandel et al. 2009); exoplanet eccentricities (Hogg, Myers & Bovy 2010); galaxy clusters (Andreon & Hurn 2010); Milky Way satellites (Martinez 2015); exoplanet mass–radius relationship (Wolfgang & Lopez 2015).4 We have demonstrated in this paper that it remains accurate over a wide range of sample sizes, source properties, signal-to-noise ratios, and number of model parameters. Such a method can in principle be applied to any field. However, as discussed in Section 4.3, it is relevant mainly when the typical uncertainty on a model parameter is comparable or larger than the typical dispersion of this parameter through the sample. If, on the contrary, the uncertainties on the parameters are much smaller than the dispersion of the sample, this method does not provide significant improvements compared to simpler techniques. The second article of this series (Paper II) will address the robustness of our code. For that purpose, we will apply it to data simulated with different hypotheses from the model (noise distribution, physical components, and distribution of parameters; see Section 4). In parallel, several up-coming papers are presenting the application of this model to real astrophysical data: the Magellanic clouds (Galliano 2017; Galliano et al., in preparation), the Carina nebula (Wu et al., in preparation), the dwarf galaxy IC 10 (Lianou et al., in preparation), and the DustPedia galaxy sample (Davies et al. 2017). Overall, observational dust studies need to emancipate themselves from dated techniques. Although a MBB least-squares fit can be useful in some cases, the development of modern instrumentation renders it more and more obsolete. More precise observations mean that we can access the complexity of the ISM conditions, beyond the simple isothermal approximation. In addition, more accurate fluxes call for an optimal way to extract relevant information from the data. With the model presented here, we have pursued this goal, hoping it will contribute to refining our understanding of dust evolution. Adopting such an approach represents a significant investment in terms of computation time and storage (Appendix D). However, this will naturally be facilitated in the near future by the increase of computational performances. Indeed, nowadays, the industry favours the development of large multicore platforms and Gibbs sampling can be reasonably well parallelized. Finally, this type of study would benefit from a better synergy between observers and instrumentalists. Ideally, the development of new instruments and their reduction pipelines should account for the progress in data analysis techniques, and reciprocally. In particular, the various sources of uncertainties, their correlations, their biases, etc. should be considered as important as the measured fluxes. Their accurate knowledge should be planned in the design of the new detectors, and thoroughly documented. This will be particularly relevant for future IR missions like SPICA (e.g. van der Tak et al. 2018). ACKNOWLEDGEMENTS I thank Jean-Philippe Bernard, Karl Gordon, Mika Juvela, Martin Kilbinger, and Kirill Tchernyshyov for fruitful discussions about Bayesian techniques and SED modelling, Anthony Jones for sharing his expertise on dust evolution, and Sébastien Fromang and Pierre-François Lavallée for coding advice. Sophia Lianou provided me with several bug reports, running earlier versions of the code. Finally, I thank Sacha Hony, Suzanne Madden, and Ronin Wu, as well as an anonymous referee, for providing me with useful comments that helped clarify the manuscript. I acknowledge support from the EU FP7 project DustPedia (Grant No. 606847), from the Agence Nationale pour la Recherche (ANR) through the program SYMPATICO (Programme Blanc, Projet ANR-11-BS56-0023) and through the PRC program 1311 between CNRS (France) and JSPS (Japan). This work was supported by the Programme National ‘Physique et Chimie du Milieu Interstellaire’ (PCMI) of CNRS/INSU with INC/INP co-funded by CEA and CNES. Footnotes 1 Except in extremely low-metallicity systems, where X-rays could be the dominant heating source (Lebouteiller et al. 2017). 2 Grains with a ≲ 10 nm are usually stochastically heated, and cannot be modelled with a single temperature; grains with a ≳ 0.1 μm have a size dependent equilibrium temperature, due to their grey UV extinction. 3 The lag is the difference between two times (two steps) in the MCMC. 4 List found on astrostatistics.psu.edu/RLectures/hierarchical.pdf 5 https://support.hdfgroup.org/HDF5/ REFERENCES Allamandola L. J., Hudgins D. M., Sandford S. A., 1999, ApJ , 511, L115 https://doi.org/10.1086/311843 CrossRef Search ADS   Andreon S., Hurn M. A., 2010, MNRAS , 404, 1922 Asano R. S., Takeuchi T. T., Hirashita H., Inoue A. K., 2013, Earth Planets Space , 65, 213 https://doi.org/10.5047/eps.2012.04.014 CrossRef Search ADS   Bakes E. L. O., Tielens A. G. G. M., 1994, ApJ , 427, 822 https://doi.org/10.1086/174188 CrossRef Search ADS   Barnard J., McCulloch R., Meng X.-L., 2000, Stat. Sin. , 10, 1281 Bron E., Le Bourlot J., Le Petit F., 2014, A&A , 569, A100 CrossRef Search ADS   Chastenet J., Bot C., Gordon K. D., Bocchio M., Roman-Duval J., Jones A. P., Ysard N., 2017, A&A , 601, A55 CrossRef Search ADS   Chevance M. et al.  , 2016, A&A , 590, A36 CrossRef Search ADS   Compiègne M. et al.  , 2011, A&A , 525, A103 CrossRef Search ADS   Coupeaud A. et al.  , 2011, A&A , 535, A124 CrossRef Search ADS   da Cunha E., Charlot S., Elbaz D., 2008, MNRAS , 388, 1595 https://doi.org/10.1111/j.1365-2966.2008.13535.x CrossRef Search ADS   Dale D. A., Helou G., Contursi A., Silbermann N. A., Kolhatkar S., 2001, ApJ , 549, 215 https://doi.org/10.1086/319077 CrossRef Search ADS   Davies J. I. et al.  , 2017, PASP , 129, 044102 https://doi.org/10.1088/1538-3873/129/974/044102 CrossRef Search ADS   De Cia A., Ledoux C., Savaglio S., Schady P., Vreeswijk P. M., 2013, A&A , 560, A88 CrossRef Search ADS   De Vis P. et al.  , 2017, MNRAS , 471, 1743 https://doi.org/10.1093/mnras/stx981 CrossRef Search ADS   Draine B. T., 2003, ARA&A , 41, 241 CrossRef Search ADS   Draine B. T., Li A., 2007, ApJ , 657, 810 https://doi.org/10.1086/511055 CrossRef Search ADS   Draine B. T. et al.  , 2007, ApJ , 663, 866 https://doi.org/10.1086/518306 CrossRef Search ADS   Fitzpatrick E. L., Massa D., 2007, ApJ , 663, 320 https://doi.org/10.1086/518158 CrossRef Search ADS   Foreman-Mackey D. et al.  , 2013, Astrophysics Source Code Library , record ascl.soft03002 Galametz M., Madden S. C., Galliano F., Hony S., Bendo G. J., Sauvage M., 2011, A&A , 532, A56 CrossRef Search ADS   Galametz M. et al.  , 2016, MNRAS , 456, 1767 https://doi.org/10.1093/mnras/stv2773 CrossRef Search ADS   Galliano F., 2017, Planet. Space Sci. , 149, 38 https://doi.org/10.1016/j.pss.2017.09.006 CrossRef Search ADS   Galliano F., Dwek E., Chanial P., 2008a, ApJ , 672, 214 CrossRef Search ADS   Galliano F., Madden S. C., Tielens A. G. G. M., Peeters E., Jones A. P., 2008b, ApJ , 679, 310 CrossRef Search ADS   Galliano F. et al.  , 2011, A&A , 536, A88 CrossRef Search ADS   Gelman A., Carlin J., Stern H., Rubin D., 2004, Bayesian Data Analysis . Chapman & Hall, Boca Raton Geman S., Geman D., 1984, IEEE Trans. Pattern Anal. Mach. Intell. , 6, 721 https://doi.org/10.1109/TPAMI.1984.4767596 CrossRef Search ADS PubMed  Gordon K. D., Clayton G. C., Misselt K. A., Landolt A. U., Wolff M. J., 2003, ApJ , 594, 279 https://doi.org/10.1086/376774 CrossRef Search ADS   Gordon K. D. et al.  , 2014, ApJ , 797, 85 https://doi.org/10.1088/0004-637X/797/2/85 CrossRef Search ADS   Grenier I. A., Casandjian J.-M., Terrier R., 2005, Science , 307, 1292 https://doi.org/10.1126/science.1106924 CrossRef Search ADS PubMed  Guhathakurta P., Draine B. T., 1989, ApJ , 345, 230 https://doi.org/10.1086/167899 CrossRef Search ADS   Guillet V. et al.  , 2017, preprint (arXiv:1710.04598) Hildebrand R. H., 1983, QJRAS , 24, 267 Hogg D. W., Myers A. D., Bovy J., 2010, ApJ , 725, 2166 https://doi.org/10.1088/0004-637X/725/2/2166 CrossRef Search ADS   Hollenbach D. J., Tielens A. G. G. M., 1997, ARA&A , 35, 179 CrossRef Search ADS   Jones A. P., Köhler M., Ysard N., Bocchio M., Verstraete L., 2017, A&A , 602, A46 CrossRef Search ADS   Juvela M., Montillaud J., Ysard N., Lunttila T., 2013, A&A , 556, A63 CrossRef Search ADS   Kelly B. C., 2011, JCGS , 20, 584 Kelly B. C., Shetty R., Stutz A. M., Kauffmann J., Goodman A. A., Launhardt R., 2012, ApJ , 752, 55 https://doi.org/10.1088/0004-637X/752/1/55 CrossRef Search ADS   Kimura H., 2016, MNRAS , 459, 2751 https://doi.org/10.1093/mnras/stw820 CrossRef Search ADS   Köhler M., Stepnik B., Jones A. P., Guillet V., Abergel A., Ristorcelli I., Bernard J.-P., 2012, A&A , 548, A61 CrossRef Search ADS   Köhler M., Ysard N., Jones A. P., 2015, A&A , 579, A15 CrossRef Search ADS   Le Bourlot J., Le Petit F., Pinto C., Roueff E., Roy F., 2012, A&A , 541, A76 CrossRef Search ADS   Lebouteiller V. et al.  , 2017, A&A , 602, A45 CrossRef Search ADS   Leroy A. K. et al.  , 2011, ApJ , 737, 12 https://doi.org/10.1088/0004-637X/737/1/12 CrossRef Search ADS   Li A., Draine B. T., 2001, ApJ , 554, 778 https://doi.org/10.1086/323147 CrossRef Search ADS   Lisenfeld U., Ferrara A., 1998, ApJ , 496, 145 https://doi.org/10.1086/305354 CrossRef Search ADS   Loredo T. J., Wasserman I. M., 1998, ApJ , 502, 75 https://doi.org/10.1086/305870 CrossRef Search ADS   Mandel K. S., Wood-Vasey W. M., Friedman A. S., Kirshner R. P., 2009, ApJ , 704, 629 https://doi.org/10.1088/0004-637X/704/1/629 CrossRef Search ADS   Markwardt C. B., 2009, in Bohlender D. A., Durand D., Dowler P., eds, ASP Conf. Ser. Vol. 411, Astronomical Data Analysis Software and Systems XVIII . Astron. Soc. Pac., San Francisco, p. 251 Marsh K. A., Whitworth A. P., Lomax O., 2015, MNRAS , 454, 4282 https://doi.org/10.1093/mnras/stv2248 CrossRef Search ADS   Martinez G. D., 2015, MNRAS , 451, 2524 https://doi.org/10.1093/mnras/stv942 CrossRef Search ADS   Mathis J. S., Mezger P. G., Panagia N., 1983, A&A , 128, 212 Moré J. J., Sorensen D. C., Hillstrom K. E., Garbow B. S., 1984, in Cowell W. J., ed., The MINPACK Project, in Sources and Development of Mathematical Software , Prentice-Hall, p. 88 Paradis D. et al.  , 2010, A&A , 520, L8 CrossRef Search ADS   Planck Collaboration XXIV, 2011a, A&A , 536, A24 CrossRef Search ADS   Planck Collaboration XVI, 2011b, A&A , 536, A16 CrossRef Search ADS   Planck Collaboration XIX, 2011c, A&A , 536, A19 CrossRef Search ADS   Planck Collaboration XVII, 2014a, A&A , 566, A55 CrossRef Search ADS   Planck Collaboration VIII, 2014b, A&A , 571, A8 CrossRef Search ADS   Press W. H., Teukolsky S. A., Vetterling W. T., Flannery B. P., 2007, Numerical Recipes , 3rd edn: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge Rémy-Ruyer A. et al.  , 2014, A&A , 563, A31 CrossRef Search ADS   Röllig M. et al.  , 2007, A&A , 467, 187 CrossRef Search ADS   Roman-Duval J. et al.  , 2014, ApJ , 797, 86 https://doi.org/10.1088/0004-637X/797/2/86 CrossRef Search ADS   Roy A. et al.  , 2013, ApJ , 763, 55 https://doi.org/10.1088/0004-637X/763/1/55 CrossRef Search ADS   Serra P., Amblard A., Temi P., Burgarella D., Giovannoli E., Buat V., Noll S., Im S., 2011, ApJ , 740, 22 https://doi.org/10.1088/0004-637X/740/1/22 CrossRef Search ADS   Shahmoradi A., 2017, preprint (arXiv:e-prints) Shetty R., Kauffmann J., Schnee S., Goodman A. A., 2009, ApJ , 696, 676 https://doi.org/10.1088/0004-637X/696/1/676 CrossRef Search ADS   Siebenmorgen R., Voshchinnikov N. V., Bagnulo S., 2014, A&A , 561, A82 CrossRef Search ADS   Sokal A., 1997, in DeWitt-Morette C., Cartier P., Folacci A., eds, Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms. Functional Integration. NATO ASI Series (Series B: Physics), vol 361 . Springer. Boston, MA Stepnik B. et al.  , 2003, A&A , 398, 551 CrossRef Search ADS   van der Tak F. F. S. et al.  , 2018, PASA , 35, 2 CrossRef Search ADS   Veneziani M., Piacentini F., Noriega-Crespo A., Carey S., Paladini R., Paradis D., 2013, ApJ , 772, 56 https://doi.org/10.1088/0004-637X/772/1/56 CrossRef Search ADS   Villani M., Larsson R., 2006, Commun. Stat. - Theory Methods , 35, 1123 https://doi.org/10.1080/03610920600672252 CrossRef Search ADS   Weingartner J. C., Draine B. T., 2001, ApJS , 134, 263 https://doi.org/10.1086/320852 CrossRef Search ADS   Wolfgang A., Lopez E., 2015, ApJ , 806, 183 https://doi.org/10.1088/0004-637X/806/2/183 CrossRef Search ADS   Wraith D., Kilbinger M., Benabed K., Cappé O., Cardoso J.-F., Fort G., Prunet S., Robert C. P., 2009, Phys. Rev. D , 80, 023507 https://doi.org/10.1103/PhysRevD.80.023507 CrossRef Search ADS   Wu R. et al.  , 2015, A&A , 575, A88 CrossRef Search ADS   Ysard N., Köhler M., Jones A., Miville-Deschênes M.-A., Abergel A., Fanciullo L., 2015, A&A , 577, A110 CrossRef Search ADS   Yu Y., Meng X.-L., 2011, JCGS , 20, 531 Zafar T., Watson D., 2013, A&A , 560, A26 CrossRef Search ADS   Zhukovska S., 2014, A&A , 562, A76 CrossRef Search ADS   Zubko V., Dwek E., Arendt R. G., 2004, ApJS , 152, 211 https://doi.org/10.1086/382351 CrossRef Search ADS   APPENDIX A: MODEL COMPUTATION AND INVERSION Calculating the full SED model at each iteration of the MCMC would be prohibitive, in terms of computing time. As discussed in Section 2.3, we rather interpolate a finely sampled grid of pre-computed templates. A1 The pre-computed model grid For each model component presented in Section 2.2, we generate a grid of specific luminosities, $$l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$, as a function of each parameter xk. We separate components that can be linearly combined. For instance, for the deltaU and powerU components, we compute separate grids for each grain sub-species: neutral and ionized PAHS, small and large carbon grains, small and large silicates. The frequencies νj belong to a list of the most widely used IR photometric filters. Integration of the SED model into the filter bandpass and colour corrections are thus included in $$l_\nu ^{{{\rm mod}}}$$. To build the templates, we start from a coarse parameter grid, and add a mid-point if the linear interpolation of $$\ln l_\nu ^{{{\rm mod}}}$$ is less accurate than 10−3. We repeat this process until no more mid-points need to be added. A2 SED model evaluation At each iteration of the MCMC, we evaluate the template by performing a multidimensional linear interpolation of $$\ln l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$, as a function of $${\boldsymbol x}$$. By construction of the grid, we know this interpolation will be more accurate than 10−3. A3 SED model inversion The model inversion used by ASIS (Section 3.3.1) is also performed with a multidimensional linear interpolation. The only difference here is that we now interpolate the parameter we are looking for, $$x_{k^\prime }$$, as a function of $$\ln l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$ and of the other parameters, $$x_{k\ne k^\prime }$$. APPENDIX B: ADDITIONAL ELEMENTS ON THE HB METHOD B1 The split-normal distribution The split-normal distribution (Villani & Larsson 2006, used in Section 3.2.3) is defined as   \begin{eqnarray} p(x) = \left\{ \begin{array}{lc} A \times \exp \left[-\frac{1}{2}\left(\displaystyle\frac{x-x_0}{\lambda }\right)^2\right] &\quad {\rm if }\ x \le x_0\\ A \times \exp \left[-\frac{1}{2}\left(\displaystyle\frac{x-x_0}{\lambda \tau }\right)^2\right] &\quad {\rm if }\ x > x_0, \end{array}\right. \end{eqnarray} (B1)where   \begin{equation} A = \sqrt{\frac{2}{\pi }}\frac{1}{\lambda (1+\tau )}, \end{equation} (B2)and x0 is a position parameter, λ, a scale parameter, and τ, a shape parameter. These parameters are linked to the mean, μ, standard deviation, σ, and skewness, γ1, through the following set of equations:   \begin{equation} b = \frac{\pi -2}{\pi } (\tau -1)^2+\tau , \end{equation} (B3)  \begin{equation} \mu = x_0 + \sqrt{\frac{2}{\pi }}\lambda (\tau -1), \end{equation} (B4)  \begin{equation} \sigma = \sqrt{b}\lambda ^2, \end{equation} (B5)  \begin{equation} \gamma _1 = b^{-3/2}\sqrt{\frac{2}{\pi }}(\tau -1)\times \left[\left(\frac{4}{\pi }-1\right)(\tau -1)^2+\tau \right]. \end{equation} (B6)There is no simple inversion. We therefore solve τ numerically from equation (B6), and then inverse equations (B4) and (B5). B2 The Gibbs sampling Gibbs sampling consists in drawing each parameter, xk, one by one, from its conditional distribution, fixing the other parameters to their current value in the MCMC, $$p(x_k|x_{k^\prime \ne k})$$. In practice, this distribution is a complex function, depending on the SED model. We numerically perform this drawing as follows. Let us assume that parameter xk is bound to the interval $$[x_k^{{{min}}},x_k^{{{max}}}]$$. We first compute the cumulative distribution function as   \begin{equation} F(x_k|x_{k^\prime \ne k}) = \frac{\int _{x_k^{{{min}}}}^{x_k} p(y|x_{k^\prime \ne k}) {\,\rm d}y}{F(x_k^{{{max}}}|x_{k^\prime \ne k})}. \end{equation} (B7)We then draw a random variable θ, uniformly distributed between 0 and 1. The updated value of xk is derived by solving $$F(x_k|x_{k^\prime \ne k})=\theta$$, by linear interpolation of the xk grid as a function of ln F. Gibbs sampling is more adapted to our problem than the rejection strategy of the Metropolis–Hastings method (e.g. Foreman-Mackey et al. 2013) or than importance sampling (e.g. Wraith et al. 2009). Indeed, the latter methods are not efficient when the number of dimensions, Ndim (equation 32), becomes large. Several runs presented in this paper lead to Ndim ≃ 7000. To provide an accurate integration of equation (B7) in a reasonable computing time, we implement the following adaptative grid. We start from a regular grid of N(0) = 10 values, $$x_k^{(t)}$$, with t =1, …, N(0), spanning the whole range of the interval $$[x_k^{{{min}}},x_k^{{{max}}}]$$. We compute the normalization of equation (B7), $$F^{(0)}(x_k^{{{max}}}|x_{k^\prime \ne k})$$, as the sum of elemental trapeziums, ΔF(0)(x(t)). We then add a mid-point in each interval and compute the new trapeziums $$\Delta F^{(1)}(x_k^{(t)})$$. We stop the refinement of element t if   \begin{eqnarray} &&|\Delta F^{(1)}(x_k^{(t)}) +\Delta F^{(1)}\left(\frac{x_k^{(t)}+x_k^{(t+1)}}{2}\right)\nonumber\\ &&\quad -\Delta F^{(0)}(x_k^{(t)})| \le 10^{-3}\times \frac{\sum _{t^\prime =1}^{N^{(1)}} \Delta F^{(1)}(x_k^{(t^\prime )})}{\sqrt{N^{(1)}}}. \end{eqnarray} (B8)We iterate until no more refinement is needed. Most of the drawings require between 50 and 200 points. B3 Consistent treatment of external parameters There are several non-dusty parameters that we may want to correlate with the grain properties, like the gas column density, the metallicity, etc. In principle, we could study these extra correlations afterwards, using the final results of the MCMC. However, including these parameters within the MCMC sampler, as part of our hierarchical model, will help improve the correlation, as they will share a common prior distribution with the model parameters. Poorly constrained grain parameters could benefit from using these additional observations. Let us decompose the parameters of a source si into q1SED model parameters and q2external parameters:   \begin{eqnarray} {\boldsymbol x}_i = \left(\left\lbrace x_{i,k}^{{\rm {model}}}\right\rbrace _{k=1,q_1}, \left\lbrace x_{i,k}^{{{\rm extra}}}\right\rbrace _{k=1,q_2}\right) \ \,{\rm with }\ q=q_1+q_2.\nonumber\\ \end{eqnarray} (B9)For instance, $$\lbrace x_{i,k}^{{\rm{model}}}\rbrace _{k=1,q_1}$$ could be {ln Mi, ln Ti, βi}, (q1 = 3) and $$\lbrace x_{i,k}^{{\rm{extra}}}\rbrace _{k=1,q_2}$$ could be $$\lbrace \ln M^{{\rm H\,\,{\small I}}}_i,Z_i\rbrace$$ (q2 = 2). The SED parameters are constrained by the observed SED. The extra parameters are fixed in principle, as they have been estimated from an independent method. However, including them in the MCMC sampler will lead to a modification of their value consistent with their uncertainties. Let us note their observed value Qi, k, with an uncertainty $$\sigma _{i,k}^{{\rm{extra}}}$$. The likelihood of the extra parameter k, of a source si, is simply   \begin{equation} p\left(x_{i,k}^{\rm{extra}}|Q_{i,k}\right) = \mathcal {P}\left(Q_{i,k},\left(\sigma _{i,k}^{{\rm{extra}}}\right)^2\right), \end{equation} (B10)where $$\mathcal {P}(\bar{y},\sigma _y^2)$$ is one of the noise distributions of Section 3.2.3, with mean $$\bar{y}$$ and variance $$\sigma _y^2$$. The posterior distribution of this extra parameter is then   \begin{eqnarray} &p&\left(\!x_{i,k}^{\rm{extra}}|Q_{i,k},{\boldsymbol x}_i^{\rm{model}}, x^{\rm{extra}}_{i,k^\prime \ne k},{\boldsymbol \mu },\Sigma \right) \propto p\left(x_{i,k}^{{\it{extra}}}|Q_{i,k}\right)\nonumber\\ &&\times p\left({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma \right) \times p\left({\boldsymbol \mu }, \Sigma\right). \end{eqnarray} (B11)We sample these extra parameters, at each iteration of the MCMC, from equation (B11). APPENDIX C: GENERALIZED LEAST SQUARES WITH CORRELATED ERRORS We have developed a chi-squared minimization SED fitter, having the same interface as our HB code. It can fit any linear combination of the individual SED components of Section 2.2 to an observed SED. It is based on the MINPACK (Moré et al. 1984) Levenberg–Marquardt method, that we have converted in Fortran 90. We have added the possibility to limit, fix and tie parameters, similarly to what Markwardt (2009) did in IDL. C1 Total covariance matrix and chi-squared In most of our cases, the uncertainties on the observed fluxes, $${\boldsymbol L}_\nu ^{\rm{obs}}$$, come from two terms: (i) the noise, $${\boldsymbol \sigma }_\nu ^{\rm{noise}}$$, which is uncorrelated between frequencies, with diagonal covariance matrix $$\mathcal {V}_{\rm noise}$$; and (ii) correlated systematic calibration uncertainties, $${\boldsymbol \sigma }_\nu ^{\rm{syst}}$$, with non-diagonal covariance matrix $$\mathcal {V}_{\rm syst}$$. The total covariance matrix is simply   \begin{equation} \mathcal {V}_{\rm obs} = \mathcal {V}_{\rm noise} + \mathcal {V}_{\rm syst}. \end{equation} (C1) Properly taking into account the different sources of uncertainties requires to account for correlated terms by minimizing the ‘generalized least squares’ problem:   \begin{equation} \chi ^2 = {\boldsymbol r}^T\,\mathcal {V}_{\rm obs}^{-1}\,{\boldsymbol r}, \end{equation} (C2)where $${\boldsymbol r} = {\boldsymbol L}_\nu ^{{\rm{obs}}}-{\boldsymbol L}_\nu ^{{\rm{mod}}}$$, is the residual between the observations and the model $${\boldsymbol L}_\nu ^{{\rm{mod}}}$$. C2 Adapting the Levenberg–Marquardt algorithm The Levenberg–Marquardt algorithm (e.g. Press et al. 2007; Markwardt 2009, hereafter LM) is aimed at finding the best parameter vector $${\boldsymbol x}$$ minimizing the χ2, starting from an initial estimate $${\boldsymbol x}^{(0)}$$. Computing, for each new value of $${\boldsymbol x}^{(n+1)}={\boldsymbol x}^{(n)}+{\boldsymbol \delta x}^{(n+1)}$$, the Jacobian matrix of the model, at $${\boldsymbol x}^{(n)}$$:   \begin{equation} J_{k,j} = \frac{\mathrm{\partial} L_{\nu ,j}^{\rm{mod}}(x_k)}{\mathrm{\partial} x_k}, = - \frac{\mathrm{\partial} w_j}{\mathrm{\partial} x_k}, \end{equation} (C3)the LM method solves:   \begin{equation} \left(\mathcal {J}^T\,\mathcal {J}\right)\,{\boldsymbol \delta x} = \mathcal {J}^T\,{\boldsymbol w}, \end{equation} (C4)where the normalized residual is $${\boldsymbol w}={\boldsymbol r}/{\boldsymbol \sigma }_\nu ^{{\rm{obs}}}$$, noting the total (uncorrelated) uncertainty:   \begin{equation} {\boldsymbol \sigma }_\nu ^{{\rm{obs}}} = \sqrt{\left({\boldsymbol \sigma }_\nu ^{{\rm{noise}}}\right)^2 + \left({\boldsymbol \sigma }_\nu ^{{\rm{syst}}}\right)^2}. \end{equation} (C5)This vector does not account for correlated terms. The classical χ2, for non-correlated uncertainties, is $$\chi ^2=|{\boldsymbol w}|^2$$. When introducing correlated errors, the new problem to solve becomes   \begin{equation} \left(\mathcal {J}^T\,\mathcal {V}_{\rm obs}^{-1}\,\mathcal {J}\right)\,{\boldsymbol \delta x} = \mathcal {J}^T\,\left(\mathcal {V}_{\rm obs}^{-1}\,{\boldsymbol r}\right). \end{equation} (C6)The Cholesky decomposition (e.g. Press et al. 2007) of the covariance matrix results in a lower triangle matrix, $$\mathcal {L}$$, such that $$\mathcal {V}_{\rm obs} = \mathcal {L}\,\mathcal {L}^{\rm T}$$. This decomposition is always possible as the covariance matrix is positive-definite. We also have $$\mathcal {V}_{\rm obs}^{-1} = \left(\mathcal {L}^{-1}\right)^T\,\mathcal {L}^{-1}$$. The new system to be solved is   \begin{equation} \mathcal {\mathcal {J}}^{\rm T}\,\mathcal {\mathcal {J}}\,{\boldsymbol x} = \mathcal {\mathcal {J}}^{\rm T}\,{\boldsymbol t}, \end{equation} (C7)where $$\mathcal {\mathcal {J}} = L^{-1}\,\mathcal {J}$$ and $${\boldsymbol t}=L^{-1}{\boldsymbol r}$$. In practice, to account for correlated uncertainties in the LM method, one just needs to perform the following steps. Compute once and for all, at the beginning (before the iterations), the Cholesky decomposition of the covariance matrix, to obtain $$\mathcal {L}$$. Compute once and for all the inverse $$\mathcal {L}^{-1}$$. Set the output of the user-defined function in MINPACK, returning the weighted residual, to $${\boldsymbol t}=\mathcal {L}^{-1}\,{\boldsymbol r}$$ instead of $${\boldsymbol w}$$. C3 Uncertainties on derived quantities One of the product of the Levenberg–Marquardt method is the covariance matrix $$\mathcal {V}_{\rm par}$$ of the parameters $${\boldsymbol x}$$. We may want to also compute the uncertainties of quantities derived from these parameters, $$f({\boldsymbol x})$$. In the case of SED fitting, such quantities can be the total luminosity, the average starlight intensity, etc. The error on any function of the parameters is   \begin{equation} \sigma ^2_{f({\boldsymbol x})} = \left({ \nabla }f({\boldsymbol x})\right)^T \mathcal {V}_{\rm par}\,{\boldsymbol \nabla }f({\boldsymbol x}). \end{equation} (C8)For example if we have only two parameters, $${\boldsymbol x}=(a\,b)$$, then   \begin{equation} {\boldsymbol \nabla }f=\left( \begin{array}{c}\displaystyle\frac{\mathrm{\partial} f}{\mathrm{\partial} a} \\ \displaystyle\frac{\mathrm{\partial} f}{\mathrm{\partial} b} \end{array} \right) \quad {\rm and} \quad \mathcal {V}_{\rm par}=\left( \begin{array}{c c}\sigma _a^2 \quad& \rho \sigma _a\sigma _b \\ \rho \sigma _a\sigma _b \quad& \sigma _b^2 \end{array} \right), \end{equation} (C9)and equation (C8) gives the usual expression:   \begin{eqnarray} \sigma ^2_{f({\boldsymbol x})} = \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} a}\right)^2\sigma _a^2 + \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} b}\right)^2\sigma _b^2 + 2\left(\frac{\mathrm{\partial} f}{\mathrm{\partial} a}\right) \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} b}\right)\rho \sigma _a\sigma _b.\nonumber\\ \end{eqnarray} (C10)Similarly, the covariance of two functions of the parameters $$f({\boldsymbol x})$$ and $$g({\boldsymbol x})$$ is   \begin{equation} {\rm cov}(f({\boldsymbol x}),g({\boldsymbol x})) = \left({\boldsymbol \nabla }f({\boldsymbol x})\right)^T\mathcal {V}_{\rm par} {\boldsymbol \nabla }g({\boldsymbol x}). \end{equation} (C11) C4 Monte Carlo initial conditions To avoid the LM algorithm to converge towards a local minimum, we generate Nini = 30 random initial conditions, drawn from a uniform distribution over the whole range of parameters. We then run the LM code from these initial conditions and keep only the parameters corresponding to the lowest of these Nini minimum chi-squared. APPENDIX D: COMPUTING TIMES FOR THE LEAST-SQUARES AND BAYESIAN METHODS Let us assume that we have a large number of sources, n ≫ 1, and that the bottleneck in terms of CPU is the calculation of one SED model at all the photometric filters, $${\boldsymbol L}_\nu ^{{\rm{mod}}}({\boldsymbol x};{\boldsymbol \nu })$$. The LM method (Appendix C) typically performs Niter ≃ 50–150 iterations. At each iteration, the model is evaluated n × q times to compute the gradient, by finite differences on each one of the q parameters. Since we run Nini = 30 initial conditions (Appendix C4), the number of model calculations is   \begin{equation} N_{\chi 2} \simeq N_{\rm ini}\times N_{\rm iter}\times n \times q. \end{equation} (D1) The parameter space for the HB method (equation 32) becomes Ndim ≃ n × q, when n ≫ 1. This is the number of Gibbs samplings of the parameters, per iteration. Each Gibbs sampling requires Nsamp ≃50–200 points (Appendix B2). For a chain of length NMCMC, the number of model calculations is thus   \begin{equation} N_{\rm HB} \simeq N_{\rm MCMC}\times n\times q\times N_{\rm samp}. \end{equation} (D2)Noting that Niter ≃ Nsamp, the ratio of χ2 and HB CPU times is   \begin{equation} \frac{{\rm CPU}_{\rm HB}}{{\rm CPU}_{\chi ^2}} \simeq \frac{N_{\rm MCMC}}{N_{\rm ini}}\simeq {3\times 10^{4}}. \end{equation} (D3)This approximation gives the correct order of magnitude. For instance, the central simulation of Section 4.1 had a ratio:   \begin{equation} \frac{{\rm CPU}_{\rm HB}}{{\rm CPU}_{\chi ^2}}\simeq \frac{3\,{\rm weeks}}{90\,{s}} \simeq 2\times 10^{4}. \end{equation} (D4) The simulations presented in this paper add up to a total of ≃105  h of CPU. It points out that the use of the HB method represents an important investment in terms of computing time. In terms of storage, the MCMCs of all the simulations in this study add up to ≃1 Tb of HDF55 files. APPENDIX E: ROUND-OFF ERRORS Round-off errors could be a potential problem for such a code, as it requires a large number of operations (equation 32). Round-off errors are however extremely difficult to track. We note that all the tests we performed in this manuscript lead to a good agreement with the true values of the parameters. Thus, round-off errors, if any, do not significantly affect the results. The reason is likely due to the stochasticity of the HB code. Indeed, each step of the MCMC is only weakly dependent on the accuracy of the previous step. In order to quantify the typical order of magnitude of rounding errors we have re-run the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3) in single precision (SP), while the rest of the runs in this paper were all performed in double precision (DP). If round-off errors were important, we would notice a significant difference between the results with the two precisions. Fig. E1 shows the distribution of the difference of the model parameters of every source, xi, in both precisions, divided by the uncertainty on these parameters, σi, in SP. It shows that for any parameter, reducing the floating number precision, from DP to SP, results in an absolute difference of only a few per cent of its uncertainty. The actual round-off errors of the DP parameters are expected to be smaller than this value, as the precision is higher than SP. Thus these errors can reasonably be considered negligible. Figure E1. View largeDownload slide Test on rounding errors. This is the distribution of the difference between the inferred model parameters of each source i, xi, in SP and in DP, divided by the uncertainty on the parameters, σi, in SP. This test is performed on the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3). Figure E1. View largeDownload slide Test on rounding errors. This is the distribution of the difference between the inferred model parameters of each source i, xi, in SP and in DP, divided by the uncertainty on the parameters, σi, in SP. This test is performed on the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3). © 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Monthly Notices of the Royal Astronomical Society Oxford University Press

A dust spectral energy distribution model with hierarchical Bayesian inference – I. Formalism and benchmarking

A dust spectral energy distribution model with hierarchical Bayesian inference – I. Formalism and benchmarking

MNRAS 476, 1445–1469 (2018) doi:10.1093/mnras/sty189 Advance Access publication 2018 January 29 A dust spectral energy distribution model with hierarchical Bayesian inference – I. Formalism and benchmarking 1,2‹ Fred ´ eric ´ Galliano IRFU, CEA, Universite´ Paris-Saclay, F-91191 Gif-sur-Yvette, France Universite´ Paris-Diderot, AIM, Sorbonne Paris Cite, ´ CEA, CNRS, F-91191 Gif-sur-Yvette, France Accepted 2018 January 17. Received 2018 January 17; in original form 2017 October 6 ABSTRACT This article presents a new dust spectral energy distribution (SED) model, named HerBIE, aimed at eliminating the noise-induced correlations and large scatter obtained when performing least-squares fits. The originality of this code is to apply the hierarchical Bayesian approach to full dust models, including realistic optical properties, stochastic heating, and the mixing of physical conditions in the observed regions. We test the performances of our model by applying it to synthetic observations. We explore the impact on the recovered parameters of several effects: signal-to-noise ratio, SED shape, sample size, the presence of intrinsic correlations, the wavelength coverage, and the use of different SED model components. We show that this method is very efficient: the recovered parameters are consistently distributed around their true values. We do not find any clear bias, even for the most degenerate parameters, or with extreme signal-to-noise ratios. Key words: methods: data analysis – methods: numerical – methods: statistical – dust, extinction – infrared: galaxies – infrared: ISM. originate in the diversity of assumptions about the grain physics 1 INTRODUCTION (Rollig ¨ et al. 2007). In addition, grains are the catalysts for numer- Dust grains are an important, but elusive, component of the interstel- ous chemical reactions, including the formation of H , the most lar medium (ISM). Their...
Loading next page...
 
/lp/ou_press/a-dust-spectral-energy-distribution-model-with-hierarchical-bayesian-iC1wgdv0f6
Publisher
The Royal Astronomical Society
Copyright
© 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society
ISSN
0035-8711
eISSN
1365-2966
D.O.I.
10.1093/mnras/sty189
Publisher site
See Article on Publisher Site

Abstract

Abstract This article presents a new dust spectral energy distribution (SED) model, named HerBIE, aimed at eliminating the noise-induced correlations and large scatter obtained when performing least-squares fits. The originality of this code is to apply the hierarchical Bayesian approach to full dust models, including realistic optical properties, stochastic heating, and the mixing of physical conditions in the observed regions. We test the performances of our model by applying it to synthetic observations. We explore the impact on the recovered parameters of several effects: signal-to-noise ratio, SED shape, sample size, the presence of intrinsic correlations, the wavelength coverage, and the use of different SED model components. We show that this method is very efficient: the recovered parameters are consistently distributed around their true values. We do not find any clear bias, even for the most degenerate parameters, or with extreme signal-to-noise ratios. methods: data analysis, methods: numerical, methods: statistical, dust, extinction, infrared: galaxies, infrared: ISM 1 INTRODUCTION Dust grains are an important, but elusive, component of the interstellar medium (ISM). Their ubiquity makes them a potential asset to understanding galaxy evolution, but their inherent complexity impedes our ability to use them to their full potential. As a diagnostic tool, dust can be used to estimate gas masses, without suffering from the phase-bias that gas lines are subject to. For this reason, dust is a popular dark gas tracer (e.g. Grenier, Casandjian & Terrier 2005; Leroy et al. 2011; Planck Collaboration XIX 2011c). However, our poor knowledge of the grain properties puts caution on the accuracy of these measurements. It is true that the grain cross-section per Hydrogen atom can be empirically calibrated in the ultraviolet (UV; e.g. Fitzpatrick & Massa 2007), for extinction studies, or in the infrared (IR; e.g. Planck Collaboration XVII 2014a), for emission studies. The problem is that these empirical laws lack an underlying physical model describing how dust properties vary in different environments. As a modelling ingredient, dust provides the major source of heating of the gas in photodissociation regions (PDRs; e.g. Hollenbach & Tielens 1997), through the photoelectric effect1 (Bakes & Tielens 1994; Weingartner & Draine 2001; Kimura 2016). The main discrepancies between different PDR codes actually originate in the diversity of assumptions about the grain physics (Röllig et al. 2007). In addition, grains are the catalysts for numerous chemical reactions, including the formation of H2, the most abundant molecule in the Universe. Accounting for the stochastic temperature fluctuations of small grains has led to a significant revision of H2 formation rates (Le Bourlot et al. 2012; Bron, Le Bourlot & Le Petit 2014). Overall, detailed modelling of the PDR and molecular lines in star-forming regions shows a qualitative agreement between the derived dust and gas physical conditions (e.g. Wu et al. 2015; Chevance et al. 2016), although the quantitative comparison can be more difficult to assess (Wu et al., in preparation). The current census on interstellar dust relies mainly on observations of the Galaxy. Most dust models are designed to reproduce the extinction, emission, and depletion pattern of the diffuse ISM (Zubko, Dwek & Arendt 2004; Draine & Li 2007; Compiègne et al. 2011; Jones et al. 2017). Some also take into account polarization constraints (Siebenmorgen, Voshchinnikov & Bagnulo 2014; Guillet et al. 2017). The dust properties in other environments are more sketchy. For instance, there is clear evidence that the grain far-IR/submm emissivity increases by a factor of ≃2–4, from the diffuse ISM to denser environments (e.g. Stepnik et al. 2003; Planck Collaboration XXIV 2011a; Roy et al. 2013). This increase can be explained by mantle accretion and coagulation (Köhler et al. 2012; Köhler, Ysard & Jones 2015). However, since the accretion of mantles in dense regions and their recycling back in the diffuse ISM is a hysteresis process (Jones et al. 2017), parametrizing the dust properties as a sole function of the density of the medium, n, and the UV flux, G0, may be too simplistic. Finally, the grain properties of other galaxies can exhibit significant differences (e.g. in the Magellanic clouds: Gordon et al. 2003; Galliano et al. 2011; Galametz et al. 2016; Chastenet et al. 2017). In particular, the evolution of the dust-to-gas mass ratio with metallicity is a subject of debate (Lisenfeld & Ferrara 1998; Draine et al. 2007; Galliano, Dwek & Chanial 2008a; Galametz et al. 2011; Rémy-Ruyer et al. 2014; De Vis et al. 2017). It appears to be non-linear in nearby galaxies (Rémy-Ruyer et al. 2014), consistent with theoretical models (Asano et al. 2013; Zhukovska 2014). On the contrary, studies in absorption on more distant systems suggest a constant dust-to-metal mass ratio down to extremely low-metallicity systems (De Cia et al. 2013; Zafar & Watson 2013). One of the ways to tackle these open questions consists in studying how the observed grain properties vary with the physical conditions they are experiencing, in a large range of environments. A fundamental step in this process is the accurate retrieval of the parameters and their intercorrelations. The recent IR/submm space observatories Spitzer, Akari, Herschel, and Planck have provided us with invaluable data on the spectral energy distributions (SEDs) of a wide variety of objects. However, despite a wealth of good quality observations, with complete spectral coverage, and calibration uncertainties down to only a few per cent (Planck Collaboration VIII 2014b), we are still facing limitations using basic analysis techniques. Among these limitations, there is a series of noise-induced false correlations between derived parameters, when performing least-squares fits (hereafter χ2; Shetty et al. 2009). A very efficient way to treat these degeneracies was proposed by Kelly et al. (2012). These authors designed a hierarchical Bayesian (hereafter HB) model, accounting for both noise and correlated calibration uncertainties. We will explain in detail, in the present article, how an HB model works. For now, we just need to note that it deals with two classes of parameters: (i) the dust parameters of each source (mass, temperature, etc.); and (ii) a set of hyperparameters controlling the distribution of these dust parameters. It is called a multilevel approach, for that reason. A least-squares fit is a single-level model, as it only deals with the dust parameters. In a single-level approach, the parameter's probability distribution of each source is independent of the other sources in the sample. Introducing hyperparameters allows one to sample a single large probability distribution for the parameters of all the sources. This way, the information about the distribution of parameters, among the different sources, has an impact on the likelihoods of individual sources. Kelly et al. (2012) were able to show that the HB approach could correct the false negative correlation between the temperature, T, and the emissivity index, β, of modified black bodies (hereafter MBBs). We emphasize that it is the multilevel nature of the HB approach that could lead to such an improvement. Indeed, several non-HB codes had been developed to interpret dust SEDs (da Cunha, Charlot & Elbaz 2008; Paradis et al. 2010; Planck Collaboration XVI 2011b; Serra et al. 2011), but those could not treat the degeneracies, due to their single-level nature. Following Kelly et al. (2012), several articles presented HB codes for single MBBs (Juvela et al. 2013), or the linear combination of two MBBs, and MBBs with parametrized emissivity (Veneziani et al. 2013). Until now, we were lacking an HB code working with full dust models, including (i) several types of grains (silicates and carbonaceous), (ii) with realistic optical properties, (iii) a size distribution, and (iv) accounting for the stochastic heating of small grains, (v) with the possibility to combine several components to account for the mixing of physical conditions in the observed regions. Such a model would allow us to apply state-of-the-art dust models to IR observations, extracting the physical parameters in an optimal way, consistent with the various known sources of uncertainties. This tool could provide unique constraints on dust evolution. This article discusses some efforts developed towards that goal. We present here a new HB dust SED model, named HerBIE (HiERarchical Bayesian Inference for dust Emission). The paper is ordered as follows. In Section 2, we present the different physical components that can be combined to fit the data. Section 3 attempts at giving a comprehensive view on the statistical hypotheses of the HB model and its numerical implementation. In Sections 4–6, we extensively discuss the tests we have conducted in order to assess the performances of the code. Section 4 presents a systematic analysis of the code performances, varying signal-to-noise ratios, SED shape, and sample size. Section 5 explores additional effects: wavelength coverage; the presence of intrinsic correlations; the addition of external parameters to the model. Section 6 briefly demonstrates the code performances using different physical models. Finally, Section 7 summarizes our results. We devote Appendices A–D to technical explanations that would otherwise alter the flow of the discussion. This is the first paper in a series of two articles. Paper II (Galliano, in preparation) will address the robustness of our model. 2 THE PHYSICAL MODEL 2.1 Notation conventions HerBIE is designed to solve a generic problem: the fit of a linear combination of SED model components to a set of n sources, si, observed through m frequencies, νj. The sources, si (i = 1, …, n), can be pixels in an image or individual objects (whole galaxies, star-forming regions, etc.) in a sample. The frequencies, νj (j = 1, …, m), can be photometric bands or frequency elements in a spectrum. The total physical model is controlled by q parameters, xk (k = 1, …, q). We note $${\boldsymbol y}$$, vectors of elements yi, and $$\mathcal {Y}$$, matrices of elements Yi, j. We note $$|\mathcal {Y}|$$ the determinant of matrix $$\mathcal {Y}$$. Probability density functions (PDFs) are written p(…); the name of the variable between parentheses defines the actual density we are referring to. Due to the inherent non-linearity of dust models, the derived PDF of several parameters is significantly skewed. Although this is natural, it makes quoting mean values and uncertainties complicated, as they do not coincide with the median nor the mode of the distribution. For these parameters, we rather consider the natural logarithm (noted $$\ln$$), in order to obtain a more symmetric distribution. We use the adjective specific to denote any quantity per unit mass. 2.2 Individual SED model components Fitting an observed SED requires choosing the number and the nature of the physical components. Each case has to be adapted, depending on the assumed properties of the studied object (e.g. diffuse cloud, H ii region, whole galaxy, etc.) and on the wavelength coverage. In this section, we describe the different components that our code currently implements. This list is meant to grow with future developments. 2.2.1 Equilibrium grains (BBQ) This component accounts for the emission of a collection of identical large grains at thermal equilibrium with a uniform radiation field. It is particularly useful to model H ii regions where strong solid-state emission bands can be observed (e.g. Wu et al., in preparation). This approximation would not be valid for smaller grains as they would be stochastically heated (e.g. Draine 2003). We designate this component as BBQ. The parameters controlling this component are simply the mass (Mi) and temperature (Ti): $${\boldsymbol x}_i=(\ln M_i,\ln T_i)$$. The monochromatic luminosity of source si, at frequency νj is expressed as   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = M_i\times \frac{3\pi }{a\rho }Q_{\rm abs}(a,\nu _j) \times B_\nu (T_i,\nu _j), \end{equation} (1)where a is the grain radius, Bν is the Planck function, and Qabs is the absorption efficiency of the chosen material, with mass density ρ. We have a large data base of optical properties to choose the Qabs from, depending on the spectral features we want to model. We arbitrarily set a = 30 nm,2 as Qabs/a is almost perfectly independent of a in the IR-mm range. 2.2.2 Modified blackbody (MBB) This component is similar to BBQ, in terms of physical assumptions. The only difference is that instead of considering a realistic Qabs, measured in the laboratory, we make the common assumption that it is a power law of the frequency: Qabs ∝ νβ, the emissivity index β being a free parameter. This approximation is the most widely used dust model, as it is supposed to provide constraints both on the physical conditions experienced by the grains (through the temperature) and on their intrinsic properties (through β). We designate this component as MBB. The parameters controlling this component are $${\boldsymbol x}_i=(\ln M_i,\ln T_i, \beta _i)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = M_i\times 4\pi \kappa (\nu _0) \left(\frac{\nu _j}{\nu _0}\right)^{\beta _i}\times B_\nu (T_i,\nu _j). \end{equation} (2)The opacity at reference wavelength λ0 = c/ν0 = 160 μm is fixed to κ(ν0) = 1.4 m2 kg− 1, as it corresponds to the Zubko et al. (2004, BARE-GR-S) Galactic dust model opacity. Although Hildebrand (1983) recommended normalizing the opacity in the submm regime, recent laboratory studies (e.g. Coupeaud et al. 2011) tend to show that dust analogues have less dispersion in the far-IR. This is the reason why we choose λ0 = 160 μm rather than 850 μm. 2.2.3 Broken emissivity modified blackbody (BEMBB) This component, introduced by Gordon et al. (2014), is a refinement of MBB, accounting for a possible change of the emissivity slope at long wavelengths, similar to what is observed in the laboratory (e.g. Coupeaud et al. 2011). There are now two emissivity indices: β1 for ν > νb and β2 for ν ≤ νb. In practice, this component could be used to fit an observed SED, finely sampled in the far-IR/submm range. We designate this component as BEMBB. The parameters controlling this component are $${\boldsymbol x}_i=(\ln M_i,\ln T_i, \beta _{1,i}, \beta _{2,i}, \nu _{b,i})$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = 4\pi \kappa (\nu _j,\beta _{1,i},\beta _{2,i},\nu _{b,i})\times B_\nu (\nu _j,T_i), \end{equation} (3)where the broken emissivity is parametrized as   \begin{eqnarray} \kappa (\nu ,\beta _{1,i},\beta _{2,i},\nu _{b,i}) {=}\left\{ \begin{array}{ll} \kappa (\nu _0)\left(\displaystyle\frac{\nu }{\nu _0} \right)^{\beta _{1,i}} & \nu > \nu _{b,i} \\ \kappa (\nu _0) \left(\displaystyle\frac{\nu }{\nu _0}\right)^{\beta _{2,i}} \left(\displaystyle\frac{\nu _{b,i}}{\nu _0}\right)^{\beta _{1,i}-\beta _{2,i}} & \nu \le \nu _{b,i}. \end{array}\right. \!\!\!\!\!\!\!\!\!\! \end{eqnarray} (4)Gordon et al. (2014) ‘calibrate’ the opacity at reference wavelength κ(ν0) on the diffuse ISM of the Milky Way. Here, we test this component with the same κ(ν0) value as the MBB, for simplicity. 2.2.4 Uniformly illuminated dust mixture (deltaU) This component represents a full ISM dust mixture, heated by a uniform interstellar radiation field with intensity U. This mixture is made of grains of different compositions (silicates, amorphous carbon, PAHs, etc.) each having a different size distribution. Several dust mixtures and their size distributions are implemented into our code: the BARE-GR-S model of Zubko et al. (2004), the model of Compiègne et al. (2011), the AC model of Galliano et al. (2011), and the THEMIS model (Jones et al. 2017). We assume that the radiation field has the spectral shape of the solar neighbourhood (with mean intensity noted $$J_\nu ^{\odot }(\nu )$$; Mathis, Mezger & Panagia 1983) scaled by the parameter U: $$J_\nu (\nu )=U\times J_\nu ^{\odot }(\nu )$$. The mean intensity is normalized so that U = 1 corresponds to   \begin{equation} \int _{c/8\,\mu {\rm m}}^{c/0.0912\,\mu {\rm m}} 4\pi J_\nu ^{\odot }(\nu ){\,\rm d}\nu = 2.2\times 10^{-5}\,\rm W\,m^{-2}. \end{equation} (5) The parameters controlling this component are the mass (Mi), the radiation field intensity (Ui), the PAH mass fraction ($$q^{{{\rm PAH}}}_i$$), and the fraction of charged PAHs ($$f^+_i$$): $${\boldsymbol x}_i=(\ln M_i,\ln U_i,q^{{\rm PAH}}_i,f^+_i)$$. The last two parameters allow us enough flexibility to fit the complex mid-IR range: qPAH controls the relative strength of the aromatic features, while f+ controls mainly the ratio between C–C and C–H bands (e.g. Allamandola, Hudgins & Sandford 1999; Li & Draine 2001; Galliano et al. 2008b). The monochromatic luminosity of source si, at frequency νj, is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) & = & M_i\times ( q^{{{\rm PAH}}}_i\times f_i^+\times l_\nu ^{{{\rm PAH}}^+}(U_i,\nu _j) \nonumber \\ && + \, q^{{\rm PAH}}_i\times (1-f_i^+)\times l_\nu ^{{{\rm PAH}}^0}(U_i,\nu _j) \nonumber \\ && + \, (1-q^{{{\rm PAH}}}_i)\times l_\nu ^{{\rm non-PAH\ dust}}(U_i,\nu _j)), \end{eqnarray} (6)where $$l_\nu ^{{X}}$$ is the specific monochromatic luminosity of component X. The emission spectrum has been computed with the Guhathakurta & Draine (1989) stochastic heating method and has been integrated over the size distribution. We designate this component as deltaU. 2.2.5 Non-uniformly illuminated dust mixture (powerU) The application of the previous component is often problematic as it does not allow us to account for the likely mixing of excitation conditions along the line of sight and in the instrumental beam. To account for non-uniform illumination, we adopt the prescription of Dale et al. (2001), assuming that the dust mass is distributed in different radiation field intensities following a power law:   \begin{equation} \frac{{\rm d}M}{{\rm d}U}=\mathcal {N}\times U^{-\alpha } \quad \,{\rm with}\quad \, U_-\le U\le U_-+\Delta U, \end{equation} (7)where the normalization factor depends on the value of α:   \begin{equation} \mathcal {N} = \left\{ \begin{array}{lc} \displaystyle\frac{(1-\alpha )}{(U_-+\Delta U)^{1-\alpha } -(U_-)^{1-\alpha }}\qquad \qquad & {\rm if }\ \alpha >1 \\ \displaystyle\frac{1}{\ln (U_-+\Delta U)-\ln (U_-)}\qquad \qquad& {\rm if }\ \alpha =1. \end{array} \right. \end{equation} (8)The three parameters of this power-law U−, ΔU, and α are free to vary to fit the shape of the observed SED. Since these three parameters are sometimes degenerate, we often discuss the more stable average starlight intensity, defined as   \begin{equation} \bar{U}= \frac{1}{M} \int _{U_-}^{U_-+\Delta U} U\times \frac{{\rm d}M}{{\rm d}U}{\,\rm d}U, \end{equation} (9)which also depends on the value of α:   \begin{eqnarray} \bar{U}= \left\{ \begin{array}{l@{\qquad}l} \displaystyle\frac{1-\alpha }{2-\alpha } \frac{\left(U_-+\Delta U\right)^{2-\alpha }-U_-^{2-\alpha }}{\left(U_-+\Delta U\right)^{1-\alpha }-U_-^{1-\alpha }} & \mbox{if } \alpha \ne 1\ {\rm and}\ \alpha \ne 2 \\ \displaystyle \frac{\Delta U}{\ln \left(U_-+\Delta U\right)-\ln U_-} & \mbox{if } \alpha = 1\\ \displaystyle \frac{\ln \left(U_-+\Delta U\right)-\ln U_-}{U_-^{-1}-\left(U_-+\Delta U\right)^{-1}} & \mbox{if } \alpha = 2. \end{array}\right. \nonumber\\ \end{eqnarray} (10) This component is simply the deltaU component integrated over the distribution of equation (7). The parameters controlling this component are thus $${\boldsymbol x}_i=(\ln M_i,\ln U_{-,i},\ln \Delta U_i,\alpha _i,q^{{{\rm PAH}}}_i,f_i^+)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) &=& M_i\times \mathcal {N}\nonumber\\ &&\int _{U_{-,i}}^{U_{-,i}+\Delta U_i} l_\nu \left(U_i,q^{\rm{PAH}}_i,f^+_i,\nu _j \right)\times U^{-\alpha _i} {\,\rm d}U, \end{eqnarray} (11)where $$M_i\times l_\nu (U_i,q^{{\rm PAH}}_i,f^+_i,\nu _j)$$ is the deltaU component (equation 6). We designate this component as powerU. 2.2.6 Near-IR emission by stellar populations (starBB) Direct or scattered starlight can significantly contaminate the emission in the near-IR range. Thus, to properly model the emission by PAHs and small grains, one needs to account for the stellar continuum. This component is simply modelled as a T⋆ = 50 000 K blackbody, the only free parameter being its bolometric luminosity $${\boldsymbol x}_i=(\ln L^\star _i)$$. The monochromatic luminosity of source si, at frequency νj is   \begin{equation} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) = L^\star _i\times \frac{\pi }{\sigma T_\star ^4} B_\nu (T_\star ,\nu _j), \end{equation} (12)where σ is the Stefan–Boltzmann constant and Bν is the Planck function. This parametrization is however too rough to provide any reliable constraint on the actual stellar populations. In addition, it is only realistic in the near-IR regime. We designate this component as starBB. 2.2.7 Free–free and synchrotron continua (radio) In addition to dust emission, submillimetre bands also contain radio continuum. We therefore add the possibility to model this component, assuming it is the linear combination of two power laws representing the free–free (Fν ∝ ν−0.1) and synchrotron continua ($$F_\nu \propto \nu ^{-\alpha _{\rm s}}$$; 0.7 ≲ αs ≲ 0.9). The free parameters are the νLν at λ1 = 1 cm, L1, i, the fraction of free–free at λ1, fFF, i, and the synchrotron index, αs, i: $${\boldsymbol x}_i=(\ln L_{1,i},f_{\rm{{FF}},i},\alpha _{s,i})$$. The monochromatic luminosity of source si, at frequency νj, is   \begin{eqnarray} L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}_i) & = & \frac{L_{1,i}}{\nu _1}\times \left( f_{{FF},i}\times \left(\frac{\nu }{\nu _1}\right)^{-0.1}\right. \nonumber \\ &&\, +\, \left.(1-f_{{{FF}},i})\times \left(\frac{\nu }{\nu _1}\right)^{-\alpha _{s,i}}\right), \end{eqnarray} (13)with ν1 = c/λ1. We designate this component as radio. 2.3 Linear combination of individual components We can linearly combine any of the previous components to fit an observed SED. Each parameter can be let free to vary (limits can be specified), or fixed to an arbitrary value. Alternatively, we can tie two parameters together (e.g. the β of two MBB components or the qPAH of a deltaU and powerU components). From a numerical point of view, we have pre-computed large grids of models for a large range of all the parameters, and performed synthetic photometry for a wide list of instrumental filters. Our code then simply interpolates these grids of templates to evaluate the SED model for a combination of parameters. The model grids and the interpolation method are described in Appendix A. 3 HIERARCHICAL BAYESIAN INFERENCE APPLIED TO SED FITTING Most of the formalism discussed in this section is the generalization to more complex dust models of the pioneering work of Kelly et al. (2012). 3.1 The non-hierarchical Bayesian point of view Let us first start by laying down the standard Bayesian formalism. If, for now, we omit systematic uncertainties, we can simply express the observed SED of a single source as the sum of our emission model and a random deviation due to the noise:   \begin{equation} L_\nu ^{{\rm obs}}(\nu _j) = L_\nu ^{{\rm mod}}(\nu _j,{\boldsymbol x}) + \epsilon (\nu _j)\times \sigma _\nu ^{{\rm noise}}(\nu _j), \end{equation} (14)where ε(ν) is a random variable with 〈ε〉 = 0 and σ(ε) = 1. The PDF of the noise p(ε(νj)) can be arbitrary (Gaussian, Student's t, etc.). We can reverse equation (14) to make explicit its dependence on the parameters $${\boldsymbol x}$$:   \begin{equation} \epsilon (\nu _j,{\boldsymbol x}) = \frac{L_\nu ^{\rm{obs}}(\nu _j)-L_\nu ^{\rm{mod}}(\nu _j,{\boldsymbol x})}{\sigma _\nu ^{{{\rm noise}}}(\nu _j)}. \end{equation} (15)In the absence of correlations, the likelihood can be expressed as a conditional PDF:   \begin{equation} p\left({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x}\right) = \prod _{j=1}^m p(\epsilon (\nu _j,{\boldsymbol x})), \end{equation} (16)noting $${\boldsymbol L}_\nu ^{{\rm obs}}=(L_\nu ^{{\rm obs}}(\nu _1),\ldots ,L_\nu ^{{\rm obs}}(\nu _m))$$, the SED vector containing the emission at each waveband. This is a well-known expression. For instance, if we assume Gaussian errors, finding the maximum of this likelihood is equivalent to minimizing the chi-squared. From a Bayesian point view, we are more interested in the PDF of the parameters, knowing the observations, $$p({\boldsymbol x}|{\boldsymbol L}_\nu ^{{\rm obs}})$$, rather than in the distribution of equation (16). Conveniently, Bayes’ theorem states that   \begin{equation} p({\boldsymbol x}|{\boldsymbol L}_\nu ^{{\rm obs}}) = \frac{p({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x}) \times p({\boldsymbol x})}{p({\boldsymbol L}_\nu ^{{\rm obs}})} \propto p({\boldsymbol L}_\nu ^{{\rm obs}}|{\boldsymbol x})\times p({\boldsymbol x}). \end{equation} (17)In the equation above, $$p({\boldsymbol x})$$ is called the prior distribution. It represents the intrinsic distribution of the parameters. We usually do not have an accurate knowledge of this distribution. The term $$p({\boldsymbol L}_\nu ^{{\rm obs}})$$ is a constant, since it is independent of the parameters. It enters as the normalization factor in the second part of equation (17). The left-hand term of equation (17) is called the posterior distribution. Standard Bayesian techniques consist of sampling it, i.e. mapping it in the space of parameters $${\boldsymbol x}$$. An assumption has to be made on the prior. It is common to assume it is constant or slowly varying over the interval range covered by the parameters. From the knowledge of the distribution of equation (17), one can estimate parameter averages, standard deviations, estimate confidence intervals, test hypotheses, etc. 3.2 The hierarchical model 3.2.1 The posterior distribution The HB method is built upon the previous formalism, with the difference that the prior distribution is now inferred from the data. This is achieved by parametrizing the shape and position of the new prior distribution with hyperparameters (which control the distribution of the parameters). It is common to assume a unimodal prior (e.g. multivariate Gaussian or Student's t), where the hyperparameters are (i) the average of the parameter vector, $${\boldsymbol \mu }$$, and (ii) their covariance matrix, Σ. This approach is relevant only when analysing a sample of n > 1 sources. Equation (17) now becomes, for one source si  \begin{equation} p({\boldsymbol x}_i|{\boldsymbol L}_{\nu ,i}^{{\rm obs}},{\boldsymbol \mu }, \Sigma) \propto p({\boldsymbol L}_{\nu ,i}^{{\rm obs}}|{\boldsymbol x}_i) \times p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma ), \end{equation} (18)where $$p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma )$$ is the new prior parametrized by $${\boldsymbol \mu }$$ and Σ. The posterior distribution of the parameters and their hyperparameters is then   \begin{eqnarray} p&(&\!{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,{\boldsymbol \mu },\Sigma |{\boldsymbol L}_{\nu ,1}^{{\rm obs}},\ldots ,{\boldsymbol L}_{\nu ,n}^{{\rm obs}}) \nonumber\\ &&\propto \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol L}_{\nu ,i}^{{\rm obs}},{\boldsymbol \mu },\Sigma ) \times p({\boldsymbol \mu })\times p(\Sigma), \end{eqnarray} (19)where $$p({\boldsymbol \mu })$$ and p(Σ) are the prior distributions of the hyperparameters. They will be described in Section 3.2.4. Note that, in equation (19), there is only one common set of hyperparameters for all of the sources. Fig. 1 illustrates how the HB method works. To make it simple, we consider a model with only one parameter x, such as, for example, the intensity of a single line fit. We simulate, in panel (a), what could be the actual distribution of parameters (in grey). We have drawn three values (colour dotted lines) from this distribution, representing what we would get if we were observing a line in three different locations in a cloud: $$x_{1,2,3}^{\rm{true}}$$. In panel (b), we introduce noise and show the likelihoods (colour curves). We have assigned an arbitrary uncertainty to the three parameters: σ1, 2, 3. We have drawn a random deviation according to this uncertainty: Δx1, 2, 3. If we were implementing a non-HB method or a least-squares fit, our estimate of the parameters would be the vertical dashed lines at the mean/mode of each likelihood: $$x_{1,2,3}^{{\rm{B}}}$$. In panel (c), we show the inferred mean prior distribution (in grey). The colour curves represent the HB posterior, i.e. the product of the prior and the likelihood. The mean values that we would derive from the HB analysis are the $$x_{1,2,3}^{{\rm{HB}}}$$, which are closer to the true values than the $$x_{1,2,3}^{{\rm{B}}}$$. The reason is that the multiplication by the prior reduces the dispersion of the values due to the noise. The posterior number 1 (in blue) has a high signal-to-noise ratio. Its likelihood is narrower than the prior. Thus, it is only weakly modified by the prior, and $$x_1^{\rm{{B}}}\simeq x_1^{{\rm{HB}}}$$. In contrast to this, the posterior number 3 (in red) deviates more, since it has a low signal-to-noise ratio. The prior thus has a major effect on this distribution and brings $$x_3^{{\rm{HB}}}$$ closer to $$x_3^{{\rm{true}}}$$. Figure 1. View largeDownload slide Illustration of the HB method. Panel (a) shows the actual distribution of parameter x (in grey) and three values of these parameters ($$x_{1,2,3}^{{\rm{true}}}$$; in colours; dotted lines) that correspond to the observations. Panel (b) shows the likelihood of the fit (colour curves) when introducing noise (σ1, 2, 3). The mean value of x derived from the likelihood $$x_{1,2,3}^{\rm{B}}$$ is displaced from the true value by $$\Delta x_{1,2,3}=|x_{1,2,3}^{{\rm{B}}}-x_{1,2,3}^{{\rm{true}}}|$$. Panel (c) shows the inferred prior (in grey), the posterior distributions (colour curves) and the derived mean values ($$x_{1,2,3}^{{\rm{HB}}}$$). Figure 1. View largeDownload slide Illustration of the HB method. Panel (a) shows the actual distribution of parameter x (in grey) and three values of these parameters ($$x_{1,2,3}^{{\rm{true}}}$$; in colours; dotted lines) that correspond to the observations. Panel (b) shows the likelihood of the fit (colour curves) when introducing noise (σ1, 2, 3). The mean value of x derived from the likelihood $$x_{1,2,3}^{\rm{B}}$$ is displaced from the true value by $$\Delta x_{1,2,3}=|x_{1,2,3}^{{\rm{B}}}-x_{1,2,3}^{{\rm{true}}}|$$. Panel (c) shows the inferred prior (in grey), the posterior distributions (colour curves) and the derived mean values ($$x_{1,2,3}^{{\rm{HB}}}$$). One of the subtleties of this process is the inference of the prior. In our simple example, its hyperparameters are the mean, $$\bar{x}$$, and standard deviation, σx. The full posterior distribution (equation 19) is thus a five-dimension PDF depending here on x1, x2, x3, $$\bar{x}$$, and σx. The most likely hyperparameters will be the values of $$\bar{x}$$ and σx for which the posterior is maximum, in the five-dimension space. We can also estimate averages of the hyperparameters $$\langle \bar{x}\rangle$$ and 〈σx〉, marginalizing the posterior over the rest of the variables. In that sense, the representation of panel (c) is a simplification. It actually shows unidimensional cuts of the prior and the posteriors, fixing $$\bar{x}$$ to $$\langle \bar{x}\rangle$$ and σx to 〈σx〉. 3.2.2 Introducing systematic uncertainties Instrumental calibration uncertainties, on top of being partially correlated between wavebands, are assumed to be fully correlated between sources. Kelly et al. (2012) treat these calibration errors as nuisance parameters. Following their formalism, we can rewrite equation (15)   \begin{equation} \epsilon (\nu _j,{\boldsymbol x},\delta _j) = \frac{L_\nu ^{{\rm{obs}}}(\nu _j) -L_\nu ^{{\rm{mod}}}(\nu _j,{\boldsymbol x})\times (1+\delta _j)}{\sigma _\nu ^{{\rm{noise}}}(\nu _j)}, \end{equation} (20)where we have introduced the calibration offset δj at frequency νj. The posterior distribution of equation (19) becomes   \begin{eqnarray} p&(&\!{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,{\boldsymbol \mu },\Sigma,{\boldsymbol \delta } |{\boldsymbol L}_{\nu ,1}^{{\rm{obs}}},\ldots ,{\boldsymbol L}_{\nu ,n}^{{{\rm obs}}}) \nonumber\\ &&\!\propto\! \prod _{i=1}^n\prod _{j=1}^m p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)\times p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \times p({\boldsymbol \mu })\!\times\! p(\Sigma)\times p({\boldsymbol \delta}),\nonumber \\ \end{eqnarray} (21)where the new likelihood, $$p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)$$, is simply $$p(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j))$$. We have also introduced the prior distribution of $${\boldsymbol \delta }$$, $$p({\boldsymbol \delta })$$. Its mean is 〈δj〉 = 0 and its covariance matrix, Vcal, is made of the calibration uncertainties of the different wavebands and their correlations. 3.2.3 The noise distribution The noise is assumed to be uncorrelated between wavelengths and sources. Our model let us choose between different distributions for the variable, depending on the actual distribution of the noise measured on the data. We currently can choose between the three following types of noise. Normal noise: This is the default case. The monochromatic likelihood of source si, at frequency νj, is in this case   \begin{equation} p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j) \propto \exp \left[-\frac{1}{2} \left(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j)\right)^2\right]. \end{equation} (22) Robust noise: In case of outliers, we can assume that the statistical errors follow a Student's t distribution with f = 3 degrees of freedom. It has broader wings than a Gaussian PDF with the same σ. The monochromatic likelihood of source si, at frequency νj, is then   \begin{equation} p(L_{\nu ,i,j}^{{{\rm obs}}}|{\boldsymbol x}_i,\delta _j) \propto \left[1+\frac{1}{f}\left(\epsilon (\nu _j,{\boldsymbol x}_i,\delta _j)\right)^2\right]^{-(f+1)/2}. \end{equation} (23) Asymmetric noise: Background galaxies or Galactic cirruses can skew the noise distribution towards high fluxes. In this case, we use a split-normal PDF (Villani & Larsson 2006):   \begin{eqnarray} &&p(\!L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j)\propto \\ &&\left\{\begin{array}{lc} {\exp \left[-\frac{1}{2} \left(\displaystyle\frac{L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}} \times (1+\delta _j)-\mu _{i,j}}{\lambda _{i,j}}\right)^2\right]}\\ \qquad \qquad{{\rm if }\ L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}}\times (1+\delta _j) \le \mu _{i,j}}\\ {\exp \left[-\frac{1}{2} \left(\displaystyle\frac{L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}} \times (1+\delta _j)-\mu _{i,j}}{\lambda _{i,j}\tau _{i,j}}\right)^2\right]}\\ {{\rm if }\ L_{\nu ,i,j}^{{\rm{obs}}}-L_{\nu ,i,j}^{{\rm{mod}}}\times (1+\delta _j) > \mu _{i,j},} \end{array}\right. \end{eqnarray} (24)where the position parameter, μi, j, the scale parameter, λi, j, and the shape parameter, τi, j, are derived from the mean (0), standard deviation ($$\sigma _{\nu ,i,j}^{{\rm{noise}}}$$) and skewness of the noise (Appendix B1). 3.2.4 The prior distributions We follow Kelly et al. (2012), and assume a g = 8 degrees of freedom multivariate Student's t distribution, for the distribution of our parameters:   \begin{eqnarray} p&(&\!{\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \propto \frac{1}{\sqrt{|\Sigma }|}\nonumber\\ &&\times \left(1+\frac{1}{g}({\boldsymbol x}_i-{\boldsymbol \mu })^{\rm T}\Sigma^{-1}({\boldsymbol x}_i-{\boldsymbol \mu })\right)^{-(g+q)/2}, \end{eqnarray} (25)where q is the number of parameters (Section 2.1). Notice that the factor $$1/\sqrt{|\Sigma |}$$, in front of the exponential, is not a normalization constant here, since we are sampling the distribution as a function of the elements of Σ. We assume a uniform prior on $${\boldsymbol \mu }$$. For the q × q covariance matrix, Σ, we use the Barnard, McCulloch & Meng (2000) separation strategy, decomposing it as: $$\Sigma =\mathcal {S}\mathcal {R}\mathcal {S}$$, where $$\mathcal {S}$$ is the diagonal matrix of standard deviations, and $$\mathcal {R}$$ is the correlation matrix. We place wide, independent normal priors on the diagonal elements of $$\ln \mathcal {S}$$, centred on the standard deviations of the least-squares best-fitting parameters (Appendix C), $$\ln S_{k,k}^{\chi ^2}$$, with σ(ln Sk, k) = 10:   \begin{eqnarray} p(\mathcal {S}) = \prod _{k=1}^q \frac{1}{S_{k,k}}\frac{1}{\sqrt{2\pi }\sigma (\ln S_{k,k})} \exp \left[-\frac{1}{2}\left(\frac{\ln S_{k,k}-\ln S_{k,k}^{\chi ^2}}{\sigma (\ln S_{k,k})}\right)^2\right].\!\!\!\!\!\!\nonumber\\ \end{eqnarray} (26)The prior on $$\mathcal {R}$$ is chosen to make sure that each correlation coefficient is uniformly distributed between −1 and 1 and $$\mathcal {R}$$ is positive definite. The formalism developed by Barnard et al. (2000) assumes that $$p(\Sigma |\mathcal {S})$$ is distributed according to an inverse Wishart distribution, with h = q + 1 degrees of freedom. The resulting prior on $$\mathcal {R}$$ is then   \begin{equation} p(\mathcal {R}) \propto |\mathcal {R}|^{(h-1)(q-1)/2-1}\left(\prod _{k=1}^q|\mathcal {R}_{(kk)}|\right)^{-h/2} \end{equation} (27)  \begin{equation} \phantom{p(\mathcal {R})}\propto |\mathcal {R}|^{q(q-1)/2-1}\left(\prod _{k=1}^q|\mathcal {R}_{(kk)}|\right)^{-(q+1)/2}, \end{equation} (28)where $$\mathcal {R}_{(kk)}$$ is the principal submatrix of order k, i.e. the matrix of elements $$R_{k_1,k_2}$$, with k1 = 1, …, k and k2 = 1, …, k. In the end, the prior distribution of the hyperparameters is   \begin{equation} p({\boldsymbol \mu })\times p(\Sigma) \propto p(\mathcal {S})\times p(\mathcal {R}). \end{equation} (29) Finally, the prior on the calibration offsets, $${\boldsymbol \delta }$$, is designed to reflect the absolute calibration uncertainties recommended by each instrument's team, with m × m covariance matrix, $$\mathcal {V}_{\rm cal}$$. Since the calibration factor $$(1+{\boldsymbol \delta })$$ cannot be zero, we draw the variable $${\boldsymbol \delta ^\prime }=\ln (1+{\boldsymbol \delta })$$. Noting that δj ≪ 1, we can assume that $$\delta _j^\prime \simeq \delta _j$$, and thus that $$\mathcal {V}_{\rm cal}$$ can be used as the covariance matrix of $${\boldsymbol \delta }^\prime$$. Similarly to the statistical error, we consider several types of distribution. The first case is a multivariate normal distribution:   \begin{equation} p({\boldsymbol \delta ^\prime }) \propto \exp \left(-\frac{1}{2}{\boldsymbol \delta ^\prime }^T\mathcal {V}_{\rm cal}^{-1} {\boldsymbol \delta ^\prime }\right). \end{equation} (30)The second case is a more robust multivariate Student's t distribution with f = 3 degrees of freedom:   \begin{equation} p({\boldsymbol \delta ^\prime }) \propto \left(1+\frac{1}{f}{\boldsymbol \delta ^\prime }^T\mathcal {V}_{\rm cal}^{-1} {\boldsymbol \delta ^\prime }\right)^{-(f+m)/2}. \end{equation} (31) 3.3 The numerical implementation Sampling the distribution of equation (21) is a numerical challenge, as its number of dimensions is   \begin{equation} N_{\rm dim} = \underbrace{n\times q}_{\rm parameters} + \underbrace{2\times q+q\times (q-1)/2}_{\rm hyperparameters} + \underbrace{m}_{\rm calibration}. \end{equation} (32)For a typical sample with n = 100 sources, observed through m = 11 photometric filters, modelled with q = 7 free parameters, the dimension is Ndim = 767. It is thus impractical to map the posterior on a regular Cartesian grid of parameters. 3.3.1 The Metropolis–Hastings move with ancillarity/sufficiency interweaving strategy The most popular way to sample the posterior distribution is to use a Markov Chain Monte Carlo (MCMC). This class of algorithm allows one to randomly draw variables from the posterior distribution. An MCMC is essentially a sequence of values of the set of parameters and hyperparameters. The number density of values in the chain scales with the probability density, i.e. more models are computed around the maximum likelihood, and few or none in regions of the parameter space where the solution is unlikely. It makes this method efficient, as the SED model (which can be numerically intensive) is computed only for relevant combinations of the parameters. MCMCs also make post-processing simple, as one can easily marginalize over any parameter, estimate moments, test hypotheses, etc. without having to perform multidimensional integrals. The MCMC sampler we have developed applies Gibbs sampling within the Metropolis–Hastings algorithm (MH; e.g. Geman & Geman 1984; Gelman et al. 2004; Press et al. 2007). This particular method consists of drawing each parameter, one by one, from their unidimensional conditional posterior distribution, fixing all the other parameters to their current value in the chain. However, as noted by Kelly (2011), calibration uncertainties introduce correlations within the MCMC, and thus require running very long chains to ensure convergence. The reason is that there is a degeneracy between the values of the calibration offsets, $${\boldsymbol \delta }$$, and the SED model parameters, $${\boldsymbol x}$$. To address this problem, Kelly (2011) demonstrated that the ancillarity–sufficiency interweaving strategy (ASIS; Yu & Meng 2011) could reduce the autocorrelation of the MCMC and thus obtain convergence towards the posterior distribution with a shorter chain. In simple terms, ASIS consists in inserting an extra step at each MCMC iteration. In this new step, we draw values of the parameters in a direction where $$(1+\delta _j)\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j)$$ is constant, in order to decouple $${\boldsymbol \delta }$$ and $${\boldsymbol x}$$. Our code applies ASIS to all model parameters. At first, we set the initial values of the parameters, $${\boldsymbol x}_i$$, of the MCMC to the best-fitting values given by the least-squares fit (Section C). For the hyperparameters, we set $${\boldsymbol \mu }$$ and the diagonal elements of $$\mathcal {S}$$ to the mean and standard deviation of the least-squares parameters. The correlation coefficients (the non-diagonal elements of $$\mathcal {R}$$) are set to 0. Finally, the initial calibration offsets, $${\boldsymbol \delta }$$ are set to $${\boldsymbol 0}$$. We then iterate the following steps NMCMC times. We draw the calibration offsets, for each frequency νj, from   \begin{eqnarray} p&(&\!\delta _j|{\boldsymbol \delta }_{j^\prime \ne j}, {\boldsymbol x}_1,\ldots {\boldsymbol x}_n, L_{\nu ,1,j}^{{\rm{obs}}},\ldots ,L_{\nu ,n,j}^{{{\rm obs}}}) \nonumber\\ &&\propto p({\boldsymbol \delta }) \times \prod _{i=1}^n p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j). \end{eqnarray} (33)If $$\mathcal {V_{\rm cal}}$$ is not diagonal (correlated calibration uncertainties), then all of the values of $${\boldsymbol \delta }$$ contribute to every frequency, νj. Otherwise, only δj contributes. For each source, si, we draw each parameter, xi, k, from   \begin{eqnarray} p&(&\!x_{i,k}|{\boldsymbol \delta },{\boldsymbol x}_{i,k^\prime \ne k},{\boldsymbol \mu }, \Sigma,{\boldsymbol L}_{\nu ,i}^{{\rm{obs}}}) \nonumber\\ &&\propto p({\boldsymbol x}_i|{\boldsymbol \mu },\Sigma) \times \prod _{j=1}^m p(L_{\nu ,i,j}^{{\rm{obs}}}|{\boldsymbol x}_i,\delta _j). \end{eqnarray} (34) We implement the component-wise interweaving strategy (CIS; Yu & Meng 2011). To do so, we iterate the following steps by looping on each parameter of index k΄. For each source, si, and each frequency, νj, we compute the new following variable:   \begin{equation} \tilde{\delta }_{i,j} = (1+\delta _j)\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j). \end{equation} (35)In practice, $$L_\nu ^{{{\rm mod}}}({\boldsymbol x}_i,\nu _j)$$ can simply be the model component which is controlled by the parameter we are looping over. If the parameter is tied to another one, then its component needs to be added. Then, for one given source, $$s_{i^\prime }$$, we draw a new value of the physical parameter we are looping over, keeping $${\tilde{{\boldsymbol \delta}}}_{i^\prime }$$ constant:   \begin{eqnarray} p&(&\!x_{i^\prime ,k^\prime }|{\tilde{{\boldsymbol \delta}}}_{i^\prime }, x_{i^\prime ,k\ne k^\prime },{\boldsymbol \mu }, \Sigma )\propto \nonumber\\ &&p ({\boldsymbol x}_{i^\prime}|\mu, \Sigma )\times p\left({\boldsymbol \delta ^\prime }=\ln \frac{{\tilde{{\boldsymbol \delta}}}_{i^\prime }}{{\boldsymbol L}_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime })}\right), \end{eqnarray} (36)where the second term of the right-hand side is simply the distribution of equation (30), replacing each $$\delta ^\prime _j$$ by $$\ln (\tilde{\delta }_{i^\prime ,j} /L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _j))$$. In practice, we select a different i΄ at each MCMC cycle, to improve statistical mixing. Notice that the likelihood does not appear, as we sample in a direction where it is constant. That is the key of the success of ASIS. We then compute the values of the parameters of the remaining sources, $$x_{i\ne i^\prime ,k^\prime }$$. This is achieved by inverting equation (35) for an arbitrarily chosen frequency $$\nu _{j^\prime }$$ (we change j΄ at each MCMC cycle) and eliminating δj:   \begin{equation} x_{i\ne i^\prime ,k^\prime } = L_{\nu ,i,j^\prime ,k^\prime }^{{\rm{inv}}} \left(\frac{\tilde{\delta }_{i,j^\prime }}{\tilde{\delta }_{i^\prime ,j^\prime }}\times L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _{j^\prime })\right), \end{equation} (37)where $$L_{\nu ,i,j,k}^{{\rm{inv}}}$$ is the model inverse function, i.e. it is the value of parameter xk corresponding to the monochromatic luminosity in the argument, at frequency νj, fixing the other parameters, $$x_{k\ne k^\prime }$$. Its numerical implementation is discussed in Appendix A3. We update the calibration offsets, for each frequency, νj, by solving equation (35):   \begin{equation} \delta _j = \frac{\tilde{\delta }_{i^\prime ,j}}{L_\nu ^{{\rm{mod}}}({\boldsymbol x}_{i^\prime },\nu _j)}-1. \end{equation} (38) The prior on $${\boldsymbol \mu }$$ being uniform, we draw the μk elements, one by one, from   \begin{equation} p(\mu _k|\mu _{k^\prime \ne k},{\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n, \Sigma) \propto \prod _{i=1}^np({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (39) The standard deviations, Sk, k, are drawn, one by one, from   \begin{equation} p(S_{k,k}|S_{k^\prime \ne k,k^\prime \ne k}, {\boldsymbol x}_1,\ldots ,{\boldsymbol x}_n,\mathcal {R}) \propto p(\mathcal {S})\times \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (40) Finally, the elements of the correlation matrix, $$\mathcal {R}$$, are drawn, one by one, from   \begin{equation} p(R_{k_1,k_2}|R_{k_1^\prime \ne k_1,k_2^\prime \ne k_2},{\boldsymbol x}_i,\mathcal {S}) \propto p(\mathcal {R})\times \prod _{i=1}^n p({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma). \end{equation} (41) 3.3.2 Assessing optimal sampling We derive several quantities in order to assess the convergence of the MCMC. The autocorrelation function (ACF) of a parameter, x, is, for a lag,3k:   \begin{equation} \rho _x(k) = \frac{{\rm cov}_t(x^{(t)},x^{(t+k)})}{\sqrt{{\rm var}_t(x^{(t)}) {\rm var}_t(x^{(t+k)})}}, \end{equation} (42)where x(t) is the value of parameter x at the step t of the MCMC. The notations vart and covt indicate that the variance and covariance are taken over the index t. We numerically evaluate it, using the FFT of the MCMC. From the ACF, we can estimate the integrated autocorrelation time (e.g. Foreman-Mackey et al. 2013):   \begin{equation} \tau _{\rm int}(x) = 1 + 2\sum _{k=1}^{N_{\rm MCMC}}\rho _x(k), \end{equation} (43)following the method of Sokal (1997). It quantifies the length after which the drawings are truly independent. The effective sample size, defined as   \begin{equation} N_{\rm eff}(x) = \frac{N_{\rm MCMC}}{\tau _{\rm int}(x)} \end{equation} (44)provides an estimate of the number of effective independent draws. We let our MCMC run until Neff > 30, at least, for each parameter. 3.3.3 Derived quantities Once the MCMC has been computed, we estimate a few quantities characterizing the parameter distribution. For each parameter and hyperparameter, y, we derive its mean and standard deviation, 〈y〉 and σy, marginalizing over the other parameters. This is technically achieved by taking the average and standard deviation of the MCMC of the parameter. Similarly, we can estimate any other quantity (correlation coefficients, degree of confidence, etc.), by computing this quantity at each MCMC step, and estimating the average and standard deviation of the resulting distribution of values. 4 EFFECTS OF THE NOISE, SAMPLE SIZE, AND SED SHAPE In this section and in Sections 5 and 6, we dissect several tests aimed at assessing the performance of our HB code. These tests are all performed on simulated data, so that one can compare the results of the model to the true values of the parameters. In the present article, we exclusively simulate data with the same assumptions as in the HB model. These assumptions are the followings. The noise: Both the distribution (normal, asymmetric, etc.; see Section 3.2.3) and the amplitude of the noise have an effect on the results. In this paper, we exclusively use normal, uncorrelated noise and assume that we perfectly know its amplitude. The physical model: We simulate SEDs with the same combination of components (see Section 2.2) as in the HB method. In Sections 4 and 5, we use only the combination of the powerU (Section 2.2.5) and starBB (Section 2.2.6) components. This combination is indeed one of the most relevant, when modelling the near-IR-to-submm emission of interstellar regions. We use the grain properties of the AC model of Galliano et al. (2011). We demonstrate the model performances with several other components in Section 6. The prior distribution: The assumed shape of the prior (equation 25) also has an impact on low signal-to-noise ratio sources. In what follows, we draw parameter values from the same distribution as in equation (25). 4.1 The simulation grid We start by studying the combined effects of signal-to-noise ratio, sample size, and SED shape on the performance of the HB method. To do so, we simulate a grid of SED samples, varying these three quantities. We simulate three different classes of SED shapes, labelled cold, warm, and hot. The physical parameters of each simulation are drawn from the distribution of equation (25) with the hyperparameters listed in Table 1. These SEDs are shown in Fig. 2. We assume that these SEDs are observed through a typical collection of photometric filters: the four Spitzer/IRAC bands (3.6, 4.5, 5.8, and 8 μm), the Spitzer/MIPS band at 24 μm, the three Herschel/PACS bands (70, 100, and 160 μm) and the three Herschel/SPIRE bands (250, 350, and 500 μm). For each SED shape, we generate three samples of n = 10, 100, and 1000 sources. Figure 2. View largeDownload slide Classes of simulated SEDs. The solid lines present the mean of the simulated SEDs of Table 1. The dashed lines show the ±1σ ranges. For each class, we quote the average starlight intensity, $$\bar{U}$$, of the mean, and the approximate equilibrium grain temperature, Td. The latter is derived from $$T_{\rm d}\simeq \bar{U}^{1/6}\times 18\,\rm K$$. We have displayed the transmission of each filter used (grey densities) and labelled its nominal wavelength, in microns. Figure 2. View largeDownload slide Classes of simulated SEDs. The solid lines present the mean of the simulated SEDs of Table 1. The dashed lines show the ±1σ ranges. For each class, we quote the average starlight intensity, $$\bar{U}$$, of the mean, and the approximate equilibrium grain temperature, Td. The latter is derived from $$T_{\rm d}\simeq \bar{U}^{1/6}\times 18\,\rm K$$. We have displayed the transmission of each filter used (grey densities) and labelled its nominal wavelength, in microns. Table 1. True hyperparameters for the three SED shapes of Section 4.1. These values are the elements of $${\boldsymbol \mu }$$, $$\mathcal {S}$$, and $$\mathcal {R}$$ of the distribution in equation (25). They correspond to the parameters described in Sections 2.2.5 and 2.2.6. M is in M⊙ and L⋆ in L⊙.   Cold  Warm  Hot  μ[ln M]  0  μ[ln U−]  ln 0.3  ln 10  ln 100  μ[ln ΔU]  ln 10  ln 300  ln 104  μ[α]  1.8  μ[qPAH]  0.06  0.04  0.02  μ[f+]  0.5  μ[ln L⋆]  ln 103  ln 104  ln 105  S[ln M]  0.5  S[ln U−]  0.4  S[ln ΔU]  0.5  S[α]  0.3  S[qPAH]  0.01  S[f+]  0.1  S[ln L⋆]  0.1  R[all]  0    Cold  Warm  Hot  μ[ln M]  0  μ[ln U−]  ln 0.3  ln 10  ln 100  μ[ln ΔU]  ln 10  ln 300  ln 104  μ[α]  1.8  μ[qPAH]  0.06  0.04  0.02  μ[f+]  0.5  μ[ln L⋆]  ln 103  ln 104  ln 105  S[ln M]  0.5  S[ln U−]  0.4  S[ln ΔU]  0.5  S[α]  0.3  S[qPAH]  0.01  S[f+]  0.1  S[ln L⋆]  0.1  R[all]  0  View Large We add random deviations to the simulated SED samples. These deviations are divided in the two following categories. Calibration offsets, δj, are drawn from the distribution of equation (30), keeping the same values for each source in the sample. However, we draw different offsets for each sample, in order to average out biases that could result from a particular realization. Noise deviations, εi, j, are drawn from a normal distribution (equation 22). These variables are independent. We assume that the absolute noise level is the same in each source of a given sample. This is to mimic observations of spatially resolved regions, where the RMS per waveband is roughly constant. We set the noise uncertainty at a frequency, νj, proportional to the simulated median of the monochromatic luminosities of all the sources, si:   \begin{equation} \sigma _{\nu ,j} = \frac{{\rm med}\left(L_\nu ^{{\rm{mod}}}({\boldsymbol x}_i,\nu _j)\right)}{f_{\rm S/N}}. \end{equation} (45)For each SED shape and sample size, we simulate three realizations of the noise with median signal-to-noise ratios, fS/N = 0.3, 3, and 30. Samples with fS/N = 0.3 are dominated by the noise, while samples with fS/N = 30 are dominated by the calibration errors. In total, we have 33 = 27 simulations. 4.2 Dissection of a model's results To start, we analyse in details the central run in the simulation grid (warm SED, with n = 100 and fS/N = 3), in order to demonstrate how the model works, on a concrete example. 4.2.1 The MCMC In Fig. 3, we show the first 2000 steps of the MCMC of the parameter ln M, for the brightest source in the sample. The distribution of values sampled by the chain, shown in the right-hand panel of Fig. 3, is the marginal posterior distribution of the parameter. Notice that the fluctuations of the chain are not independent. There are structures, like the one highlighted in green, having a length of the order of the integrated autocorrelation time (τint; equation 43). We have chosen an example with a particularly short autocorrelation time, for clarity. However, τint can reach up to ≃105 for some parameters (see Section 4.4). We have also highlighted the burn-in phase, in red. This phase corresponds to the time spent by the MCMC to walk from the initial condition to the region of relevant likelihood. It is advised to exclude this part of the chain from the analysis. Using the least-squares best-fitting parameters as initial conditions (Section 3.3.1), we have not witnessed any particularly long burn-in. It is usually of the order of τint. Figure 3. View largeDownload slide MCMCof a parameter. The parameter is ln M for the brightest source in the central simulation of Section 4.1. The right-hand panel shows the corresponding distribution of the parameter, and its derived average and standard deviation. We highlight the burn-in phase (in red) and a typical autocorrelation time (in green). For clarity, we show only the first 2000 steps. Figure 3. View largeDownload slide MCMCof a parameter. The parameter is ln M for the brightest source in the central simulation of Section 4.1. The right-hand panel shows the corresponding distribution of the parameter, and its derived average and standard deviation. We highlight the burn-in phase (in red) and a typical autocorrelation time (in green). For clarity, we show only the first 2000 steps. The autocorrelation of the chain can be quantified. Fig. 4 shows the ACF (equation 42) for two hyperparameters of the central simulation. These ACFs all have the same qualitative behaviour. They start around 1, at small lags. They then drop towards 0 in a time that is comparable to τint. Finally, for large lags, they oscillate around 0 with a small amplitude. In Fig. 4, we compare the ACFs and integrated autocorrelation times obtained with (i) our full method including ASIS (Section 3.3.1), in blue; and (ii) standard Gibbs sampling, switching off ASIS, in red. We can see that, using ASIS, τint is reduced by a factor of ≃6–9, in this particular case. It illustrates that the implementation of ASIS can help reach convergence with a significantly shorter chain. Figure 4. View largeDownload slide Autocorrelation functions (equation 42) for two hyperparameters. The two hyperparameters are the averages of ln M and ln U− (Section 2.2.5), for the central simulation of Section 4.1. The blue curves correspond to the ACFs obtained with our full method, including ASIS (Section 3.3.1). The red curves correspond to the same ACFs, but switching off ASIS. Figure 4. View largeDownload slide Autocorrelation functions (equation 42) for two hyperparameters. The two hyperparameters are the averages of ln M and ln U− (Section 2.2.5), for the central simulation of Section 4.1. The blue curves correspond to the ACFs obtained with our full method, including ASIS (Section 3.3.1). The red curves correspond to the same ACFs, but switching off ASIS. 4.2.2 The SEDs Fig. 5 shows examples of SED fits for the central simulation. Panel (a) shows the SED of the brightest pixel and panel (b), the faintest. The SED probability density is simply the distribution of SED models, computed with the values of the parameters, at each step in the MCMC. Obviously, the SED density is more dispersed for the low signal-to-noise ratio source. We see that, in both cases, the HB SED density is in better agreement with the true SED (in red) than the least-squares fit (in green). We also notice that the PAH fraction of the χ2, in panel (b), has been clearly underestimated, while the HB model is close to its true value. Panel (c) shows the calibration offsets ($${\boldsymbol \delta }$$). The simulated offsets are shown in red. We emphasize that they are common to all the sources in the sample. We can see that the inferred values of $${\boldsymbol \delta }$$ (in blue) are consistent with the true values. Figure 5. View largeDownload slide SED fits of two sources. The two sources are the brightest (panel a) and the faintest (panel b) of the central simulation of Section 4.1. The circle with error bars are the synthetic observations. Upper limits are quoted at 3σ. The blue-to-black density shows the HB probability distribution of the SED. The red line shows the true SED (without noise). The green lines are the least-squares fit, for comparison. For each SED, we quote the median (over frequencies) of the signal-to-noise ratio (medν[S/N]). Panel (c) shows the calibration offsets $${\boldsymbol \delta }$$ (common to both SEDs). The red dots are the true offsets, and the blue circle with error bars are the inferred values. Figure 5. View largeDownload slide SED fits of two sources. The two sources are the brightest (panel a) and the faintest (panel b) of the central simulation of Section 4.1. The circle with error bars are the synthetic observations. Upper limits are quoted at 3σ. The blue-to-black density shows the HB probability distribution of the SED. The red line shows the true SED (without noise). The green lines are the least-squares fit, for comparison. For each SED, we quote the median (over frequencies) of the signal-to-noise ratio (medν[S/N]). Panel (c) shows the calibration offsets $${\boldsymbol \delta }$$ (common to both SEDs). The red dots are the true offsets, and the blue circle with error bars are the inferred values. 4.2.3 The derived parameters Fig. 6 compares the performances of different methods, applied to the central simulation. It shows the derived correlation between two of the main parameters: the dust mass, M, and the average starlight intensity, $$\bar{U}$$ (equation 10). The true values are shown in red, in each panel. Notice that there is no intrinsic correlation between the two parameters. However, the least-squares fit, in panel (a), shows a clear negative correlation, with a significant scatter. This is the equivalent of the noise-induced β–T negative correlation, for MBBs, demonstrated by Shetty et al. (2009). In panel (b), we display the non-HB results. The stars and error ellipses show the means and covariances of the posterior distribution (equation 17), including the calibration offsets in the likelihood and its prior distribution. The inferred values show a significant scatter and a negative correlation between the two parameters. Notice that the uncertainties are very different between panels (a) and (b). The Bayesian error ellipses are more rigorous, as they are directly derived from the actual shape of the likelihood, and not from a parabolic approximation, in the χ2 case (Appendix C). Panel (b) demonstrates that, although the non-HB method provides an accurate description of the likelihoods of each source, it does not help to significantly reduce the scatter and eliminate false correlations. Finally, panel (c) shows the HB results (in blue). The scatter is considerably reduced, compared to the previous cases. In addition, there is no more false correlation between the parameters. The sizes of the error ellipses have also been greatly reduced, especially for the lowest signal-to-noise ratio sources. However, notice that these uncertainties are still consistent with the true values. Panel (c) of Fig. 6 demonstrates, on a concrete case, the effect that panel (c) of Fig. 1 was trying to illustrate: the reduction of the dispersion of low signal-to-noise ratio sources by the prior. Panel (a) of Fig. 7 shows the non-HB posterior distribution (green density), for the brightest and faintest sources of the simulation, in the same parameter space as Fig. 6. The PDF of the faintest source clearly extends out of the range of the figure, as its SED has a median signal-to-noise ratio of only 0.66 (panel b of Fig. 5). Panel (b) of Fig. 7 shows the HB posteriors, for the same sources. Comparing the two panels, we see that the PDF of the brightest source is almost not modified by the prior. On the contrary, the PDF of the faintest source is brought back towards its true value. Comparing this PDF to its mean value and error ellipse, we see it is noticeably skewed. Figure 6. View largeDownload slide Efficiency of different methods at recovering parameters. In each panel, we show the true (in red) and the inferred values (stars with error ellipses) of the mass, M, and average starlight intensity, $$\bar{U}$$, for the central simulation of Section 4.1. Panel (a) shows the least-squares results (Appendix C; in green). Panel (b) shows the non-HB inference (in orange). Panel (c) shows the HB results (in blue). Figure 6. View largeDownload slide Efficiency of different methods at recovering parameters. In each panel, we show the true (in red) and the inferred values (stars with error ellipses) of the mass, M, and average starlight intensity, $$\bar{U}$$, for the central simulation of Section 4.1. Panel (a) shows the least-squares results (Appendix C; in green). Panel (b) shows the non-HB inference (in orange). Panel (c) shows the HB results (in blue). Figure 7. View largeDownload slide Posterior distributions of two sources. These are the brightest and faintest sources of the central simulation of Section 4.1. We show the same correlation between M and $$\bar{U}$$ as in Fig. 6, and keep the same dynamic range. Panel (a) shows the non-HB values (green density), corresponding to panel (b) of Fig. 6. Panel (b) shows the HB values (blue density), corresponding to panel (c) of Fig. 6. In each panel, the grey stars are the averages of the parameters over the posterior, and the grey ellipses are their covariances. The true values are shown in red, in both panels. Figure 7. View largeDownload slide Posterior distributions of two sources. These are the brightest and faintest sources of the central simulation of Section 4.1. We show the same correlation between M and $$\bar{U}$$ as in Fig. 6, and keep the same dynamic range. Panel (a) shows the non-HB values (green density), corresponding to panel (b) of Fig. 6. Panel (b) shows the HB values (blue density), corresponding to panel (c) of Fig. 6. In each panel, the grey stars are the averages of the parameters over the posterior, and the grey ellipses are their covariances. The true values are shown in red, in both panels. 4.3 The role of the prior distribution To illustrate the role of the prior distribution, Fig. 8 shows the results of the HB code on three of the simulations of Section 4.1: a warm SED, with n = 1000 sources. The three panels show the relation between the average starlight intensity, $$\bar{U}$$, and the PAH mass fraction, qPAH. In panel (a), the median signal-to-noise ratio (equation 45) is high (fS/N = 30). As a consequence, the parameters of each source are well constrained. The uncertainty ellipses have a characteristic size much smaller than the width of the distribution of parameters. The typical uncertainty in qPAH is indeed $$\sigma _{q^{{{\rm PAH}}}}\simeq 2\times 10^{-3}$$, while the standard deviation of the distribution along qPAH is S[qPAH] ≃ 0.01. The prior distribution is thus rather flat compared to the likelihood of an individual source. Therefore, the multiplication of the likelihood by the prior (equation 19) does not have a significant effect. As a result, the posterior distribution is close to the non-hierarchical case (e.g. Fig. 7). Figure 8. View largeDownload slide Demonstration of the role of the prior. The three panels show the results for the simulations of Section 4.1, with a warm SED, n = 1000 sources, and with median signal-to-noise ratios fS/N = 30 (panel a), fS/N = 3 (panel b), and fS/N = 0.3 (panel c). The red dots show the true values of the average starlight intensity, $$\bar{U}$$, and of the PAH mass fraction, qPAH. The blue stars and their error ellipses are the HB inference. Figure 8. View largeDownload slide Demonstration of the role of the prior. The three panels show the results for the simulations of Section 4.1, with a warm SED, n = 1000 sources, and with median signal-to-noise ratios fS/N = 30 (panel a), fS/N = 3 (panel b), and fS/N = 0.3 (panel c). The red dots show the true values of the average starlight intensity, $$\bar{U}$$, and of the PAH mass fraction, qPAH. The blue stars and their error ellipses are the HB inference. In contrast, when the median signal-to-noise ratio decreases (panels b and c of Fig. 8), the width of the parameter distribution is unchanged (S[qPAH] ≃ 0.01 for each panel), but the uncertainty on the parameters of each source increases. In panel (b), with fS/N = 3, the two quantities are roughly equal. The multiplication of the likelihood by the prior thus has an effect on the posterior distributions. In particular, the sources at low $$\bar{U}$$ have a lower signal-to-noise ratio. Their mean values (blue stars) tend to delineate the shape of the prior distribution. The size of their uncertainty ellipses is also reduced by the prior. In panel (c), with fS/N = 0.3, the signal-to-noise ratio is so low that the individual likelihoods are much larger than the prior. As a result, the posterior distribution is very close to the prior distribution. Consequently, all the mean values of qPAH (blue stars) are almost perfectly equal to the prior's average ($$\langle q^{{\rm{PAH}}}_i\rangle \simeq \mu [q^{{\rm{PAH}}}]$$). The uncertainty ellipses of the posterior distribution have the width of the prior distribution ($$\sigma _{q^{{\rm{PAH}}}_i}\simeq S[q^{{\rm{PAH}}}]$$). However, notice that along the horizontal axis, the parameter $$\bar{U}$$, which is better constrained, still exhibits an intrinsic distribution of values. If we were to decrease even more the signal-to-noise ratio, the blue stars would all collapse on to one single point in the panel with coordinates $$(\mu [\bar{U}],\mu [q^{{\rm{PAH}}}])$$. The HB method is particularly useful in cases like panel (c) of Fig. 8. It is in such a case that it provides results significantly better than non-HB and least-squares methods. In panel (c), the low signal-to-noise ratio prevents performing any relevant analysis of individual sources. However, the fact that the HB method deals with the whole probability distribution of the sample, allows us to recover average properties. In other words, any constraint, even with an extremely low signal-to-noise ratio, is taken into account in the HB approach. 4.4 Systematic analysis of the model's performances After scrutinizing select model's results, let us now study the performances of the HB method over the whole simulation grid of Section 4.1. In particular, we need to understand how close the derived parameters are from their true values. We are also interested in knowing how the HB method improves the results, compared to the χ2 fit. For that purpose, we define the following metric, for each parameter and hyperparameter, y:   \begin{equation} D[y] =\left\{ \begin{array}{lc} \displaystyle\frac{\left\langle y^{\rm{HB}}\right\rangle -y^{{\rm true}}}{\sigma _y^{{\rm {HB}}}} &\quad{\rm{for\, the\, HB\, case,}} \\ \displaystyle\frac{y_{\chi ^2}-y^{\rm{true}}}{\sigma _y^{\rm{HB}}} & \,\,\,{\rm for\, the\, }\chi ^2{\rm case,} \end{array} \right. \end{equation} (46)where 〈yHB〉 and $$\sigma _y^{{{HB}}}$$ are the mean and standard deviation over the MCMC of y, $$y_{\chi ^2}$$, the least-squares value, and ytrue, the true value. With this definition, we can study the relative deviation of the HB values, from their true values: a value |D[y]| ≤ Nσ means that the HB value is consistent within Nσ. In addition, we can directly compare the HB and χ2 deviations, as they have the same denominator. 4.4.1 Performances for the hyperparameters Fig. 9 compares the distributions of D[y] for all of the hyperparameters of the simulation grid (Section 4.1). Panel (a) shows the distribution of the recovered means, 〈μk〉 (Section 3.2.4). Since there are 27 models, with 7 parameters per model (Table 1), this distribution contains a total of 7 × 27 = 189 values. Similarly, panel (b) shows the distribution of the standard deviations, 〈Sk, k〉 (diagonal elements of $$\mathcal {S}$$; Section 3.2.4; 189 values). Panel (c) shows the distribution of the non-diagonal elements of the correlation matrix, $$\langle R_{k,k^\prime }\rangle$$ (Section 3.2.4; 7(7 − 1)/2 × 27 = 567 values). Panel (d) shows the calibration offsets, 〈ln (1 + δj)〉. There are 11 photometric filters (Section 4.1), thus this distribution contains 11 × 27 = 297 values. Figure 9. View largeDownload slide Recovery performances for the hyperparameters and calibration offsets. The four panels show the distribution of the relative deviation D[y] (equation 46), for the hyperparameters of all the simulations of Section 4.1. We have separated the hyperparameters by type: panel (a) shows the distribution for all the means, μk; panel (b), all the standard deviations, Sk, k; panel (c), all the correlation coefficients, $$R_{k,k^\prime }$$; and panel (d), all the calibration offsets, ln (1 + δj). For the χ2 method, we quote the mean relative residuals, as the calibration offsets. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. Figure 9. View largeDownload slide Recovery performances for the hyperparameters and calibration offsets. The four panels show the distribution of the relative deviation D[y] (equation 46), for the hyperparameters of all the simulations of Section 4.1. We have separated the hyperparameters by type: panel (a) shows the distribution for all the means, μk; panel (b), all the standard deviations, Sk, k; panel (c), all the correlation coefficients, $$R_{k,k^\prime }$$; and panel (d), all the calibration offsets, ln (1 + δj). For the χ2 method, we quote the mean relative residuals, as the calibration offsets. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. We can first note that the recovered values are tightly distributed around the true values. Most of the values of the hyperparameters are within 3σ of their true values. The quantity having the widest distribution is the standard deviation Sk, k. Secondly, comparing the HB to the χ2 results, we can see that the HB method systematically improves the results. In particular, the standard deviations, Sk, k, and correlation coefficients, $$R_{k,k^\prime }$$, are the quantities for which the improvement is the most notable. The χ2 distribution of Sk, k is clearly skewed towards positive values. It is due to the fact that the χ2 method always leads to more dispersed parameter distributions (e.g. Fig. 6, panel a). Table 2 quantifies the main properties of these histograms. The fraction of outliers (>3σ; column 4) is consistent with a Gaussian distribution, except for the elements of $$\mathcal {S}$$. The Sk, k values are notably more spread out. They also have the most skewed distribution (largest absolute 〈DHB[y]〉; column 2). Inspecting these histograms, we notice that most of the outliers correspond to the PAH charge fraction, f+ (equation 6). It is indeed the most degenerate parameter, with the collection of photometric filters we have chosen. This parameter controls mainly the 3.3, 11.2, and 12.7 μm PAH features. In our simulation grid, the only constraint on these bands is provided by the IRAC3.6 μm band, as can be seen in Fig. 2. However, this photometric band also constrains the stellar continuum, L⋆, and the calibration offset, $$\delta _{\rm {IRAC}_{\rm 3.6\,\mu m}}$$. In addition, the 3.3 μm-to-continuum ratio is often very low (e.g. panel b of Fig. 5). The value of f+ is thus very poorly constrained. All things considered, it is remarkable that the values of this parameter are properly recovered, in most of the cases. Table 2. Statistics of the recovery performances. For each parameter and hyperparameter, y, we quote the following properties of the histograms presented in Figs 9 and 10. (2) 〈DHB[y]〉 is the average of the distribution. (3,4) f(|DHB[y]| ≤ N) is the fraction of absolute values below N. (5) Max(|DHB[y]|) is the maximum deviation, in number of σ. (6) Finally, $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$ is the median of the ratio between the χ2 and HB absolute deviations. The latter quantifies by how much the parameter recovery has been improved. The last two lines show the corresponding confidence levels of two common probability distributions: a Gaussian and a Student's t distribution with f = 3 degrees of freedom. (1)  (2)  (3)  (4)  (5)  (6)  y  〈DHB[y]〉  f(|DHB[y]| ≤ 1)  f(|DHB[y]| ≤ 3)  Max(|DHB[y]|)  $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$  μk  −0.13  79.9  per cent  99.5  per cent  3.8  1.7  Sk, k  −0.52  56.1  per cent  91.5  per cent  5.8  9.0  $$R_{k,k^\prime }$$  0.0069  95.4  per cent  100.0  per cent  2.2  8.4  ln (1 + δj)  0.027  69.4  per cent  99.3  per cent  4.2  2.4  ln Mi  0.16  76.6  per cent  99.8  per cent  4.8  2.3  $$\ln \bar{U}_i$$  −0.29  78.6  per cent  99.9  per cent  5.0  2.7  $$q^{{\rm{PAH}}}_i$$  0.46  60.4  per cent  97.8  per cent  5.5  2.2  $$f^+_i$$  −0.072  68.2  per cent  99.2  per cent  5.0  3.6  Common probability distributions  Gaussian  0  68.3  per cent  99.7  per cent  –  –  Student's t (f = 3)  0  60.9  per cent  94.2  per cent  –  –  (1)  (2)  (3)  (4)  (5)  (6)  y  〈DHB[y]〉  f(|DHB[y]| ≤ 1)  f(|DHB[y]| ≤ 3)  Max(|DHB[y]|)  $${\rm med}(|D_{\chi ^2}/D^{{\rm{HB}}}[y]|)$$  μk  −0.13  79.9  per cent  99.5  per cent  3.8  1.7  Sk, k  −0.52  56.1  per cent  91.5  per cent  5.8  9.0  $$R_{k,k^\prime }$$  0.0069  95.4  per cent  100.0  per cent  2.2  8.4  ln (1 + δj)  0.027  69.4  per cent  99.3  per cent  4.2  2.4  ln Mi  0.16  76.6  per cent  99.8  per cent  4.8  2.3  $$\ln \bar{U}_i$$  −0.29  78.6  per cent  99.9  per cent  5.0  2.7  $$q^{{\rm{PAH}}}_i$$  0.46  60.4  per cent  97.8  per cent  5.5  2.2  $$f^+_i$$  −0.072  68.2  per cent  99.2  per cent  5.0  3.6  Common probability distributions  Gaussian  0  68.3  per cent  99.7  per cent  –  –  Student's t (f = 3)  0  60.9  per cent  94.2  per cent  –  –  View Large Column (6) of Table 2 shows that the standard deviations and correlation coefficients are recovered a factor ≃8 times better for the HB method than with the χ2. Quoting the median instead of the mean of the ratio in column (6) is conservative, as there are more outliers with the χ2 method. 4.4.2 Performances for the parameters Fig. 10 shows the distributions of D[y] for the four most physically relevant parameters (M, $$\bar{U}$$, qPAH, and f+). Each of these panels contains the values of the parameter for each region in the 27 simulations (Section 4.1). Since, there are nine simulations with 10 regions, nine with 100, and nine with 1000, the total number of points in each of these distributions is 9990. The HB distributions are tightly centred around 0. The two best constrained parameters, M and $$\bar{U}$$, have a 3σ degree of confidence (column 4 of Table 2) close to a Gaussian. The two other parameters, qPAH and f+, which are more degenerate, are still well constrained. They are, however, slightly more spread out, with a fraction of outliers between a Gaussian and a Student's t distribution. On average, there is no clear bias. After inspection, we do not find any clear trend of these residuals with signal-to-noise ratio, SED shape or sample size. Figure 10. View largeDownload slide Recovery performances for the main parameters. The four panels show the distribution of the relative deviation D[y] (equation 46), for the main parameters of each source in all the simulations of Section 4.1. We focus here on the most important parameters: panel (a) shows the distribution of the dust mass per source, ln Mi; panel (b), the average starlight intensity, $$\bar{U}_i$$ (equation 10); panel (c), PAH mass fraction, $$q^{{\rm{PAH}}}_i$$; and panel (d), the charge fraction of PAHs, $$f^+_i$$. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. Figure 10. View largeDownload slide Recovery performances for the main parameters. The four panels show the distribution of the relative deviation D[y] (equation 46), for the main parameters of each source in all the simulations of Section 4.1. We focus here on the most important parameters: panel (a) shows the distribution of the dust mass per source, ln Mi; panel (b), the average starlight intensity, $$\bar{U}_i$$ (equation 10); panel (c), PAH mass fraction, $$q^{{\rm{PAH}}}_i$$; and panel (d), the charge fraction of PAHs, $$f^+_i$$. The blue histograms correspond to the HB values, while the red histograms represent the χ2 results. We have highlighted the 1σ (in dark grey) and the 3σ (in light grey) ranges. The comparison to the χ2 distribution demonstrates that the parameters are recovered ≃2–3 times better with the HB method (column 6 of Table 2). In addition, the outliers are more numerous with the χ2 method, as the χ2 uncertainties are less reliable (Section 4.2.3). 4.5 The integrated autocorrelation times The integrated autocorrelation times, τint (equation 43), for each parameter and hyperparameters of the simulation grid are represented in Fig. 11. We can see that the calibration offsets and the elements of the covariance matrix converge faster than the averages and the parameters. The inspection of these distributions does not show any obvious trend of τint with signal-to-noise ratio, sample size or SED shape. Overall, the maximum time in the whole grid is τint ≃ 3 × 104. It means that, to make sure the MCMC has converged towards the stationary posterior, a chain of length of NMCMC ≃ 106, after the burn-in phase, is adequate. Figure 11. View largeDownload slide Integrated autocorrelation times for the simulation grid. This figure shows the distribution of τint (equation 43) for the distribution of individual parameters (panel a, red); the calibration offsets, ln (1 + δj) (panel a, green); the averages, μk (panel b, blue); and the elements of the covariance matrix, Sk, k and $$R_{k,k^\prime }$$ (panel b, purple). To build the histogram of the parameters (panel a), we have normalized the number of points by the number of sources, n, in order to give the same weight to each simulation. Figure 11. View largeDownload slide Integrated autocorrelation times for the simulation grid. This figure shows the distribution of τint (equation 43) for the distribution of individual parameters (panel a, red); the calibration offsets, ln (1 + δj) (panel a, green); the averages, μk (panel b, blue); and the elements of the covariance matrix, Sk, k and $$R_{k,k^\prime }$$ (panel b, purple). To build the histogram of the parameters (panel a), we have normalized the number of points by the number of sources, n, in order to give the same weight to each simulation. Knowing where the burn-in phase ends is however more difficult. One can, for example, run several parallel chains, with different initial conditions. In our case, since we are applying our code to simulated data, we know the value towards which each parameter should converge. It is thus possible to estimate if the burn-in phase is passed. Inspecting our results, we did not find any parameter where the burn-in phase lasts longer than a few τint. This reassuring feature seems to be the consequence of properly chosen initial conditions: the least-squares values (Section 3.3.1). Indeed, starting the chain with random initial conditions can lead to burn-in phases longer than 106. In this paper, we have excluded the first 105 steps of each MCMC to account for burn-in. 5 VARIATIONS ON THE SIMULATION GRID In this section, we demonstrate the performances of our HB code on additional effects, that were not covered by the simulation grid of Section 4.1. 5.1 The presence of intrinsic correlations The model grid studied in Section 4 was simulated with no intrinsic correlation between parameters (R[all] = 0; Table 1). The purpose was to stay as general as possible. However, our HB code is designed to efficiently recover correlation coefficients between parameters. To demonstrate its efficiency, we have simulated two samples, with n = 100 sources, and a median signal-to-noise ratio, fS/N = 3. The distribution of hyperparameters is based on the warm SED shape (Table 1), with the following modifications: S[ln U−] = 0.6, S[ln ΔU] = 0.7, S[qPAH] = 0.2, R[ln M, ln U−] = ς 0.5, R[ln M, ln ΔU] = ς 0.5, R[ln M, qPAH] = −ς 0.5, R[ln M, f+] = ς 0.5, R[ln U−, ln ΔU] = 0.5, R[ln U−, qPAH] = −0.5, R[ln U−, f+] = 0.5, R[ln ΔU, qPAH] = −0.5, R[ln ΔU, f+] = 0.5, and R[qPAH, f+] = −0.5. The parameter ς is set to ς = −1, to induce a negative correlation between M and $$\bar{U}$$, and to ς = 1, to induce a positive correlation. Fig. 12 shows the results of the χ2 and HB codes on these two simulations. As noted on the previous examples, the χ2 method leads to a more dispersed distribution of values (panels a and c). More interestingly here, we see that when the true correlation is negative (panel a; ρ = −0.55), the χ2 method finds a tighter correlation (ρ = −0.85); when the true correlation is positive (panel c; ρ = 0.62), the χ2 method finds a negative correlation (ρ = −0.68). Figure 12. View largeDownload slide Efficiency of the method in presence of intrinsic correlations. Each panel shows the relation between the dust mass, M, and the average starlight intensity, $$\bar{U}$$. The true values are the red dots. Panels (a) and (b) show the results applied to simulations with an intrinsic negative correlation. Panels (c) and (d) show the results with a positive correlation. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. We quote, in each panel, the true and inferred values of the correlation coefficient, ρ, between ln M and $$\ln \bar{U}$$. Figure 12. View largeDownload slide Efficiency of the method in presence of intrinsic correlations. Each panel shows the relation between the dust mass, M, and the average starlight intensity, $$\bar{U}$$. The true values are the red dots. Panels (a) and (b) show the results applied to simulations with an intrinsic negative correlation. Panels (c) and (d) show the results with a positive correlation. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. We quote, in each panel, the true and inferred values of the correlation coefficient, ρ, between ln M and $$\ln \bar{U}$$. In contrast, the HB method is able to consistently recover the correlation coefficients, whether the true correlation is negative (panel b) or positive (panel d). The quoted correlation coefficients, in panels b and d, for the HB method, have been derived directly from the MCMC, as explained in Section 3.3.3. 5.2 Effect of the wavelength coverage It is obvious that the wavelength coverage has an impact on the recovery of the parameters. However, the nature of this impact on the HB results is not trivial. We have generated two simulations to illustrate this effect. We start from the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3) and make the following modifications. Far-IR coverage: we remove the three SPIRE bands from the set of constraints. The longest wavelength constraint is thus moved from λ = 500 μm to λ = 160 μm. Mid-IR coverage: We add the following WISE and Akari/IRC bands: WISE3.4 μm, WISE4.6 μm, WISE11.6 μm, WISE22.1 μm, IRC3.2 μm, IRC4.1 μm, IRC7.0 μm, IRC9.0 μm, IRC11 μm, IRC15 μm, and IRC18 μm. Fig. 13 shows the results of the HB code, with reduced far-IR coverage. The parameters which are the most affected by this modification are those controlling the shape of the far-IR part of the SED, mainly M and $$\bar{U}$$. Comparing panels (a) and (c), we see that, without long wavelength constraints, the χ2 results are significantly more dispersed. The noise-induced correlation, discussed in Sections 4.2.3 and 5.1, is enhanced in panel (c). On the contrary, comparing panels (b) and (d), we see that the HB inference does not result in an increase of the dispersion, nor any false correlation. Most true values are within 1σ of their HB results. We note that decreasing the wavelength coverage has the same qualitative effect as decreasing the signal-to-noise ratio (see Section 4.3): the prior becomes dominant in the posterior distribution. Figure 13. View largeDownload slide Effect of the far-IR wavelength coverage. The four panels show the results of different methods, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the dust mass, M, and average starlight intensity, $$\bar{U}$$ equation (10). Panels (a) and (b) correspond to the case where the whole wavelength coverage is used. Panels (c) and (d) show the results obtained, excluding the three SPIRE bands, from the set of constraints. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. Figure 13. View largeDownload slide Effect of the far-IR wavelength coverage. The four panels show the results of different methods, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the dust mass, M, and average starlight intensity, $$\bar{U}$$ equation (10). Panels (a) and (b) correspond to the case where the whole wavelength coverage is used. Panels (c) and (d) show the results obtained, excluding the three SPIRE bands, from the set of constraints. The green stars and uncertainty ellipses, in panels (a) and (c), are the least-squares results. The blue stars and uncertainty ellipses, in panels (b) and (d), are the HB results. The mid-IR coverage has an effect, mainly on the PAH mass and charge fractions, qPAH and f+. We discussed in Section 4.4.1 the fact that f+ was particularly poorly constrained. This can be seen in panel (a) of Fig. 14. Increasing the mid-IR wavelength coverage (panel b), this problem is solved and the distribution of parameters is now more accurately recovered. Figure 14. View largeDownload slide Effect of the mid-IR wavelength coverage. The two panels show the HB results, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the PAH mass fraction, qPAH, and PAH charge fraction, f+. The blue stars and uncertainty ellipses are the HB results. Panel (a) corresponds to the original coverage with Spitzer and Herschel photometry, only. Panel (b) shows the results increasing the mid-IR coverage, by adding WISE and Akari bands. Figure 14. View largeDownload slide Effect of the mid-IR wavelength coverage. The two panels show the HB results, applied on the central simulation of Section 4.1 (warm SED, with n = 100 and fS/N = 3). In each panel, the red dots show the true values of the PAH mass fraction, qPAH, and PAH charge fraction, f+. The blue stars and uncertainty ellipses are the HB results. Panel (a) corresponds to the original coverage with Spitzer and Herschel photometry, only. Panel (b) shows the results increasing the mid-IR coverage, by adding WISE and Akari bands. 5.3 The introduction of an external parameter As we start to see here, the HB method is particularly efficient at recovering intrinsic correlations between parameters of the SED model. However, the correlation of these SED parameters with other quantities (e.g. gas mass, metallicity, etc.) is also relevant. In order to treat these external parameters in the HB framework, we need to extend the prior distribution to them (see Appendix B3). Although these external parameters are not free parameters, their mean values are going to be modified consistently with their uncertainties. To demonstrate this process, we have drawn an SED sample and an associated distribution of gas mass Mgas (assuming a typical Galactic Mgas/Mdust ≃ 100). This sample has n = 300 sources and a median signal-to-noise ratio of fS/N = 0.3. We adopt the warm SED distribution (Section 4.1), with the following modifications: S[ln U−] = 0.6, S[ln ΔU] = 0.7, μ[ln Mgas] = ln (100 M⊙), S[ln Mgas] = 0.5, and R[ln Mdust, ln Mgas] = 0.9, where we have noted the dust mass Mdust to avoid confusion. We attribute a 10  per cent uncertainty to the gas mass. Fig. 15 compares the results obtained with and without prior extension. Panel (a) shows that, when Mgas is not in the prior, the correlation found is significantly weaker (ρ = 0.49 ± 0.04) than the intrinsic one (ρ = 0.9). Notice however that the inferred dust masses are consistent with the true values. The largest discrepancies (2–3σ) happen at low column density, where signal-to-noise ratio is low. Panel (b) presents the results with extension of the prior. The correlation between the two parameters is better recovered (ρ = 0.78 ± 0.04). The efficiency of extending the prior to external parameters is increased if the number of sources is larger. Figure 15. View largeDownload slide Extension of the prior to an external parameter. The two panels show the relation between the dust mass, noted Mdust, for clarity here, and the gas mass, Mgas, a parameter not controlling the SED model. The red dots are the true values in both panels. The blue stars and uncertainty bars/ellipses are the HB inference. Panel (a) shows the standard result, with the prior of equation (19): Mdust is derived from the HB code and Mgas is plotted as is. Panel (b) shows the result, introducing the gas mass as an extra parameter in the prior (see Appendix B3). We show error bars rather than ellipses in panel (a), because the two plotted quantities do not come from the same joint PDF. Figure 15. View largeDownload slide Extension of the prior to an external parameter. The two panels show the relation between the dust mass, noted Mdust, for clarity here, and the gas mass, Mgas, a parameter not controlling the SED model. The red dots are the true values in both panels. The blue stars and uncertainty bars/ellipses are the HB inference. Panel (a) shows the standard result, with the prior of equation (19): Mdust is derived from the HB code and Mgas is plotted as is. Panel (b) shows the result, introducing the gas mass as an extra parameter in the prior (see Appendix B3). We show error bars rather than ellipses in panel (a), because the two plotted quantities do not come from the same joint PDF. 6 APPLICATION TO OTHER PHYSICAL MODELS 6.1 Modified black bodies As discussed in Section 2.2.2, MBB is the most widely used dust model, due to its simplicity. The HB dust codes that have been previously presented in the literature all implement it (Kelly et al. 2012; Juvela et al. 2013; Veneziani et al. 2013; Marsh, Whitworth & Lomax 2015). In this section, we apply our HB code to one MBB simulation, in order to confirm its ability to correct the noise-induced negative correlation between T and β (Shetty et al. 2009). This simulation is designed to mimic typical Planck observations of the diffuse Galactic ISM (e.g. Planck Collaboration XVII 2014a; Ysard et al. 2015). We assume: μ[ln M] = 0, μ[ln T] = ln (20.5 K), μ[β] = 1.65, S[ln M] = 0.05, S[ln T] = 0.05, S[β] = 0.01, and R[all] = 0. We draw n = 1000 sources, observed through the following COBE and Planck photometric bands: DIRBE100 μm, DIRBE140 μm, DIRBE240 μm, HFI350 μm, HFI550 μm, and HFI850 μm. We set the median signal-to-noise ratio to fS/N = 10. The results are presented in Fig. 16. Panel (a) displays the least-squares values. Notice the well-known false β–T correlation. Panel (b) demonstrates that the HB method can indeed accurately correct this false correlation, as was also shown by Kelly et al. (2012). Paper II will present more MBB simulations to discuss robustness. Figure 16. View largeDownload slide Application to MBB models. The two panels represent the temperature and emissivity index of a simulation with n = 1000 sources having an MBB SED (Section 2.2.2). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 16. View largeDownload slide Application to MBB models. The two panels represent the temperature and emissivity index of a simulation with n = 1000 sources having an MBB SED (Section 2.2.2). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 6.2 Broken emissivity modified black bodies BEMBB models (Section 2.2.3) have been used by Gordon et al. (2014) and Roman-Duval et al. (2014) to model the Magellanic clouds. To roughly mimic these data, we simulate a sample of n = 300 sources, observed through: PACS100 μm, PACS160 μm, SPIRE250 μm, SPIRE350 μm, and SPIRE500 μm. The parameters are distributed as: μ[ln M] = 0, S[ln M] = 0.05, μ[ln T] = ln (20.5 K), S[ln T] = 0.05, μ[β1] = 1.65, S[β1] = 0.01, μ[β2] = 1.65, S[β2] = 0.01, μ[ln νb] = ln (c/350 μm), S[ln νb] = 0.01, and R[all] = 0. We assume that the median signal-to-noise ratio is fS/N = 10. One of the derived quantities discussed by Gordon et al. (2014) is the submm excess, e500, defined as the relative difference between the actual emission at 500 μm and the emission of an MBB with the same temperature and β = β1:   \begin{equation} e_{500} = \frac{1-\nu _{500}^{\beta _1}}{\nu _{500}^{\beta _2}\nu _b^{\beta _1-\beta _2}}, \end{equation} (47)where ν500 = c/500 μm. Since μ[β1] = μ[β2], our simulation exhibits on average a zero excess. Fig. 17 presents the results. Panel (a) shows that the χ2 method is unable to recover any meaningful excess. It is due to the fact that the parameters of this model are extremely degenerate. On the contrary the HB approach is able to provide a tight distribution, consistent with the true values (panel b). Figure 17. View largeDownload slide Application to BEMBB models. The two panels represent the temperature and 500 μm excess, e500, of a simulation with n = 300 sources having a BEMBB SED (Section 2.2.3). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 17. View largeDownload slide Application to BEMBB models. The two panels represent the temperature and 500 μm excess, e500, of a simulation with n = 300 sources having a BEMBB SED (Section 2.2.3). The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 6.3 Multicomponent dust SED models The number of combinations of dust models is almost infinite. We end this section with a widely used one, introduced by Draine & Li (2007). It consists in the linear combination of a powerU (Section 2.2.5), a deltaU (Section 2.2.4), and a starBB (Section 2.2.6) component. It is aimed at approximating the multiphase nature of the ISM of galaxies, the deltaU component being attributed to diffuse ISM, and the powerU component to hotter PDRs. Several parameters are tied: $$U_{ {\it deltaU}} = U_-^{{\it powerU}}$$, $$q^{{\rm{PAH}}}_{\rm {\it deltaU}}=q^{{\rm{PAH}}}_{\rm {\it powerU}}$$, and $$f^+_{\rm {\it deltaU}}=f^+_{\rm {\it powerU}}$$. In addition, some parameters are fixed: ΔUpowerU = 106 and αpowerU = 2. This way, the powerU component represents dust hotter than the deltaU component and can account for the contribution of star-forming regions to the mid-IR emission of galaxies. The two PAH parameters are tied, as they would otherwise be degenerate. The mass fraction of the powerU component is   \begin{equation} \gamma = \frac{M_{{\it powerU}}}{M_{{\it powerU}}+M_{{\it deltaU}}}. \end{equation} (48)We have performed one simulation to illustrate the performances of the HB method with this model. We have drawn parameters from the warm SED distribution (Section 4.1), with n = 100 sources, and a median signal-to-noise ratio, fS/N = 1, with the following modifications: μ[ln MpowerU] = ln 0.01 and $$\mu [\ln U_-^{{\rm {\it powerU}}}]=\ln 1$$. We have added the deltaU component with μ[ln MdeltaU] = ln 1. The results are presented in Fig. 18. It shows the relation between the mass of the two components, Mtot = MpowerU + MdeltaU, and the parameter γ. As can be seen in panel (a), with the χ2 method, γ is completely degenerate. It spreads the whole range of values between 0 and 100  per cent. On the contrary, the HB method is able to significantly reduce the uncertainties and provide a relevant distribution of parameters, consistent with their true values. Figure 18. View largeDownload slide Application to a multiphase dust model. The two panels represent the total dust mass, Mtot, and the mass fraction of non-uniformly illuminated dust, γ (equation 48), of a simulation with n sources. The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. Figure 18. View largeDownload slide Application to a multiphase dust model. The two panels represent the total dust mass, Mtot, and the mass fraction of non-uniformly illuminated dust, γ (equation 48), of a simulation with n sources. The red dots, in both panels, are the true values. The green stars and uncertainty ellipses, in panel (a), are the χ2 values. The blue stars and uncertainty ellipses, in panel (b), are the HB results. 7 SUMMARY AND CONCLUSION This is the first article in a series of two papers presenting a new model, HerBIE, aimed at deriving dust parameters from IR observations. The main originality of this model is to apply HB inference to full dust models, taking into account realistic optical properties, size distribution, stochastic heating, and distribution of starlight intensities, as well as colour correction and correlated calibration uncertainties. Simply put, the HB method consists of sampling the probability distribution of the physical parameters of an ensemble of sources. This distribution of parameters is modelled with a prior distribution controlled by hyperparameters. The inferred prior distribution does not significantly modify the PDF of sources for which the likelihood is much narrower than the dispersion of the distribution of parameters (Sections 4.2.3 and 4.3). However, the prior has an important effect on sources with low signal-to-noise ratios. We have described the formalism of our model and its numerical implementation (Sections 2–3). We have subsequently applied our model to synthetic observations, in order to quantify its performances (Sections 4–6). The main conclusions are the following. We have compared the performances of least-squares, non-hierarchical, and hierarchical Bayesian methods (Section 4.2.3). We have shown that, although the non-HB approach is better than least squares at estimating the uncertainties on derived parameters, it is not able to correct noise-induced correlations and scatter. On the contrary, the HB approach is particularly efficient at recovering true correlations between parameters and their intrinsic scatter (Sections 4.4 and 5.1). The HB inferred values are also consistently closer to their true values than the least-squares values (Section 4.4). We have systematically studied the performances of our model, varying signal-to-noise ratio, SED shape, and sample size (Section 4.4). We have shown that the recovered parameters are distributed symmetrically around their true values. We did not find any clear bias. The scatter of the best constrained parameters from their true values are close to a normal distribution, with almost 100  per cent of the distribution within 3σ (Section 4.4.2). Poorly constrained, degenerate, parameters are also consistently recovered, but with a fraction of outliers closer to a Student's t distribution. The HB method can reasonably compensate for a partial lack of spectral coverage (Section 5.2). We have demonstrated that one can easily include external parameters in the prior distribution in order to improve the recovery of a potential correlation between these external parameters and the dust properties (Section 5.3). We have applied our HB code with other physical models (different types of MBBs and multicomponent dust SED models; Section 6). We have demonstrated that it works well in these cases, too, and that it can help obtain better constraints on degenerate parameters. We have discussed the issues of convergence towards the stationary posterior. Our choice of starting the MCMC at the least-squares values appears to lead to relatively short burn-in phases. The integrated autocorrelation times of the simulations in this paper are all shorter than a few 104 (Section 4.5). Having implemented ancillarity–sufficiency interweaving strategy to all parameters, we have demonstrated that it significantly reduces the autocorrelation of the Markov chain (Section 4.2.1). The HB technique is quite general (e.g. Shahmoradi 2017). It has been successfully applied to several fields in astrophysics, among others: luminosity distribution of γ-ray bursts (Loredo & Wasserman 1998); supernova studies (Mandel et al. 2009); exoplanet eccentricities (Hogg, Myers & Bovy 2010); galaxy clusters (Andreon & Hurn 2010); Milky Way satellites (Martinez 2015); exoplanet mass–radius relationship (Wolfgang & Lopez 2015).4 We have demonstrated in this paper that it remains accurate over a wide range of sample sizes, source properties, signal-to-noise ratios, and number of model parameters. Such a method can in principle be applied to any field. However, as discussed in Section 4.3, it is relevant mainly when the typical uncertainty on a model parameter is comparable or larger than the typical dispersion of this parameter through the sample. If, on the contrary, the uncertainties on the parameters are much smaller than the dispersion of the sample, this method does not provide significant improvements compared to simpler techniques. The second article of this series (Paper II) will address the robustness of our code. For that purpose, we will apply it to data simulated with different hypotheses from the model (noise distribution, physical components, and distribution of parameters; see Section 4). In parallel, several up-coming papers are presenting the application of this model to real astrophysical data: the Magellanic clouds (Galliano 2017; Galliano et al., in preparation), the Carina nebula (Wu et al., in preparation), the dwarf galaxy IC 10 (Lianou et al., in preparation), and the DustPedia galaxy sample (Davies et al. 2017). Overall, observational dust studies need to emancipate themselves from dated techniques. Although a MBB least-squares fit can be useful in some cases, the development of modern instrumentation renders it more and more obsolete. More precise observations mean that we can access the complexity of the ISM conditions, beyond the simple isothermal approximation. In addition, more accurate fluxes call for an optimal way to extract relevant information from the data. With the model presented here, we have pursued this goal, hoping it will contribute to refining our understanding of dust evolution. Adopting such an approach represents a significant investment in terms of computation time and storage (Appendix D). However, this will naturally be facilitated in the near future by the increase of computational performances. Indeed, nowadays, the industry favours the development of large multicore platforms and Gibbs sampling can be reasonably well parallelized. Finally, this type of study would benefit from a better synergy between observers and instrumentalists. Ideally, the development of new instruments and their reduction pipelines should account for the progress in data analysis techniques, and reciprocally. In particular, the various sources of uncertainties, their correlations, their biases, etc. should be considered as important as the measured fluxes. Their accurate knowledge should be planned in the design of the new detectors, and thoroughly documented. This will be particularly relevant for future IR missions like SPICA (e.g. van der Tak et al. 2018). ACKNOWLEDGEMENTS I thank Jean-Philippe Bernard, Karl Gordon, Mika Juvela, Martin Kilbinger, and Kirill Tchernyshyov for fruitful discussions about Bayesian techniques and SED modelling, Anthony Jones for sharing his expertise on dust evolution, and Sébastien Fromang and Pierre-François Lavallée for coding advice. Sophia Lianou provided me with several bug reports, running earlier versions of the code. Finally, I thank Sacha Hony, Suzanne Madden, and Ronin Wu, as well as an anonymous referee, for providing me with useful comments that helped clarify the manuscript. I acknowledge support from the EU FP7 project DustPedia (Grant No. 606847), from the Agence Nationale pour la Recherche (ANR) through the program SYMPATICO (Programme Blanc, Projet ANR-11-BS56-0023) and through the PRC program 1311 between CNRS (France) and JSPS (Japan). This work was supported by the Programme National ‘Physique et Chimie du Milieu Interstellaire’ (PCMI) of CNRS/INSU with INC/INP co-funded by CEA and CNES. Footnotes 1 Except in extremely low-metallicity systems, where X-rays could be the dominant heating source (Lebouteiller et al. 2017). 2 Grains with a ≲ 10 nm are usually stochastically heated, and cannot be modelled with a single temperature; grains with a ≳ 0.1 μm have a size dependent equilibrium temperature, due to their grey UV extinction. 3 The lag is the difference between two times (two steps) in the MCMC. 4 List found on astrostatistics.psu.edu/RLectures/hierarchical.pdf 5 https://support.hdfgroup.org/HDF5/ REFERENCES Allamandola L. J., Hudgins D. M., Sandford S. A., 1999, ApJ , 511, L115 https://doi.org/10.1086/311843 CrossRef Search ADS   Andreon S., Hurn M. A., 2010, MNRAS , 404, 1922 Asano R. S., Takeuchi T. T., Hirashita H., Inoue A. K., 2013, Earth Planets Space , 65, 213 https://doi.org/10.5047/eps.2012.04.014 CrossRef Search ADS   Bakes E. L. O., Tielens A. G. G. M., 1994, ApJ , 427, 822 https://doi.org/10.1086/174188 CrossRef Search ADS   Barnard J., McCulloch R., Meng X.-L., 2000, Stat. Sin. , 10, 1281 Bron E., Le Bourlot J., Le Petit F., 2014, A&A , 569, A100 CrossRef Search ADS   Chastenet J., Bot C., Gordon K. D., Bocchio M., Roman-Duval J., Jones A. P., Ysard N., 2017, A&A , 601, A55 CrossRef Search ADS   Chevance M. et al.  , 2016, A&A , 590, A36 CrossRef Search ADS   Compiègne M. et al.  , 2011, A&A , 525, A103 CrossRef Search ADS   Coupeaud A. et al.  , 2011, A&A , 535, A124 CrossRef Search ADS   da Cunha E., Charlot S., Elbaz D., 2008, MNRAS , 388, 1595 https://doi.org/10.1111/j.1365-2966.2008.13535.x CrossRef Search ADS   Dale D. A., Helou G., Contursi A., Silbermann N. A., Kolhatkar S., 2001, ApJ , 549, 215 https://doi.org/10.1086/319077 CrossRef Search ADS   Davies J. I. et al.  , 2017, PASP , 129, 044102 https://doi.org/10.1088/1538-3873/129/974/044102 CrossRef Search ADS   De Cia A., Ledoux C., Savaglio S., Schady P., Vreeswijk P. M., 2013, A&A , 560, A88 CrossRef Search ADS   De Vis P. et al.  , 2017, MNRAS , 471, 1743 https://doi.org/10.1093/mnras/stx981 CrossRef Search ADS   Draine B. T., 2003, ARA&A , 41, 241 CrossRef Search ADS   Draine B. T., Li A., 2007, ApJ , 657, 810 https://doi.org/10.1086/511055 CrossRef Search ADS   Draine B. T. et al.  , 2007, ApJ , 663, 866 https://doi.org/10.1086/518306 CrossRef Search ADS   Fitzpatrick E. L., Massa D., 2007, ApJ , 663, 320 https://doi.org/10.1086/518158 CrossRef Search ADS   Foreman-Mackey D. et al.  , 2013, Astrophysics Source Code Library , record ascl.soft03002 Galametz M., Madden S. C., Galliano F., Hony S., Bendo G. J., Sauvage M., 2011, A&A , 532, A56 CrossRef Search ADS   Galametz M. et al.  , 2016, MNRAS , 456, 1767 https://doi.org/10.1093/mnras/stv2773 CrossRef Search ADS   Galliano F., 2017, Planet. Space Sci. , 149, 38 https://doi.org/10.1016/j.pss.2017.09.006 CrossRef Search ADS   Galliano F., Dwek E., Chanial P., 2008a, ApJ , 672, 214 CrossRef Search ADS   Galliano F., Madden S. C., Tielens A. G. G. M., Peeters E., Jones A. P., 2008b, ApJ , 679, 310 CrossRef Search ADS   Galliano F. et al.  , 2011, A&A , 536, A88 CrossRef Search ADS   Gelman A., Carlin J., Stern H., Rubin D., 2004, Bayesian Data Analysis . Chapman & Hall, Boca Raton Geman S., Geman D., 1984, IEEE Trans. Pattern Anal. Mach. Intell. , 6, 721 https://doi.org/10.1109/TPAMI.1984.4767596 CrossRef Search ADS PubMed  Gordon K. D., Clayton G. C., Misselt K. A., Landolt A. U., Wolff M. J., 2003, ApJ , 594, 279 https://doi.org/10.1086/376774 CrossRef Search ADS   Gordon K. D. et al.  , 2014, ApJ , 797, 85 https://doi.org/10.1088/0004-637X/797/2/85 CrossRef Search ADS   Grenier I. A., Casandjian J.-M., Terrier R., 2005, Science , 307, 1292 https://doi.org/10.1126/science.1106924 CrossRef Search ADS PubMed  Guhathakurta P., Draine B. T., 1989, ApJ , 345, 230 https://doi.org/10.1086/167899 CrossRef Search ADS   Guillet V. et al.  , 2017, preprint (arXiv:1710.04598) Hildebrand R. H., 1983, QJRAS , 24, 267 Hogg D. W., Myers A. D., Bovy J., 2010, ApJ , 725, 2166 https://doi.org/10.1088/0004-637X/725/2/2166 CrossRef Search ADS   Hollenbach D. J., Tielens A. G. G. M., 1997, ARA&A , 35, 179 CrossRef Search ADS   Jones A. P., Köhler M., Ysard N., Bocchio M., Verstraete L., 2017, A&A , 602, A46 CrossRef Search ADS   Juvela M., Montillaud J., Ysard N., Lunttila T., 2013, A&A , 556, A63 CrossRef Search ADS   Kelly B. C., 2011, JCGS , 20, 584 Kelly B. C., Shetty R., Stutz A. M., Kauffmann J., Goodman A. A., Launhardt R., 2012, ApJ , 752, 55 https://doi.org/10.1088/0004-637X/752/1/55 CrossRef Search ADS   Kimura H., 2016, MNRAS , 459, 2751 https://doi.org/10.1093/mnras/stw820 CrossRef Search ADS   Köhler M., Stepnik B., Jones A. P., Guillet V., Abergel A., Ristorcelli I., Bernard J.-P., 2012, A&A , 548, A61 CrossRef Search ADS   Köhler M., Ysard N., Jones A. P., 2015, A&A , 579, A15 CrossRef Search ADS   Le Bourlot J., Le Petit F., Pinto C., Roueff E., Roy F., 2012, A&A , 541, A76 CrossRef Search ADS   Lebouteiller V. et al.  , 2017, A&A , 602, A45 CrossRef Search ADS   Leroy A. K. et al.  , 2011, ApJ , 737, 12 https://doi.org/10.1088/0004-637X/737/1/12 CrossRef Search ADS   Li A., Draine B. T., 2001, ApJ , 554, 778 https://doi.org/10.1086/323147 CrossRef Search ADS   Lisenfeld U., Ferrara A., 1998, ApJ , 496, 145 https://doi.org/10.1086/305354 CrossRef Search ADS   Loredo T. J., Wasserman I. M., 1998, ApJ , 502, 75 https://doi.org/10.1086/305870 CrossRef Search ADS   Mandel K. S., Wood-Vasey W. M., Friedman A. S., Kirshner R. P., 2009, ApJ , 704, 629 https://doi.org/10.1088/0004-637X/704/1/629 CrossRef Search ADS   Markwardt C. B., 2009, in Bohlender D. A., Durand D., Dowler P., eds, ASP Conf. Ser. Vol. 411, Astronomical Data Analysis Software and Systems XVIII . Astron. Soc. Pac., San Francisco, p. 251 Marsh K. A., Whitworth A. P., Lomax O., 2015, MNRAS , 454, 4282 https://doi.org/10.1093/mnras/stv2248 CrossRef Search ADS   Martinez G. D., 2015, MNRAS , 451, 2524 https://doi.org/10.1093/mnras/stv942 CrossRef Search ADS   Mathis J. S., Mezger P. G., Panagia N., 1983, A&A , 128, 212 Moré J. J., Sorensen D. C., Hillstrom K. E., Garbow B. S., 1984, in Cowell W. J., ed., The MINPACK Project, in Sources and Development of Mathematical Software , Prentice-Hall, p. 88 Paradis D. et al.  , 2010, A&A , 520, L8 CrossRef Search ADS   Planck Collaboration XXIV, 2011a, A&A , 536, A24 CrossRef Search ADS   Planck Collaboration XVI, 2011b, A&A , 536, A16 CrossRef Search ADS   Planck Collaboration XIX, 2011c, A&A , 536, A19 CrossRef Search ADS   Planck Collaboration XVII, 2014a, A&A , 566, A55 CrossRef Search ADS   Planck Collaboration VIII, 2014b, A&A , 571, A8 CrossRef Search ADS   Press W. H., Teukolsky S. A., Vetterling W. T., Flannery B. P., 2007, Numerical Recipes , 3rd edn: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge Rémy-Ruyer A. et al.  , 2014, A&A , 563, A31 CrossRef Search ADS   Röllig M. et al.  , 2007, A&A , 467, 187 CrossRef Search ADS   Roman-Duval J. et al.  , 2014, ApJ , 797, 86 https://doi.org/10.1088/0004-637X/797/2/86 CrossRef Search ADS   Roy A. et al.  , 2013, ApJ , 763, 55 https://doi.org/10.1088/0004-637X/763/1/55 CrossRef Search ADS   Serra P., Amblard A., Temi P., Burgarella D., Giovannoli E., Buat V., Noll S., Im S., 2011, ApJ , 740, 22 https://doi.org/10.1088/0004-637X/740/1/22 CrossRef Search ADS   Shahmoradi A., 2017, preprint (arXiv:e-prints) Shetty R., Kauffmann J., Schnee S., Goodman A. A., 2009, ApJ , 696, 676 https://doi.org/10.1088/0004-637X/696/1/676 CrossRef Search ADS   Siebenmorgen R., Voshchinnikov N. V., Bagnulo S., 2014, A&A , 561, A82 CrossRef Search ADS   Sokal A., 1997, in DeWitt-Morette C., Cartier P., Folacci A., eds, Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms. Functional Integration. NATO ASI Series (Series B: Physics), vol 361 . Springer. Boston, MA Stepnik B. et al.  , 2003, A&A , 398, 551 CrossRef Search ADS   van der Tak F. F. S. et al.  , 2018, PASA , 35, 2 CrossRef Search ADS   Veneziani M., Piacentini F., Noriega-Crespo A., Carey S., Paladini R., Paradis D., 2013, ApJ , 772, 56 https://doi.org/10.1088/0004-637X/772/1/56 CrossRef Search ADS   Villani M., Larsson R., 2006, Commun. Stat. - Theory Methods , 35, 1123 https://doi.org/10.1080/03610920600672252 CrossRef Search ADS   Weingartner J. C., Draine B. T., 2001, ApJS , 134, 263 https://doi.org/10.1086/320852 CrossRef Search ADS   Wolfgang A., Lopez E., 2015, ApJ , 806, 183 https://doi.org/10.1088/0004-637X/806/2/183 CrossRef Search ADS   Wraith D., Kilbinger M., Benabed K., Cappé O., Cardoso J.-F., Fort G., Prunet S., Robert C. P., 2009, Phys. Rev. D , 80, 023507 https://doi.org/10.1103/PhysRevD.80.023507 CrossRef Search ADS   Wu R. et al.  , 2015, A&A , 575, A88 CrossRef Search ADS   Ysard N., Köhler M., Jones A., Miville-Deschênes M.-A., Abergel A., Fanciullo L., 2015, A&A , 577, A110 CrossRef Search ADS   Yu Y., Meng X.-L., 2011, JCGS , 20, 531 Zafar T., Watson D., 2013, A&A , 560, A26 CrossRef Search ADS   Zhukovska S., 2014, A&A , 562, A76 CrossRef Search ADS   Zubko V., Dwek E., Arendt R. G., 2004, ApJS , 152, 211 https://doi.org/10.1086/382351 CrossRef Search ADS   APPENDIX A: MODEL COMPUTATION AND INVERSION Calculating the full SED model at each iteration of the MCMC would be prohibitive, in terms of computing time. As discussed in Section 2.3, we rather interpolate a finely sampled grid of pre-computed templates. A1 The pre-computed model grid For each model component presented in Section 2.2, we generate a grid of specific luminosities, $$l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$, as a function of each parameter xk. We separate components that can be linearly combined. For instance, for the deltaU and powerU components, we compute separate grids for each grain sub-species: neutral and ionized PAHS, small and large carbon grains, small and large silicates. The frequencies νj belong to a list of the most widely used IR photometric filters. Integration of the SED model into the filter bandpass and colour corrections are thus included in $$l_\nu ^{{{\rm mod}}}$$. To build the templates, we start from a coarse parameter grid, and add a mid-point if the linear interpolation of $$\ln l_\nu ^{{{\rm mod}}}$$ is less accurate than 10−3. We repeat this process until no more mid-points need to be added. A2 SED model evaluation At each iteration of the MCMC, we evaluate the template by performing a multidimensional linear interpolation of $$\ln l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$, as a function of $${\boldsymbol x}$$. By construction of the grid, we know this interpolation will be more accurate than 10−3. A3 SED model inversion The model inversion used by ASIS (Section 3.3.1) is also performed with a multidimensional linear interpolation. The only difference here is that we now interpolate the parameter we are looking for, $$x_{k^\prime }$$, as a function of $$\ln l_\nu ^{{{\rm mod}}}({\boldsymbol x},\nu _j)$$ and of the other parameters, $$x_{k\ne k^\prime }$$. APPENDIX B: ADDITIONAL ELEMENTS ON THE HB METHOD B1 The split-normal distribution The split-normal distribution (Villani & Larsson 2006, used in Section 3.2.3) is defined as   \begin{eqnarray} p(x) = \left\{ \begin{array}{lc} A \times \exp \left[-\frac{1}{2}\left(\displaystyle\frac{x-x_0}{\lambda }\right)^2\right] &\quad {\rm if }\ x \le x_0\\ A \times \exp \left[-\frac{1}{2}\left(\displaystyle\frac{x-x_0}{\lambda \tau }\right)^2\right] &\quad {\rm if }\ x > x_0, \end{array}\right. \end{eqnarray} (B1)where   \begin{equation} A = \sqrt{\frac{2}{\pi }}\frac{1}{\lambda (1+\tau )}, \end{equation} (B2)and x0 is a position parameter, λ, a scale parameter, and τ, a shape parameter. These parameters are linked to the mean, μ, standard deviation, σ, and skewness, γ1, through the following set of equations:   \begin{equation} b = \frac{\pi -2}{\pi } (\tau -1)^2+\tau , \end{equation} (B3)  \begin{equation} \mu = x_0 + \sqrt{\frac{2}{\pi }}\lambda (\tau -1), \end{equation} (B4)  \begin{equation} \sigma = \sqrt{b}\lambda ^2, \end{equation} (B5)  \begin{equation} \gamma _1 = b^{-3/2}\sqrt{\frac{2}{\pi }}(\tau -1)\times \left[\left(\frac{4}{\pi }-1\right)(\tau -1)^2+\tau \right]. \end{equation} (B6)There is no simple inversion. We therefore solve τ numerically from equation (B6), and then inverse equations (B4) and (B5). B2 The Gibbs sampling Gibbs sampling consists in drawing each parameter, xk, one by one, from its conditional distribution, fixing the other parameters to their current value in the MCMC, $$p(x_k|x_{k^\prime \ne k})$$. In practice, this distribution is a complex function, depending on the SED model. We numerically perform this drawing as follows. Let us assume that parameter xk is bound to the interval $$[x_k^{{{min}}},x_k^{{{max}}}]$$. We first compute the cumulative distribution function as   \begin{equation} F(x_k|x_{k^\prime \ne k}) = \frac{\int _{x_k^{{{min}}}}^{x_k} p(y|x_{k^\prime \ne k}) {\,\rm d}y}{F(x_k^{{{max}}}|x_{k^\prime \ne k})}. \end{equation} (B7)We then draw a random variable θ, uniformly distributed between 0 and 1. The updated value of xk is derived by solving $$F(x_k|x_{k^\prime \ne k})=\theta$$, by linear interpolation of the xk grid as a function of ln F. Gibbs sampling is more adapted to our problem than the rejection strategy of the Metropolis–Hastings method (e.g. Foreman-Mackey et al. 2013) or than importance sampling (e.g. Wraith et al. 2009). Indeed, the latter methods are not efficient when the number of dimensions, Ndim (equation 32), becomes large. Several runs presented in this paper lead to Ndim ≃ 7000. To provide an accurate integration of equation (B7) in a reasonable computing time, we implement the following adaptative grid. We start from a regular grid of N(0) = 10 values, $$x_k^{(t)}$$, with t =1, …, N(0), spanning the whole range of the interval $$[x_k^{{{min}}},x_k^{{{max}}}]$$. We compute the normalization of equation (B7), $$F^{(0)}(x_k^{{{max}}}|x_{k^\prime \ne k})$$, as the sum of elemental trapeziums, ΔF(0)(x(t)). We then add a mid-point in each interval and compute the new trapeziums $$\Delta F^{(1)}(x_k^{(t)})$$. We stop the refinement of element t if   \begin{eqnarray} &&|\Delta F^{(1)}(x_k^{(t)}) +\Delta F^{(1)}\left(\frac{x_k^{(t)}+x_k^{(t+1)}}{2}\right)\nonumber\\ &&\quad -\Delta F^{(0)}(x_k^{(t)})| \le 10^{-3}\times \frac{\sum _{t^\prime =1}^{N^{(1)}} \Delta F^{(1)}(x_k^{(t^\prime )})}{\sqrt{N^{(1)}}}. \end{eqnarray} (B8)We iterate until no more refinement is needed. Most of the drawings require between 50 and 200 points. B3 Consistent treatment of external parameters There are several non-dusty parameters that we may want to correlate with the grain properties, like the gas column density, the metallicity, etc. In principle, we could study these extra correlations afterwards, using the final results of the MCMC. However, including these parameters within the MCMC sampler, as part of our hierarchical model, will help improve the correlation, as they will share a common prior distribution with the model parameters. Poorly constrained grain parameters could benefit from using these additional observations. Let us decompose the parameters of a source si into q1SED model parameters and q2external parameters:   \begin{eqnarray} {\boldsymbol x}_i = \left(\left\lbrace x_{i,k}^{{\rm {model}}}\right\rbrace _{k=1,q_1}, \left\lbrace x_{i,k}^{{{\rm extra}}}\right\rbrace _{k=1,q_2}\right) \ \,{\rm with }\ q=q_1+q_2.\nonumber\\ \end{eqnarray} (B9)For instance, $$\lbrace x_{i,k}^{{\rm{model}}}\rbrace _{k=1,q_1}$$ could be {ln Mi, ln Ti, βi}, (q1 = 3) and $$\lbrace x_{i,k}^{{\rm{extra}}}\rbrace _{k=1,q_2}$$ could be $$\lbrace \ln M^{{\rm H\,\,{\small I}}}_i,Z_i\rbrace$$ (q2 = 2). The SED parameters are constrained by the observed SED. The extra parameters are fixed in principle, as they have been estimated from an independent method. However, including them in the MCMC sampler will lead to a modification of their value consistent with their uncertainties. Let us note their observed value Qi, k, with an uncertainty $$\sigma _{i,k}^{{\rm{extra}}}$$. The likelihood of the extra parameter k, of a source si, is simply   \begin{equation} p\left(x_{i,k}^{\rm{extra}}|Q_{i,k}\right) = \mathcal {P}\left(Q_{i,k},\left(\sigma _{i,k}^{{\rm{extra}}}\right)^2\right), \end{equation} (B10)where $$\mathcal {P}(\bar{y},\sigma _y^2)$$ is one of the noise distributions of Section 3.2.3, with mean $$\bar{y}$$ and variance $$\sigma _y^2$$. The posterior distribution of this extra parameter is then   \begin{eqnarray} &p&\left(\!x_{i,k}^{\rm{extra}}|Q_{i,k},{\boldsymbol x}_i^{\rm{model}}, x^{\rm{extra}}_{i,k^\prime \ne k},{\boldsymbol \mu },\Sigma \right) \propto p\left(x_{i,k}^{{\it{extra}}}|Q_{i,k}\right)\nonumber\\ &&\times p\left({\boldsymbol x}_i|{\boldsymbol \mu }, \Sigma \right) \times p\left({\boldsymbol \mu }, \Sigma\right). \end{eqnarray} (B11)We sample these extra parameters, at each iteration of the MCMC, from equation (B11). APPENDIX C: GENERALIZED LEAST SQUARES WITH CORRELATED ERRORS We have developed a chi-squared minimization SED fitter, having the same interface as our HB code. It can fit any linear combination of the individual SED components of Section 2.2 to an observed SED. It is based on the MINPACK (Moré et al. 1984) Levenberg–Marquardt method, that we have converted in Fortran 90. We have added the possibility to limit, fix and tie parameters, similarly to what Markwardt (2009) did in IDL. C1 Total covariance matrix and chi-squared In most of our cases, the uncertainties on the observed fluxes, $${\boldsymbol L}_\nu ^{\rm{obs}}$$, come from two terms: (i) the noise, $${\boldsymbol \sigma }_\nu ^{\rm{noise}}$$, which is uncorrelated between frequencies, with diagonal covariance matrix $$\mathcal {V}_{\rm noise}$$; and (ii) correlated systematic calibration uncertainties, $${\boldsymbol \sigma }_\nu ^{\rm{syst}}$$, with non-diagonal covariance matrix $$\mathcal {V}_{\rm syst}$$. The total covariance matrix is simply   \begin{equation} \mathcal {V}_{\rm obs} = \mathcal {V}_{\rm noise} + \mathcal {V}_{\rm syst}. \end{equation} (C1) Properly taking into account the different sources of uncertainties requires to account for correlated terms by minimizing the ‘generalized least squares’ problem:   \begin{equation} \chi ^2 = {\boldsymbol r}^T\,\mathcal {V}_{\rm obs}^{-1}\,{\boldsymbol r}, \end{equation} (C2)where $${\boldsymbol r} = {\boldsymbol L}_\nu ^{{\rm{obs}}}-{\boldsymbol L}_\nu ^{{\rm{mod}}}$$, is the residual between the observations and the model $${\boldsymbol L}_\nu ^{{\rm{mod}}}$$. C2 Adapting the Levenberg–Marquardt algorithm The Levenberg–Marquardt algorithm (e.g. Press et al. 2007; Markwardt 2009, hereafter LM) is aimed at finding the best parameter vector $${\boldsymbol x}$$ minimizing the χ2, starting from an initial estimate $${\boldsymbol x}^{(0)}$$. Computing, for each new value of $${\boldsymbol x}^{(n+1)}={\boldsymbol x}^{(n)}+{\boldsymbol \delta x}^{(n+1)}$$, the Jacobian matrix of the model, at $${\boldsymbol x}^{(n)}$$:   \begin{equation} J_{k,j} = \frac{\mathrm{\partial} L_{\nu ,j}^{\rm{mod}}(x_k)}{\mathrm{\partial} x_k}, = - \frac{\mathrm{\partial} w_j}{\mathrm{\partial} x_k}, \end{equation} (C3)the LM method solves:   \begin{equation} \left(\mathcal {J}^T\,\mathcal {J}\right)\,{\boldsymbol \delta x} = \mathcal {J}^T\,{\boldsymbol w}, \end{equation} (C4)where the normalized residual is $${\boldsymbol w}={\boldsymbol r}/{\boldsymbol \sigma }_\nu ^{{\rm{obs}}}$$, noting the total (uncorrelated) uncertainty:   \begin{equation} {\boldsymbol \sigma }_\nu ^{{\rm{obs}}} = \sqrt{\left({\boldsymbol \sigma }_\nu ^{{\rm{noise}}}\right)^2 + \left({\boldsymbol \sigma }_\nu ^{{\rm{syst}}}\right)^2}. \end{equation} (C5)This vector does not account for correlated terms. The classical χ2, for non-correlated uncertainties, is $$\chi ^2=|{\boldsymbol w}|^2$$. When introducing correlated errors, the new problem to solve becomes   \begin{equation} \left(\mathcal {J}^T\,\mathcal {V}_{\rm obs}^{-1}\,\mathcal {J}\right)\,{\boldsymbol \delta x} = \mathcal {J}^T\,\left(\mathcal {V}_{\rm obs}^{-1}\,{\boldsymbol r}\right). \end{equation} (C6)The Cholesky decomposition (e.g. Press et al. 2007) of the covariance matrix results in a lower triangle matrix, $$\mathcal {L}$$, such that $$\mathcal {V}_{\rm obs} = \mathcal {L}\,\mathcal {L}^{\rm T}$$. This decomposition is always possible as the covariance matrix is positive-definite. We also have $$\mathcal {V}_{\rm obs}^{-1} = \left(\mathcal {L}^{-1}\right)^T\,\mathcal {L}^{-1}$$. The new system to be solved is   \begin{equation} \mathcal {\mathcal {J}}^{\rm T}\,\mathcal {\mathcal {J}}\,{\boldsymbol x} = \mathcal {\mathcal {J}}^{\rm T}\,{\boldsymbol t}, \end{equation} (C7)where $$\mathcal {\mathcal {J}} = L^{-1}\,\mathcal {J}$$ and $${\boldsymbol t}=L^{-1}{\boldsymbol r}$$. In practice, to account for correlated uncertainties in the LM method, one just needs to perform the following steps. Compute once and for all, at the beginning (before the iterations), the Cholesky decomposition of the covariance matrix, to obtain $$\mathcal {L}$$. Compute once and for all the inverse $$\mathcal {L}^{-1}$$. Set the output of the user-defined function in MINPACK, returning the weighted residual, to $${\boldsymbol t}=\mathcal {L}^{-1}\,{\boldsymbol r}$$ instead of $${\boldsymbol w}$$. C3 Uncertainties on derived quantities One of the product of the Levenberg–Marquardt method is the covariance matrix $$\mathcal {V}_{\rm par}$$ of the parameters $${\boldsymbol x}$$. We may want to also compute the uncertainties of quantities derived from these parameters, $$f({\boldsymbol x})$$. In the case of SED fitting, such quantities can be the total luminosity, the average starlight intensity, etc. The error on any function of the parameters is   \begin{equation} \sigma ^2_{f({\boldsymbol x})} = \left({ \nabla }f({\boldsymbol x})\right)^T \mathcal {V}_{\rm par}\,{\boldsymbol \nabla }f({\boldsymbol x}). \end{equation} (C8)For example if we have only two parameters, $${\boldsymbol x}=(a\,b)$$, then   \begin{equation} {\boldsymbol \nabla }f=\left( \begin{array}{c}\displaystyle\frac{\mathrm{\partial} f}{\mathrm{\partial} a} \\ \displaystyle\frac{\mathrm{\partial} f}{\mathrm{\partial} b} \end{array} \right) \quad {\rm and} \quad \mathcal {V}_{\rm par}=\left( \begin{array}{c c}\sigma _a^2 \quad& \rho \sigma _a\sigma _b \\ \rho \sigma _a\sigma _b \quad& \sigma _b^2 \end{array} \right), \end{equation} (C9)and equation (C8) gives the usual expression:   \begin{eqnarray} \sigma ^2_{f({\boldsymbol x})} = \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} a}\right)^2\sigma _a^2 + \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} b}\right)^2\sigma _b^2 + 2\left(\frac{\mathrm{\partial} f}{\mathrm{\partial} a}\right) \left(\frac{\mathrm{\partial} f}{\mathrm{\partial} b}\right)\rho \sigma _a\sigma _b.\nonumber\\ \end{eqnarray} (C10)Similarly, the covariance of two functions of the parameters $$f({\boldsymbol x})$$ and $$g({\boldsymbol x})$$ is   \begin{equation} {\rm cov}(f({\boldsymbol x}),g({\boldsymbol x})) = \left({\boldsymbol \nabla }f({\boldsymbol x})\right)^T\mathcal {V}_{\rm par} {\boldsymbol \nabla }g({\boldsymbol x}). \end{equation} (C11) C4 Monte Carlo initial conditions To avoid the LM algorithm to converge towards a local minimum, we generate Nini = 30 random initial conditions, drawn from a uniform distribution over the whole range of parameters. We then run the LM code from these initial conditions and keep only the parameters corresponding to the lowest of these Nini minimum chi-squared. APPENDIX D: COMPUTING TIMES FOR THE LEAST-SQUARES AND BAYESIAN METHODS Let us assume that we have a large number of sources, n ≫ 1, and that the bottleneck in terms of CPU is the calculation of one SED model at all the photometric filters, $${\boldsymbol L}_\nu ^{{\rm{mod}}}({\boldsymbol x};{\boldsymbol \nu })$$. The LM method (Appendix C) typically performs Niter ≃ 50–150 iterations. At each iteration, the model is evaluated n × q times to compute the gradient, by finite differences on each one of the q parameters. Since we run Nini = 30 initial conditions (Appendix C4), the number of model calculations is   \begin{equation} N_{\chi 2} \simeq N_{\rm ini}\times N_{\rm iter}\times n \times q. \end{equation} (D1) The parameter space for the HB method (equation 32) becomes Ndim ≃ n × q, when n ≫ 1. This is the number of Gibbs samplings of the parameters, per iteration. Each Gibbs sampling requires Nsamp ≃50–200 points (Appendix B2). For a chain of length NMCMC, the number of model calculations is thus   \begin{equation} N_{\rm HB} \simeq N_{\rm MCMC}\times n\times q\times N_{\rm samp}. \end{equation} (D2)Noting that Niter ≃ Nsamp, the ratio of χ2 and HB CPU times is   \begin{equation} \frac{{\rm CPU}_{\rm HB}}{{\rm CPU}_{\chi ^2}} \simeq \frac{N_{\rm MCMC}}{N_{\rm ini}}\simeq {3\times 10^{4}}. \end{equation} (D3)This approximation gives the correct order of magnitude. For instance, the central simulation of Section 4.1 had a ratio:   \begin{equation} \frac{{\rm CPU}_{\rm HB}}{{\rm CPU}_{\chi ^2}}\simeq \frac{3\,{\rm weeks}}{90\,{s}} \simeq 2\times 10^{4}. \end{equation} (D4) The simulations presented in this paper add up to a total of ≃105  h of CPU. It points out that the use of the HB method represents an important investment in terms of computing time. In terms of storage, the MCMCs of all the simulations in this study add up to ≃1 Tb of HDF55 files. APPENDIX E: ROUND-OFF ERRORS Round-off errors could be a potential problem for such a code, as it requires a large number of operations (equation 32). Round-off errors are however extremely difficult to track. We note that all the tests we performed in this manuscript lead to a good agreement with the true values of the parameters. Thus, round-off errors, if any, do not significantly affect the results. The reason is likely due to the stochasticity of the HB code. Indeed, each step of the MCMC is only weakly dependent on the accuracy of the previous step. In order to quantify the typical order of magnitude of rounding errors we have re-run the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3) in single precision (SP), while the rest of the runs in this paper were all performed in double precision (DP). If round-off errors were important, we would notice a significant difference between the results with the two precisions. Fig. E1 shows the distribution of the difference of the model parameters of every source, xi, in both precisions, divided by the uncertainty on these parameters, σi, in SP. It shows that for any parameter, reducing the floating number precision, from DP to SP, results in an absolute difference of only a few per cent of its uncertainty. The actual round-off errors of the DP parameters are expected to be smaller than this value, as the precision is higher than SP. Thus these errors can reasonably be considered negligible. Figure E1. View largeDownload slide Test on rounding errors. This is the distribution of the difference between the inferred model parameters of each source i, xi, in SP and in DP, divided by the uncertainty on the parameters, σi, in SP. This test is performed on the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3). Figure E1. View largeDownload slide Test on rounding errors. This is the distribution of the difference between the inferred model parameters of each source i, xi, in SP and in DP, divided by the uncertainty on the parameters, σi, in SP. This test is performed on the central simulation of Section 4.1 (warm SED, n = 100, fS/N = 3). © 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society

Journal

Monthly Notices of the Royal Astronomical SocietyOxford University Press

Published: May 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off