TY - JOUR AU - Falkendal,, T AB - Abstract We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from a heterogeneous data set (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly versatile fitting tool for handling increasing source complexity when combining multi-wavelength data sets with fully customisable filter/model data bases. The complete control of the user is one advantage, which avoids the traditional problems related to the ‘black box’ effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially generated data sets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA, and VLA data) in the context of extragalactic SED fitting makes Mr-Moose a particularly attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution. methods: statistical, quasars: general, infrared: general, radio continuum: general, submillimetre: general 1 INTRODUCTION Confronting observations and models of our Universe is the very core of any fitting procedure. However, the fitting process faces an increasing number of challenges to adapt and compare the wide variety of models and observations. From the modelling side, generations of large libraries of highly non-linear physical processes allow for a more accurate description of the complexity of the physical processes at work in the sources. From the observer's side, new instruments and facilities continue to push further resolution – both spectrally and spatially – and sensitivity over the electromagnetic spectrum. The principle of fitting appears at the frontiers of these two sides and interfacing these two aspects optimally is crucial. How to obtain maximal information from the data (which can vary a lot on quality and quantity) and compare it to the most optimal models available? By optimal here, we define the model which will represent data with the most fidelity or complexity without going in the regime of ‘overfitting’, where degeneracies between parameters dominate, and therefore, no meaningful constraints can be extracted from the data set (usually keeping in mind the Ockham razor principle). This optimal regime can be quantified thanks to the Bayes theorem, where likelihood of different models can be compared to find the relative likelihood probability of one model compared to another. The particular case of spectral energy distribution (SED) fitting is an old problem and can be traced as far as the 1960s (e.g. Tinsley 1968). This technique has seen a considerable development starting at the end of the twentieth century and is now widely used in most fields of Astrophysics: from stars in our Galaxy to very distant galaxies during the epoch of reionization. As data throughout the electromagnetic spectrum became readily available, more and more complex models were developed, such as PEGASE (Fioc & Rocca-Volmerange 1997), Maraston (1998), Starburst99 (Leitherer et al. 1999), Bruzual & Charlot (2003), MAPPINGS III (Groves, Dopita & Sutherland 2004), to predict galaxy emission and (Pier & Krolik 1992; Fritz, Franceschini & Hatziminaoglou 2006; Hönig et al. 2006; Nenkova et al. 2008; Stalevski et al. 2012; Siebenmorgen, Heymann & Efstathiou 2015) for AGN emission to name the most widely used. Note that these codes do not contain any fitting routines, their main intent being to provide the user with a vast library of templates exploring a large, often non-linear parameter space. As a result, SED fitting techniques remained primitive, mainly using minimizing algorithms and regression such as least-square or chi-square minimization. Since 2000, we have seen a rise of tools implementing more sophisticated statistical frameworks in order to explore degeneracies in the increasing complexity of the SED fitting. It is important here to distinguish the two complementary aspects of SED fitting: the models used to predict the SED given a set of parameters and assumptions (Conroy 2013, for a recent review) and the algorithm used to estimate the parameters on data (the fitting itself; see Sharma 2017, for a review on the different available algorithms). During the last decade, these two aspects were often combined to provide users off-the-shelf SED fitting procedures with a varying wavelength range: Magphys (da Cunha, Charlot & Elbaz 2008) for galaxy evolution in the UV-FIR range, BayesClumpy (Asensio Ramos & Ramos Almeida 2009) for AGN torus fitting in the UV-FIR range, CIGALE (Noll et al. 2009) for galaxy evolution with AGN implementation in the UV-FIR range, BRATS (Harwood et al. 2013) specialized for multi-radio frequency fitting in resolved sources, SEDfit (Sawicki 2012) for galaxy evolution in optical/NIR implementing spatially resolved sources, SEDeblend (MacKenzie, Scott & Swinbank 2016) for SED fitting in the context of lensed source with FIR observations, AGNFitter (Calistro Rivera et al. 2016) focusing on the AGN component in the UV-FIR range, etc. to name the most widely used in extragalactic astronomy.1 In an extragalactic context, most of these codes such as Magphys, CIGALE, BayesClumpy or AGNFitter rely on a relatively strong assumption: all the emission comes from a single unresolved source and cross-identification at different wavelength is trivial. This simplification, while being necessary to enable the interpretation of extended multi-wavelength coverage (typically from UV to FIR) and to make use of the numerous source catalogues available in literature, is also one of their most important shortcomings in the advent of high-resolution imaging across the electromagnetic spectrum. Indeed, as the wavelength range increases, resolution and sensitivity change drastically, and the assumption of the single source is no longer valid. Also, the other codes such as BRATS, SEDeblend, or SEDfit are specialized in relatively short wavelength range and/or very specific applications: BRATS for spectral index mapping in radio, SEDeblend for lensed sources in the FIR domain, and SEDfit in the optical/NIR domain for resolved galaxies. None of the aforementioned codes present the possibility to fit partially resolved sources, from NIR to radio. To cite a more specific example as presented in Gullberg et al. (2016): fitting simultaneously Spitzer  (3–160 |$\mu$|m; Werner et al. 2004), Herschel  (70–500 |$\mu$|m; Pilbratt et al. 2010), ALMA (100–700 GHz), and VLA data for distant, partially resolved galaxies illustrates exactly this point. On one hand, the resolution of Herschel does not allow us to properly disentangle different spatial components (blending) but provides five data points in spectral space, particularly covering the thermal dust IR peak of the SED. One the other hand, ALMA is particularly suited to answer to the question of the spatial location, at the cost of smaller spectral coverage. Also, the heating sources of the dust emitting at 70 |$\mu$|m and 3 mm are different (AGN or young stars) but are linked through the total energy emitted and their location. The same type of reasoning can be applied to the AGN radio emission (ν < 100 GHz, from the VLA for instance), often unresolved at low frequency and presenting multi-components at higher frequencies (core, jets, hotspots, and lobes). In the case of radio galaxies, these last two aspects are even co-existing for a single source. Therefore, as these observations are complementary, each providing valuable constraints spectrally and spatially on different overlapping components (some with non-detections), why not combining all information together? This is where Mr-Moose intervenes, providing a framework to simultaneously combine spatial and spectral information into a single fitting routine. The paper is organized as follows: we explain the methodology and structure of the code as well as the description of the input and output files in Section 2. Section 3 presents three examples of application of Mr-Moose. In Section 4, we discuss the limitations and future developments and finally conclude in Section 5. 2 DESIGN AND STRUCTURE OF Mr-Moose Mr-Moose stands for Multi-esolution and multi-object/rigin spectral energy distribution fitting procedure. The philosophy when designing Mr-Moose was to provide a tool for SED fitting, sufficiently simple to be used by non-SED fitting experts, yet sufficiently versatile to handle as many cases as possible, particularly in the context of multi-wavelength samples. The main drivers for the design are as follows: to handle data with a large span of resolutions to combine interferometer/single-dish observations, to deal with upper limits consistently, allowing a fit with gradually decreasing probability when approaching an upper limit rather than an arbitrary cut at 3σ, to treat the fitting in a Bayesian framework to provide asymmetric uncertainties on parameters and to easily identify degeneracies as well as compare different model combinations, to allow a wide variety of models as input, analytic models for this first release, and/or libraries for non-linear models in future updates. Mr-Moose is written in Python 2.7, developed in PyCharm Community Edition and complies to the PEP8 writing standard. It relies as much as possible on already existing packages (see full list in the Appendix) to further increase robustness. The code was designed to be user friendly but not user opaque. A large (and necessary) control of the data, model, and fit setting inputs is allowed to test the reliability of the output and therefore avoid the ‘black-box’ effect. However, the code aims to be sufficiently simple to be used with a minimal statistical and computational knowledge. We first present the core of the fitting procedure, the maximum likelihood estimation, and review the input and output files. Table 1 summarizes the different file extensions used in Mr-Moose. We also report a simplified flowchart of the code structure in Fig. 1 and refer the user to a more detailed flowchart in the GitHub repository.2 Figure 1. Open in new tabDownload slide Flow chart of Mr-Moose. Figure 1. Open in new tabDownload slide Flow chart of Mr-Moose. Table 1. Summary of the convention for the file extensions, as used in Mr-Moose. Extension Use Description .fit Input File containing the settings for the fit. .dat Input File containing the data. .mod Input Model file containing the function, parameters, and their considered range. .fil Input Filter files, containing the transmission curves. .sav Output File containing the information from the fit, saved for future use, and storing the settings for the fit. .sed Output File containing the flux per frequency (in Hz) of the best fit for all components of the SED in unit of erg s−1cm−2Hz−1 .pdf Output Output files containing the plots. .fits Output Modelization of the source(s) in each filter from the .dat file. .py ... Mr-Moose files containing the procedures. Extension Use Description .fit Input File containing the settings for the fit. .dat Input File containing the data. .mod Input Model file containing the function, parameters, and their considered range. .fil Input Filter files, containing the transmission curves. .sav Output File containing the information from the fit, saved for future use, and storing the settings for the fit. .sed Output File containing the flux per frequency (in Hz) of the best fit for all components of the SED in unit of erg s−1cm−2Hz−1 .pdf Output Output files containing the plots. .fits Output Modelization of the source(s) in each filter from the .dat file. .py ... Mr-Moose files containing the procedures. Open in new tab Table 1. Summary of the convention for the file extensions, as used in Mr-Moose. Extension Use Description .fit Input File containing the settings for the fit. .dat Input File containing the data. .mod Input Model file containing the function, parameters, and their considered range. .fil Input Filter files, containing the transmission curves. .sav Output File containing the information from the fit, saved for future use, and storing the settings for the fit. .sed Output File containing the flux per frequency (in Hz) of the best fit for all components of the SED in unit of erg s−1cm−2Hz−1 .pdf Output Output files containing the plots. .fits Output Modelization of the source(s) in each filter from the .dat file. .py ... Mr-Moose files containing the procedures. Extension Use Description .fit Input File containing the settings for the fit. .dat Input File containing the data. .mod Input Model file containing the function, parameters, and their considered range. .fil Input Filter files, containing the transmission curves. .sav Output File containing the information from the fit, saved for future use, and storing the settings for the fit. .sed Output File containing the flux per frequency (in Hz) of the best fit for all components of the SED in unit of erg s−1cm−2Hz−1 .pdf Output Output files containing the plots. .fits Output Modelization of the source(s) in each filter from the .dat file. .py ... Mr-Moose files containing the procedures. Open in new tab 2.1 The maximum likelihood estimation in Mr-Moose The code works in a Bayesian framework, likelihood calculation is central in identifying the most representative models to the data, and provides marginalized probability density functions for each parameter in order to estimate uncertainties. Following the Bayes theorem written in terms relevant for SED fitting, the probability of a model M given the data D can be written as: \begin{eqnarray} P(M|D) = \frac{P(M) * P(D|M)}{P(D)}, \end{eqnarray} (1) where P(M|D) is the posterior distribution (the probability of M after observing D), P(M) is the prior probability (the probability of M before observing D), P(D|M) is the likelihood (the probability of observing D given M), and P(D) is the model evidence (a measure of how well the model M predicts the data D and can be considered as a normalization factor to the problem, often non-trivial to estimate). Therefore, P(M|D) is the final parameter distribution (that we are looking for) and P(D|M) is likelihood to be determined and to maximize (or minimize given our estimator, a slightly modified version of the goodness-of-fit). This maximum likelihood (abbreviated P onwards) corresponds to the product of all probabilities of the model given the data and can be written as: \begin{eqnarray} P \propto \prod _{i}P_{i}\prod _{j}P_{j}, \end{eqnarray} (2) with i the detections and j the upper limits. We stress that the priors on the model parameters are set to uniform probability density functions (corresponding to uninformative priors). Each individual probability distribution, respectively for detection and upper limit (following a normal distribution), is defined as follows: \begin{eqnarray} P_{i} \propto {\rm exp} \left[-\frac{1}{2}\left( \frac{S_{i}^{\rm data} - S_{i}^{\rm model}}{\sigma _i} \right)^2 \right]\Delta S_{i}, \end{eqnarray} (3) \begin{eqnarray} P_{j} \propto \int _{-\infty }^{S_{{\rm lim},j}^{\rm data}} {\rm exp} \left[-\frac{1}{2}\left( \frac{S_{j} - S_{j}^{\rm model}}{\sigma _j} \right)^2 \right]{\rm d}S_{j}, \end{eqnarray} (4) where |$S_{i}^{\rm data}$| is the measured flux and σi, its standard deviation, |$S_{i}^{\rm model}$| is the predicted flux, all in filter i for a detection and j for a non-detection. Note the ΔSi as being the data offset from the true value of |$S_{i}^{\rm data}$|⁠, infinitesimally small in the case of detection, and dSj referring to the integrated flux probability (Sj) to the detection threshold |$S_{{\rm lim},j}^{\rm data}$| (see appendix 1, their fig. 6, Sawicki 2012, for a graphical representation). Taking the logarithm of equation (2) and injecting equations (3) and (4), we obtain: \begin{eqnarray} P \propto &&-\frac{1}{2} \sum _{i} \left( \frac{S^{\rm data}_{i}-S^{\rm model}_{i}}{\sigma _{i}} \right)^2 + \sum _{i} {\rm ln} \Delta S_{i} \nonumber\\ &&+\, \sum _{j} {\rm ln} \int _{-\infty }^{S_{{\rm lim},j}^{\rm data}} {\rm exp} \left[-\frac{1}{2}\left( \frac{S_{j} - S_{j}^{\rm model}}{\sigma _j} \right)^2 \right]{\rm d}S_{j}. \end{eqnarray} (5) The second term of the right-hand side of the equation being constant, maximizing the probability, P, corresponds to minimizing the following estimator: \begin{eqnarray} \chi ^2 &= & \sum _{i} \left( \frac{S^{\rm data}_{i}-S^{\rm model}_{i}}{\sigma _i} \right)^2 \nonumber\\ &&-\,2 \sum _{j} \ln \left\lbrace \sqrt{\frac{\pi }{2}} \sigma _{j} \left[1 + {\rm erf}\left(\frac{S^{\rm data}_{{\rm lim},j}-S^{\rm model}_{j}}{\sqrt{2}\sigma _{j}}\right)\right] \right\rbrace . \end{eqnarray} (6) We apply a change of expression of the third term of equation (5) making use of the error function, erf, for computational convenience [see appendix A of Sawicki (2012) for details]. In practice, the error function tends to infinity quickly but gradually after a threshold point (the upper limit) calculated by the user as the local noise level in the original image, in the corresponding filter (the same image at a given filter/frequency can provide a combination of detections and upper limits if the location of components is known.). This χ2 is only slightly different when including the upper limits into the calculation. We note that in the case when no upper limit is provided, the second term is null, and therefore the equation corresponds to the classical χ2 definition. In order to explore the parameter space and minimize the χ2 (equivalent to maximizing the posterior probability), we make use of the emceePython package (Foreman-Mackey et al. 2013), allowing a choice of different samplers to probe the parameter space. The emcee package follows the classic Markov chain Monte Carlo (MCMC) approach, consisting of randomly moving walkers exploring the parameter space. The main goal is to sample of the posterior probability density function of the parameters with a sufficiently high number of independent realizations. This is the best compromise between a brute force approach where each solution of the parameter space is calculated for a given grid and a standard least-square minimization where only the best solution is provided (which can be local minima). This approach also presents the advantage to estimate uncertainties on the model parameters, even with non-normal distributions thanks to the exploration of the parameter space, but without the computing cost (increasing exponentially with the number of parameters). To explore and reach convergence on the best solution, the main idea of a MCMC approach is to set a walker in the parameter space, moving randomly with a given step size. For each new position, the likelihood function is evaluated, and the step is accepted if the probability is higher than the previous position. Hence the walker moves slowly towards the best solution which maximizes the likelihood function. Several algorithms are available, the most famous being the Metropolis–Hasting (MH) algorithm. However, this algorithm can fail to converge quickly when the convergence parameters are not optimally tuned and the parameter space presents strong non-linear behaviour (in particular if the step length is not optimized for the problem). New algorithms were developed to answer these caveats, reducing significantly the converging time. In particular, emcee uses affine-invariant transformations (Foreman-Mackey et al. 2013; Goodman & Weare 2010) to sample the parameter space, using a stretch move based on the other walker positions. This presents the advantages of being less sensitive to strongly correlated parameters allowing a minimum loss of performance and therefore a quicker convergence. Also, this algorithm enables two valuable simplifications compared to the standard MH: (i) the performance of the code is basically relying on two parameters: a, a gain parameter, and n, the number of walkers and (ii) the possible parallelization, splitting the chains into complementary sub-ensembles to predict iteratively the next walker moves, enabling even larger performance gains for specific problems. We refer the reader to Foreman-Mackey et al. (2013) for more details on the algorithm. 2.2 Inputs Mr-Moose requires four input files: a data file, a model file, a setting-file, and the filter data base each containing the necessary information to perform the fit (see Table 1). 2.2.1 Data file The data file requires the most attention in its design as it will set up the number of arrangements (the various data/model combination). The Mr-Moose procedure does not check for inconsistency in the data, and is the responsibility of the user to provide reliable or relevant photometric points (well calibrated and corrected for any photometric effects). To help with this step, corresponding images will be generated for each reported filter in the data file (see Section 2.3). Multi-wavelength SED fitting is particularly challenging in this regard, therefore specific care should be taken in order to have consistent data throughout the fitting range (blending, filtering of scale/flux by interferometers, instruments corrections, aperture effects, absolute calibration uncertainties) by adding extra models/arrangements if necessary. However, we stress that starting with a simple case and gradually complexifying the fitting is usually a rewarding approach. Table 2 summarizes the data file organization. Table 2. Structure of the .dat file. We provide here an example. Name of columns Type Purposes/notes Filter String Filter name. It should match exactly the name of a filter in the data base (case sensitive), without the extension (***.fil), necessary to calculate the model flux during the fitting (Section 2.2.4). lambda0 Float Central wavelength of the filter. RA String Right ascension of the source in the |$00^{\rm h}00^{\rm m}00{^{\rm s}_{.}}0$| format -- used only in the .fits generation Dec. String Declination of the source in the |$00^{\rm d}00^{\rm m}00{^{\rm s}_{.}}0$| format – used only in the .fits generation resolution Float Resolution of the instrument in arcsec – used only in the .fits generation det_type String Indicate if the reported value is a detection or an upper limit flux Float Observed flux in the given filter flux_error Float Uncertainty on the flux for the given filter arrangement Integer The number of the arrangement component String The name of the arrangement (will be used for the spilt plot) component_number List of integer The list of models to fit for this data point (separated with a semi-column) Name of columns Type Purposes/notes Filter String Filter name. It should match exactly the name of a filter in the data base (case sensitive), without the extension (***.fil), necessary to calculate the model flux during the fitting (Section 2.2.4). lambda0 Float Central wavelength of the filter. RA String Right ascension of the source in the |$00^{\rm h}00^{\rm m}00{^{\rm s}_{.}}0$| format -- used only in the .fits generation Dec. String Declination of the source in the |$00^{\rm d}00^{\rm m}00{^{\rm s}_{.}}0$| format – used only in the .fits generation resolution Float Resolution of the instrument in arcsec – used only in the .fits generation det_type String Indicate if the reported value is a detection or an upper limit flux Float Observed flux in the given filter flux_error Float Uncertainty on the flux for the given filter arrangement Integer The number of the arrangement component String The name of the arrangement (will be used for the spilt plot) component_number List of integer The list of models to fit for this data point (separated with a semi-column) Open in new tab Table 2. Structure of the .dat file. We provide here an example. Name of columns Type Purposes/notes Filter String Filter name. It should match exactly the name of a filter in the data base (case sensitive), without the extension (***.fil), necessary to calculate the model flux during the fitting (Section 2.2.4). lambda0 Float Central wavelength of the filter. RA String Right ascension of the source in the |$00^{\rm h}00^{\rm m}00{^{\rm s}_{.}}0$| format -- used only in the .fits generation Dec. String Declination of the source in the |$00^{\rm d}00^{\rm m}00{^{\rm s}_{.}}0$| format – used only in the .fits generation resolution Float Resolution of the instrument in arcsec – used only in the .fits generation det_type String Indicate if the reported value is a detection or an upper limit flux Float Observed flux in the given filter flux_error Float Uncertainty on the flux for the given filter arrangement Integer The number of the arrangement component String The name of the arrangement (will be used for the spilt plot) component_number List of integer The list of models to fit for this data point (separated with a semi-column) Name of columns Type Purposes/notes Filter String Filter name. It should match exactly the name of a filter in the data base (case sensitive), without the extension (***.fil), necessary to calculate the model flux during the fitting (Section 2.2.4). lambda0 Float Central wavelength of the filter. RA String Right ascension of the source in the |$00^{\rm h}00^{\rm m}00{^{\rm s}_{.}}0$| format -- used only in the .fits generation Dec. String Declination of the source in the |$00^{\rm d}00^{\rm m}00{^{\rm s}_{.}}0$| format – used only in the .fits generation resolution Float Resolution of the instrument in arcsec – used only in the .fits generation det_type String Indicate if the reported value is a detection or an upper limit flux Float Observed flux in the given filter flux_error Float Uncertainty on the flux for the given filter arrangement Integer The number of the arrangement component String The name of the arrangement (will be used for the spilt plot) component_number List of integer The list of models to fit for this data point (separated with a semi-column) Open in new tab 2.2.2 Model file The model file contains the information of the parameters from the combination of the models to fit on the data. Table 3 shows the format of the model file. The function name is referring to the function to be called during the fit and therefore should match exactly the name of the user defined function (case sensitive). The function is to be defined in the models.py file. Ideally, it should be sufficiently well designed to avoid a drop in execution time as each model is executed at each step for each walker by avoiding as much as possible loops, conditions, or function calls. The code already provides some functions, developed and tested for specific applications and examples (Gilli et al. 2014; Falkendal et al. 2018; Drouart et al., in preparation). We provide here a description of the models provided with the release of Mr-Moose: BB_law: a blackbody model directly called from the astropy package (Astropy Collaboration et al. 2013) \begin{eqnarray} S_{\nu } = N \frac{2h\nu ^3/c^{2}}{{\rm exp}(h\nu /kT)-1}, \end{eqnarray} (7) where Sν is the flux in erg s−1 cm−2 Hz−1, N the normalization to the data (relatively complex term containing the scaling to the data and the solid angle covered by the source with the small angular size source simplification, depending on each instrument for unresolved sources; see https://casper.berkeley.edu/astrobaki/index.php/Single_Dish_Basics for more details), ν the frequency in Hz, h the Planck constant, k the Boltzmann constant, T the temperature in K, c the speed of light. AGN_law: an empirical AGN model designed to fit the IR AGN component \begin{eqnarray} S_{\nu } = N \left(\frac{\nu _{\rm cut}}{\nu _{\rm obs}}\right)^{-\alpha } {\rm exp} \left(-\frac{\nu _{\rm cut}}{\nu _{\rm obs}}\right), \end{eqnarray} (8) where νcut is the cutoff frequency, α the spectra index, νobs the observed frequency in Hz. sync_law: a single power-law model, common form to represent synchrotron emission in radio and is defined as: \begin{eqnarray} S_{\nu } = N\nu ^{\alpha }, \end{eqnarray} (9) a modified blackbody function as used in Gilli et al. (2014) for example 3 (see Section 3.3), \begin{eqnarray} S_{\nu } = N B_{\nu }(T)\left(1- {\rm exp}\left[-\left(\frac{\nu }{\nu _0}\right)^{\beta }\right]\right), \end{eqnarray} (10) where Bν is the blackbody function defined previously (as BB_law), β the dust emissivity, and ν0 the frequency where the dust emission enters the optically thick regime (τ = (ν/ν0)β =1). Table 3. Structure of the .mod file. It consists of a series of blocs, to be repeated as many times as necessary to add the models to be fitted on the data. We present an example in Sections 3.1 and 3.2. It is to be noted that the name of the parameters should be expressed in latex syntax if the user requires the parameter to be written in latex format in the plot (particularly the triangle plot). Name # function_name Number of parameter (n) First parameter Min value Max value Second parameter Nin value Max value ⋮ ⋮ ⋮ n parameter Min value Max value # Name # function_name Number of parameter (n) First parameter Min value Max value Second parameter Nin value Max value ⋮ ⋮ ⋮ n parameter Min value Max value # Open in new tab Table 3. Structure of the .mod file. It consists of a series of blocs, to be repeated as many times as necessary to add the models to be fitted on the data. We present an example in Sections 3.1 and 3.2. It is to be noted that the name of the parameters should be expressed in latex syntax if the user requires the parameter to be written in latex format in the plot (particularly the triangle plot). Name # function_name Number of parameter (n) First parameter Min value Max value Second parameter Nin value Max value ⋮ ⋮ ⋮ n parameter Min value Max value # Name # function_name Number of parameter (n) First parameter Min value Max value Second parameter Nin value Max value ⋮ ⋮ ⋮ n parameter Min value Max value # Open in new tab Each model exists in two versions, the standard where an array of frequency, an array with the parameter and the redshift value is provided and a version with redshift as a free parameter by simply adding the _z at the end of the function (see examples in the Mr-Moose repository) and by changing the all_same_redshift and providing the relevant redshift array (one redshift value per component). Note that when the redshift is a free parameter, the inputs are slightly different with only an array of frequency and an array for the parameters. We refer the user to the examples available in the repository. The library of models will increase in future updates depending on the need; we refer to the user manual for an up-to-date library of available models. We remind that the package astropy (Astropy Collaboration et al. 2013) provides a large library of the most commonly used function in astronomy. Also, we suggest to design the model with parameters spanning a large range in log-scale (such as normalization factors, as done in the examples in this paper), making sure the corresponding function is making the adequate transformation to calculate flux. The initial position of the walkers is set following the emcee documentation: a Gaussian ball of walkers, centred on the median of the parameter interval. This has important consequences and should be taken into account when defining this interval (see Section 4). 2.2.3 Setting file The setting file (see full structure in Table 4) contains the parameters for Mr-Moose to perform the fit such as the number of walkers (nwalkers), the number of steps (nsteps), the limit (nsteps_cut) at which the chains should be used to plot the results (the limit from which convergence of the fit is reached). We provide examples with typical numbers. For more information, we invite the user to read the recommendations from the emcee documentation. In a nutshell: the more walkers and steps, the better. However a balance has to be found to keep the code executable on a machine. From experience, one effective strategy consists in running some first blind tests to have a feeling of the likely requirements for the number of walkers and the number of steps. For relatively simple case, we advice to start with 200 walkers and run for 500 steps, with a cut at 400 (nwalkers=200, nsteps=500, and nsteps_cut=400). This first run will probe the parameter space with 20 000 points ((nsteps-nsteps_cut) × nwalkers) and gives an indication of how many steps might be required to reach convergence. For a larger number of parameters, a larger number of walkers and steps are required to explore more thoroughly the parameter space (see example 1 versus example 2). Table 4. Structure of the .fit file. In order, the columns refer to the name of the variable as expressed in the code, its type, and its purpose. Parameter Type Purposes/notes source_file String The source file path where the data are to be read all_same_redshift Boolean Keyword to allow for the redshift of the components to be a free parameter redshift Float array The redshift of each component in the system model_file String The model file where the parameters of the models and their range are selected (the parameters to fit) nwalkers Integer The number of walkers for the fit (should be typically more than 100, emcee documentation advice at least twice the number of free parameters). It plays a role on the speed at which the chain converges, so usually the more the better (in the limit of the computer memory). nsteps Integer The number of steps for each walkers to perform (can vary widely from 200 to several thousand base on the complexity of the problem) nsteps_cut Integer Should be smaller than nsteps; it represents the number of steps to ignore when drawing the probability density function of the parameters (it depends mainly on convergence time and the number of walkers as it requires to select a sufficiently large number of points to populate the parameter space after convergence). percentiles Float array The percentile of the distribution to be returned (used in the plots to show max intervals). Must be odd-dimensioned and all numbers in the range 0 ≤ percentiles ≤ 1. The default values are chosen to represent probability density functions which are not normal, excluding the 10 per cent outliers. skip_imaging Boolean Skip the creation of the data file in images skip_fit Boolean Skip the fitting skip_MCChains Boolean Skip the generation of the MC-Chains plot skip_triangle Boolean Skip the generation of the triangle plot skip_SED Boolean Skip the generation of the SED plots unit_obs String Unit of the wavelength/frequency provided in the .dat file unit_flux String Unit of the flux provided in the .dat file Parameter Type Purposes/notes source_file String The source file path where the data are to be read all_same_redshift Boolean Keyword to allow for the redshift of the components to be a free parameter redshift Float array The redshift of each component in the system model_file String The model file where the parameters of the models and their range are selected (the parameters to fit) nwalkers Integer The number of walkers for the fit (should be typically more than 100, emcee documentation advice at least twice the number of free parameters). It plays a role on the speed at which the chain converges, so usually the more the better (in the limit of the computer memory). nsteps Integer The number of steps for each walkers to perform (can vary widely from 200 to several thousand base on the complexity of the problem) nsteps_cut Integer Should be smaller than nsteps; it represents the number of steps to ignore when drawing the probability density function of the parameters (it depends mainly on convergence time and the number of walkers as it requires to select a sufficiently large number of points to populate the parameter space after convergence). percentiles Float array The percentile of the distribution to be returned (used in the plots to show max intervals). Must be odd-dimensioned and all numbers in the range 0 ≤ percentiles ≤ 1. The default values are chosen to represent probability density functions which are not normal, excluding the 10 per cent outliers. skip_imaging Boolean Skip the creation of the data file in images skip_fit Boolean Skip the fitting skip_MCChains Boolean Skip the generation of the MC-Chains plot skip_triangle Boolean Skip the generation of the triangle plot skip_SED Boolean Skip the generation of the SED plots unit_obs String Unit of the wavelength/frequency provided in the .dat file unit_flux String Unit of the flux provided in the .dat file Open in new tab Table 4. Structure of the .fit file. In order, the columns refer to the name of the variable as expressed in the code, its type, and its purpose. Parameter Type Purposes/notes source_file String The source file path where the data are to be read all_same_redshift Boolean Keyword to allow for the redshift of the components to be a free parameter redshift Float array The redshift of each component in the system model_file String The model file where the parameters of the models and their range are selected (the parameters to fit) nwalkers Integer The number of walkers for the fit (should be typically more than 100, emcee documentation advice at least twice the number of free parameters). It plays a role on the speed at which the chain converges, so usually the more the better (in the limit of the computer memory). nsteps Integer The number of steps for each walkers to perform (can vary widely from 200 to several thousand base on the complexity of the problem) nsteps_cut Integer Should be smaller than nsteps; it represents the number of steps to ignore when drawing the probability density function of the parameters (it depends mainly on convergence time and the number of walkers as it requires to select a sufficiently large number of points to populate the parameter space after convergence). percentiles Float array The percentile of the distribution to be returned (used in the plots to show max intervals). Must be odd-dimensioned and all numbers in the range 0 ≤ percentiles ≤ 1. The default values are chosen to represent probability density functions which are not normal, excluding the 10 per cent outliers. skip_imaging Boolean Skip the creation of the data file in images skip_fit Boolean Skip the fitting skip_MCChains Boolean Skip the generation of the MC-Chains plot skip_triangle Boolean Skip the generation of the triangle plot skip_SED Boolean Skip the generation of the SED plots unit_obs String Unit of the wavelength/frequency provided in the .dat file unit_flux String Unit of the flux provided in the .dat file Parameter Type Purposes/notes source_file String The source file path where the data are to be read all_same_redshift Boolean Keyword to allow for the redshift of the components to be a free parameter redshift Float array The redshift of each component in the system model_file String The model file where the parameters of the models and their range are selected (the parameters to fit) nwalkers Integer The number of walkers for the fit (should be typically more than 100, emcee documentation advice at least twice the number of free parameters). It plays a role on the speed at which the chain converges, so usually the more the better (in the limit of the computer memory). nsteps Integer The number of steps for each walkers to perform (can vary widely from 200 to several thousand base on the complexity of the problem) nsteps_cut Integer Should be smaller than nsteps; it represents the number of steps to ignore when drawing the probability density function of the parameters (it depends mainly on convergence time and the number of walkers as it requires to select a sufficiently large number of points to populate the parameter space after convergence). percentiles Float array The percentile of the distribution to be returned (used in the plots to show max intervals). Must be odd-dimensioned and all numbers in the range 0 ≤ percentiles ≤ 1. The default values are chosen to represent probability density functions which are not normal, excluding the 10 per cent outliers. skip_imaging Boolean Skip the creation of the data file in images skip_fit Boolean Skip the fitting skip_MCChains Boolean Skip the generation of the MC-Chains plot skip_triangle Boolean Skip the generation of the triangle plot skip_SED Boolean Skip the generation of the SED plots unit_obs String Unit of the wavelength/frequency provided in the .dat file unit_flux String Unit of the flux provided in the .dat file Open in new tab 2.2.4 Filter data base The filter data base is essential in the fitting routine as allowing to predict the flux of a given combination of components at a given wavelength. For each step, the average flux of the model is calculated through the transmission curve of the filter given in the data file. This transmission curve represents the response of a telescope for a constant incident flux (in the frequency space) for a given frequency range. The convolution of the incident flux with the telescope response is important as any large variation in the SED within a spectral window can bias the estimated flux, for instance in the case of an emission line or a strong gradient within the averaged frequency range. This effect is particularly strong in optical and NIR where the telescope response varies significantly from filter to filter. During the fitting, we therefore calculate for each filter the corresponding telescope response convoluted with the modelled source with the following equation: \begin{eqnarray} S_{i}=\frac{\int _{\nu _{\rm min}}^{\nu _{\rm max}} S_{\nu }^{\rm model}t_{\nu } {\rm d}\nu } {\int _{\nu _{\rm min}}^{\nu _{\rm max}} t_{\nu } {\rm d}\nu }, \end{eqnarray} (11) where Si is the averaged flux for a given filter i, ν the frequency, |$S_{\nu }^{\rm model}$| the flux of the model and tν the transmission curve. Each of the filter used in the data file must be present in the data base, or the code will fail to execute. Addition of new filters is easy; a simple text file with two columns, frequency, and corresponding transmission can be added in the filters folder. For optical and NIR, given the large amount of filters used in astronomy, we point the reader to Asiago Database on Photometric Systems3 to add the relevant filters for a specific analysis (Fiorucci & Munari 2003). For other wavelength, we refer the user to each instrument user manual to provide the relevant transmission curve. In the case of radio telescopes (ALMA, VLA, ATCA, etc.), the bandpass calibration corrects for any effect, therefore the telescope response can be approximated as a gate function centred on the observed frequency and covered by the correlators (which can be discontinuous, depending on the configuration required to the science case).4 2.3 Outputs Mr-Moose produces a series of outputs, containing different information (see Tables 1 and 5). The outputs can be divided into two parts: the supporting files (‘.fits’, ‘.pkl’, ‘.sav’, ‘.sed’) and the result files (‘.pdf’). We focus first on the support files and their contents, allowing to dig further in the interpretation if necessary and allow for an easy reproduction of a given fitting. We designed Mr-Moose to allow users familiar with the emcee package to use their custom tools from the ‘.pkl’. However, Mr-Moose comes also with plotting functions to display results in order to interpret the result from the fit. There are several graphs ‘.pdf’ file generated by the procedure: chain convergence, triangle, and SED plots. These plots work ‘in unison’ to understand the results of the fitting. Any unexpected behaviours in these plots should bring suspicion on the inputs, would it be data, model used, arrangements defined or the settings of the fitting procedure. Table 5. List of the output files and their use. Name Purposes/notes triangle_plot Probability density functions for each parameter and by pair. This plot is particularly efficient to reveal degeneracies between parameters (banana shape) or unconstrained parameters by the fit (no single peak in the probability density function) MCChains_plot Values of parameters at each step for each walker. This plot is particularly useful to show when convergence is reached, used to calculate all the quantities displayed in other plots SED_fnu_plot Plot in units of frequency and flux of all the data and models on the same graphical window. While relevant for few components or arrangements, this plot becomes rapidly too complex for multi-component case SED_fnu_splitplot Split plot in unit of frequency and flux for each arrangement defined in the data file SED_fnu_spaghettitplot Plot showing all the possible parameter values on the SED after convergence (at each step for each walker). This plot translates the triangle plot on the data, showing how constrained the model is compared to the data. SED_fnu_splitspaghettitplot Equivalent to the SED_fnu_splitplot but for the spaghetti style SED_file File storing the SED of each component as plotted in the SED_fnu plot sampler_file File storing the chains of the fitting. save_struct File storing the details of the fit for future references and reproducibility, save the fit settings dictionary Name Purposes/notes triangle_plot Probability density functions for each parameter and by pair. This plot is particularly efficient to reveal degeneracies between parameters (banana shape) or unconstrained parameters by the fit (no single peak in the probability density function) MCChains_plot Values of parameters at each step for each walker. This plot is particularly useful to show when convergence is reached, used to calculate all the quantities displayed in other plots SED_fnu_plot Plot in units of frequency and flux of all the data and models on the same graphical window. While relevant for few components or arrangements, this plot becomes rapidly too complex for multi-component case SED_fnu_splitplot Split plot in unit of frequency and flux for each arrangement defined in the data file SED_fnu_spaghettitplot Plot showing all the possible parameter values on the SED after convergence (at each step for each walker). This plot translates the triangle plot on the data, showing how constrained the model is compared to the data. SED_fnu_splitspaghettitplot Equivalent to the SED_fnu_splitplot but for the spaghetti style SED_file File storing the SED of each component as plotted in the SED_fnu plot sampler_file File storing the chains of the fitting. save_struct File storing the details of the fit for future references and reproducibility, save the fit settings dictionary Open in new tab Table 5. List of the output files and their use. Name Purposes/notes triangle_plot Probability density functions for each parameter and by pair. This plot is particularly efficient to reveal degeneracies between parameters (banana shape) or unconstrained parameters by the fit (no single peak in the probability density function) MCChains_plot Values of parameters at each step for each walker. This plot is particularly useful to show when convergence is reached, used to calculate all the quantities displayed in other plots SED_fnu_plot Plot in units of frequency and flux of all the data and models on the same graphical window. While relevant for few components or arrangements, this plot becomes rapidly too complex for multi-component case SED_fnu_splitplot Split plot in unit of frequency and flux for each arrangement defined in the data file SED_fnu_spaghettitplot Plot showing all the possible parameter values on the SED after convergence (at each step for each walker). This plot translates the triangle plot on the data, showing how constrained the model is compared to the data. SED_fnu_splitspaghettitplot Equivalent to the SED_fnu_splitplot but for the spaghetti style SED_file File storing the SED of each component as plotted in the SED_fnu plot sampler_file File storing the chains of the fitting. save_struct File storing the details of the fit for future references and reproducibility, save the fit settings dictionary Name Purposes/notes triangle_plot Probability density functions for each parameter and by pair. This plot is particularly efficient to reveal degeneracies between parameters (banana shape) or unconstrained parameters by the fit (no single peak in the probability density function) MCChains_plot Values of parameters at each step for each walker. This plot is particularly useful to show when convergence is reached, used to calculate all the quantities displayed in other plots SED_fnu_plot Plot in units of frequency and flux of all the data and models on the same graphical window. While relevant for few components or arrangements, this plot becomes rapidly too complex for multi-component case SED_fnu_splitplot Split plot in unit of frequency and flux for each arrangement defined in the data file SED_fnu_spaghettitplot Plot showing all the possible parameter values on the SED after convergence (at each step for each walker). This plot translates the triangle plot on the data, showing how constrained the model is compared to the data. SED_fnu_splitspaghettitplot Equivalent to the SED_fnu_splitplot but for the spaghetti style SED_file File storing the SED of each component as plotted in the SED_fnu plot sampler_file File storing the chains of the fitting. save_struct File storing the details of the fit for future references and reproducibility, save the fit settings dictionary Open in new tab 2.3.1 Supporting files: chains, SED, fitting, and fits files The ‘.fits’ files are generated following the data provided in the ‘.dat’ file. For each filter, the procedure generates an image in the FITS format (commonly used in astronomy; Wells, Greisen & Harten 1981), containing all the components associated with this filter, making use of the RA, Dec., and resolution values provided in the ‘.dat’ file. The procedure creates a unresolved source, assuming a Gaussian point spread function of full width at half-maximum provided as the resolution parameter, at the given position (RA, Dec.). This allows us to check that the combination of the data file is correctly assigned by direct comparison with the original images, if available. We note that the pixel size is fixed to a fifth of the resolution parameter. A ‘.pkl’ file contains the object from the emcee procedure saved making use of the cPickle function from the pathos package allowing a serialization, necessary in case of parallelized use. No modifications are implemented from the original emcee sampler, and as such, emcee-familiar users can make use of this output into custom graphical tools. This file contains the complete chain for each walker and other diagnostics provided by the emcee package. The ‘.sav’ file contains the information from the fitting, including the setting file and the model structures, modified during the fitting and storing the final results in a YAML format. The file contains the final percentiles for each parameter, the best-fitting value, the name, and path of the output files generated by the procedure. A ‘.sed’ file is created reporting each component of the models from the global best fit. The first column is the frequency in Hz, and the following columns reports the flux in erg s−1 cm−2 Hz−1. 2.3.2 Result files: convergence, triangle, and SED plots The convergence plot shows the walkers exploration of the parameter space for each iteration of the procedure for each parameter. This plot helps us to identify when convergence is reached, therefore to which value to set the nsteps_cut parameter to be used by the triangle and SED plots. When is convergence reached? The walkers should oscillate at given values in all windows simultaneously, revealing that they converged into the minimum of the parameter space. Note that several minima can co-exist, therefore the walkers can be split between several values, but as long as the chains are stable around these values (for several hundreds steps), convergence can be considered as reached. However, with the increasing complexity of larger parameter spaces, some walkers can be ‘stuck’: i.e. moving erratically, very slowly, or not moving at all. Therefore, including these walkers in the subsequent plots would bias any interpretation of the probability density distribution. We report for completeness these ‘stuck’ walkers in the convergence plot (with a different colour coding) as a check of the fitting (see Section 3). These ‘stuck’ walkers will be ignored in the triangle, and SED plots. We filter these out extracting only the chains from the sampler with a high acceptance fraction (meaning that walkers are effectively ‘walking’). The default value for this threshold of selection in Mr-Moose is set to half of the average acceptance fraction value of all chains (AAF value, for instance if AAF = 0.6, the threshold is defined as 0.3 and all chains below are ignored in the following steps). We report both the AAF along with the fraction of walkers filtered out with this cut (SW) in the convergence plot. For specific cases, where the default cut is not optimal, Mr-Moose allows us to set a manual value (see README). As a guide, AAF should be in the 0.2–0.5 range and the stuck walkers fraction relatively low. For better details, we report the reader to the emcee documentation. Briefly, in the case of AAF and SW values being extremes, three solutions are to increase the number of walkers (nwalkers), to better define the initial parameter values through the parameter interval (see Section 4). or to simply increase the number of steps (nsteps). The triangle plot shows several pieces of information from the fit making use of the corner package. It basically uses all information contained by the selected walkers in the chain after the convergence value (nsteps_cut) provided in the setting-file. It presents two types of plot: 2D paired-parameter plots and marginalized distribution for each parameter in a form of a triangle. The 2D plots show the probability density functions by reporting each step of each walker for the projected parameters. Additional contours are also added, representing an alternative version of the dot-representation, with contour values being at the 25 per cent, 50 per cent, and 75 per cent of the maximal value of the corresponding marginalized distribution. We stress that these contours do not correspond to the percentiles of the distribution, obtained by cumulative integration of the probability density function. Along the diagonal, the marginalized distributions of each parameter are presented by histograms. The percentile values defined in the setting-file are also reported. We recommend the use of a slightly different version of the seven-figure summary (Bowley 1920), omitting the two extremes, minimum and maximum values of the distribution as particularly sensitive to outliers and ‘stuck’ walkers, therefore 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent. They usually present a good summary of different types of distributions (as it is expected to deviate substantially from normal distributions), although it should not prevent us from carefully checking each distribution visually to identify problems. This triangle diagram is particularly helpful to (i) reveal any degeneracies among parameters and (ii) reveal if parameters are constrained, and how well, given the data set. The SED plots show the fit of each arrangement on the data, the final results of the models, and data combination. The final number of generated plot changes, depending on the complexity of the source considered. First, the SED is coming in two versions, a best fit and a ‘spaghetti’ version. The former shows the best overall fit to the data, each colour corresponding to one model (the maximum likelihood reached in the chains). The latter shows all the solutions provided by each walkers at each step after the convergence cut (same as the triangle plot). This is particularly useful to see how good the models are constrained with the data in combination with the triangle and convergence plots. We choose this representation over a shaded area representation to better visualize the constraints from the data and to better highlight the different solution in case of the presence of several minima. For sources with several arrangements, split versions of the SED will also be produced to ease interpretation (see examples in Section 3) with colour coding of the models conserved over the windows. 3 EXAMPLES In order to demonstrate Mr-Moose capabilities, we focus here on three specific extragalactic examples which were used to build this procedure (the code is provided with several examples with increasing complexity; we focus here on the artificial ones produced by the file example_1.py and example_6.py) as well as a previous result from literature. For the sake of simplicity, we will refer to them as example 1, 2, and 3, respectively. The first is a simple case of fitting, with only one model on to a data set, here synchrotron emission from a radio source in the 74 MHz–8.4 GHz range. The second case is much more complex: we here aim to disentangle the contribution of various components, spatially non-collocated and originating from different physical processes. We take the example of a powerful radio galaxy and a star-forming galaxy companion with separate emission from the lobes (of synchrotron origin), the dust heated by star formation and the dust heated by AGN, in the 8 |$\mu$|m-1.4 GHz wavelength range. The third example covers a real data set published in Gilli et al. (2014), fitting a modified blackbody in the 70 |$\mu$|m–3 mm range to reproduce dust emission of a distant quasar, demonstrating the useful addition of upper limits in the fitting process. 3.1 Example 1: single source, single component This example is the simplest fitting imaginable for Mr-Moose, consisting of a single law on a given data set. Although of limited interest, it allows us to familiarize with the code. We benchmark Mr-Moose by generating a fake source and trying to recover these original parameters. 3.1.1 Fake source and input files To generate fake source data points, we assume a single, unresolved source at z = 0 emitting a single synchrotron law represented by the model sync_law with α = −1 and log10N = −22 (to recover a source with flux at Jy-level) where α is the slope and N the normalization (in log-scale). We generate five single points passing this SED through the filters from the filter library and assign a given signal-to-noise ratio (here SNR=5 for all points). We simulate further observation uncertainties by adding a random Gaussian noise with a standard deviation corresponding to the provided SNR value. In practice, the code adds (or subtracts) flux to each data point, not following exactly the original synchrotron law defined previously. By adding this extra noise the chance of recovering the exact parameter values decreases; this perfectly illustrates the real case of observations, where each point is drawn from a distribution. This fake data file (Listing 9) is, along with the setting-file (Listing 6) and the model file (Listing 1), generated from the example_1.py procedure (provided with Mr-Moose). These files are therefore fed into Mr-Moose to perform the SED fitting. 3.1.2 Output files and results The code generates the output files previously described (see Section 2). We focus on the four ‘.pdf’ plots: the convergence plot (Fig. 2), the triangle plot (Fig. 3), and two SED plots (in unit of Fν, Fig. 4). Figure 2. Open in new tabDownload slide MC-Chains for each parameter for the given number of steps for example 1. Note the convergence reached roughly after step 100 Figure 2. Open in new tabDownload slide MC-Chains for each parameter for the given number of steps for example 1. Note the convergence reached roughly after step 100 Figure 3. Open in new tabDownload slide Triangle diagram for example 1. The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values is reported in Table 6. The other diagram represents the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the density distribution at 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figure 3. Open in new tabDownload slide Triangle diagram for example 1. The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values is reported in Table 6. The other diagram represents the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the density distribution at 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figure 4. Open in new tabDownload slide SEDs for example 1. Note that the uncertainties are plotted but are smaller than the data points, diamonds being detections. Left: best-fitting plot. The plain black line is the sum of the different components in each arrangement. Right: ‘spaghetti’ plot. The best fit here is reported as the black plain line for each component (defined in the .dat file). The shaded area is the sum of each model representation after convergence (values of the parameters at each step of each walker). Figure 4. Open in new tabDownload slide SEDs for example 1. Note that the uncertainties are plotted but are smaller than the data points, diamonds being detections. Left: best-fitting plot. The plain black line is the sum of the different components in each arrangement. Right: ‘spaghetti’ plot. The best fit here is reported as the black plain line for each component (defined in the .dat file). The shaded area is the sum of each model representation after convergence (values of the parameters at each step of each walker). The MC-Chains plot (Fig. 2) shows here a quick convergence (reached roughly at the 100 step). We define nsteps_cut=180, and apply it to the subsequent plots (triangle and SEDs). The triangle plot (Fig. 3) is the classical plot to represent multi-parameter space: plotting the distribution of parameters in pairs with the marginalized distribution of each parameter on the hypotenuse of the triangle. Along this diagonal, the marginalized distribution for each parameter is a histogram. The dashed lines are the percentiles provided in the setting-file, here 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent (provided by the user in the setting-file), as recommended in Section 2. The parameters are well constrained in a small area of the parameter space. It is interesting to note that even if the two marginalized distributions show relatively Gaussian-like distributions, the 2D projection is far from being a symmetric cloud. The contours correspond here to 25 per cent, 50 per cent, and 75 per cent from the maximal value of the marginal distributions. Overall, we note that the input parameter is recovered within the uncertainty interval (remembering we added extra noise to the data; see Table 6). Table 6. Components and parameters of example 1. * Value in logarithmic scale. The uncertainties quoted are the 25 and 75 percentiles. Name Parameter True value Fit value sync_law N −22* |$-21.85^{+0.14}_{-0.15}$| α −1 |$-1.02^{+0.02}_{-0.02}$| Name Parameter True value Fit value sync_law N −22* |$-21.85^{+0.14}_{-0.15}$| α −1 |$-1.02^{+0.02}_{-0.02}$| Open in new tab Table 6. Components and parameters of example 1. * Value in logarithmic scale. The uncertainties quoted are the 25 and 75 percentiles. Name Parameter True value Fit value sync_law N −22* |$-21.85^{+0.14}_{-0.15}$| α −1 |$-1.02^{+0.02}_{-0.02}$| Name Parameter True value Fit value sync_law N −22* |$-21.85^{+0.14}_{-0.15}$| α −1 |$-1.02^{+0.02}_{-0.02}$| Open in new tab As presented in Section 2, the code generates two versions of the SED to visualize convergence and the effect of uncertainties: the best fit and the ‘spaghetti’ version allow us to visualize how well the data constrain the parameters of the model. Fig. 4 shows that the power-law parameters are well constrained with only a small scatter of the model in the data points. 3.2 Example 2: multi-sources, multi-components For the second example, we aim to demonstrate the application of Mr-Moose to a much more realistic situation where the total emission received is a complex combination of several blended sources at different wavelengths by using all the variations and freedom allowed by Mr-Moose. This fake system is adapted from a real science case (see Gullberg et al. 2016). In the following paragraph and Fig. 5, we summarize this complex source consisting of two galaxies, one powerful radio galaxy with two bright radio lobes (FRII type, Fanaroff & Riley 1974) and a star-forming companion galaxy. Figure 5. Open in new tabDownload slide Scheme of the combination of observations at different wavelengths from the data file to illustrate the considered example. The circle sizes are relative to the resolution of the observation but not to scale (see Fig. 6). Note that Herschel, LABOCA, one ALMA, one VLA and ATCA observations are not resolving the different components. From bigger to small circles: Herschel, ATCA, VLA, LABOCA, ALMA, double circles are VLA and ALMA (closer to centre), and in the centre Spitzer. Figure 5. Open in new tabDownload slide Scheme of the combination of observations at different wavelengths from the data file to illustrate the considered example. The circle sizes are relative to the resolution of the observation but not to scale (see Fig. 6). Note that Herschel, LABOCA, one ALMA, one VLA and ATCA observations are not resolving the different components. From bigger to small circles: Herschel, ATCA, VLA, LABOCA, ALMA, double circles are VLA and ALMA (closer to centre), and in the centre Spitzer. 3.2.1 Fake source and input files The following characteristics and assumptions are used to define this system: The radio galaxy is composed of an AGN, a host galaxy, and is star-forming The AGN is radio-loud presenting a FRII morphology (two bright radio lobes) The companion is star forming and without AGN The SED coverage covers the 8 |$\mu$|m–1.4 GHz range with various resolutions and detections as the result of the different facilities used for observations. Fig. 6 presents the observations, each image generated by Mr-Moose before the fitting. To be further precise, the IR SED does not differentiate between the two galaxies due to the lack of resolution and is constituted of three components: the cold dust from both the companion and the host and the AGN hosted in the radio galaxy. We also note that the 350 |$\mu$|m, 500 |$\mu$|m, and 870 |$\mu$|m are only upper limits and, therefore, the sum of these three components cannot largely exceed these values. The synchrotron emission is constituted of two lobes, partially blended with the radio galaxy itself and with the companion or unresolved at the lowest radio frequency. Also, the synchrotron and dust emission are equally contributing at certain frequencies (here 230 GHz). Figure 6. Open in new tabDownload slide Images generated from the information contained data file assuming a Gaussian point spread function, all images scaled to the same coordinates. Frequency is decreasing towards the right and the top. Note that no sources are detected in spire_350, spire_500, and laboca_870 images, while VLA_X and VLA_C present two components. North is top and east is left on all images. Figure 6. Open in new tabDownload slide Images generated from the information contained data file assuming a Gaussian point spread function, all images scaled to the same coordinates. Frequency is decreasing towards the right and the top. Note that no sources are detected in spire_350, spire_500, and laboca_870 images, while VLA_X and VLA_C present two components. North is top and east is left on all images. We have a total of 14 observations in the 8 |$\mu$|m–1.4 GHz range, with 17 photometric data points. The total number of components is therefore five, one AGN dust component, two cold dust components (one the radio galaxy, one of the companion galaxy assuming that the cold dust component originates from young stars) and two synchrotron emissions (originating from each lobe). This corresponds to 10 free parameters, summarized in Table 7. Given the combination of resolution, sensitivity, frequency coverage, photometry, and components, we can define seven arrangements of the data/models, used to generate the fake data file and model file. Dust/AGN: combination of the dust emission from the companion and the radio galaxy itself consisting of the AGN and cold dust components Spitzer, Herschel, and LABOCA (Siringo et al. 2009): W sync/comp (western synchrotron, companion): partially resolved at 229 GHz, combination of synchrotron and cold dust emission (ALMA Band 6, 7.5 GHz bandwidth) E sync/host (eastern synchrotron, host galaxy): partially resolved at 245 GHz, combination of synchrotron and cold dust emission (ALMA Band 6, 7.5 GHz bandwidth) total sync (total synchrotron): unresolved radio emission at 1.4 GHz, 4.7 GHz 102 GHz (VLA, ATCA, and ALMA Band 3 at 50 MHz, 2 GHz, and 7.5 GHz bandwidth, respectively) W sync: only resolved at 4.8 GHz and 8.4 GHz (VLA, C and X bands, with 50 MHz bandwidth) E sync: only resolved at 4.8 GHz and 8.4 GHz (VLA, C and X bands, with 50 MHz bandwidth) total emission: combination of all components at 102 GHz (ALMA Band 3, with 7.5 GHz bandwidth) Table 7. Components and parameters of example 2. *All normalization values are logarithmic values. The quoted uncertainties are the 25 and 75 percentiles. Name Parameter True value Fit value AGN_law NAGN −32* |$-31.94^{+0.08}_{-0.09}$| αAGN −2 |$-2.04^{+0.13}_{-0.12}$| BB_law(1) NBB1 −22* |$-21.75^{+0.29}_{-0.22}$| T1 40 42.98|$^{+13.30}_{-14.82}$| BB_law(2) NBB2 −22* |$-21.74^{+0.30}_{-0.18}$| T2 60 45.51|$^{+9.71}_{-16.63}$| sync_law(1) Ns1 −22* −21.64|$^{+0.93}_{-0.87}$| αs1 −1 |$-1.06^{+0.09}_{-0.10}$| sync_law(2) Ns2 −22* 21.88|$^{+0.91}_{-0.87}$| αs2 −1.2 |$-1.03^{+0.09}_{-0.10}$| Name Parameter True value Fit value AGN_law NAGN −32* |$-31.94^{+0.08}_{-0.09}$| αAGN −2 |$-2.04^{+0.13}_{-0.12}$| BB_law(1) NBB1 −22* |$-21.75^{+0.29}_{-0.22}$| T1 40 42.98|$^{+13.30}_{-14.82}$| BB_law(2) NBB2 −22* |$-21.74^{+0.30}_{-0.18}$| T2 60 45.51|$^{+9.71}_{-16.63}$| sync_law(1) Ns1 −22* −21.64|$^{+0.93}_{-0.87}$| αs1 −1 |$-1.06^{+0.09}_{-0.10}$| sync_law(2) Ns2 −22* 21.88|$^{+0.91}_{-0.87}$| αs2 −1.2 |$-1.03^{+0.09}_{-0.10}$| Open in new tab Table 7. Components and parameters of example 2. *All normalization values are logarithmic values. The quoted uncertainties are the 25 and 75 percentiles. Name Parameter True value Fit value AGN_law NAGN −32* |$-31.94^{+0.08}_{-0.09}$| αAGN −2 |$-2.04^{+0.13}_{-0.12}$| BB_law(1) NBB1 −22* |$-21.75^{+0.29}_{-0.22}$| T1 40 42.98|$^{+13.30}_{-14.82}$| BB_law(2) NBB2 −22* |$-21.74^{+0.30}_{-0.18}$| T2 60 45.51|$^{+9.71}_{-16.63}$| sync_law(1) Ns1 −22* −21.64|$^{+0.93}_{-0.87}$| αs1 −1 |$-1.06^{+0.09}_{-0.10}$| sync_law(2) Ns2 −22* 21.88|$^{+0.91}_{-0.87}$| αs2 −1.2 |$-1.03^{+0.09}_{-0.10}$| Name Parameter True value Fit value AGN_law NAGN −32* |$-31.94^{+0.08}_{-0.09}$| αAGN −2 |$-2.04^{+0.13}_{-0.12}$| BB_law(1) NBB1 −22* |$-21.75^{+0.29}_{-0.22}$| T1 40 42.98|$^{+13.30}_{-14.82}$| BB_law(2) NBB2 −22* |$-21.74^{+0.30}_{-0.18}$| T2 60 45.51|$^{+9.71}_{-16.63}$| sync_law(1) Ns1 −22* −21.64|$^{+0.93}_{-0.87}$| αs1 −1 |$-1.06^{+0.09}_{-0.10}$| sync_law(2) Ns2 −22* 21.88|$^{+0.91}_{-0.87}$| αs2 −1.2 |$-1.03^{+0.09}_{-0.10}$| Open in new tab We generate the fake files using the fake source procedure provided with Mr-Moose, generating, summing, and passing the different combinations through the adequate filters in the same fashion presented in Section 3.1. Table 7 lists the values of each parameter and Fig. 6 the images corresponding to each filter from the data file. We choose nwalkers=400 to make sure to have a sufficiently large number of walkers exploring the parameter space and a deliberately (too) large number of steps (nsteps=3500) to ensure convergence is reached. 3.2.2 Output files The code generates the output files previously described in Section 2. Given the higher number of arrangements, two new SED files are generated (compared to example 1), presenting a split version of the total SED. Indeed, when dealing with complex system, presenting emission from different origins, over-plotting everything on the same SED makes interpretation difficult. Therefore, Mr-Moose provides split SEDs, with the same scale and range for each arrangement (seven in this example) with colour coding each model component. We describe here the results from the convergence plot (Fig. 7), the triangle plot (Fig. 9), and finally finish by the SEDs (Figs 10 and 11). Figure 7. Open in new tabDownload slide MC-Chains for each parameter for the given number of steps for example 2. Note the convergence reached roughly after 1000 steps. The grey lines are the ‘stuck’ walkers, excluded from further analysis (see Section 3.2.2 and Fig. 8). Figure 7. Open in new tabDownload slide MC-Chains for each parameter for the given number of steps for example 2. Note the convergence reached roughly after 1000 steps. The grey lines are the ‘stuck’ walkers, excluded from further analysis (see Section 3.2.2 and Fig. 8). Fig. 7 shows the chains of walkers, where convergence is reached roughly after 1000 steps. With more attention, one notices each parameter converges at different times, the first ones being the most constrained parameter such as NAGN or αAGN (compared to NBB1). Also from this plot, it is clear that the dust temperature cannot be further constrained without extra information. We deliberately chose a larger number of steps to ensure we would not miss a long convergence trend. The grey chains correspond to flagged walkers, below the acceptance fraction threshold discussed in Section 2. In this particular example, the default AAF threshold value does not filter out some isolated walkers which did not converge even after this large number of steps. This is probably due to the increasing complexity of the parameter space, where several local minima coexist. We therefore set a manual acceptance fraction threshold at 0.23. We choose this value because it filters out the low-end tail of the distribution of the acceptance fraction values (see Fig. 8) corresponding to the isolated walkers in the parameter space (see the bottom of the NBB2 chains, at roughly step 1000). To plot the following SEDs and triangle diagram, we set the limit |${\it nsteps\_cut}$|=3450. A much lower cut, such as 3000, is possible but would clutter the following diagrams and given our 400 walkers, this still represents 20 000 points to sample our distributions. Figure 8. Open in new tabDownload slide Histogram of the acceptance fraction of the 400 chains. Note the cut at 0.23; chains under this values are reported in grey in Fig. 7, which are the ‘stuck walkers’ (see text Section 3.2.2). Figure 8. Open in new tabDownload slide Histogram of the acceptance fraction of the 400 chains. Note the cut at 0.23; chains under this values are reported in grey in Fig. 7, which are the ‘stuck walkers’ (see text Section 3.2.2). Fig. 9 provides us with information on the parameters. Looking at the histogram on the diagonal, we can see that most parameters are relatively well constrained, within the original values, except for the temperature of the cold dust components, especially T1 (This will be particularly obvious in the SED plot.). We also see that with the increasing complexity of the system, contours are very useful to highlight ‘banana’ shapes more clearly. This is a perfect demonstration of how Mr-Moose provides reliable results in the ambiguous case of a lack of data. Figure 9. Open in new tabDownload slide Triangle diagram for example 2. The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values are reported in Table 7. The other diagrams represent the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figure 9. Open in new tabDownload slide Triangle diagram for example 2. The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values are reported in Table 7. The other diagrams represent the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figs 10 and 11, respectively, show the single and the split SEDs for the best fit and ‘spaghetti’ visualizations. In these cases, overplotting all information on a single SED is very confusing (see Fig. 10). While we keep this figure for its pedagogic value, we prefer to focus on the split version of SED to interpret the fitting (see Fig. 11). On the top panel (best-fitting visualization), the procedure proposes a really good fitting. When incorporating the uncertainties on the parameters, in the bottom panel (spaghetti visualization), we see that the superposition of the different components explain the degeneracies observed in Fig. 9. The large variation of the temperature allows for a large variation of the normalization and, in turns, a different setup and relative contribution of each component. This is a perfect illustration that (i) a simple value of minimum χ2 fit is limited as not providing a fair view of the fitting, (ii) extra-information on the relative contribution of the different components is necessary to better constrain the fit, (iii) even a sophisticated fitting approach does not preclude to high-quality data and a careful examination of the source. Figure 10. Open in new tabDownload slide Total SED for each arrangement of data and models plot in a single figure. Coloured lines refer to the different models and the black lines are the total of all components in each arrangement. This figure is reported for pedagogic purposes only as, given the increasing complexity of the system, a split SED is much clearer to disentangle the various contribution of the components to the data (see Fig. 11). Figure 10. Open in new tabDownload slide Total SED for each arrangement of data and models plot in a single figure. Coloured lines refer to the different models and the black lines are the total of all components in each arrangement. This figure is reported for pedagogic purposes only as, given the increasing complexity of the system, a split SED is much clearer to disentangle the various contribution of the components to the data (see Fig. 11). Figure 11. Open in new tabDownload slide SEDs for example 2, split into subplots to illustrate each arrangement. Note that the uncertainties are plotted but are smaller than the data points. Diamonds are detections while downward triangle are upper limits. Top: best-fitting plot. The plain black line is the sum of the different components in each arrangement. Bottom ‘spaghetti’ plot. The best fit here is reported as the black plain line for each component (defined in the .dat file). The colours refer to each model component, consistently between each subplot. Figure 11. Open in new tabDownload slide SEDs for example 2, split into subplots to illustrate each arrangement. Note that the uncertainties are plotted but are smaller than the data points. Diamonds are detections while downward triangle are upper limits. Top: best-fitting plot. The plain black line is the sum of the different components in each arrangement. Bottom ‘spaghetti’ plot. The best fit here is reported as the black plain line for each component (defined in the .dat file). The colours refer to each model component, consistently between each subplot. 3.3 Example 3: real data set We finally run Mr-Moose on a published data set to ensure the code is able to recover previous results. We take the example from Gilli et al. (2014) as photometry and parameter results are reported in the publication and allow for a direct comparison. Briefly, the data, a combination of ALMA and Herschel observations with upper limits and detections, are used to fit a modified blackbody component (reported in Section 2.2.2), in order to access the dust properties of a z = 4.75 quasar. We run the code in two configurations a and b: one with only the detected filters and a second adding the upper limits to the fit. For both configurations we set β = 2 and ν0 = 1.5 1012 Hz as in the original publication. We list the files in the Appendix and run the procedure twice. We focus only on recovering the temperature parameter as the other parameters are derived either from the best-fitting temperature or directly from observed fluxes. For the case of detection only (a), the temperature is recovered within the uncertainties of the quoted value in the original paper (see Table 8 and Figs 12–14). Note also the asymmetrical uncertainties when having the information from parameter space exploration. The uncertainties provided by Mr-Moose are different, referring to the 25 per cent and 75 per cent of the distribution when the value quoted in Gilli et al. (2014) is likely the standard 1σ symmetric uncertainty assuming a normal distribution. Figure 12. Open in new tabDownload slide MC-Chains for example 3. Note the convergence after roughly 100 steps. Top: detections only (a). Bottom: detections and upper limits (b). Figure 12. Open in new tabDownload slide MC-Chains for example 3. Note the convergence after roughly 100 steps. Top: detections only (a). Bottom: detections and upper limits (b). Figure 13. Open in new tabDownload slide Triangle diagram for example 3. Top: detections only (a). Bottom: detections and upper limits (b). The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values is reported in Table 8. The other diagram represents the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figure 13. Open in new tabDownload slide Triangle diagram for example 3. Top: detections only (a). Bottom: detections and upper limits (b). The diagonal represents the marginalized probability distribution. The vertical dashed line corresponds to the 10 per cent, 25 per cent, 50 per cent, 75 per cent, and 90 per cent percentiles of the distribution. The 50 per cent percentile parameter along the 25 per cent and 75 per cent percentile values is reported in Table 8. The other diagram represents the 2D projection of the parameter space on the corresponding parameters on the abscissa and ordinate axes. The contours represent the 75 per cent, 50 per cent, and 25 per cent of the maximum parameter value. Figure 14. Open in new tabDownload slide SEDs for example 3. Top: detections only (a). Bottom: detections and upper limits (b). Left: best-fitting plot. Right: spaghetti plot. The diamonds are the detections and the triangles are the 1σ upper limits. Note that the fitting can converge above the upper limits to a certain extent. Figure 14. Open in new tabDownload slide SEDs for example 3. Top: detections only (a). Bottom: detections and upper limits (b). Left: best-fitting plot. Right: spaghetti plot. The diamonds are the detections and the triangles are the 1σ upper limits. Note that the fitting can converge above the upper limits to a certain extent. Table 8. Components and parameters of example 3. *Value in logarithmic scale. The uncertainties quoted are the 25 and 75 percentiles. See Section 3.3 for the details about configurations a and b. Name Parameter Gilli2014+ config a config b MBB N n/a |$-15.33^{+0.05}_{-0.04}$| |$-15.22^{+0.05}_{-0.04}$| T[K] 58.5|$^{+5.8}_{-5.8}$| 59.14|$^{+3.61}_{-4.06}$| 51.08|$^{+3.01}_{-3.23}$| Name Parameter Gilli2014+ config a config b MBB N n/a |$-15.33^{+0.05}_{-0.04}$| |$-15.22^{+0.05}_{-0.04}$| T[K] 58.5|$^{+5.8}_{-5.8}$| 59.14|$^{+3.61}_{-4.06}$| 51.08|$^{+3.01}_{-3.23}$| Open in new tab Table 8. Components and parameters of example 3. *Value in logarithmic scale. The uncertainties quoted are the 25 and 75 percentiles. See Section 3.3 for the details about configurations a and b. Name Parameter Gilli2014+ config a config b MBB N n/a |$-15.33^{+0.05}_{-0.04}$| |$-15.22^{+0.05}_{-0.04}$| T[K] 58.5|$^{+5.8}_{-5.8}$| 59.14|$^{+3.61}_{-4.06}$| 51.08|$^{+3.01}_{-3.23}$| Name Parameter Gilli2014+ config a config b MBB N n/a |$-15.33^{+0.05}_{-0.04}$| |$-15.22^{+0.05}_{-0.04}$| T[K] 58.5|$^{+5.8}_{-5.8}$| 59.14|$^{+3.61}_{-4.06}$| 51.08|$^{+3.01}_{-3.23}$| Open in new tab When including upper limits (b), the result differs with a slightly lower temperature and uncertainties. This is probably due to the inclusion of upper limits as continuous data with decreasing probability as defined in equation (6) (note also how the model is able to converge above the upper limits, which are represented as 1σ in Fig. 14, bottom right rather than the 3σ in Fig. 2; Gilli et al. 2014). This is a good demonstration of how constraining upper limits can add valuable information when performing SED fitting. As mentioned in the original paper, we would like to stress that the uncertainties reported here are mainly fit uncertainties: they only represent the combined user knowledge on the uncertainties given in the data file. For a more general comment, these statistical uncertainties do not incorporate any systematic uncertainties (which can dominate in certain cases). This is a very common problem in SED fitting; the final uncertainties obtained are limited to the systematic uncertainties associated with the assumed model to represent the data or from the data themselves. We add this word of caution for user non-familiar with SED fitting as these effects are often more subtle (but can be larger than the statistical uncertainties!) and usually hidden in the model choice and/or data. In this particular case, only a full comparison between a range of models (e.g. the common problem of IMF in stellar population fitting) and a complete data processing uncertainties (e.g. bad calibration or sky subtraction) assessment can answer this particular question. 4 KNOWN LIMITATIONS AND FUTURE DEVELOPMENTS At the publication of this paper, the code will be available online on the GitHub platform at the following url: https://github.com/gdrouart/MrMoose, under a GPLv3 license (allowing re-use and modifications, see License.txt in Mr-Moose folder). While Mr-Moose is under continuous development and will see periodic upgrades, the code also knows some limitations. We list some of the most important ones and refer the reader to the README for a more exhaustive list of all known issues/limitations. The execution time is highly dependent on the complexity of the configuration: the most parameters/components/arrangements, the longer the time for the fitting to converge. Typical execution times are from couple of minutes to several hours (see Table 9) for a reasonably recent machine. More particularly, it assumes that the user defined models are as efficient as possible because these models are called at each iteration for each walker and is therefore the bottleneck of the code. Large number of walkers and steps can lead to significantly larger files, slowing the process. Even if Mr-Moose is designed to run in one step, we recommend to split the run in several steps, first generate the images only to check your data file, hence a second run to perform the fit, and a third to generate the ‘.pdf’ files (by changing the keywords values in the setting file). Choosing initial values close to the ‘true’ values is important, otherwise the code will take a long time to converge (if converging at all). This choice of initial guess is particularly important when including upper limits. By design, when the parameter values produce a model above an upper limit, the likelihood is very quickly set to infinity due to the error function (see Section 2). Therefore, a slight underestimation of parameters is preferable, particularly on any normalization parameters (to keep the second term of the χ2 in equation (6) close to null at the beginning). We remind that the initialization of the walkers is made following the emcee recommendation: a Gaussian ‘ball’ of walkers centred on the median value of the interval parameters. The code provides two levels of parallelization (multi-core processing) for the calculation of the likelihood: the parallelization on one source and the parallelization on a sample of sources. The former tends to deteriorate the performance, most likely due to the large overheads when the likelihood is relatively simple (see emcee documentation). The second is only applicable on a sample of sources and therefore is much more efficient in reducing fitting time of an entire sample, using one core per source (Mr-Moose is provided with the wrapper herd.py to parallelize on sample.). However, specific care should be taken to avoid running out of memory as the number of positions to store scales with the number of steps and walkers (and the number of sources to fit simultaneously). Table 9. Execution time for different configurations of Mr-Moose. All times reported are approximate time when executed on a relatively recent laptop [here for a laptop with a (3.1GHz Intel core i7 and 16GB DDR3 memory)]. Examples nwalkers nsteps No. of parameters Time 1 200 200 2 ∼2 min 2 400 3500 10 ∼24 h 3a 200 400 2 ∼5 min 3b 200 400 2 ∼10 min Examples nwalkers nsteps No. of parameters Time 1 200 200 2 ∼2 min 2 400 3500 10 ∼24 h 3a 200 400 2 ∼5 min 3b 200 400 2 ∼10 min Open in new tab Table 9. Execution time for different configurations of Mr-Moose. All times reported are approximate time when executed on a relatively recent laptop [here for a laptop with a (3.1GHz Intel core i7 and 16GB DDR3 memory)]. Examples nwalkers nsteps No. of parameters Time 1 200 200 2 ∼2 min 2 400 3500 10 ∼24 h 3a 200 400 2 ∼5 min 3b 200 400 2 ∼10 min Examples nwalkers nsteps No. of parameters Time 1 200 200 2 ∼2 min 2 400 3500 10 ∼24 h 3a 200 400 2 ∼5 min 3b 200 400 2 ∼10 min Open in new tab We list here some of the future features, already planned: allowing for a wider range of priors: in the current version, only uniform priors are incorporated. We plan to add a variety of priors in order to increase the fidelity of prior knowledge into the fitting procedure. This will also solve the initial parameter value limitations mentioned previously. a migration to Python 3 is planned (when upgraded, all further developments will be applied to the Python 3 in priority.) implementing the use of discrete models such as sophisticated AGN models or galaxy evolution models. This step requires the implementation of a methodology where the samplers can explore a (partially) discrete parameter space. Several solutions are considered (SED interpolation, brute force, branch-and-bound methods, nesting, etc). Its implementation is beyond the scope of this paper; it will be the object of a major update of Mr-Moose. 5 CONCLUSION We presented a new fitting tool allowing the user to fit SEDs for a variety of source configuration from relatively simple to complex source configuration in a Bayesian framework using the emcee package. The code, highly customizable, allows a wide range of configurations between data and models, with the aim to stay user-friendly. The code consistently handles upper limits along with detections, each data point being calculated through the filter data base, also managed and defined by the user. The large flexibility on creating models/data/filters makes this code virtually adaptable to any case, independently of wavelengths, instruments/telescopes combinations or science cases, providing a robust tool to interpret increasingly complex multi-wavelength data sets. In particular, the code appears ideally suited (but not limited) to fit a combination of Spitzer, Herschel, ALMA, and JVLA data set, and/or in the presence of constraining upper limits in the data even in the case of a single unresolved source. ACKNOWLEDGEMENTS GD would like to thank Steven Murray for insightful discussions on Bayesian inference and the Curtin PyClub for useful programming tips. We also thank Matt Lehnert, Carlos De Breuck, Joel Vernet and Nick Seymour for advice and help in the conception of this code. GD acknowledges the ESO Scientific Visitor Programme in supporting the development of this code. Footnotes 1 See also http://www.sedfitting.org/Fitting.html for a more exhaustive listing. 2 https://github.com/gdrouart/MrMoose 3 http://ulisse.pd.astro.it/Astro/ADPS/ 4 Two functions in the mm_utilities.py can be found to generate these gate filters. REFERENCES Asensio R. A. , Ramos A. C. , 2009 , ApJ , 696 , 2075 https://doi.org/10.1088/0004-637X/696/2/2075 Crossref Search ADS Astropy C ollaboration et al. , 2013 , A&A , 558 , A33 Crossref Search ADS Bowley A. L. , (Arthur Lyon) S. , 1920 , An elementary manual of statistics , 3rd edn. P. S. King & Son , https://doi.org/10.1017/S0020268100032121 Google Preview WorldCat COPAC Bruzual G. , Charlot S. , 2003 , MNRAS , 344 , 1000 https://doi.org/10.1046/j.1365-8711.2003.06897.x Crossref Search ADS Calistro R. G. , Lusso E. , Hennawi J. F. , Hogg D. W. , 2016 , ApJ , 833 , 98 https://doi.org/10.3847/1538-4357/833/1/98 Crossref Search ADS Conroy C. , 2013 , ARA&A , 51 , 393 Crossref Search ADS da Cunha E. , Charlot S. , Elbaz D. , 2008 , MNRAS , 388 , 1595 https://doi.org/10.1111/j.1365-2966.2008.13535.x Crossref Search ADS Falkendal T. et al. , 2018 , A&A , submitted Fanaroff B. L. , Riley J. M. , 1974 , MNRAS , 167, 31p https://doi.org/10.1093/mnras/167.1.31P Fioc M. , Rocca-Volmerange B. , 1997 , A&A , 326 , 950 Fiorucci M. , Munari U. , 2003 , A&A , 401 , 781 Crossref Search ADS Foreman-Mackey D. , Hogg D. W. , Lang D. , Goodman J. , 2013 , PASP , 125 , 306 https://doi.org/10.1086/670067 Crossref Search ADS Fritz J. , Franceschini A. , Hatziminaoglou E. , 2006 , MNRAS , 366 , 767 https://doi.org/10.1111/j.1365-2966.2006.09866.x Crossref Search ADS Gilli R. et al. , 2014 , A&A , 562 , A67 Crossref Search ADS Goodman J. , Weare J. , 2010 , Commun. Appl. Math. Comput. Sci. , 5 , 65 Crossref Search ADS Groves B. A. , Dopita M. A. , Sutherland R. S. , 2004 , ApJS , 153 , 9 https://doi.org/10.1086/421113 Crossref Search ADS Gullberg B. et al. , 2016 , A&A , 586 , A124 Crossref Search ADS Harwood J. J. , Hardcastle M. J. , Croston J. H. , Goodger J. L. , 2013 , MNRAS , 435 , 3353 https://doi.org/10.1093/mnras/stt1526 Crossref Search ADS Hönig S. F. , Beckert T. , Ohnaka K. , Weigelt G. , 2006 , A&A , 452 , 459 Crossref Search ADS Leitherer C. et al. , 1999 , ApJS , 123 , 3 https://doi.org/10.1086/313233 Crossref Search ADS MacKenzie T. P. , Scott D. , Swinbank M. , 2016 , MNRAS , 463 , 10 https://doi.org/10.1093/mnras/stw1890 Crossref Search ADS Maraston C. , 1998 , MNRAS , 300 , 872 https://doi.org/10.1111/j.1365-8711.1998.01947.x Crossref Search ADS Nenkova M. , Sirocky M. M. , Ivezić Ž. , Elitzur M. , 2008 , ApJ , 685 , 147 https://doi.org/10.1086/590482 Crossref Search ADS Noll S. , Burgarella D. , Giovannoli E. , Buat V. , Marcillac D. , Muñoz-Mateos J. C. , 2009 , A&A , 507 , 1793 Crossref Search ADS Pier E. A. , Krolik J. H. , 1992 , ApJ , 401 , 99 https://doi.org/10.1086/172042 Crossref Search ADS Pilbratt G. L. et al. , 2010 , A&A , 518 , L1 Crossref Search ADS Sawicki M. , 2012 , PASP , 124 , 1208 https://doi.org/10.1086/668636 Crossref Search ADS Sharma S. , 2017 , ARA&A , 55 , 213S Siebenmorgen R. , Heymann F. , Efstathiou A. , 2015 , A&A , 583 , A120 Crossref Search ADS Siringo G. et al. , 2009 , A&A , 497 , 945 Crossref Search ADS Stalevski M. , Fritz J. , Baes M. , Nakos T. , Popović L. Č. , 2012 , MNRAS , 420 , 2756 https://doi.org/10.1111/j.1365-2966.2011.19775.x Crossref Search ADS Tinsley B. M. , 1968 , ApJ , 151 , 547 https://doi.org/10.1086/149455 Crossref Search ADS Wells D. C. , Greisen E. W. , Harten R. H. , 1981 , A&AS , 44 , 363 Werner M. W. et al. , 2004 , ApJS , 154 , 1 https://doi.org/10.1086/422992 Crossref Search ADS LIST OF PACKAGES USED IN Mr-Moose AND REFERENCES The list of packages, their version and their respective reference. astropy v2.0rc1, http://www.astropy.org, Astropy Collaboration et al. (2013) corner v2.0.1, https://pypi.python.org/pypi/corner emcee v2.2.1, http://dan.iel.fm/emcee/current/#, Foreman-Mackey et al. (2013) guppy v0.1.10, https://pypi.python.org/pypi/guppy/ matplotlib v1.5.1, http://matplotlib.org/ numpy v1.9.1, http://www.numpy.org/ pathos v0.2.0, https://pypi.python.org/pypi/pathos/0.2.0 pycallgraph v1.0.1, http://pycallgraph.slowchop.com/en/master/#, PyYAML v3.12, http://pyyaml.org/ tqdm v4.8.4, https://github.com/tqdm/tqdm scipy v0.14.0, https://www.scipy.org/ INPUT FILES FOR BOTH EXAMPLES. © 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) TI - Mr-Moose: an advanced SED-fitting tool for heterogeneous multi-wavelength data sets JO - Monthly Notices of the Royal Astronomical Society DO - 10.1093/mnras/sty831 DA - 2018-07-11 UR - https://www.deepdyve.com/lp/oxford-university-press/mr-moose-an-advanced-sed-fitting-tool-for-heterogeneous-multi-lIYNCItJMo SP - 4981 VL - 477 IS - 4 DP - DeepDyve ER -