TY - JOUR AU - Lee, K., J. AB - Abstract We develop two algorithms, based on maximum likelihood inference, for estimating the parameters of polarized radio sources which emit at a single rotation measure (RM), e.g. pulsars. These algorithms incorporate the flux density spectrum of the source, either a power law or a scaled version of the Stokes I spectrum, and a variation in sensitivity across the observing band. We quantify the detection significance and measurement uncertainties in the fitted parameters, and we derive weighted versions of the RM synthesis algorithm which, under certain conditions, maximize the likelihood. We use Monte Carlo simulations to compare injected and recovered source parameters for a range of signal-to-noise ratios, investigate the quality of standard methods for estimating measurement uncertainties and search for statistical biases. These simulations consider one frequency band each for the Australia Telescope Compact Array, the Square Kilometre Array (SKA) and the Low Frequency Array. We find that results obtained for one frequency band cannot be easily generalized, and that methods that were developed in the past for correcting bias in individual frequency channels do not apply to wide-band data sets. The standard method for estimating the measurement uncertainty in RM is not accurate for sources with non-zero spectral indices. Furthermore, dividing Stokes Q and U by Stokes I to correct for spectral index effects, in combination with RM synthesis, does not maximize the likelihood. polarization, methods: analytical, methods: data analysis, methods: numerical, methods: statistical 1 INTRODUCTION Faraday rotation of polarized radio waves provides us with an important tool for studying magnetic fields in ionized gas, both locally in the Milky Way (e.g. Gardner & Whiteoak 1963; Seielstad, Morris & Radhakrishnan 1964) and out to very high redshifts (e.g. Carilli, Owen & Harris 1994). When astrophysical Faraday rotation was first discovered by Cooper & Price (1962), and in the following decades, measuring Faraday rotation required re-observing a source at a number of different radio frequencies. With the advent of broad-band radio receivers it became possible to measure Faraday rotation with a single observation. Nowadays the two most popular methods for extracting information from such observations are rotation measure (RM) synthesis (Brentjens & de Bruyn 2005) and fitting models to measurements of Stokes Q and U as a function of frequency (‘QU fitting’; e.g. Farnsworth, Rudnick & Brown 2011; O'Sullivan et al. 2012). Because measuring Faraday rotation requires studying the same source over a wide range of frequencies, two effects can complicate the interpretation of the data: radio sources generally have a non-zero flux density spectral index, and the sensitivity of the observations can vary between frequencies. Such a variation in the sensitivity of the observations can arise, for example, because the system temperature varies across the observing band or because strong radio frequency interference makes data points unusable. Several authors have tried to mitigate the first effect by dividing the observed Stokes Q and U flux densities by the observed Stokes I flux density and the second effect by using one over the measured noise variance of each frequency channel as statistical weights. As far as we know, no mathematical proof for the latter method being correct has been published previously. Fitting the Stokes Q and U frequency spectra by maximizing the likelihood can solve both problems: one can fit for the polarized flux density spectrum of the source and use measurements of the noise variances in Stokes Q and U to incorporate a variation in sensitivity at different frequencies. In this paper, we will use semi-analytical techniques to determine the polarization properties of simple sources that emit at only one RM, \begin{eqnarray} \mathrm{RM}\, (\mathrm{rad\ m}^{-2})\ \approx \ 0.81 \int _\mathrm{source}^\mathrm{observer} n_\mathrm{e} B_\Vert \mathrm{d}l\,. \end{eqnarray} (1) Here ne is the free electron density (cm−3), |$B_\Vert$| the length of the magnetic field vector projected along the line of sight (μG) and dl an infinitesimal distance interval along the line of sight from the source to the observer (pc). We will refer to the integral in equation (1) as the RM of the emission, instead of the Faraday depth of the emission. In Appendix A, we argue why we prefer this nomenclature. In our paper, we will quantify the measurement uncertainties of the parameters that describe the radio signal and its detection significance. In two cases, we were able to derive analytical expressions for maximizing the likelihood, which greatly reduces the dimensionality of the search grid, thereby speeding up data processing. These analytical expressions shed light on the connection between the techniques of QU fitting and RM synthesis, and can be used as simple and fast tests for determining whether a source emits at a single RM. If the source fails this test, a more complex model might be required to describe the source. This is not the first time maximum likelihood (ML) techniques have been used to determine the (polarization) properties of radio sources: O'Sullivan et al. (2012), Scaife & Heald (2012) and Wehus, Fuskeland & Eriksen (2013) used brute force methods to search through parameter space, while Bell et al. (2013) used ML inference to improve the cleaning of RM spectra (see also section 2.2 in Sun et al. 2015). Montier et al. (2015b) discuss additional methods for estimating parameters of polarized radio sources whose emission is not affected by Faraday rotation. Finally, this paper provides background information on the algorithm that we used to determine the polarization properties of radio pulsars in the Galactic Centre (Schnitzeler et al. 2016, ‘S16’). This paper is structured as follows. In Section 2, we derive ML estimators for a radio source whose polarized flux density spectrum is either a power law or a scaled version of the Stokes I spectrum. We also derive a weighted form of RM synthesis that maximizes the likelihood for weakly polarized sources, and we show that using ratios of measured flux densities Q/I and U/I in RM synthesis does not maximize the likelihood. In Section 3, we use Monte Carlo simulations to investigate parameter distributions in the presence of noise, we test the accuracy of standard methods for calculating measurement uncertainties and we look for potential bias in the estimators we derived. Throughout our paper we will use |$\boldsymbol {L}\, =\, Q+{\rm i}\,U$| to indicate the linear polarization vector, as this variable name is commonly used instead of |$\boldsymbol {P}$| in pulsar observations. Furthermore, we will ignore the fact that frequency channels have a finite width, and that channel weighting functions should be included when considering sources with large |RM| (as we showed in Schnitzeler & Lee 2015) or with large variations in flux density across the frequency channels. 2 PARAMETER ESTIMATION The problem that we aim to solve can be summarized as follows. Given measurements of the Stokes parameters Q and U and their noise variances at various frequencies, find the parameters that describe the polarized radio signal: the intrinsic Stokes parameters Qref and Uref at a specific reference frequency, the shape of the polarized flux density spectrum and the RM of the signal. We will indicate the measured Stokes Q and U flux densities with Qobs and Uobs, and we will assume that measurements are available for Nch mutually independent frequency channels.1 Furthermore, we assume that there are no offsets in Q and U across the frequency band, and that the noise in each of the channels follows a Gaussian distribution with variance |$\sigma _{Q,i}^2$| and |$\sigma _{U,i}^2$| for Stokes Q and U, where i is the channel index. Different frequency channels can have different noise variances, and the noise variances in Stokes Q and U do not have to be the same. We include a factor η which corrects for a possible error in the scale of σQ,i and σU,i. The log likelihood for this observing setup is given by \begin{eqnarray} \log {\Lambda } = -\frac{1}{2}\sum _{i=1}^{N_\mathrm{ch}} \left[\left(\frac{Q_{\mathrm{mod},i}\,c_i - U_{\mathrm{mod},i}\,s_i - Q_{\mathrm{obs},i}}{\eta \,\sigma _{Q,i}}\right)^2 \right. \nonumber \\ &&+ \left.\left(\frac{Q_{\mathrm{mod},i}\,s_i + U_{\mathrm{mod},i}\,c_i - U_{\mathrm{obs},i}}{\eta \,\sigma _{U,i}}\right)^2 \right] \nonumber \\ &&- \sum _{i=1}^{N_\mathrm{ch}} [\log (\sigma _{Q,i}) + \log (\sigma _{U,i})] \nonumber \\ && -N_\mathrm{ch}[\log (2\pi)+2\log (\eta)]. \end{eqnarray} (2)Qmod,i and Umod,i describe the emitted (intrinsic) polarized flux density spectrum of the source. Faraday rotation of this signal is described by ci and si, which are shorthand for cos (2RMλ2) and sin (2RMλ2), respectively. λ2 is the mean wavelength squared of a frequency channel. When searching for sources with large positive or negative RMs or with steep flux density spectra, the four possible combinations of (Qmod,i, Umod,i) × (ci, si) in equation (2) should be replaced with their channel-averaged values (see also Schnitzeler & Lee 2015). Equation (2) requires solving for the Q and U flux density of each frequency channel, and for RM and η, which means that this system of equations is underdetermined. To reduce the dimensionality of the parameter space we consider two special cases: sources with a power-law polarized flux density spectrum (Section 2.1), and sources for which the polarized flux density spectrum is a scaled version of the Stokes I spectrum (Section 2.2). In Section 2.3, we derive expressions for RM synthesis that allow for a variation in the polarized flux density spectrum of the source and for a variation in sensitivity across the observing band, and we show under which conditions RM synthesis maximizes the likelihood. Finally, in Section 2.4, we discuss how the likelihood can be used to quantify the significance of the detection, i.e. whether the signal can be explained as purely due to noise. Correcting for spectral index effects by dividing Stokes Q and U by Stokes I requires (amongst others) that linear polarization and Stokes I are produced in the same part of the source, which is not guaranteed to be the case. For example, when studying active galactic nuclei the core can dominate the emission in Stokes I but show very little linear polarization (e.g. in gigahertz-peaked spectrum sources, O'Dea 1998). In such cases it makes little sense to divide Stokes Q and U by Stokes I. 2.1 Power-law polarized flux density spectrum In this case, the polarized flux density spectrum is a simple power law, and the linear Stokes parameters can be written as Qmod,i = Qref(νi/νref)α and Umod,i = Uref(νi/νref)α. The ML estimators for Qref, Uref and η, are found by taking the derivative of the log likelihood in equation (2) with respect to these parameters and setting the result equal to zero. We will indicate ML estimators by using hats, for example in |$\hat{Q}_\mathrm{ref}$| and |$\hat{U}_\mathrm{ref}$|⁠. The equations that maximize the likelihood for Stokes Qref and Uref can be written in matrix form as \begin{eqnarray} && { {\left(\begin{array}{cc}a_{1,1} & a_{1,2} \\ a_{1,2} & a_{2,2}\\ \end{array}\right)} {\left(\begin{array}{cc}\hat{Q}_\mathrm{ref} \\ \hat{U}_\mathrm{ref}\\ \end{array}\right)} = } \nonumber \\ & & \sum _{i=1}^{N_\mathrm{ch}} \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{\alpha } {\left(\begin{array}{cc}c_i & s_i \\ -s_i & c_i\\ \end{array}\right)} {\left(\begin{array}{cc}Q_{\mathrm{obs},i}/\sigma _{Q,i}^2 \\ U_{\mathrm{obs},i}/\sigma _{U,i}^2 \\ \end{array}\right)}, \end{eqnarray} (3) where \begin{eqnarray*} a_{1,1} & = & \sum _{i=1}^{N_\mathrm{ch}} \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{2\alpha }\left(\frac{c_i^2}{\sigma _{Q,i}^2} + \frac{s_i^2}{\sigma _{U,i}^2}\right)\!\!, \nonumber \\ a_{1,2} & = & \sum _{i=1}^{N_\mathrm{ch}} \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{2\alpha }\left(-\frac{c_i\,s_i}{\sigma _{Q,i}^2} + \frac{s_i\,c_i}{\sigma _{U,i}^2}\right)\!\!, \nonumber \\ a_{2,2} & = & \sum _{i=1}^{N_\mathrm{ch}} \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{2\alpha }\left(\frac{s_i^2}{\sigma _{Q,i}^2} + \frac{c_i^2}{\sigma _{U,i}^2}\right)\!\!. \nonumber \nonumber \end{eqnarray*} The ML estimators for Qref and Uref can then be found by matrix inversion, and the square of the ML estimator for η is \begin{eqnarray} (\hat{\eta})^2 = \frac{1}{2N_\mathrm{ch}}\sum _{i=1}^{N_\mathrm{ch}} \Bigg[\left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{2\alpha } \left(\frac{\hat{Q}_\mathrm{ref}\,c_i - \hat{U}_\mathrm{ref}\,s_i - Q_{\mathrm{obs},i}}{\sigma _{Q,i}}\right)^2 \nonumber \\ && + \left(\frac{\hat{Q}_\mathrm{ref}\,s_i + \hat{U}_\mathrm{ref}\,c_i - U_{\mathrm{obs},i}}{\sigma _{U,i}}\right)^2 \Bigg]\!\!. \end{eqnarray} (4) Because the equations for |$\hat{Q}_\mathrm{ref}, \hat{U}_\mathrm{ref}$| and |$\hat{\eta }$| can be written in terms of α and RM, maximizing the likelihood requires searching through only a 2D grid (RM, α) instead of the 5D grid (RM, α, Qref, Uref and η). 2.2 Stokes L is a scaled version of Stokes I If the polarized flux density spectrum is a scaled version of the Stokes I spectrum (which is usually the case in simple polarized sources) then Qmod,i = q Imod,i and Umod,i = u Imod,i; q and u describe the intrinsic linear polarization properties of the source, and do not depend on frequency. If instead we make the assumption that the measured flux densities in Stokes Q and U are proportional to Stokes I then this implies also a correlation between the noise contributions to these Stokes parameters, which is contrary to our assumptions. Assuming Gaussian noise in Stokes I with a variance |$\sigma _{I,i}^2$|⁠, independent frequency channels, and no correlation between the Stokes parameters, including the observed Stokes I spectrum in the log likelihood adds an extra term \begin{eqnarray} \!-\frac{1}{2}\sum _{i=1}^{N_\mathrm{ch}}\left(\frac{I_{\mathrm{mod},i}-I_{\mathrm{obs},i}}{\sigma _{I,i}}\right)^2 - \sum _{i=1}^{N_\mathrm{ch}}\log {\sigma _{I,i}} - \frac{N_\mathrm{ch}}{2}\log {(2\pi)} \end{eqnarray} (5) to equation (2). To simplify our analysis we will assume that the measured noise variances in Stokes I are exact, and do not require a scale factor ηI. Since we are searching for ways to assign weights to individual frequency channels in RM synthesis based on the noise variance of each channel and the source spectral index, we will only consider algorithms that search through a 1D grid of trial RM values, similar to RM synthesis. In Appendix B, we derive the equations for the ML estimators of q and u. These equations involve polynomials of degree six or seven, depending on how the ‘nuisance parameters’ Imod,i are removed. There are no general solutions for such polynomials with arbitrary coefficients (based on the Abel–Ruffini theorem, Jacobson 2009), and numerical methods have to be used to maximize the likelihood. Furthermore, sums over all frequency channels of the ratios Qobs,i/Iobs,i and Uobs,i/Iobs,i do not occur in the equations for finding the ML estimators of q and u. Therefore, using these flux density ratios in RM synthesis to correct for spectral index effects does not lead to the ML estimators for q and u, which is undesirable. This result can be understood intuitively. Plotting Qobs,i and Uobs,i in the complex plane will show a scatter plot arranged along a segment of a spiral that can be extrapolated back to the origin (if α = 0 all data points scatter along a circle segment, and if RM = 0 the spiral becomes a radial line). Now consider plotting Qobs,i/Iobs,i and Uobs,i/Iobs,i in the complex plane, and compare the two figures. If the signal-to-noise level of the data is poor, dividing Qobs,i and Uobs,i by Iobs,i means dividing two noisy quantities by another noisy quantity. The noise in Iobs,i reshuffles many data points over the four quadrants, and one cannot hope to recover the correct (ML) estimators for q and u in this case. Perhaps at a sufficiently high signal-to-noise level (small scatter in the complex plane) applying RM synthesis to Qobs,i/Iobs,i and Uobs,i/Iobs,i can approximate maximizing the likelihood, but we did not explore this possibility further. Fortunately, under certain conditions the log likelihood we derived after marginalizing over Imod,i (equation B12) can be simplified to \begin{eqnarray} \log {\Lambda ^{\prime }} = -\frac{1}{2}\sum _{i=1}^{N_\mathrm{ch}} \left[\left(\frac{Q_{\mathrm{obs},i} - \alpha _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 \right. \nonumber \\ && \left. + \left(\frac{U_{\mathrm{obs},i} - \beta _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2\right. \nonumber \\ && \left. + \left(\frac{\beta _i\,Q_{\mathrm{obs},i} - \alpha _i\,U_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 \right] \nonumber \\ && -\, 2\sum _{i=1}^{N_\mathrm{ch}} \log (\sigma _{L,i}) -N_\mathrm{ch}[\log (2\pi)+2\log (\eta)] \end{eqnarray} (6) [see appendix C for the derivation; αi and βi are defined in equation (B12)]. This equation is valid if in addition to L ∝ I also the noise variances in Stokes Q and U are equal (⁠|$\sigma _{L,i}^2 \equiv \sigma _{Q,i}^2 = \sigma _{U,i}^2$|⁠), the source is weakly polarized (Lmod ≪ Imod) and σI,i ≈ η σL,i. The only free parameters in equation (6) are q, u, RM and η, RM being the only non-linear parameter. The square of the ML estimator for η is \begin{eqnarray} (\hat{\eta })^2 = \frac{1}{2N_\mathrm{ch}}\sum _{i=1}^{N_\mathrm{ch}} \frac{1}{\sigma _{L,i}^2} \times \left[(Q_{\mathrm{obs},i} - (c_i \hat{q} - s_i \hat{u})\,I_{\mathrm{obs},i})^2 \right. \nonumber \\ && \left. +\, (U_{\mathrm{obs},i} - (s_i \hat{q} + c_i \hat{u})\,I_{\mathrm{obs},i} )^2 \right. \nonumber \\ && \left. +\, (\hat{q}(s_i\,Q_{\mathrm{obs},i} - c_i\,U_{\mathrm{obs},i}) + \hat{u}(c_i\,Q_{\mathrm{obs},i} + s_i\,U_{\mathrm{obs},i}))^2 \right]\!. \nonumber \\ \end{eqnarray} (7) In this case, the expressions for the ML estimators for q, u and η can be written as a function of RM, which means that the (slow) 2D grid search from Section 2.1 is reduced to a much faster 1D grid search over RM. This is where our ML-based method starts to show similarities with RM synthesis, and we will discuss this in more detail in Section 2.3. 2.3 ML optimization and RM synthesis 2.3.1 Power-law polarized flux density spectrum If the source has a power-law flux density spectrum, all noise variances in Q and U are equal (⁠|$\sigma _{Q,i}^2 = \sigma _{U,i}^2\equiv \sigma _{L,i}^2$|⁠), and one searches the (RM,α) grid only along the RM axis at α = 0 then \begin{eqnarray} {\left(\begin{array}{cc}\hat{Q}_\mathrm{ref} \\ \hat{U}_\mathrm{ref} \\ \end{array}\right)} = \sum _{i=1}^{N_\mathrm{ch}}\frac{1}{\sigma _{L,i}^2} {\left(\begin{array}{cc}c_i & s_i \\ -s_i & c_i\\ \end{array}\right)} {\left(\begin{array}{cc}Q_{\mathrm{obs},i} \\ U_{\mathrm{obs},i} \\ \end{array}\right)} \Bigg/\sum _{i=1}^{N_\mathrm{ch}}\frac{1}{\sigma _{L,i}^2}\,. \end{eqnarray} (8) In particular, if the noise variances do not depend on frequency \begin{eqnarray} {\left(\begin{array}{cc}\hat{Q}_\mathrm{ref} \\ \hat{U}_\mathrm{ref} \\ \end{array}\right)} = \frac{1}{N_\mathrm{ch}}\sum _{i=1}^{N_\mathrm{ch}} {\left(\begin{array}{cc}c_i & s_i \\ -s_i & c_i\\ \end{array}\right)} {\left(\begin{array}{cc}Q_{\mathrm{obs},i} \\ U_{\mathrm{obs},i} \\ \end{array}\right)}\,. \end{eqnarray} (9) The right-hand side of equation (8) expresses how the observed polarization vectors (Qobs,i, Uobs,i) are derotated and summed, using the noise variances as weights. If all noise variances are equal one recognizes the right-hand side of equation (9) as the net polarization vector that is calculated using RM synthesis. Summarizing, if the source has a power-law spectrum with a spectral index of zero, RM synthesis produces the ML estimators of Qref and Uref; if the noise variances vary across the frequency band then equation (8) can be used to include this variation to calculate the ML estimators for Qref and Uref. If RM synthesis [or its weighted form: equation (8)] is applied to a source with a non-zero spectral index then this can affect the derived Qref and Uref, the significance of the detection, and the confidence intervals in the source parameters. For example, based on our polarization observations of PSR J1746−2856 with the Australia Telescope Compact Array (ATCA) between 4.5 and 6.5 GHz (shown in fig. 2 from S16), we concluded that this pulsar has a polarization spectral index that is significantly different from zero (table 1 from S16). These observations are sensitive enough that the contour levels in the (RM,α) grid are compact, and the log likelihood reaches its maximum far from the line α = 0 in the (RM, α) grid (Fig. 1). If the source is detected at a sufficiently high signal-to-noise level then the axes of the contour ellipses in the log likelihood landscape tend to line up with the (RM,α) axes of the coordinate grid. Under these conditions it is possible that the correct RM and its associated measurement uncertainty can be found by applying RM synthesis even if the source has a non-zero spectral index. Figure 1. Open in new tabDownload slide Log likelihood contours for the observations of PSR J1746−2856 reported in S16. The red, blue and green contours indicate the 68, 95 and 99.7 per cent confidence intervals for one source parameter, respectively. The grid that we show in this figure is intended to help guide the eye; the grid we actually used for calculating the contour levels is much finer. The axis ratio of the contour ellipses depends on the frequency coverage of the observations. Figure 1. Open in new tabDownload slide Log likelihood contours for the observations of PSR J1746−2856 reported in S16. The red, blue and green contours indicate the 68, 95 and 99.7 per cent confidence intervals for one source parameter, respectively. The grid that we show in this figure is intended to help guide the eye; the grid we actually used for calculating the contour levels is much finer. The axis ratio of the contour ellipses depends on the frequency coverage of the observations. 2.3.2 Stokes L is a scaled version of Stokes I We find the ML estimators for q and u by taking the derivative of equation (6) with respect to these parameters and setting the resulting expressions equal to zero. Re-arranging the expressions for ∂log Λ΄/∂{q, u} gives the following equation: \begin{eqnarray} && { {\left(\begin{array}{cc}k_{1,1} & k_{1,2} \\ k_{1,2} & k_{2,2} \\ \end{array}\right)} {\left(\begin{array}{cc}\hat{q} \\ \hat{u} \\ \end{array}\right)} } \nonumber \\ && =\sum _{i=1}^{N_\mathrm{ch}}\frac{I^2_{\mathrm{obs},i}}{\sigma _{L,i}^2} {\left(\begin{array}{cc}c_i & s_i \\ -s_i & c_i\\ \end{array}\right)} {\left(\begin{array}{cc}Q_{\mathrm{obs},i}/I_{\mathrm{obs},i} \\ U_{\mathrm{obs},i}/I_{\mathrm{obs},i} \\ \end{array}\right)} \Big/\sum _{i=1}^{N_\mathrm{ch}}\frac{I^2_{\mathrm{obs},i}}{\sigma _{L,i}^2}. \end{eqnarray} (10) The matrix on the left-hand side of equation (10) has the following elements: \begin{eqnarray} k_{1,1} = 1+ \sum _{i=1}^{N_\mathrm{ch}} \frac{(Q_{\mathrm{obs},i}\, s_i - U_{\mathrm{obs},i}\, c_i)^2}{\sigma _{L,i}^2} \Bigg/\sum _{i=1}^{N_\mathrm{ch}}\frac{I^2_{\mathrm{obs},i}}{\sigma _{L,i}^2}, \end{eqnarray} (11) \begin{eqnarray} k_{1,2} =\sum _{i=1}^{N_\mathrm{ch}} \frac{(Q_{\mathrm{obs},i}c_i + U_{\mathrm{obs},i}s_i)(Q_{\mathrm{obs},i}s_i - U_{\mathrm{obs},i}c_i)}{\sigma _{L,i}^2} \Bigg/\sum _{i=1}^{N_\mathrm{ch}}\frac{I^2_{\mathrm{obs},i}}{\sigma _{L,i}^2}, \nonumber \\ \end{eqnarray} (12) \begin{eqnarray} k_{2,2} = 1+ \sum _{i=1}^{N_\mathrm{ch}} \frac{(Q_{\mathrm{obs},i}\, c_i + U_{\mathrm{obs},i}\, s_i)^2}{\sigma _{L,i}^2} \Bigg/\sum _{i=1}^{N_\mathrm{ch}}\frac{I^2_{\mathrm{obs},i}}{\sigma _{L,i}^2}. \end{eqnarray} (13) The right-hand side of equation (10) expresses how the measured polarization vectors (Qobs,i, Uobs,i) are derotated and summed, similar to equation (8) but with the weights 1/|$\sigma _{L,i}^2$| replaced with |$I^2_{\mathrm{obs},i}/\sigma _{L,i}^2$|⁠. If all noise variances and all Iobs,i are equal one recognizes the right-hand side of equation (10) as the net polarization vector that is calculated with RM synthesis. As far as we know, this is the first time that an expression has been derived algebraically for RM synthesis that simultaneously allows for a variation in the polarized flux density spectrum of the source and for a variation in the sensitivity across the frequency band. Inserting the solutions for |$\hat{q}$| and |$\hat{u}$| from equation (10) back into equations (B13) and (B14) shows that these |$\hat{q}$| and |$\hat{u}$| are not roots of the equations, and therefore do not maximize the likelihood. We did not investigate further how large the polarization fraction can be for equation (10) to be a useful approximation. One can show that if the signal-to-noise ratio in all Stokes parameters tends to infinity then the |$\hat{q}$| and |$\hat{u}$| from equation (10) are equal to the q and u of the injected signal, independent of the RM and polarization fraction of the source (polarization fractions of up to one are allowed). Numerical methods can be used to find the maximum in the log likelihood landscape, instead of relying on the approximations that we used to derive equation (10). This approach has the additional advantage of being able to handle also more complex situations, e.g. variations of the noise variances across the observing band. 2.4 Significance of the detection Detection statistics in the RM domain have been investigated previously by George, Stil & Keller (2012), Hales et al. (2012) and Macquart et al. (2012). Contrary to previous studies, we use the log likelihood also to calculate the detection significance. If log Λ0 is the log likelihood equivalent of equation (2), (B12) or (6), if there is no signal but only noise, then 2log (Λ/Λ0) follows a χ2 distribution with a number of degrees of freedom equal to the number of parameters in Λ minus the number of parameters in Λ0 (Wilks 1938). For this theorem to be true Nch ≫ 1, which is satisfied by wide-band receivers. The connection between the log likelihood ratio and the χ2 distribution makes it possible to assign to any signal a probability that a measurement is produced purely by noise. This method for calculating the detection significance from data in the frequency domain instead of the RM domain is both easier and more reliable, because data in different frequency channels are independent and because the noise properties of the individual channels are understood well. As a caveat, the log likelihood ratio can deviate from a χ2 distribution, as pointed out by, e.g. Chernoff (1954) and Protassov et al. (2002). We did not investigate this further in our paper. 3 POWER-LAW POLARIZED FLUX DENSITY SPECTRA: MEASUREMENT UNCERTAINTY AND BIAS In this section, we will investigate the distribution of the ML estimators for Stokes Qref and Uref, RM and α using Monte Carlo simulations of the noise. We use standard expressions to calculate the polarized flux density and intrinsic polarization angle of the emission from the ML estimators for Qref and Uref: \begin{eqnarray} L_\mathrm{ref} & = & \sqrt{\hat{Q}_\mathrm{ref}^2+\hat{U}_\mathrm{ref}^2}, \end{eqnarray} (14) \begin{eqnarray} \chi _\mathrm{ref} & = & \frac{1}{2}\mathrm{atan}\left(\frac{\hat{U}_\mathrm{ref}}{\hat{Q}_\mathrm{ref}}\right)\!\!, \end{eqnarray} (15) where χref is calculated taking into account which quadrant in the Q, U plane the polarization vector Qref + i Uref is in. Note that Lref and χref themselves are not ML estimators. The subscript ‘ref’ in ‘χref’ might be somewhat confusing. To clarify, χref is the intrinsic polarization angle, i.e. after correcting for Faraday rotation; ‘ref’ refers to the reference frequency for the power-law flux density spectrum (Section 2.1). In this section, we will test how accurate standard methods for estimating the measurement uncertainties are, and we will look for statistical bias.2 Although our focus will be on observations between 4.5 and 6.5 GHz, which we used in S16, we will also investigate two additional frequency bands using Monte Carlo simulations (Table 1). First we explain how we estimate the measurement uncertainties (Section 3.1), then we explain how we set up the Monte Carlo simulations (Section 3.2). In subsequent sections we analyse the influence of noise on the parameters we are interested in. We summarize our main results in Section 3.7. Table 1. Overview of our Monte Carlo simulations. The injected signal has an RM of 0 rad m−2, an intrinsic polarization angle χ0 = 0|$^\circ$|⁠, and a spectral index of 0, −2 or 2 (‘a’/‘d’, ‘b’ and ‘c’ runs, respectively). ‘Telescope’ lists an existing or planned telescope where this frequency coverage is available. ‘FWHM RMSF’ indicates the full width at half-maximum of the RM spread function. RMmax is defined as the |RM| where the recovered polarized flux density of a source has dropped to half the emitted flux density; see also footnote 1 in S16. To calculate this quantity we used Schnitzeler & Lee (2015) to simulate frequency channels with a finite width and uniform channel response function. . Telescope . Frequency . Channel . Channel . FWHM . RMmax . RM search . . . range . number . width . RMSF . . range . . . (MHz) . . (MHz) . (rad m−2) . (rad m−2) . (rad m−2) . Run 1a,b,c ATCAa 4473.5–6525.5 513 4 ≈1600 ≈4.0 × 105 ±8.0 × 104 Run 1d (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) ±2500 Run 2a,b,c SKA1-midb 350–1760 1410 1 5 ≈11.6 × 103 ±270 Run 3a,b,c LOFAR-HBAc 110–240 2600 0.05 0.65 ≈900 ±32 . Telescope . Frequency . Channel . Channel . FWHM . RMmax . RM search . . . range . number . width . RMSF . . range . . . (MHz) . . (MHz) . (rad m−2) . (rad m−2) . (rad m−2) . Run 1a,b,c ATCAa 4473.5–6525.5 513 4 ≈1600 ≈4.0 × 105 ±8.0 × 104 Run 1d (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) ±2500 Run 2a,b,c SKA1-midb 350–1760 1410 1 5 ≈11.6 × 103 ±270 Run 3a,b,c LOFAR-HBAc 110–240 2600 0.05 0.65 ≈900 ±32 aUsed in S16. bFrequency coverage for SKA1-mid band 1+2 (Dewdney et al. 2013). cvan Haarlem et al. (2013). Open in new tab Table 1. Overview of our Monte Carlo simulations. The injected signal has an RM of 0 rad m−2, an intrinsic polarization angle χ0 = 0|$^\circ$|⁠, and a spectral index of 0, −2 or 2 (‘a’/‘d’, ‘b’ and ‘c’ runs, respectively). ‘Telescope’ lists an existing or planned telescope where this frequency coverage is available. ‘FWHM RMSF’ indicates the full width at half-maximum of the RM spread function. RMmax is defined as the |RM| where the recovered polarized flux density of a source has dropped to half the emitted flux density; see also footnote 1 in S16. To calculate this quantity we used Schnitzeler & Lee (2015) to simulate frequency channels with a finite width and uniform channel response function. . Telescope . Frequency . Channel . Channel . FWHM . RMmax . RM search . . . range . number . width . RMSF . . range . . . (MHz) . . (MHz) . (rad m−2) . (rad m−2) . (rad m−2) . Run 1a,b,c ATCAa 4473.5–6525.5 513 4 ≈1600 ≈4.0 × 105 ±8.0 × 104 Run 1d (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) ±2500 Run 2a,b,c SKA1-midb 350–1760 1410 1 5 ≈11.6 × 103 ±270 Run 3a,b,c LOFAR-HBAc 110–240 2600 0.05 0.65 ≈900 ±32 . Telescope . Frequency . Channel . Channel . FWHM . RMmax . RM search . . . range . number . width . RMSF . . range . . . (MHz) . . (MHz) . (rad m−2) . (rad m−2) . (rad m−2) . Run 1a,b,c ATCAa 4473.5–6525.5 513 4 ≈1600 ≈4.0 × 105 ±8.0 × 104 Run 1d (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) (i.d.) ±2500 Run 2a,b,c SKA1-midb 350–1760 1410 1 5 ≈11.6 × 103 ±270 Run 3a,b,c LOFAR-HBAc 110–240 2600 0.05 0.65 ≈900 ±32 aUsed in S16. bFrequency coverage for SKA1-mid band 1+2 (Dewdney et al. 2013). cvan Haarlem et al. (2013). Open in new tab 3.1 Calculating measurement errors for individual observations If the signal-to-noise level is sufficiently high that confidence regions are simply connected then the 68 per cent confidence interval of each parameter can be found by determining where the log likelihood has decreased by 0.5 (Avni 1976; Cash 1976; Lampton, Margon & Bowyer 1976; Avni 1978). At low signal-to-noise levels the log likelihood landscape can show multiple peaks above the contour at max(log Λ)−0.5, in which case this condition is violated; this makes the method we describe for deriving measurement uncertainties based on the shape of the log likelihood landscape unreliable at low signal-to-noise levels. Close to the grid point with the global maximum in log Λ we fitted a 2D parabola to accurately determine the ML estimators for RM and α, and with these, |$\hat{Q}_\mathrm{ref}$|⁠, |$\hat{U}_\mathrm{ref}$| and |$\hat{\eta }$|⁠. To determine the uncertainties in RM and α we used the projection of the 2D ellipse where the likelihood has decreased by 0.5 on to the coordinate axes (e.g. fig. 15.6.4 in Press et al. 1992). Fig. 1 illustrates this for PSR J1746−2856; also other significance levels are shown. The ML estimators for Qref and Uref depend on RM and α, and we derived their measurement uncertainties by fitting a parabola to the log likelihood, varying only the parameter of interest and keeping the other parameters fixed at the values that maximize the likelihood. This is the method we used in S16, and will be using in the current paper. Alternatively, for a power-law polarized flux density spectrum we can calculate confidence intervals for |$\hat{Q}_\mathrm{ref}, \hat{U}_\mathrm{ref}$| and η more quickly from the covariance matrix |$\bf{C}$| that is the inverse of the Fisher information matrix, |$\bf{I}$|⁠. |$\bf{I}$| has elements \begin{eqnarray} \bf{I}_{i,j} = \Big <\frac{\mathrm{\partial} \log {\Lambda }}{\mathrm{\partial} \gamma _i}\,\frac{\mathrm{\partial} \log {\Lambda }}{\mathrm{\partial} \gamma _j}\Big >\!, \end{eqnarray} (16) where brackets indicate calculating the expectation value over the parameter space, using the likelihood as probability density function, and γi indicates either Qref, Uref or η. Since the deviations between the model and the observations in equation (2) are Gaussian, the matrix elements |$\bf{I}_{i,j}$| can be calculated using the Slepian–Bangs formula (Slepian 1954; Bangs 1971), which leads to \begin{equation} \bf{C} = {\left(\begin{array}{ccc}\kappa \boldsymbol {\delta _2}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _2} & -\kappa \boldsymbol {\delta _1}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _2} & 0 \\ -\kappa \boldsymbol {\delta _2}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _1} & \kappa \boldsymbol {\delta _1}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _1} & 0 \\ 0 & 0 & \eta ^4/N_\mathrm{ch} \\ \end{array}\right)}. \end{equation} (17) The vectors |$\boldsymbol {\delta _1}^\mathrm{T} = (c_1, \ldots , c_{N_\mathrm{ch}}, -s_1, \ldots , -s_{N_\mathrm{ch}})$| and |$\boldsymbol {\delta _2}^\mathrm{T} = (s_1, \ldots , s_{N_\mathrm{ch}}, c_1, \ldots , c_{N_\mathrm{ch}})$|⁠, the matrix \begin{equation} \bf{D} = \eta ^2 {\left(\begin{array}{cccccc}\sigma _{Q,1}^2 \\ & \ddots \\ & & \sigma _{Q,N_\mathrm{ch}}^2 \\ & & & \sigma _{U,1}^2 \\ & & & & \ddots \\ & & & & & \sigma _{U,N_\mathrm{ch}}^2 \\ \end{array}\right)} \end{equation} (18) (all off-diagonal elements being 0) and the scalar |$\kappa = 1/(\boldsymbol {\delta _1}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _1}\,\times \,\boldsymbol {\delta _2}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _2} - \boldsymbol {\delta _1}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _2}\,\times \,\boldsymbol {\delta _2}^\mathrm{T}\bf{D}^{-1} \boldsymbol {\delta _1})$|⁠. If the noise variances in Stokes Q and U are equal in all channels, but are allowed to vary between channels, then var|$(\hat{Q})$| = var|$(\hat{U}) = \eta ^2/\sum _{i=1}^{N_\mathrm{ch}} 1/\sigma _i^2$| and cov(⁠|$\hat{Q},\hat{U}$|⁠) = cov(⁠|$\hat{U},\hat{Q}$|⁠) = 0. Furthermore, if η = 1 and all channels have the same noise variance then var|$(\hat{Q})$| = var|$(\hat{U}) = \sigma ^2/N_\mathrm{ch}$|⁠. Starting by assuming a constant noise variance for Nch independent channels, Macquart et al. (2012) derived the same expression for the noise level in the amplitude of the RM spectrum. One can calculate the Fisher information also for equation (6), but this is very time consuming. In this case fitting parabolas to the log likelihood is an alternative, practical solution for estimating the measurement uncertainties in q, u and η. 3.2 Setup of the Monte Carlo simulations For our Monte Carlo simulations we choose the three frequency setups listed in Table 1. The sources that we simulate emit at an intrinsic polarization angle χ0 = 0|$^\circ$|⁠, an RM of 0 rad m−2, and have a polarized flux density spectral index of −2, 0 or 2. All sources produce the same frequency-band-averaged polarized flux density, ‘〈L〉Δν’, independent of their spectral index. To ensure this the polarized flux density at the reference frequency of the source, Lref, satisfies \begin{eqnarray} \langle L\rangle _{\Delta \nu } = L_\mathrm{ref}\frac{\nu _\mathrm{ref}}{\Delta \nu} \nonumber \\ && \left\lbrace \begin{array}{l@{\quad}l} \log {\left(\frac{\nu _2}{\nu _1}\right)} & \alpha = -1 \\ \frac{1}{\alpha +1}\left[\left(\frac{\nu _2}{\nu _\mathrm{ref}}\right)^{\alpha +1}-\left(\frac{\nu _1}{\nu _\mathrm{ref}}\right)^{\alpha +1}\right] & \alpha \ne -1. \end{array}\right. \end{eqnarray} (19) The observing band covers frequencies between ν1 and ν2, Δν = ν2 − ν1, for νref we use the median frequency of the band. All frequency channels are statistically independent and have Gaussian noise; the noise variances in Stokes Q and U are equal and constant. If all channels have the same noise variance then combining Nch frequency channels improves the signal-to-noise level of our observations by |$\sqrt{N_\mathrm{ch}}$| (Section 3.1). Therefore, to obtain signal-to-noise ratios in our Monte Carlo simulations of between zero and eight we set the standard deviation of the noise in each frequency channel equal to |$\sqrt{N_\mathrm{ch}}$|⁠. To each of the mock data sets that we generated this way we apply our method for finding the ML estimators of Qref, Uref, RM and α. The grid in (RM, α) is searched out to RMs of ±50 FWHM of the RM spread function and covers spectral indices between −6 and +2 (runs ‘a’,‘b’ or ‘d’) or between −12 and +12 (‘c’ runs); we also consider solutions to be valid if the ML estimators for RM or α fall within 1σ of these boundaries. The extent in α of the smaller grid reflects the range of spectral indices of known pulsars (Lorimer et al. 1995; Bates, Lorimer & Verbiest 2013); this range in α is wide enough that it also covers the spectral indices of most known active galactic nuclei. By using two different grid sizes we can test how important the choice for the size of the grid is; sources with a spectral index of +2 can only be investigated on the larger grid. On the smaller grid runs 1a,b,d are based on 104 realizations of the Monte Carlo simulation. All other runs, including all runs on the larger grid, are based on 2000 realizations. Unless mentioned explicitly, names of runs refer to simulations on the smaller grid. 3.3 A qualitative investigation of run 1a Fig. 2 shows the distributions of the ML estimators for Qref and Uref, RM and α for run 1a for various strengths of the injected signal. At low signal-to-noise ratios noise often produces a higher peak in log likelihood in the (RM, α) grid than the injected signal, and maximizing the log likelihood will then select the peak produced by noise (this also happens in RM synthesis: if the injected signal is weak compared to the noise level, selecting the highest polarized flux density in the RM spectrum misinterprets this peak as coming from the injected signal). This leads to broad distributions in the ML estimators for Qref, Uref, RM and α. The larger the (RM, α) grid, the higher the probability that noise will produce a higher log likelihood than the injected signal: we tested this with a simulation similar to run 1a except that we search α values up to ±12. The stronger the signal the more likely it is that the signal combined with noise produces the highest peak in log likelihood: this produces a transition between the ‘noise-dominated’ and ‘signal-dominated’ regimes. The latter occurs at an injected S/N of about 7 in Fig. 2, where the distributions of RM and α become more Gaussian and the Qref, Uref point cloud becomes distributed along the circle that indicates the signal-to-noise ratio of the injected signal. Figure 2. Open in new tabDownload slide Distribution of the ML estimators for Stokes Qref and Uref, RM and the spectral index α, for simulation 1a and injected signal strengths between 1 and 7 sigma (after combining all frequency channels). The radius of the circle in the Q, U diagrams is equal to the Lref/σ ratio of the injected signal. For a signal-to-noise ratio of 1 the distribution of RM is very broad, extending well beyond the horizontal range of the panel therefore the peak of this distribution is very low. Figure 2. Open in new tabDownload slide Distribution of the ML estimators for Stokes Qref and Uref, RM and the spectral index α, for simulation 1a and injected signal strengths between 1 and 7 sigma (after combining all frequency channels). The radius of the circle in the Q, U diagrams is equal to the Lref/σ ratio of the injected signal. For a signal-to-noise ratio of 1 the distribution of RM is very broad, extending well beyond the horizontal range of the panel therefore the peak of this distribution is very low. Fig. 2 shows asymmetries, most clearly in the distribution of α, which are related to the extent of the (RM,α) grid. At low signal levels the α distribution is very broad, and extends from the α = 0 of the injected signal down to α values of less than −10, but up to α values of only +5. This broad distribution is tapered at large positive or negative α values because we selected only solutions in our Monte Carlo simulations where the ML estimator of α lies between −6 and +2, or within 1σ of this grid boundary. A tail in the histogram of α values for weak injected signal persists even if we extend the search grid in α out to ±12. We found that the central hole in the Qref, Uref distribution from Fig. 2 is not present in simulations 2a and 3a. By changing the fractional bandwidth of the simulations while keeping the centre frequency fixed we found that this gap appears in observations with small fractional bandwidths. The gap in the Qref, Uref distribution is also filled in if we search out to α values of ±12. These results demonstrate that different choices for the frequency setup of observations or for the extent of the search grid in RM and α lead to different bias at low signal-to-noise ratios, in stark contrast with the old situation where bias in only a single frequency channel was considered (e.g. fig. 2 in Wardle & Kronberg 1974; Simmons & Stewart 1985). Therefore, one should be cautious about generalizing results for polarization bias in wide-band data sets. 3.4 Distributions of the polarized signal-to-noise ratio and of the detection significance Fig. 3 compares the injected to the recovered ML estimators for the polarized flux density divided by the standard deviation of the noise; also shown are predictions for the debiasing methods selected by Everett & Weisberg (2001) and George et al. (2012) (note that Everett et al. proposed their method for debiasing individual frequency channels, while we considered the entire frequency band). The bias correction selected by Everett et al. is based on work by Wardle & Kronberg (1974) and Simmons & Stewart (1985). We use box–whisker plots because these provide information also for asymmetric distributions and/or distributions with outliers. At high signal-to-noise levels the vertical dotted lines in each box–whisker symbol extend to what would be ±1σ in a 1D Gaussian distribution. Figure 3. Open in new tabDownload slide Box–whisker plot comparing the injected polarized signal-to-noise ratio with the recovered ML estimators for the signal-to-noise ratio for run 1a. Each whisker extends out to 1.48 times the median absolute deviation, which, for Gaussian distributions, is ≈1σ. Circles indicate the largest and smallest values encountered in the simulation. The red lines show the one-to-one line and predictions for the debiasing methods selected by Everett & Weisberg (2001, ‘EW’) and George et al. (2012, ‘GSK’). Figure 3. Open in new tabDownload slide Box–whisker plot comparing the injected polarized signal-to-noise ratio with the recovered ML estimators for the signal-to-noise ratio for run 1a. Each whisker extends out to 1.48 times the median absolute deviation, which, for Gaussian distributions, is ≈1σ. Circles indicate the largest and smallest values encountered in the simulation. The red lines show the one-to-one line and predictions for the debiasing methods selected by Everett & Weisberg (2001, ‘EW’) and George et al. (2012, ‘GSK’). Fig. 3 shows that weak signals (signal-to-noise ratio ≲3) are often reconstructed at a similar signal-to-noise ratio of about 3 independent of the strength of the injected signal (Hales et al. 2012 reached a similar conclusion in their analysis, see fig. 1 in their paper). Because the measured signal-to-noise ratio can be interpreted in this case by a wide range of injected signal-to-noise ratios, debiasing makes little sense. The height of the plateau at low signal-to-noise levels depends on the frequency setup of the simulation, as we found by comparing the equivalents of Fig. 3 between runs 1a, 2a and 3a: run 2a shows the weakest noise bias (median recovered signal-to-noise ratio of about 2). In run 1a, at signal-to-noise levels ≳5 the median of the distribution of reconstructed signal-to-noise ratios deviates little from the injected signal strength, indicating negligible bias. At lower signal-to-noise levels these medians follow the debiasing curve proposed by Everett et al. more closely than the debiasing curve proposed by George et al., even though the scatter in the recovered signal-to-noise values is much larger than the separation between the debiasing curves. As we noted at the end of the previous subsection, different frequency setups are affected by noise in different ways. Since George et al. proposed their debiasing method for observations centred on 1.4 GHz, using a narrow bandwidth, their debiasing method might not work as well for our simulation run 1a. If the source has a spectral index of −2 then the recovered signal-to-noise ratios do not increase as rapidly as the injected signal-to-noise ratios if the signal-to-noise ratio is high; the equivalent of Fig. 3 for run 2a even shows a local minimum. This bias is reduced strongly if the spectral index is +2 (runs 1–3). An important question is how the measured signal-to-noise ratio can be translated back into the signal-to-noise ratio of the injected signal. This is equivalent to drawing a horizontal line in Fig. 3 and determining the distribution of injected L/σ along this line. To solve this inverse problem we divided Fig. 3 into a fine grid (cell size 0.5 × 0.5) and counted the number of realizations of the Monte Carlo simulation in each cell. Then we read off along a horizontal line (i.e. for a given observed L/σ) that injected L/σ values contribute. Note that this method only works if the number of realizations of the Monte Carlo process is the same for each injected L/σ. Fig. 4 shows for five different strengths of the measured signal that strengths of the injected signal contribute. For weak signals, a Gaussian distribution is definitely not a good fit. If the measured signal-to-noise ratio is 5 then the distribution of injected L/σ values is described well by a Gaussian with a standard deviation of almost one (dashed green line); if the measured signal-to-noise ratio is even higher then this standard deviation is even closer to one. Also, if the measured L/σ is 5 or larger then the fitted Gaussians have a mean close to zero, which implies that in these cases the distribution of the injected L/σ values is centred on the value of L/σ of the measured signal. Therefore, for these signal-to-noise ratios the distributions of L/σ are almost bias free.3 We could not extend our analysis to even larger values for the measured signal-to-noise ratio because for such strong signals the distributions of injected L/σ extend beyond the grid that we used to make Fig. 4. Figure 4. Open in new tabDownload slide Translation from measured signal-to-noise ratios between 3 and 7 (indicated by different colours) back to injected signal-to-noise ratios for simulation 1a. These distributions correspond to horizontal cuts in Fig. 3 after gridding this figure using cells with a width of 0.5 units. The pdfs were centred on the value of the measured L/σ of each cut, i.e. this figure shows pdf[(L/σ)injected] − (L/σ)observed. We indicated this by using ‘shifted’ along the x-axis. For measured signal-to-noise ratios larger than about 5, the distribution of injected signal-to-noise values can be fitted well by a Gaussian, as indicated by the dashed line. The fitted Gaussian that we plotted has a standard deviation of 1.03. Figure 4. Open in new tabDownload slide Translation from measured signal-to-noise ratios between 3 and 7 (indicated by different colours) back to injected signal-to-noise ratios for simulation 1a. These distributions correspond to horizontal cuts in Fig. 3 after gridding this figure using cells with a width of 0.5 units. The pdfs were centred on the value of the measured L/σ of each cut, i.e. this figure shows pdf[(L/σ)injected] − (L/σ)observed. We indicated this by using ‘shifted’ along the x-axis. For measured signal-to-noise ratios larger than about 5, the distribution of injected signal-to-noise values can be fitted well by a Gaussian, as indicated by the dashed line. The fitted Gaussian that we plotted has a standard deviation of 1.03. Closely related to the distribution of the ML estimators for the polarized signal-to-noise ratio is the distribution of the detection significance. The latter is quantified by the difference in log likelihood for a realization with and without the injected signal, which can be thought of as the height of the peak in the log likelihood landscape compared to its surroundings. A stronger signal (higher signal-to-noise ratio) will produce a higher peak in log likelihood, leading to a higher detection significance. As Fig. 3 shows, this correlation between the detection significance and the signal-to-noise ratio of the measured signal breaks down if the injected signal is weak (L/σ ≲ 3) and the noise contribution is responsible for producing the highest peak in the log likelihood landscape. Because we are more familiar with quantifying detection significances using the error function, we calculate for each detection the value of Nσ for which the error function gives the same detection significance as the significance we derive from the log likelihood ratio. Fig. 5 shows the distributions of Nσ for a range of strengths of the injected signal in simulation 1a. If the injected signal is weak then noise can produce the highest log likelihood in a large (RM,α) grid, similar to how a large map of the sky can show bright spots due to noise. This leads to many realizations in the Monte Carlo process having a detection significance above ‘3σ’ even if the injected signal has only L/σ = 1. A sufficiently high detection threshold, e.g. ‘5σ’, will prevent such weak sources being interpreted as real detections, but at the same time Fig. 5 shows that some genuine signals will fall below this detection threshold and will therefore be discarded (false negatives). Fig. 5 also shows that for L/σ = 4, 5, the ‘Nσ’ coordinate of the mean of each distribution is smaller than the value of L/σ of the injected signal. This implies that quoting the measured L/σ of a detection overestimates the real detection significance because the centres of the distributions of injected and measured L/σ in Fig. 4 are almost the same for these two L/σ values. Figure 5. Open in new tabDownload slide Distribution of detection significances that we calculated from the log likelihood ratio for four strengths of the injected signal (simulation 1a, using a binsize of 0.2 along the horizontal axis). To make the detection significances easier to interpret, we calculate for each detection how far we have to integrate the error function to obtain the same detection significance (‘Nσ’, which is plotted along the horizontal axis). The distributions for signal-to-noise ratios of 0 and 1 are nearly identical; to avoid cluttering the plot we omit the distribution for a signal-to-noise ratio of 2. Figure 5. Open in new tabDownload slide Distribution of detection significances that we calculated from the log likelihood ratio for four strengths of the injected signal (simulation 1a, using a binsize of 0.2 along the horizontal axis). To make the detection significances easier to interpret, we calculate for each detection how far we have to integrate the error function to obtain the same detection significance (‘Nσ’, which is plotted along the horizontal axis). The distributions for signal-to-noise ratios of 0 and 1 are nearly identical; to avoid cluttering the plot we omit the distribution for a signal-to-noise ratio of 2. 3.5 Distribution of RMs Weak signals produce broad RM distributions, as demonstrated in Fig. 2. If we limit the search range in RM (run 1d) we can make the effects of noise on the RM distribution for different strengths of the injected signal even more clear, as shown in Fig. 6 (see also fig. 3 in George et al. 2012 and fig. 2 in Macquart et al. 2012). These broad distributions can be removed by selecting only realizations of the Monte Carlo simulations that have detection probabilities larger than ‘4σ’. As Fig. 6 shows this removes almost all the realizations belonging to the injected signal with L/σ = 1, but also if L/σ = 5 a large fraction of the realizations is removed. Why this happens can be understood from Fig. 5, which shows that even if the injected signal has L/σ = 5 a large proportion of the Monte Carlo realizations has a detection significance smaller than ‘4σ’. Figure 6. Open in new tabDownload slide Derived RMs for simulation 1d for injected signal-to-noise strengths of 1 (red, left y-axis) and 5 (blue, right y-axis). Dashed lines include only data points with an overall detection significance > ‘4σ’ (see main text). Figure 6. Open in new tabDownload slide Derived RMs for simulation 1d for injected signal-to-noise strengths of 1 (red, left y-axis) and 5 (blue, right y-axis). Dashed lines include only data points with an overall detection significance > ‘4σ’ (see main text). Often the error in the measured RM is derived from errRM = FWHM/(2 L/σ), where FWHM is the full width at half-maximum of the RM spread function (which can be calculated using equation 61 in Brentjens & de Bruyn 2005), L the measured polarized flux density and σ the rms noise level of the observations when integrating over the entire frequency band.4 In Fig. 7, we test if the distribution of RM/errRM derived from observations is biased, and if errRM is a good approximation for the error in RM. For large enough injected L/σ values the distribution of recovered RM/errRM is centred on the injected RM of 0 rad m−2, implying that this estimator is unbiased. As indicated by the black whiskers, the width of this distribution is (almost) one if we use the errors calculated from the log likelihood, as one would expect if the calculated errors are correct. However, the width of the distribution of RM/errRM is noticeably different from one if errRM is used. So far we have not found an explanation why errRM is only approximately correct; not debiasing L before calculating errRM can be ruled out because Fig. 3 shows that noise bias in L is negligible for L/σ ≳ 5. Figure 7. Open in new tabDownload slide Box–whisker plot showing the distribution of reconstructed |$\widehat{\mathrm{RM}}$| divided by either the measurement error as derived from fitting a 2D parabola to contours in the log Λ landscape (black lines) or by the standard error in RM, errRM (red lines; points are slightly offset along the horizontal axis). Simulations for run 1a; note that at low S/N, the error estimation in RM using log likelihood fitting is unreliable (Section 3.1). Figure 7. Open in new tabDownload slide Box–whisker plot showing the distribution of reconstructed |$\widehat{\mathrm{RM}}$| divided by either the measurement error as derived from fitting a 2D parabola to contours in the log Λ landscape (black lines) or by the standard error in RM, errRM (red lines; points are slightly offset along the horizontal axis). Simulations for run 1a; note that at low S/N, the error estimation in RM using log likelihood fitting is unreliable (Section 3.1). There are only small differences in the extent of the ‘1σ’ confidence regions in Fig. 7 between runs 1a, 2a and 3a if we estimate measurement errors in RM by fitting contours in the log likelihood landscape; using errRM instead leads to much larger differences. In fact, our Monte Carlo simulations show that using errRM can lead to ‘1σ’ whiskers that are much smaller or larger than ±1 in Fig. 7. If the source has a spectral index of −2 then the ‘1σ’ whiskers in Fig. 7 extend only out to about ±0.3 in run 2b, and out to about ±0.75 in run 3b, if we use errRM. Two effects play a role. First, a source with a negative spectral index has a smaller error in RM than a source with a positive spectral index if the same frequency setup is used in both cases: a negative spectral index implies that the source is bright at low frequencies (large λ2) where RMs can be determined accurately, while the opposite is true for a source with a positive spectral index. Since errRM is calculated assuming α = 0, this suggests that sources with negative (positive) α on average have smaller (larger) errors than the value calculated with errRM. When dividing the ML estimator for RM by errRM this leads to confidence regions that extend to less than (more than) ±1 along the vertical axis in Fig. 7. Secondly, errRM depends on the signal-to-noise ratio of the signal, and these can be strongly biased as we reported in Section 3.4. By replacing the measured signal-to-noise ratio in errRM with the injected signal-to-noise ratio we found that this second effect by itself is not sufficient to explain the behaviour of the distributions of RM/errRM. By comparing the equivalents of Fig. 7 for different simulation runs we find that the rate at which the error in RM decreases with increasing signal-to-noise level depends both on the frequency setup of the observations and on the spectral index of the source. For example, if α = 2 and we use errRM then the ‘1σ’ whiskers in Fig. 7 extend well beyond y = ±1, falling outside the plot range (run 2c) or extending out to y = ±1.4 (run 3c). If we determine the error in RM by fitting contours in log likelihood then in the case of run 2c the ‘1σ’ whiskers decrease towards y = ±1.3 at a signal-to-noise ratio of 8; we did not investigate if the whiskers converge to y = ±1 at even higher signal-to-noise ratios. We conclude that errRM provides a fast way for estimating the error in RM for a given frequency setup and observing time, but more accurate methods are required for determining measurement errors in RM for sources with non-zero spectral indices. 3.6 Distributions of the intrinsic polarization angle χref and the spectral index α Naghizadeh-Khouei & Clarke (1993) discussed the probability density function for the measured polarization angle of a single frequency channel. These authors show that at high signal-to-noise levels this distribution has a standard deviation of 1/(2(L/σ)ch) radians, where (L/σ)ch is the signal-to-noise ratio of a single frequency channel. However, the intrinsic polarization angle χref, which we derive from the ML estimators for Qref and Uref, has been corrected for Faraday rotation; therefore, the distribution of χref from our Monte Carlo simulations will differ from the probability density function derived by Naghizadeh-Khouei and Clarke (Fig. 8). To estimate the measurement uncertainty (1σ) in χref we rewrite equation A.20 from Brentjens & de Bruyn (2005), assuming uniformly spaced channels in wavelength squared, constant noise variances across the band (equal in Stokes Q and U), δλ2 ≪ 1 and Nch ≫ 1: \begin{eqnarray} \mathrm{err}_\chi = \frac{1}{2(L/\sigma )_\mathrm{ch}\sqrt{N_\mathrm{ch}}}\sqrt{1+3\left(\frac{\lambda ^2_\mathrm{max}+\lambda ^2_\mathrm{min}}{\lambda ^2_\mathrm{max}-\lambda ^2_\mathrm{min}}\right)^2}\,, \end{eqnarray} (20) where |$\lambda ^2_\mathrm{min}$| and |$\lambda ^2_\mathrm{max}$| are the lowest respectively the highest wavelength squared values in the observing band. One can interpret this equation as the product of the error in χref derived by Naghizadeh-Khouei and Clarke after summing Nch channels, modulated by a term that contains information on the frequency coverage of the observations (the ratio inside the square root is proportional to one over the fractional bandwidth expressed in units of wavelength squared). Figure 8. Open in new tabDownload slide Probability density functions of the intrinsic polarization angle χref (black lines) for four different strengths of the injected signal (run 1a). The solid red line shows the distribution from equation 3 in Naghizadeh-Khouei & Clarke (1993), the vertical red lines show the estimate for the measurement error |$[1/(2(L/\sigma)_\mathrm{ch})]/\sqrt{N_\mathrm{ch}}$| and the vertical blue lines show the prediction by equation (20). Figure 8. Open in new tabDownload slide Probability density functions of the intrinsic polarization angle χref (black lines) for four different strengths of the injected signal (run 1a). The solid red line shows the distribution from equation 3 in Naghizadeh-Khouei & Clarke (1993), the vertical red lines show the estimate for the measurement error |$[1/(2(L/\sigma)_\mathrm{ch})]/\sqrt{N_\mathrm{ch}}$| and the vertical blue lines show the prediction by equation (20). In simulations 1–3a,b,c the means of the distributions of χref lie between ±2.2|$^\circ$|⁠. Such small offsets imply that the bias in χref is small; in fact, these offsets could be largely due to the limited number of realizations of the Monte Carlo process. Using the signal-to-noise ratio of the injected signal in equation (20), the standard deviations of the χref distributions shown in Fig. 8 can be described well by equation (20), with the exception of the first panel where the signal-to-noise level is simply too low. In simulations 1–3a,b,c we found that injected signal-to-noise ratios of 5 and 7 can have differences between the prediction by equation (20) and the standard deviation of the χref distributions of up to four degrees; for weaker signals these differences can be (much) larger. Based on these observations we conclude that equation (20) is a useful approximation for estimating the error in the derived polarization angle if the signal is sufficiently strong. As demonstrated first by Jauncey (1967), maximizing the log likelihood for the coefficient α in a power-law distribution typically requires numerical techniques. The expression we derived for finding the ML estimator of α in the case of equal noise variances, \begin{eqnarray} && {\sum _{i=1}^{N_\mathrm{ch}} \left[ Q_{\mathrm{obs},i}(\hat{Q}_\mathrm{ref}\,c_i - \hat{U}_\mathrm{ref}\,s_i) \right. } \nonumber \\ && + \left. U_{\mathrm{obs},i}(\hat{Q}_\mathrm{ref}\,s_i + \hat{U}_\mathrm{ref}\,c_i) \right] \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^\alpha \log \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)/\sigma _{L,i}^2 \nonumber \\ && \qquad= \sum _{i=1}^{N_\mathrm{ch}} (\hat{Q}_\mathrm{ref}^2 + \hat{U}_\mathrm{ref}^2) \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)^{2\alpha } \log \left(\frac{\nu _i}{\nu _\mathrm{ref}}\right)/\sigma _{L,i}^2 \,, \end{eqnarray} (21) also cannot be solved analytically. Therefore, we did not investigate further if analytical expressions can be found for the probability density function of α or the standard deviation of this distribution. 3.7 Summary of main results from Section 3 In this section, we investigated how source parameters determined from noisy data are affected by the frequency setup of the observations, the spectral index of the source and the extent of the search grid in RM and α. All three factors influence the results, which implies that debiasing schemes should include all these factors and that such schemes are difficult to generalize. Debiasing weak signals makes little sense, since the injected signal one wishes to recover from the observations gets lost in the noise. Citing the polarized signal-to-noise ratio of a measurement is a biased proxy for the detection significance; the log likelihood ratio is much more accurate. The measured signal-to-noise ratio is itself subject to noise, and can be explained by a range of possible injected signal-to-noise ratios. For sufficiently strong signals the latter distribution is Gaussian. We derive the equation for the error in RM that is often used in the literature, errRM = FWHM/(2L/σ), and show that this quantity systematically under- or overestimates the true measurement uncertainty in RM. Sources with spectral indices α ≠ 0 typically have measurement uncertainties in RM that are different from errRM. Finally, we derive an expression for the measurement uncertainty in the intrinsic polarization angle, errχ. Since this derivation is based on the same assumptions as our derivation of errRM, also errχ should be considered an approximation that can deviate from the true measurement error. 4 CONCLUSIONS In this paper, we developed two methods for determining the physical properties of radio sources that emit at one RM, i.e. by assuming a power-law polarized flux density spectrum or that the polarized flux density spectrum is a scaled version of the spectrum in Stokes I. We allow for a variation in the sensitivity (noise variance) across the observing band. By fitting contours in the log likelihood landscape we determine the ML estimators for the source parameters, the measurement uncertainties and the detection significance. Also we derive an expression to quickly estimate the measurement error in the intrinsic polarization angle. If the polarized flux density follows a power law with a spectral index α, then maximizing the likelihood requires numerically searching through a grid of (RM, α) values; the ML estimators for the Stokes Q and U flux densities at the reference frequency (Qref and Uref) and the multiplicative scale factor for the noise variance η can be calculated analytically for each grid point. In this case we show that for a source with α = 0, the ML estimators for Qref and Uref can be found by applying RM synthesis. We also show that a variation in the sensitivity across the observing band can be included in the RM synthesis formalism by using one over the noise variance in each frequency channel as weights. If the polarized flux density spectrum is a scaled version of the Stokes I spectrum then the ML estimators for RM and the polarization fractions q and u can only be found using numerical techniques. Applying RM synthesis to the observed flux density ratios Qobs,i/Iobs,i and Uobs,i/Iobs,i (where ‘i’ is the channel index) does not maximize the likelihood, because the equations for finding the ML estimators for q and u do not depend on these ratios of flux densities. However, for weakly polarized sources we derive a weighted form of RM synthesis that includes the Stokes I spectrum of the source and a variation in sensitivity across the frequency band. For sources with a power-law flux polarized density spectrum we use Monte Carlo simulations to investigate statistical bias and whether standard methods for estimating measurement uncertainties are accurate. We simulate different frequency setups (4 cm band on the ATCA, bands 1+2 on SKA1-mid and Low Frequency Array (LOFAR)-HBA), sources with spectral indices α of −2, 0 or +2, and different extents of the search grid in RM and α. We find that noise bias affects different frequency bands in different ways, which means that in the low signal-to-noise regime results for one frequency band cannot be generalized. For observations in the 4 cm band of the ATCA noise bias in the polarized flux density is negligible if the signal-to-noise ratio is larger than about 5. An observed signal-to-noise ratio can be the result of a range of injected signal-to-noise ratios plus noise. We found that this distribution of injected signal-to-noise ratios is approximately Gaussian if the measured signal-to-noise ratio is larger than 5. We also found that citing the polarized signal-to-noise ratio as a proxy for the detection significance overestimates this significance. At low signal-to-noise ratios the distribution of the ML estimators for RM and the spectral index is very wide. For weak signals, the highest peak in the likelihood can be produced purely by noise or a combination of noise plus the injected signal. In these situations, the ML estimators for the model parameters can be very different from the properties of the injected (true) signal. Also RM synthesis is susceptible to this, because in that case one interprets the highest peak in polarized flux density in the RM spectrum as being due to a real signal. The standard method for calculating the measurement error in RM only applies to sources with α = 0; in certain cases the combination of noise bias in the polarized flux density and spectral index effects leads to large differences between the true measurement error in RM and the error that is calculated using the standard method. Finally, we derive an equation for approximating the error in the intrinsic polarization angle, errχ, and show that this quantity depends both on the signal-to-noise ratio and on the frequency coverage of the observations. Acknowledgments We would like to thank Aristeidis Noutsos and Olaf Wucknitz (both at the Max Planck Institute for Radio Astronomy) for many fruitful discussions. We also thank the anonymous referee for their very constructive comments. KJL is supported by the National Basic Research Program of China, 973 Program (Grant no. 2015CB857101) and NSFC (Grant no. U15311243). The figures shown in our paper make use of colour tables that were developed by Paul Tol5 (Netherlands Institute for Space Research). 1 " Montier et al. (2015a) include the full covariance matrix for the Stokes parameters I, Q and U in their analysis of the polarization fraction and polarization angle of radio sources. However, the sources they consider are not affected by Faraday rotation. 2 " By measuring a physical quantity and the associated noise variance one can reconstruct a distribution of possible values for the parameter of interest. The aim of debiasing is to centre this distribution on the value of the injected signal; debiasing does not replace the distribution of possible values of the quantity of interest with a single, noise-free value. 3 " For measured L/σ of 5, 6, 7 the distributions of injected L/σ shown in Fig. 4 have means of −0.01, 0.03, 0.03. To understand whether these means are consistent with zero (indicating no bias), we applied the one-sample t-test. We assume that the distributions of injected L/σ values are close enough to being Gaussian that we can run this test. The t-values for the three distributions are −0.58, 2.50 and 3.41; larger values for |t| are encountered in 56, 1.2 and 0.07 per cent of cases, respectively, hinting at bias for the highest signal-to-noise levels. 4 " To derive this expression start with equation A.18 in Brentjens & de Bruyn 2005. If all channels have the same noise variance and the source has a spectral index of zero, and assuming a uniform coverage in wavelength squared instead of frequency that is sampled with Nch channels, one can show that |$\sigma _{\lambda ^2}^2 = N_\mathrm{ch}(N_\mathrm{ch}+1)(\delta \lambda ^2)^2/12$|⁠, with δλ2 the width of each channel. Since Δλ2 = Nchδλ2, |$\sigma _\mathrm{RM}= \sqrt{N_\mathrm{ch}/[(N_\mathrm{ch}-2)(N_\mathrm{ch}+1)]}(\sigma /L)(\sqrt{3}/\Delta \lambda ^2)$|⁠. For Nch ≫ 1 the square root |${\approx } 1/\sqrt{N_\mathrm{ch}}$|⁠, while |$\sqrt{3}/\Delta \lambda ^2$| is half the FWHM of the RM spread function. Noting that the signal-to-noise ratio after RM synthesis increases as |$\sqrt{N_\mathrm{ch}}$| (see e.g. Section 3.1), one then recovers the standard expression for the measurement error in RM after running RM synthesis. 5 " https://personal.sron.nl/∼pault/ REFERENCES Avni Y. , 1976 , ApJ , 210 , 642 Crossref Search ADS Avni Y. , 1978 , A&A , 66 , 307 Bangs W. J. , 1971 , PhD thesis , Yale Univ. , New Haven, CT, USA Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Bates S. D. , Lorimer D. R., Verbiest J. P. W., 2013 , MNRAS , 431 , 1352 Crossref Search ADS Bell M. R. , Oppermann N., Crai A., Enßlin T. A., 2013 , A&A , 551 , L7 Crossref Search ADS Brentjens M. A. , de Bruyn A. G., 2005 , A&A , 441 , 1217 Crossref Search ADS Burn B. J. , 1966 , MNRAS , 133 , 67 Crossref Search ADS Carilli C. L. , Owen F. N., Harris D. E., 1994 , AJ , 107 , 480 Crossref Search ADS Cash W. , 1976 , A&A , 52 , 307 Chernoff H. , 1954 , Ann. Math. Stat. , 25 , 573 Crossref Search ADS Cooper B. F. C. , Price R. M., 1962 , Nature , 195 , 1084 Crossref Search ADS Dewdney P. E. , Turner W., Millenaar R., McCool R., Lazio J., Cornwell T. J., 2013 , Technical report, SKA1 System Baseline Design . SKA Office Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Everett J. E. , Weisberg J. M., 2001 , ApJ , 553 , 341 Crossref Search ADS Farnsworth D. , Rudnick L., Brown S., 2011 , AJ , 141 , 191 Crossref Search ADS Gardner F. F. , Whiteoak J. B., 1963 , Nature , 197 , 1162 Crossref Search ADS George S. J. , Stil J. M., Keller B. W., 2012 , Publ. Astron. Soc. Aust. , 29 , 214 Crossref Search ADS Hales C. A. , Gaensler B. M., Norris R. P., Middelberg E., 2012 , MNRAS , 424 , 2160 Crossref Search ADS Huang L. , Shcherbakov R. V., 2011 , MNRAS , 416 , 2574 Crossref Search ADS Jacobson N. , 2009 , Basic Algebra 1 2nd edn. Dover Publications , New York Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Jauncey D. L. , 1967 , Nature , 216 , 877 Crossref Search ADS Jones T. W. , O’Dell S. L., 1977 , ApJ , 214 , 522 Crossref Search ADS Lampton M. , Margon B., Bowyer S., 1976 , ApJ , 208 , 177 Crossref Search ADS Lorimer D. R. , Yates J. A., Lyne A. G., Gould D. M., 1995 , MNRAS , 273 , 411 Crossref Search ADS Macquart J.-P. , Ekers R. D., Feain I., Johnston-Hollitt M., 2012 , ApJ , 750 , 139 Crossref Search ADS Montier L. , Plaszczynski S., Levrier F., Tristram M., Alina D., Ristorcelli I., Bernard J.-P., 2015a , A&A , 574 , A135 Crossref Search ADS Montier L. , Plaszczynski S., Levrier F., Tristram M., Alina D., Ristorcelli I., Bernard J.-P., Guillet V., 2015b , A&A , 574 , A136 Crossref Search ADS Naghizadeh-Khouei J. , Clarke D., 1993 , A&A , 274 , 968 O’Dea C. P. , 1998 , Publ. Astron. Soc. Aust. , 110 , 493 Crossref Search ADS O’Sullivan S. P. et al. , 2012 , MNRAS , 421 , 3300 Crossref Search ADS Press W. H. , Teukolsky S. A., Vetterling W. T., Flannery B. P., 1992 , Numerical Recipes in C. The Art of Scientific Computing , 2nd edn. Univ. Press , Cambridge Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Protassov R. , van Dyk D. A., Connors A., Kashyap V. L., Siemiginowska A., 2002 , ApJ , 571 , 545 Crossref Search ADS Sazonov V. N. , 1969 , Sov. Astron. , 13 , 396 Scaife A. M. M. , Heald G. H., 2012 , MNRAS , 423 , L30 Crossref Search ADS Schnitzeler D. H. F. M. , Lee K. J., 2015 , MNRAS , 447 , L26 Crossref Search ADS Schnitzeler D. H. F. M. , Banfield J. K., Lee K. J., 2015 , MNRAS , 450 , 3579 Crossref Search ADS Schnitzeler D. H. F. M. , Eatough R. P., Ferrière K., Kramer M., Lee K. J., Noutsos A., Shannon R. M., 2016 , MNRAS , 459 , 3005 Crossref Search ADS Seielstad G. A. , Morris D., Radhakrishnan V., 1964 , ApJ , 140 , 53 Crossref Search ADS Simmons J. F. L. , Stewart B. G., 1985 , A&A , 142 , 100 Slepian D. , 1954 , Trans. IRE Prof. Group Inf. Theory , 3 , 68 Crossref Search ADS Sun X. H. et al. , 2015 , AJ , 149 , 60 Crossref Search ADS van Haarlem M. P. et al. , 2013 , A&A , 556 , A2 Crossref Search ADS Wardle J. F. C. , Kronberg P. P., 1974 , ApJ , 194 , 249 Crossref Search ADS Wehus I. K. , Fuskeland U., Eriksen H. K., 2013 , ApJ , 763 , 138 Crossref Search ADS Wilks S. S. , 1938 , Ann. Math. Stat. , 9 , 60 Crossref Search ADS APPENDIX A: NOMENCLATURE In the scientific literature both ‘RM’ and ‘Faraday depth’ are used for the integral in equation (1), having been introduced by Gardner & Whiteoak (1963) and Burn (1966), respectively. In this appendix we explain why we prefer to use ‘RM’. Compared to our argument in Schnitzeler, Banfield & Lee (2015) we now allow for synchrotron emission and Faraday rotation to take place in the same part of the line-of-sight dl. It is not clear why Burn (1966) did not consider keeping the term ‘RM’. Considering the radiative transfer of polarized radio waves, as presented by e.g. Sazonov (1969), Jones & O'Dell (1977) and more recently by Huang & Shcherbakov (2011), it is sensible to use ‘RM’ instead of ‘Faraday depth’. Equation 1 of Jones & O'Dell (1977) showed that, in a non-relativistic, optically thin source (i.e. with absorption neglected) where Faraday conversion between linear and circular polarization is negligible (this is the medium considered by Burn in his paper), the radiative transfer of the linear polarization vector can be written as \begin{eqnarray} \frac{\mathrm{d}\boldsymbol {L}_\nu }{\mathrm{d}l}\, =\, 2\mathrm{i}\times 0.81n_\mathrm{e}B_\Vert \boldsymbol {L}_\nu + \epsilon _\nu \mathrm{e}^{2\mathrm{i}\chi _0}\,, \end{eqnarray} (A1) where εν is the volume emissivity of the source. All parameters in equation (A1) can vary along the line of sight. The physical interpretation of equation (A1) is that each infinitesimal path element dl rotates background synchrotron emission and adds its own locally generated emission. From the perspective of the polarization vector, from the moment it is emitted it will only encounter Faraday rotating screens when travelling in the direction of the observer. Before Burn wrote his paper, the tiny amount of Faraday rotation that each path-length dl adds, |$0.81n_\mathrm{e}B_\Vert \mathrm{d}l$| in equation (A1), would have been known as the RM of the infinitesimal path-length dl. Because radiative transfer can be expressed completely in terms of RM, Burn could have kept using this term and did not need to introduce the concept of Faraday depth. Of course, one cannot use ‘RM’ both for the integral in equation (1) and to indicate the change in polarization angle with increasing wavelength squared; the numerical value of this derivative is only equal to RM if the source emits at a single RM. We used the term ‘net RM’ in Schnitzeler et al. (2015) to indicate the change in polarization angle with wavelength squared: since the observed monochromatic polarization vector is the beam-averaged sum of all the polarization vectors emitted by the source, both the angle of this monochromatic polarization vector and also the change in angle with wavelength squared are net quantities that depend on the contributions by all infinitesimal source elements. APPENDIX B: Finding ML estimators for q and u if L ∝ I In this appendix, we derive the equations for finding the ML estimators of q and u if the polarized flux density spectrum of a source is a scaled version of its Stokes I spectrum. There are two ways for removing the nuisance parameters Imod,i, the modelled Stokes I flux density in each channel: either one finds the ML estimators for Imod,i (Section B1) or one marginalizes over these parameters (Section B2). Because each option leads to different equations for |$\hat{q}$| and |$\hat{u}$|⁠, we consider them in separate sections. B1 Solve for Imod,i We derive the ML estimators for η and Imod,i by taking the partial derivatives of the log likelihood with respect to these parameters and setting the result equal to zero. |$\hat{\eta }$| depends on q, u and Imod,i, while |$\hat{I}_\mathrm{mod,i}$| depends on |$\hat{\eta }$| and therefore recursively on Imod,i. Solving for Imod,i and subsequently finding the ML estimators for q and u is difficult. To investigate whether it is at all possible to find analytical solutions for the ML estimators of q and u we make the simplifying assumptions that (1) the measured noise variances are exact (η = 1) and (2) σQ,i = σU,i = σI,i = constant. In this case, it is easy to find the ML estimator for Imod,i. The partial derivatives of the log likelihood with respect to q and u have in their denominators σ2(q2 + u2 + 1)2 and in their numerators \begin{eqnarray} && {(-1+q^2-u^2)(\mathcal {QI}+u\,\mathcal {QU}) - qu^2(\mathcal {QQ}-\mathcal {UU}) } \nonumber \\ &&+\, 2qu\,\mathcal {UI} + q(\mathcal {II}-\mathcal {QQ}), \end{eqnarray} (B1) \begin{eqnarray} && {(1+q^2-u^2)(\mathcal {UI}+q\,\mathcal {QU}) - q^2u(\mathcal {QQ}-\mathcal {UU})} \nonumber \\ &&-\, 2qu\,\mathcal {QI} - u(\mathcal {II}-\mathcal {UU}), \end{eqnarray} (B2) where we used the following shorthand notation for the observables: \begin{eqnarray} Q_{\mathrm{derot},i} & = & c_i Q_{\mathrm{obs},i} + s_i U_{\mathrm{obs},i}, \end{eqnarray} (B3) \begin{eqnarray} U_{\mathrm{derot},i} & = & -s_i Q_{\mathrm{obs},i} + c_i U_{\mathrm{obs},i}, \end{eqnarray} (B4) \begin{eqnarray} \mathcal {QQ} & = & \sum _{i=1}^{N_\mathrm{ch}} Q_{\mathrm{derot},i}^2, \end{eqnarray} (B5) \begin{eqnarray} \mathcal {UU} & = & \sum _{i=1}^{N_\mathrm{ch}} U_{\mathrm{derot},i}^2, \end{eqnarray} (B6) \begin{eqnarray} \mathcal {II} & = & \sum _{i=1}^{N_\mathrm{ch}} I_{\mathrm{obs},i}^2, \end{eqnarray} (B7) \begin{eqnarray} \mathcal {QI} & = & \sum _{i=1}^{N_\mathrm{ch}} Q_{\mathrm{derot},i} I_{\mathrm{obs},i}, \end{eqnarray} (B8) \begin{eqnarray} \mathcal {QU} & = & \sum _{i=1}^{N_\mathrm{ch}} Q_{\mathrm{derot},i}U_{\mathrm{derot},i}, \end{eqnarray} (B9) \begin{eqnarray} \mathcal {UI} & = & \sum _{i=1}^{N_\mathrm{ch}} U_{\mathrm{derot},i} I_{\mathrm{obs},i}\,. \end{eqnarray} (B10) To maximize the likelihood it is sufficient to find the zero-points of the numerators of ∂log Λ/∂q and ∂log Λ/∂u, the denominators are never zero for real-valued q and u. Equations (B1) and (B2) are both equal to zero if each of the frequency channels satisfies either Iobs,i + q Qderot,i + u Uderot,i = 0 or simultaneously the following two equations: \begin{eqnarray} \left\lbrace \begin{array}{@{}l@{\quad }l@{}}-q (I_{\mathrm{obs},i} + u\,U_{\mathrm{derot},i}) + (1+u^2)\,Q_{\mathrm{derot},i} = 0,\\ -(1+q^2)\,U_{\mathrm{derot},i} + u\,(I_{\mathrm{obs},i} + q\,Q_{\mathrm{derot},i}) = 0\,. \end{array}\right. \end{eqnarray} (B11) Since RM synthesis involves sums over frequency channels, while these solutions do not, we do not consider these solutions to be the equivalents of RM synthesis that we are looking for. Equation (B1) is a polynomial of degree 2 in q, and one can easily solve for q as a function of u. Inserting either one of the solutions for q(u) into the equation for ∂log Λ/∂u leads to a fraction with in its numerator a polynomial of degree six in u and in its denominator a polynomial of degree eight. The Abel–Ruffini theorem states that there is no general algebraic solution for polynomials of degree five or higher with arbitrary coefficients (Jacobson 2009); indeed, we did not find an obvious solution for u in a reasonable amount of time, not even if we used the computer algebra system |$\small {Wolfram\,Mathematica}$|⁠. Based on this result we believe that also the more general situation (η ≠ 1 or arbitrary noise variances) cannot be solved algebraically. B2 Marginalize over Imod,i Marginalizing over the nuisance parameters Imod,i for all frequency channels leads to the new (marginal) likelihood \begin{eqnarray} \log {\Lambda ^{\prime }} & = & -\frac{1}{2}\sum _{i=1}^{N_\mathrm{ch}} [\mathrm{fac}_{1,i} + 2\log (\mathrm{fac}_{2,i})] \nonumber \\ && -\, \sum _{i=1}^{N_\mathrm{ch}} [ \log (\sigma _{Q,i}) + \log (\sigma _{U,i}) + \log (\sigma _{I,i}) ] \nonumber \\ && -\,N_\mathrm{ch}[\log (2\pi)+2\log (\eta)], \end{eqnarray} (B12) where \begin{eqnarray*} \mathrm{fac}_{1,i} & = & \left[(Q_{\mathrm{obs},i}-\alpha _i I_{\mathrm{obs},i})^2\sigma _{U,i}^2 \right. \nonumber \\ & & \left. +\, (U_{\mathrm{obs},i}-\beta _i I_{\mathrm{obs},i})^2\sigma _{Q,i}^2 \right. \nonumber \\ & & \left. +\, (\beta _i Q_{\mathrm{obs},i} - \alpha _i U_{\mathrm{obs},i} )^2 (\sigma _{I,i}/\eta )^2 \right]/\zeta _i^2\,, \nonumber \\ {\left(\begin{array}{cc}\alpha _i \\ \beta _i \end{array}\right)} & = & {\left(\begin{array}{cc}c_i & -s_i \\ s_i & c_i\\ \end{array}\right)} {\left(\begin{array}{cc}q \\ u \end{array}\right)}\,, \nonumber \\ \zeta _i^2 & = & \alpha _i^2\sigma _{I,i}^2\sigma _{U,i}^2 + \beta _i^2\sigma _{I,i}^2\sigma _{Q,i}^2 + \eta ^2\sigma _{Q,i}^2\sigma _{U,i}^2\, {\rm and}\nonumber\\ \mathrm{fac}_{2,i} & = & \sqrt{ \left(\frac{1}{\sigma _{I,i}}\right)^2 + \left(\frac{\alpha _i}{\eta \,\sigma _{Q,i}}\right)^2 + \left(\frac{\beta _i}{\eta \,\sigma _{U,i}}\right)^2\,. \nonumber } \end{eqnarray*} The free parameters q, u, and η occur in fac1,i, ζi and fac2,i therefore taking the derivative of equation (B12) with respect to these parameters in order to maximize the likelihood leads to non-linear equations in these parameters. We tested if analytical solutions can be found that maximize the likelihood if we make the same two simplifying assumptions as in Section B1, i.e. η = 1 and σQ,i = σU,i = σI,i ≡ σ (constant). Taking the derivative of equation (B12) with respect to q and u leads to two fractions with in their denominators σ2(q2 + u2 + 1)2 and in their numerators \begin{eqnarray} && {(1-q^2+u^2)(\mathcal {QI}+u\,\mathcal {QU}) + qu^2(\mathcal {QQ}-\mathcal {UU}-N_\mathrm{ch}\sigma ^2)} \nonumber \\ && -\, 2qu\,\mathcal {UI} - q(\mathcal {II}-\mathcal {QQ}_\mathrm{obs}-\mathcal {UU}_\mathrm{obs}-N_\mathrm{ch}\sigma ^2-\mathcal {UU}) \nonumber \\ && -\,q^3N_\mathrm{ch}\sigma ^2, \end{eqnarray} (B13) \begin{eqnarray} && {(1+q^2-u^2)(\mathcal {UI}+q\,\mathcal {QU}) + q^2u(-\mathcal {QQ}+\mathcal {UU}-N_\mathrm{ch}\sigma ^2)} \nonumber \\ && - \,2qu\,\mathcal {QI} - u(\mathcal {II}-\mathcal {QQ}_\mathrm{obs}-\mathcal {UU}_\mathrm{obs}+N_\mathrm{ch}\sigma ^2+\mathcal {QQ}) \nonumber \\ && -\,u^3N_\mathrm{ch}\sigma ^2, \end{eqnarray} (B14) where \begin{eqnarray} \mathcal {QQ}_\mathrm{obs} & = & \sum _{i=1}^{N_\mathrm{ch}} Q_{\mathrm{obs},i}^2, \end{eqnarray} (B15) \begin{eqnarray} \mathcal {UU}_\mathrm{obs} & = & \sum _{i=1}^{N_\mathrm{ch}} U_{\mathrm{obs},i}^2. \end{eqnarray} (B16) Equations (B13) and (B14) are mixed polynomials of degree three in both q and u. If we insert each of the three roots q(u) of the numerator of ∂log Λ/∂q into the expression for ∂log Λ/∂u this leads to a fraction with a polynomial of degree seven in u in its numerator and a polynomial of degree eight in its denominator. We did not find solutions for q and u in a reasonable amount of time. APPENDIX C: Deriving an approximation to the log likelihood if L ∝ I To derive equation (6) from equation (B12) we assume that the noise variances in Stokes Q and U are equal for each frequency channel but are allowed to vary between channels. We introduce |$\sigma _{L,i}^2 \equiv \sigma _{Q,i}^2 = \sigma _{U,i}^2$|⁠, p2 ≡ q2 + u2, |$L_{\mathrm{mod},i} \equiv \sqrt{Q_{\mathrm{mod},i}^2 + U_{\mathrm{mod},i}^2}$| and |$L_{\mathrm{obs},i} \equiv \sqrt{Q_{\mathrm{obs},i}^2 + U_{\mathrm{obs},i}^2}$|⁠. Then \begin{eqnarray} \alpha _i^2 + \beta _i^2 & = & p^2 = \left(\frac{L_{\mathrm{mod}}}{I_{\mathrm{mod}}}\right)^2\!\!, \end{eqnarray} (C1) \begin{eqnarray} \sigma _{I,i}\,\mathrm{fac}_{2,i} & = & \sqrt{1 + \left(\frac{\sigma _{I,i}}{\eta \sigma _{L,i}}\right)^2\left(\frac{L_{\mathrm{mod}}}{I_{\mathrm{mod}}}\right)^2}, \end{eqnarray} (C2) \begin{eqnarray} && {\mathrm{fac}_{1,i} = } \nonumber \\ && \left(\frac{Q_{\mathrm{obs},i} - \alpha _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 + \left(\frac{U_{\mathrm{obs},i} - \beta _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 \nonumber \\ && +\, \left(p^2\,I_{\mathrm{obs},i}^2 - 2(Q_{\mathrm{obs},i}\alpha _i + U_{\mathrm{obs},i}\beta _i)I_{\mathrm{obs},i} + L_{\mathrm{obs},i}^2\right) \nonumber \\ && \times \left(\frac{1}{p^2\sigma _{I,i}^2+\eta ^2\sigma _{L,i}^2} - \frac{1}{\eta ^2\sigma _{L,i}^2}\right) \nonumber \\ && + \left(\frac{\beta _i Q_{\mathrm{obs},i} - \alpha _i U_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2\frac{\sigma _{I,i}^2}{p^2\sigma _{I,i}^2 + \eta ^2\sigma _{L,i}^2} \,. \end{eqnarray} (C3) If the source is weakly polarized (Lmod ≪ Imod) and σI,i ≈ η σL,i then p2 is approximately zero, and σI,i fac2,i ≈ 1 therefore log (σI,i fac2,i) ≈ 0. Furthermore, since \begin{eqnarray} \mathrm{fac}_{1,i} & \approx & \left(\frac{Q_{\mathrm{obs},i} - \alpha _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 + \left(\frac{U_{\mathrm{obs},i} - \beta _i\,I_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2 \nonumber \\ &&+ \left(\frac{\beta _i Q_{\mathrm{obs},i} - \alpha _i U_{\mathrm{obs},i}}{\eta \,\sigma _{L,i}}\right)^2\,, \end{eqnarray} (C4) equation (B12) simplifies to equation (6). © 2016 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society TI - Finding a faint polarized signal in wide-band radio data JF - Monthly Notices of the Royal Astronomical Society DO - 10.1093/mnras/stw3104 DA - 2017-04-01 UR - https://www.deepdyve.com/lp/oxford-university-press/finding-a-faint-polarized-signal-in-wide-band-radio-data-Hk2fX9WI9Q SP - 378 VL - 466 IS - 1 DP - DeepDyve ER -