# Redundant interferometric calibration as a complex optimization problem

Redundant interferometric calibration as a complex optimization problem Abstract Observations of the redshifted 21 cm line from the epoch of reionization have recently motivated the construction of low-frequency radio arrays with highly redundant configurations. These configurations provide an alternative calibration strategy – ‘redundant calibration’ – and boost sensitivity on specific spatial scales. In this paper, we formulate calibration of redundant interferometric arrays as a complex optimization problem. We solve this optimization problem via the Levenberg–Marquardt algorithm. This calibration approach is more robust to initial conditions than current algorithms and, by leveraging an approximate matrix inversion, allows for further optimization and an efficient implementation (‘redundant stefcal’). We also investigated using the preconditioned conjugate gradient method as an alternative to the approximate matrix inverse, but found that its computational performance is not competitive with respect to ‘redundant stefcal’. The efficient implementation of this new algorithm is made publicly available. instrumentation: interferometers, methods: data analysis, methods: numerical, techniques: interferometric, cosmology: observations 1 INTRODUCTION The quest for the redshifted 21 cm line from the epoch of reionization (EoR) is a frontier of modern observational cosmology and has motivated the construction of a series of new interferometric arrays operating at low frequencies over the last decade. EoR measurements are challenging because of the intrinsic faintness of the cosmological signal (see for instance, Furlanetto 2016; McQuinn 2016, for recent reviews) buried underneath foreground emission, which is a few orders of magnitude brighter than the EoR anywhere in the sky (e.g. Bernardi et al. 2009, 2010; Ghosh et al. 2012; Dillon et al. 2014; Parsons et al. 2014). As miscalibrated foreground emission leads to artefacts that can jeopardize the EoR signal (e.g. Grobler et al. 2014; Barry et al. 2016; Ewall-Wice et al. 2017), an exceptional interferometric calibration is required. Such calibration needs accurate knowledge of both the sky emission and the instrumental response (e.g. Smirnov 2011b). A skymodel is required for standard calibration as it would be an ill-posed problem if one tries to directly solve for all of the unknowns, i.e. the antenna gains and the uncorrupted visibilities, without first attempting to simplify the calibration problem. When the uncorrupted visibilities, however, are predicted from a predefined skymodel, the problem simplifies and becomes solvable. In contrast, if an array is deployed in a redundant configuration, i.e. with receiving elements placed on regular grids so that multiple baselines measure the same sky brightness emission, we circumvent the need for a skymodel altogether. This is true, since redundancy leads to fewer unknowns (i.e. the number of unique uv-modes) and if the array is redundant enough, it reduces the number of unknowns to such an extent that the calibration problem becomes solvable without having to predict the uncorrupted visibilities from a skymodel. Redundancy, therefore, is a promising path to achieve highly accurate interferometric calibration (Noordam & De Bruyn 1982; Pearson & Readhead 1984; Wieringa 1992; Liu et al. 2010; Noorishad et al. 2012; Marthi & Chengalur 2014; Sievers 2017). Moreover, redundant configurations provide a sensitivity boost on particular spatial scales that can be tuned to be the highest signal-to-noise ratio (SNR) EoR modes (Parsons et al. 2012; Dillon & Parsons 2016). These reasons motivated the design and deployment of EoR arrays in redundant configurations like the MIT-EoR (Zheng et al. 2014), the Precision Array to Probe the Epoch of Reionization (Ali et al. 2015), and the Hydrogen Epoch of Reionization Array (HERA; DeBoer et al. 2017). In this paper, we present a new algorithm to calibrate redundant arrays based on the complex optimization formalism recently introduced by Smirnov & Tasse (2015). With respect to current algorithms, it is more robust to initial conditions, while remaining comparatively fast. We also show that given certain approximations this new algorithm reduces to the redundant calibration equivalent of the stefcal algorithm (Salvini & Wijnholds 2014). We investigate the speed-up that the preconditioned conjugate gradient (PCG) method provides if it is employed by the new algorithm (Liu et al. 2010). A comparison between the computational complexity of the optimized new algorithm and redundant stefcal is also performed. A summary of the notation that is used within the paper is presented in Table 1. We also discuss the overset notation used within the paper in a bit more detail below (we use x as an operand proxy in this paper): $$\widetilde{x}$$ – This notation is used to denote a new scalar value that was derived from the scalar x using a proper mathematical definition. $$\overline{x}$$ – This notation denotes the conjugation of its operand (the conjugation of x). For the readers’ convenience, we will redefine this operator when it is first used within the paper. $$\widehat{x}$$ – This notation denotes that the quantity is an estimated value. $$\breve{\boldsymbol {x}}$$ – This notation denotes an augmented vector, i.e. $$\breve{\boldsymbol {x}} = [\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$. Table 1. Notation and frequently used symbols. x, $$\boldsymbol {x}$$, and $$\boldsymbol{X}$$  Scalar, vector, and matrix  $$\mathbb {N}$$  Set of natural numbers  $$\mathbb {C}$$  Set of complex numbers  $$\boldsymbol{I}$$  Identity matrix  αpq, ϕpq, ζpq, ξpq, and  Indexing functions  $$\psi _{pq}:\mathbb {N}^2\rightarrow \mathbb {N}$$    $$\overline{x}$$  Conjugation  $$\overline{\boldsymbol {x}}, \overline{\boldsymbol{X}}$$  Element-wise conjugation  x2  Square  $$\Vert \boldsymbol {x}\Vert _{\rm F}$$  Frobenius norm  $$\boldsymbol{X}^{-1}$$  Matrix inversion  ⊙  Hadamard product    (element-wise product)  $$\boldsymbol{X}^H$$  Hermitian transpose  $$\boldsymbol {x}^T, \boldsymbol{X}^T$$  Transpose  $$\langle \boldsymbol {x}\rangle _x$$  Averaging over x  $$\frac{\mathrm{\partial} }{\mathrm{\partial} z}$$  Wirtinger derivative  $$\widetilde{x}$$  A new quantity derived from x  $$\widehat{x}$$  Estimated quantity  $$\breve{\boldsymbol {x}}$$  Augmented vector $$[\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$  i  $$\sqrt{-1}$$  N, B, L, and P  Number of antennas, baselines,    redundant groups, and parameters  $$\boldsymbol {d}$$, $$\boldsymbol {v}$$, $$\boldsymbol {g}$$, $$\boldsymbol {y}$$, $$\boldsymbol {r}$$, and $$\boldsymbol {n}$$  Data, predicted, antenna,    true visibility, residual,    and noise vector  $$\boldsymbol {z}$$  $$\boldsymbol {z}= [\boldsymbol {g}^T, \boldsymbol {y}^T]^T$$  Λ  Objective function  λ and ρ  Damping factor and $$\rho = \frac{1}{1+\lambda }$$  $$\boldsymbol{J}$$  Jacobian matrix  $$\boldsymbol{H}$$  $$\boldsymbol{J}^H\boldsymbol{J}$$ (Hessian matrix)  $$\boldsymbol {\mathcal {H}}$$  Modified Hessian matrix  $$[\boldsymbol{X}]_{ij}$$  Element ij of matrix $$\boldsymbol{X}$$  xij  Composite antenna index  xi  Antenna or redundant group index  xk  Iteration number  $$\boldsymbol{M}$$  Preconditioner matrix  m  Number of non-zero entries    in a matrix  κ  Spectral condition number    of a matrix  γ  Sparsity factor of a matrix  Δx  Parameter update  ∃!  There exists a unique  x, $$\boldsymbol {x}$$, and $$\boldsymbol{X}$$  Scalar, vector, and matrix  $$\mathbb {N}$$  Set of natural numbers  $$\mathbb {C}$$  Set of complex numbers  $$\boldsymbol{I}$$  Identity matrix  αpq, ϕpq, ζpq, ξpq, and  Indexing functions  $$\psi _{pq}:\mathbb {N}^2\rightarrow \mathbb {N}$$    $$\overline{x}$$  Conjugation  $$\overline{\boldsymbol {x}}, \overline{\boldsymbol{X}}$$  Element-wise conjugation  x2  Square  $$\Vert \boldsymbol {x}\Vert _{\rm F}$$  Frobenius norm  $$\boldsymbol{X}^{-1}$$  Matrix inversion  ⊙  Hadamard product    (element-wise product)  $$\boldsymbol{X}^H$$  Hermitian transpose  $$\boldsymbol {x}^T, \boldsymbol{X}^T$$  Transpose  $$\langle \boldsymbol {x}\rangle _x$$  Averaging over x  $$\frac{\mathrm{\partial} }{\mathrm{\partial} z}$$  Wirtinger derivative  $$\widetilde{x}$$  A new quantity derived from x  $$\widehat{x}$$  Estimated quantity  $$\breve{\boldsymbol {x}}$$  Augmented vector $$[\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$  i  $$\sqrt{-1}$$  N, B, L, and P  Number of antennas, baselines,    redundant groups, and parameters  $$\boldsymbol {d}$$, $$\boldsymbol {v}$$, $$\boldsymbol {g}$$, $$\boldsymbol {y}$$, $$\boldsymbol {r}$$, and $$\boldsymbol {n}$$  Data, predicted, antenna,    true visibility, residual,    and noise vector  $$\boldsymbol {z}$$  $$\boldsymbol {z}= [\boldsymbol {g}^T, \boldsymbol {y}^T]^T$$  Λ  Objective function  λ and ρ  Damping factor and $$\rho = \frac{1}{1+\lambda }$$  $$\boldsymbol{J}$$  Jacobian matrix  $$\boldsymbol{H}$$  $$\boldsymbol{J}^H\boldsymbol{J}$$ (Hessian matrix)  $$\boldsymbol {\mathcal {H}}$$  Modified Hessian matrix  $$[\boldsymbol{X}]_{ij}$$  Element ij of matrix $$\boldsymbol{X}$$  xij  Composite antenna index  xi  Antenna or redundant group index  xk  Iteration number  $$\boldsymbol{M}$$  Preconditioner matrix  m  Number of non-zero entries    in a matrix  κ  Spectral condition number    of a matrix  γ  Sparsity factor of a matrix  Δx  Parameter update  ∃!  There exists a unique  View Large The paper is organized as follows: we review the complex calibration formalism by Smirnov & Tasse (2015) in Section 2, we extend it to the redundant case in Section 3, we present some computational complexity results in Section 4, and we present our conclusions in Section 5. 2 WIRTINGER CALIBRATION In a radio interferometer, the true sky visibilities ypq measured by a baseline formed by antenna p and q are always ‘corrupted’ by the non-ideal response of the receiver, which is often incorporated into a single, receiver-based complex number g (i.e. an antenna gain). The observed visibility dpq is therefore given by (Hamaker, Bregman & Sault 1996; Sault, Hamaker & Bregman 1996; Smirnov 2011a)   \begin{eqnarray} d_{pq} = g_{p}\overline{g_q} \, y_{pq} + n_{pq}, \end{eqnarray} (1)where $$\overline{x}$$ indicates complex conjugation and npq is the thermal noise component. The real and imaginary components of the thermal noise are normally distributed with a mean of zero and a standard deviation σ:   \begin{eqnarray} \sigma \propto \frac{T_{{\rm sys}}}{\sqrt{\Delta \nu \tau }}, \end{eqnarray} (2)where Tsys is equal to the system temperature, Δν is the observational bandwidth, and τ is the integration time per visibility. Considering the number of visibilities B (i.e. baselines) measured by an array of N elements $$B = \frac{N^2-N}{2}$$, equation (1) can be expressed in the following vector form:   \begin{eqnarray} \boldsymbol {d}= \boldsymbol {v}+ \boldsymbol {n}, \end{eqnarray} (3)where   \begin{eqnarray} \left[ \boldsymbol {d}\right]_{\alpha _{pq}} &=& d_{pq},\quad \left[ \boldsymbol {v}\right]_{\alpha _{pq}} = v_{pq}=g_p y_{pq} \overline{g_q}, \nonumber \\ \left[ \boldsymbol {n}\right]_{\alpha _{pq}} &=& n_{pq}, \end{eqnarray} (4)and   \begin{eqnarray} \alpha _{pq} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}(q-p) + (p-1)\left(N-\frac{1}{2}p \right) & {\rm if}\,p<q\\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (5)The function αpq therefore maps composite antenna indices to unique single indices, i.e.   \begin{eqnarray} \lbrace \alpha _{12}, \alpha _{13}, \ldots , \alpha _{N-1N}\rbrace = \lbrace 1,2, \ldots ,B\rbrace . \end{eqnarray} (6)The vectors in equation (4) are column vectors of size B (i.e. p < q). It is important to point out here that all of the mathematical definitions in this paper assume that we are only considering composite indices belonging to the set {rs|r < s}. For the sake of mathematical completeness, however, αpq is defined for composite indices that do not belong to this set. Also note that, while the indexing function αpq does not attain the value zero, in practice the zero indexing value does play a very important role in some of the definitions used within this paper (see for example equation A10). Radio interferometric calibration aims to determine the best estimate of $$\boldsymbol {g}= [g_1,g_2, \ldots ,g_N]^T$$ in order to correct the data and, following equation (3), can be formulated as a non-linear least-squares optimization problem:   \begin{eqnarray} \min _{\boldsymbol {g}} \Lambda (\boldsymbol {g}) = \min _{\boldsymbol {g}} \Vert \boldsymbol {r}\Vert _{\rm F}^2 = \min _{\boldsymbol {g}} \Vert \boldsymbol {d}- \boldsymbol {v}(\boldsymbol {g})\Vert _{\rm F}^2, \end{eqnarray} (7)where Λ is the objective function, $$\boldsymbol {r}$$ is the residual vector, and $$\Vert \boldsymbol {x}\Vert _{\rm F}$$ denotes the Frobenius norm. In standard interferometric calibration, ypq is assumed to be known at some level, for instance through the observation of previously known calibration sources. Non-linear least-squares problems are generally solved by using gradient-based minimization algorithms [i.e. Gauss–Newton (GN) or Levenberg–Marquardt (LM)] that require the model [$$\boldsymbol {v}$$ in equation (7)] to be differentiable towards each parameter. When the least-squares problem is complex, it becomes less straightforward to apply these gradient-based minimization methods, as many complex functions are not differentiable if the classic notion of differentiation is used, i.e. $$\frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} z}$$ does not exist if $$z \in \mathbb {C}$$. In order to circumvent the differentiability conundrum associated with complex least-squares problems, standard interferometric calibration divides the complex optimization problem into its real and imaginary parts and solves for the real and imaginary parts of the unknown model parameters separately. Smirnov & Tasse (2015) showed, however, that this approach is not needed if complex calculus (Wirtinger 1927) is adopted. The Wirtinger derivatives are defined as   \begin{eqnarray} \frac{\mathrm{\partial} }{\mathrm{\partial} z} = \frac{1}{2}\left(\frac{\mathrm{\partial} }{\mathrm{\partial} x} - {\rm i} \frac{\mathrm{\partial} }{\mathrm{\partial} y} \right),\quad\frac{\mathrm{\partial} }{\mathrm{\partial} \overline{z}} = \frac{1}{2}\left(\frac{\mathrm{\partial} }{\mathrm{\partial} x} + {\rm i} \frac{\mathrm{\partial} }{\mathrm{\partial} y} \right), \end{eqnarray} (8)which lead to the following relations:   \begin{eqnarray} \frac{\mathrm{\partial} z}{\mathrm{\partial} z} = 1,\quad \frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} z} =0,\quad \frac{\mathrm{\partial} z}{\mathrm{\partial} \overline{z}} = 0,\quad \frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} \overline{z}} =1. \end{eqnarray} (9)If the gradient operator is defined using equation (8), the model $$\boldsymbol {v}$$ now becomes analytic in both $$\boldsymbol {g}$$ and $$\overline{\boldsymbol {g}}$$, and equation (9) can be used to derive the complex variants of the real-valued GN and LM algorithms. In the complex GN and LM algorithms, complex parameters and their conjugates are treated as separate variables. Assuming that ypq is known, equation (7) is recast as (Smirnov & Tasse 2015)   \begin{eqnarray} \min _{\breve{\boldsymbol {g}}} \Lambda (\breve{\boldsymbol {g}}) = \min _{\breve{\boldsymbol {g}}} \Vert \breve{\boldsymbol {r}}\Vert _{\rm F}^2 = \min _{\breve{\boldsymbol {g}}} \Vert \breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}(\breve{\boldsymbol {g}})\Vert _{\rm F}^2, \end{eqnarray} (10)where $$\breve{\boldsymbol {r}} = [\boldsymbol {r}^T, \overline{\boldsymbol {r}}^T]^T$$, $$\breve{\boldsymbol {d}} = [\boldsymbol {d}^T, \overline{\boldsymbol {d}}^T]^T$$, $$\breve{\boldsymbol {v}} = [\boldsymbol {v}^T, \overline{\boldsymbol {v}}^T]^T$$, and $$\breve{\boldsymbol {g}} = [\boldsymbol {g}^T, \overline{\boldsymbol {g}}^T]^T$$. The complex GN update is therefore defined as   \begin{eqnarray} \Delta \breve{\boldsymbol {g}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (11)with   \begin{eqnarray} {\boldsymbol{J}}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol {\mathcal {J}}_1 & \boldsymbol {\mathcal {J}}_2\\ \overline{\boldsymbol {\mathcal {J}}}_2 & \overline{\boldsymbol {\mathcal {J}}}_1 \end{array}\right]}, \end{eqnarray} (12)and   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_1]_{\alpha _{pq},i} = \frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} g_i},\quad [\boldsymbol {\mathcal {J}}_2]_{\alpha _{pq},i} = \frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{g}_i}. \end{eqnarray} (13)The matrix $$\boldsymbol{J}$$ is generally referred to as the Jacobian1 matrix. The complex LM update is very similar, the major difference being the introduction of a damping parameter, λ:   \begin{eqnarray} \Delta \breve{\boldsymbol {g}} = (\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (14)where $$\boldsymbol{D}=\boldsymbol{I}{\odot }\boldsymbol{J}^H\boldsymbol{J}$$. In this paper, we use single subscript indices (e.g. i) to refer to a specific antenna (or as will become apparent later a specific redundant group) and composite subscript indices (e.g. pq) to refer to a specific baseline. If these indices appear in the definition of matrix elements, then their allowed ranges are determined by the dimension of the matrix that is being defined (see Table 2). Furthermore, the identity matrix is denoted by $$\boldsymbol{I}$$. Moreover, the Hadamard2 product and the Hermitian transpose are denoted by ⊙ and $$\boldsymbol{X}^H$$, respectively (Liu & Trenkler 2008). Note the use of the Wirtinger derivatives in equation (13). We will refer to $$\boldsymbol{J}^H\boldsymbol{J}$$ as the Hessian3 matrix $$\boldsymbol{H}$$ and to $$\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D}$$ as the modified Hessian matrix $$\boldsymbol {\mathcal {H}}$$ throughout this paper (Madsen, Nielsen & Tingleff 2004). Table 2. The dimensions of the Jacobian matrices, and their respective sub-matrices, defined in Sections 2 and 3. Matrix  Dimension  Wirtinger calibration  $$\boldsymbol{J}$$  2B × 2N  $$\boldsymbol {\mathcal {J}}_1$$  B × N  $$\boldsymbol {\mathcal {J}}_2$$  B × N  Redundant Wirtinger calibration  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol {\mathcal {J}}_1$$  B × (N + L)  $$\boldsymbol {\mathcal {J}}_2$$  B × (N + L)  Matrix  Dimension  Wirtinger calibration  $$\boldsymbol{J}$$  2B × 2N  $$\boldsymbol {\mathcal {J}}_1$$  B × N  $$\boldsymbol {\mathcal {J}}_2$$  B × N  Redundant Wirtinger calibration  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol {\mathcal {J}}_1$$  B × (N + L)  $$\boldsymbol {\mathcal {J}}_2$$  B × (N + L)  View Large Equation (11) or (14) can now be used iteratively to update the parameter vector $$\breve{\boldsymbol {g}}$$:   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} = \breve{\boldsymbol {g}}_{k} + \Delta \breve{\boldsymbol {g}}_{k}, \end{eqnarray} (15)until convergence is reached. In the case of the GN algorithm, the parameter update step simplifies and becomes (Smirnov & Tasse 2015)   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{1}{2}\breve{\boldsymbol {g}}_{k}. \end{eqnarray} (16) Smirnov & Tasse (2015) realized that the diagonal entries of the Hessian matrix $$\boldsymbol{H}$$ are much more significant than its off-diagonal entries, i.e. $$\boldsymbol{H}$$ is nearly diagonal. By approximating $$\boldsymbol{H}$$ by its diagonal and substituting the approximate Hessian matrix, the LM parameter update step becomes (Smirnov & Tasse 2015)   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} &\approx& \frac{1}{1+\lambda }\widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{\lambda }{1+\lambda } \breve{\boldsymbol {g}}_k, \nonumber \\ &=& \rho \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + (1-\rho )\breve{\boldsymbol {g}}_k, \end{eqnarray} (17)where $$\rho = \frac{1}{1+\lambda }$$. Note that equations (16) and (17) are not dependent on $$\breve{\boldsymbol {r}}$$. Interestingly enough, if λ = 0 we obtain the odd parameter update step of stefcal,4 and if λ = 1 (which corresponds to $$\rho =\frac{1}{2}$$), we obtain the even parameter update step of stefcal5 (Mitchell et al. 2008; Salvini & Wijnholds 2014). In the stefcal algorithm, the measurement equation (equation 1) is linearized by assuming that the gains are known, but that their conjugates are not. Under this assumption, the system of equations become linear and the conjugates of the gains can be obtained in a straightforward manner. Starting from the latest value of the gain conjugates, an updated estimate of the gains can be obtained iteratively until convergence is reached. Alternating between solving and fixing different sets of parameters (which is exactly what stefcal does) is referred to as the alternating direction implicit (ADI) method. The stefcal algorithm reduces the computational complexity of calibration from O(N3) to O(N2). 3 REDUNDANT WIRTINGER CALIBRATION Interferometric baselines are redundant when they sample the exact same visibilities in the uv-plane, i.e. if baseline pq and rs are redundant, then ypq = yrs. A redundant array layout allows us to solve for the unknown observed visibilities themselves in addition to the antenna gains (see equation 4). This is true, since in the case of a redundant layout, equation (4) is already an overdetermined system even before having predicted visibilities from a pre-existing skymodel. It is convenient to group redundant visibilities together and label each group using a single index rather than using their antenna pairs as in equation (1). We introduce a function ϕ that maps the antenna pair associated with a specific baseline to its corresponding redundant baseline group, i.e. if baseline pq and rs are redundant, then ϕpq = ϕrs (implying that they belong to the same group). To be exact, ϕ maps the composite index pq to its group index only if pq ∈ {rs|r ≤ s}. If pq ∉ {rs|r ≤ s}, then the composite index pq is mapped to zero. The function ϕpq is, therefore, not symmetric. Equation (1) can be rewritten for a redundant array as   \begin{eqnarray} d_{pq} = g_{p}\overline{g_q}y_{\phi _{pq}} + n_{pq}, \end{eqnarray} (18)with the same vector form as equation (3) if   \begin{eqnarray} \left[ \boldsymbol {d}\right]_{\alpha _{pq}} &=& d_{pq},\quad \left[ \boldsymbol {v}\right]_{\alpha _{pq}} = v_{pq}=g_p y_{\phi _{pq}} \overline{g_q}, \nonumber \\ \left[ \boldsymbol {n}\right]_{\alpha _{pq}} &=& n_{pq}, \end{eqnarray} (19)where the vectors in equation (19) are column vectors of size B (i.e. p < q). We also introduce the following symmetric variant of ϕpq:   \begin{eqnarray} \zeta _{pq} = \left\lbrace \begin{array}{l@{\quad}l}\phi _{pq}\,&{\rm if}\,p \le q\\ \phi _{qp}&{\rm if}\,\,p>q \end{array}\right., \end{eqnarray} (20)and we will refer to ζpq as the symmetric geometric function. It is possible to construct a simple analytic expression for ζpq for an east–west regular array, i.e. ζpq = |q − p|. It becomes, however, increasingly difficult to construct analytic expressions of ζpq for more complicated array layouts. The empirically constructed symmetric geometry functions of three different redundant layouts are displayed in Fig. 1. We denote the range of ζpq with $$\mathcal {R}(\zeta _{pq})$$. The maximal element that ζpq can ascertain is denoted by L and can be interpreted as the greatest number of unique redundant baseline groups that can be formed for a given array layout. Figure 1. View largeDownload slide Three different redundant antenna layouts: hexagonal (top left), square (top middle), and regular east–west (top right) with their associated symmetric redundancy geometry functions ζpq (bottom panels). We used 91 antennas to construct the hexagonal layout, while 100 antennas were used in the square and east–west layouts. In the case of the east–west layout, we only plot the positions of the first 10 antennas. The maximal numbers of redundant baseline groups L that can be formed for the hexagonal, square, and east–west layouts are 165, 180, and 99, respectively. The analytic expressions of L for a hexagonal, square, and east–west layout are $$L = 2N-\frac{1}{2}\sqrt{12N-3}-\frac{1}{2}$$, $$L=2N-2\sqrt{N}$$, and L = N − 1, respectively. Figure 1. View largeDownload slide Three different redundant antenna layouts: hexagonal (top left), square (top middle), and regular east–west (top right) with their associated symmetric redundancy geometry functions ζpq (bottom panels). We used 91 antennas to construct the hexagonal layout, while 100 antennas were used in the square and east–west layouts. In the case of the east–west layout, we only plot the positions of the first 10 antennas. The maximal numbers of redundant baseline groups L that can be formed for the hexagonal, square, and east–west layouts are 165, 180, and 99, respectively. The analytic expressions of L for a hexagonal, square, and east–west layout are $$L = 2N-\frac{1}{2}\sqrt{12N-3}-\frac{1}{2}$$, $$L=2N-2\sqrt{N}$$, and L = N − 1, respectively. We can now formulate redundant calibration as a least-squares problem:   \begin{eqnarray} \min _{\boldsymbol {z}} \Lambda (\boldsymbol {z}) = \min _{\boldsymbol {z}} \Vert \boldsymbol {r}\Vert _{\rm F}^2 = \min _{\boldsymbol {z}} \Vert \boldsymbol {d}- \boldsymbol {v}(\boldsymbol {z})\Vert _{\rm F}^2, \end{eqnarray} (21)where   \begin{eqnarray} \boldsymbol {g}&=&[g_1, \ldots ,g_N]^T,\quad \boldsymbol {y} = [y_1, \ldots ,y_L]^T, \nonumber \\ \boldsymbol {z}&=& [\boldsymbol {g}^T, \boldsymbol {y}^T]^T. \end{eqnarray} (22)The number of model parameters to be solved for is now P = 2(N + L), since redundant calibration is a complex problem. Note that equation (21) is only solvable (i.e. the array is redundant enough) if L + N ≤ B. In the literature, equation (21) is solved by splitting the problem into its real and imaginary parts. The real and imaginary parts of the unknown parameters are then solved for separately (Wieringa 1992; Liu et al. 2010; Zheng et al. 2014). Currently, the above is achieved by using the real-valued GN algorithm (Kurien et al. 2016). We, instead, intend to formulate the redundant calibration problem using Wirtinger calculus and recast equation (21) as   \begin{eqnarray} \min _{\breve{\boldsymbol {z}}} \Vert \breve{\boldsymbol {r}}\Vert = \min _{\breve{\boldsymbol {z}}} \Vert \breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}(\breve{\boldsymbol {z}})\Vert , \end{eqnarray} (23)where $$\breve{\boldsymbol {z}} = [\boldsymbol {z}^T, \overline{\boldsymbol {z}}^T]^T$$. We derive the complex Jacobian associated with equation (23) to be   \begin{eqnarray} \boldsymbol{J}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol {\mathcal {J}}_1 & \boldsymbol {\mathcal {J}}_2\\ \overline{\boldsymbol {\mathcal {J}}}_2 & \overline{\boldsymbol {\mathcal {J}}}_1 \end{array}\right]}, \end{eqnarray} (24)where   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_1]_{\alpha _{pq},i} =\left\lbrace \begin{array}{l{@}{\quad}l}\displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} g_i} & {\rm if}\,i\le N \\ \displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} y_{i-N}} & {\rm otherwise} \end{array}\right., \end{eqnarray} (25)and   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_2]_{\alpha _{pq},i} =\left\lbrace \begin{array}{l{@}{\quad}l}\displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{g}_i} & {\rm if}\,i\le N \\ \displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{y}_{i-N}} & {\rm otherwise} \end{array}\right.. \end{eqnarray} (26) Algorithm 1 View largeDownload slide Constructing $$\boldsymbol {\mathcal {J}}_1$$ Algorithm 1 View largeDownload slide Constructing $$\boldsymbol {\mathcal {J}}_1$$ Note that we employ the same subscript indexing notation that we used in Section 2 in equations (25) and (26). To further aid the reader in understanding this indexing notation, we refer him/her to Fig. 2 (also see Algorithm 1), which depicts a flow diagram of the matrix construction procedure with which $$\boldsymbol {\mathcal {J}}_1$$ can be constructed. The range of values the indices in equations (25) and (26) can attain should also be clear to the reader after having inspected Fig. 2 and Table 2. Figure 2. View largeDownload slide A flow chart representing the procedure one would follow to partially construct the Jacobian matrix (i.e. $$\boldsymbol {\mathcal {J}}_1$$) associated with Wirtinger redundant calibration. The red circle represents the start of the flow diagram. The blue circle represents the end of the diagram. Blue diamonds denote loop conditionals, while green diamonds denote simple conditionals. The diagram elements following the red arrows just below a loop conditional element all form part of the main body of the loop that has the conditional statement of the aforementioned loop conditional element in its definition. The pseudo-code associated with this flow diagram is given in Algorithm 1. Figure 2. View largeDownload slide A flow chart representing the procedure one would follow to partially construct the Jacobian matrix (i.e. $$\boldsymbol {\mathcal {J}}_1$$) associated with Wirtinger redundant calibration. The red circle represents the start of the flow diagram. The blue circle represents the end of the diagram. Blue diamonds denote loop conditionals, while green diamonds denote simple conditionals. The diagram elements following the red arrows just below a loop conditional element all form part of the main body of the loop that has the conditional statement of the aforementioned loop conditional element in its definition. The pseudo-code associated with this flow diagram is given in Algorithm 1. We can now calculate the GN and LM updates to be   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}} \end{eqnarray} (27)and   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (28)respectively. As in Section 2, equation (27) can be used to iteratively update our parameter vector:   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} = \breve{\boldsymbol {z}}_{k} + \Delta \breve{\boldsymbol {z}}_{k}. \end{eqnarray} (29)The analytic expressions for $$\boldsymbol{J}$$, $$\boldsymbol{H}$$, and $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ can be found in Appendix A. Appendix A also contains two useful identities involving $$\boldsymbol{J}$$ and $$\boldsymbol{H}$$. In the case of the GN algorithm, we can simplify equation (29) even further. Replacing $$\breve{\boldsymbol {r}}$$ with $$\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}$$ in equation (27) results in   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H(\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}). \end{eqnarray} (30)If we substitute the first identity of equation (A20) into equation (30) and we simplify the result, we obtain   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}-\frac{1}{3}\breve{\boldsymbol {z}}. \end{eqnarray} (31)If we use the above-simplified update in equation (29), it reduces to   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{2}{3}\breve{\boldsymbol {z}}_{k}. \end{eqnarray} (32)Equation (32) is the redundant equivalent of equation (16), and it shows us that in the case of redundant calibration, we can calculate the GN parameter update step without calculating the residual. Fig. 3 shows that the Hessian matrix $$\boldsymbol{H}$$ is nearly diagonal and sparse for both the regular east–west and hexagonal layouts we considered. We therefore follow the approach of Smirnov & Tasse (2015) and approximate the Hessian matrix $$\boldsymbol{H}$$ with its diagonal. If we substitute $$\boldsymbol{J}^H\boldsymbol{J}$$ with $$\widetilde{\boldsymbol{H}}=\boldsymbol{H}{\odot }\boldsymbol{I}$$ and replace $$\breve{\boldsymbol {r}}$$ with $$\breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}$$ in equation (27), we obtain   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H(\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}). \end{eqnarray} (33)Utilizing the second identity in equation (A20) allows us to simplify equation (33) to   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}-\breve{\boldsymbol {z}}, \end{eqnarray} (34)which leads to   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}. \end{eqnarray} (35)Using equation (28), we follow the same procedure and obtain a similar result for LM   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} &\approx& \frac{1}{1+\lambda }\widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{\lambda }{1+\lambda } \breve{\boldsymbol {z}}_k, \end{eqnarray} (36)  \begin{eqnarray} \hphantom{\breve{\boldsymbol {z}}_{k+1}}= \rho \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + (1-\rho )\breve{\boldsymbol {z}}_k. \end{eqnarray} (37)The analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {d}}$$ will be very similar to the analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$, the only difference being that in equation (A18) the letter r would be replaced by a d. If we substitute the analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {d}}$$ and $$\widetilde{\boldsymbol{H}}^{-1}$$ (which can easily be constructed using Appendix A) into equation (37), we obtain the following two update rules:   \begin{eqnarray} g_{i}^{k+1} = \rho \frac{\sum _{j\ne i} g_j^k \widetilde{y}_{ij}^{ \!\!k} d_{ij}}{\sum _{j\ne i} |g_j^k|^2|y_{\zeta _{ij}}^k|^2} + (1-\rho ) g_i^k \end{eqnarray} (38)and   \begin{eqnarray} y_{i}^{k+1} = \rho \frac{\sum _{rs \in \mathcal {RS}_i} \overline{g}_r^k g_s^k d_{rs}}{\sum _{rs \in \mathcal {RS}_i}|g_r^k|^2|g_s^k|^2} + (1-\rho ) y_i^k. \end{eqnarray} (39)The index set $$\mathcal {RS}_i$$ and the quantity $$\widetilde{y}_{ij}$$ are defined in equations (A14) and (A19), respectively. The computational complexity of inverting $$\widetilde{\boldsymbol{H}}$$ is O(P). We note that equation (38) is the gain estimator associated with stefcal. Figure 3. View largeDownload slide The number of analytic terms out of which the entries of the Hessian $$\boldsymbol{H}$$ consist for two different geometric layouts, namely a regular east–west grid with N = 5 (left-hand panel) and a hexagonal grid with N = 7. The diagonal entries of these two Hessians are clearly more significant than their off-diagonal entries. Moreover, these two Hessians also contain many zero entries. Note that the locations of the zero entries are dependent on the geometry of the array layout. Figure 3. View largeDownload slide The number of analytic terms out of which the entries of the Hessian $$\boldsymbol{H}$$ consist for two different geometric layouts, namely a regular east–west grid with N = 5 (left-hand panel) and a hexagonal grid with N = 7. The diagonal entries of these two Hessians are clearly more significant than their off-diagonal entries. Moreover, these two Hessians also contain many zero entries. Note that the locations of the zero entries are dependent on the geometry of the array layout. Equations (38) and (39) were obtained by Marthi & Chengalur (2014) by taking the derivative of the objective function Λ relative to the elements of $$\boldsymbol {g}$$ and $$\boldsymbol {y}$$, setting the intermediate results to zero and then solving for the unknown parameters (i.e. using the gradient descent algorithm). We note that their derivation is less general. The LM algorithm has better convergence properties than gradient descent and encompasses the gradient descent algorithm as a special case. In Appendix B, we show that equations (38) and (39) can also be derived using the ADI method. For this reason, we refer to the implementation of the pseudo-LM calibration scheme derived above, i.e. equations (38) and (39), as redundant stefcal throughout the rest of the paper. Interestingly, Marthi & Chengalur (2014) were not the first to make use of the ADI method to perform redundant calibration; a slower alternative ADI-based calibration algorithm is presented in Wijnholds & Noorishad (2012). The choice of the ρ parameter is somewhat unconstrained. In this paper, we chose ρ by adopting the same strategy that is used by stefcal and Marthi & Chengalur (2014), i.e. we chose ρ to be equal to $$\frac{1}{3}$$ (λ = 2). We also carried out simulations to validate this choice. We generated a skymodel that comprised of 100 flat spectrum sources distributed over a 3° by 3° sky patch. The flux density of each source was drawn from a power-law distribution with a slope of 2 and the source position was drawn from a uniform distribution. We also made use of multiple fictitious telescope layouts each one having a hexagonal geometry (see the upper-left image of Fig. 1 for an example layout). The largest hexagonal array that we used has 217 antennas, with a minimum and maximum baseline of 20 and 320 m, respectively. We corrupted visibilities by applying gain errors (see Tables 3 and 4) and calibrated the corrupted visibilities using redundant stefcal. Solutions were independently derived for each time-step and channel for five realizations. Table 3. The gain error models used in this paper. We have used the symbol x here as a proxy as it can either refer to time slots or channels. We either performed our simulations over multiple time slots and one frequency channel or one time slot and multiple frequency channels (see Table 4). Moreover, c in the leftmost column denotes the speed of light. We adopted a sinusoidal error model, similar to the sinusoidal error models used within meqtrees (Noordam & Smirnov 2010), as well as a phase slope across frequency that mimics a real case of physical delays between different antennas. In the first model, we model a gain error with amplitude around one and an additional phase error. In the second model, we chose A in such a way that we do not unintentionally produce nonsensical gain values, i.e. a zero gain value. For the third model, the value of τ was chosen so that the number of phase wraps that occur across the observing band is restricted to a realistic number. Number tag  1  2  3  Model  Sinusoidal: amplitude and phase  Sinusoidal: real and imaginary parts  Linear phase slope  Function  (A + 1)ejP  $$A\cos (2\pi fx+B)+1.5A+jC\sin (2\pi fx+D)$$  ejP  Parameters  $$A=a\cos (2\pi fx +b)$$  f = 5  P = τx    $$P =c \cos (2\pi fx +d)$$  A, C ∼ U[0.5, 10]  $$\tau = \frac{l}{c}$$    f = 5  $$B,D\sim U[0,2\pi ]$$  l ∼ U[5, 50] (m)    a ∼ U[0.8, 0.9]        c ∼ U[0.5, 5]        $$b,d\sim U[0,2\pi ]$$      Number tag  1  2  3  Model  Sinusoidal: amplitude and phase  Sinusoidal: real and imaginary parts  Linear phase slope  Function  (A + 1)ejP  $$A\cos (2\pi fx+B)+1.5A+jC\sin (2\pi fx+D)$$  ejP  Parameters  $$A=a\cos (2\pi fx +b)$$  f = 5  P = τx    $$P =c \cos (2\pi fx +d)$$  A, C ∼ U[0.5, 10]  $$\tau = \frac{l}{c}$$    f = 5  $$B,D\sim U[0,2\pi ]$$  l ∼ U[5, 50] (m)    a ∼ U[0.8, 0.9]        c ∼ U[0.5, 5]        $$b,d\sim U[0,2\pi ]$$      View Large Table 4. We generated results using two main setups. We either used one frequency channel and multiple time slots or one time slot and multiple frequency channels. The most important parameters used in realizing these two major setups are presented here. We chose the observational frequency band of setup 1 to coincide with the HERA array. To broaden the scope of our analysis, we chose the observational frequency of our second setup to be equal to 1.4 GHz, which is a typical observing frequency of the Westerbork Synthesis Radio Telescope.   Setup 1  Setup 2  Num. channels  1024  1  ν-range  100–200 MHz  1.4 GHz  Num. time slots  1  50    Setup 1  Setup 2  Num. channels  1024  1  ν-range  100–200 MHz  1.4 GHz  Num. time slots  1  50  View Large Note that our simulations are almost ideal as we did not include a primary beam response, nor did we incorporate time and frequency smearing into our simulation. We also did not explicitly define our noise in terms of the integration time, channel bandwidth, and Tsys; we instead follow the same approach described in Liu et al. (2010) and Marthi & Chengalur (2014) and make use of the definition of SNR to introduce noise into our visibilities. We use the following definition of SNR (Liu et al. 2010; Marthi & Chengalur 2014):   \begin{eqnarray} {\rm SNR} = 10\log \left(\frac{\langle \boldsymbol {v}{\odot }\overline{\boldsymbol {v}}\rangle _{\nu ,t,pq}}{\langle \boldsymbol {n}{\odot }\overline{\boldsymbol {n}}\rangle _{\nu ,t,pq}} \right), \end{eqnarray} (40)where $$\langle \boldsymbol {x}\rangle _{\nu ,t,pq}$$ denotes averaging over frequency, time, and baseline. It should be pointed out, however, that by producing visibilities with different SNR values, we are effectively either changing our integration time or our channel bandwidth or both (assuming a single Tsys value for our instrument). Fig. 4 shows the simulation results as a function of SNR and number of antennas. The accuracy of our solutions is quantified through the percentage error   \begin{eqnarray} \beta = \frac{\Vert \boldsymbol {v}- \widehat{\boldsymbol {v}}\Vert _{\rm F}^2}{\Vert \boldsymbol {v}\Vert _{\rm F}^2}, \end{eqnarray} (41)where $$\widehat{\boldsymbol {v}}$$ is the redundant stefcal parameter estimate. Figure 4. View largeDownload slide We plot the percentage error β between the simulated visibilities and the visibilities solved for by redundant stefcal for different SNR values as a function of the number of antennas (N) in the array. Figure 4. View largeDownload slide We plot the percentage error β between the simulated visibilities and the visibilities solved for by redundant stefcal for different SNR values as a function of the number of antennas (N) in the array. The error magnitude follows the expected behaviour, i.e. it decreases as a function of SNR and number of antennas N. Interestingly, it reduces to a few per cent when N > 120 for essentially any choice of SNR. 4 PCG METHOD Liu et al. (2010) suggested that the execution speed of redundant calibration could be reduced using the conjugate gradient (CG) method (Hestenes & Stiefel 1952), which would be computationally advantageous, since the Hessian matrix associated with redundant calibration (see Fig. 3) is sparse (Reid 1971). In this section, we study the computational complexity of the CG method when it is used to invert the modified Hessian matrix (see equation 28), in particular when preconditioning is used (i.e. the PCG method). Interestingly, empirical tests suggest that the unmodified Hessian itself is singular. It is therefore important to mention that the CG method can pseudo-invert the unmodified Hessian as well, i.e. the CG method can be directly applied to equation (27), because the Hessian is a positive semi-definite Hermitian matrix and the vector $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ is an element of its column range (Lu & Chen 2018). The computational complexity of the CG method is   \begin{eqnarray} O(\sqrt{\kappa }m), \end{eqnarray} (42)where m denotes the number of non-zero entries and κ denotes the spectral condition number of the matrix that is to be inverted. The spectral condition number κ of the matrix $$\boldsymbol{A}$$ is defined as   \begin{eqnarray} \kappa (\boldsymbol{A}) = \frac{\iota _{{\rm max}}}{\iota _{{\rm min}}}, \end{eqnarray} (43)where ιmax and ιmin denote the largest and the smallest eigenvalue of $$\boldsymbol{A}$$, respectively. Preconditioning is a technique used to improve the spectral condition number of a matrix. Let us consider a generic system of linear equations   \begin{eqnarray} \boldsymbol{A}\boldsymbol {x}= \boldsymbol {b}, \end{eqnarray} (44)and a positive-definite Hermitian matrix $$\boldsymbol{M}$$ so that   \begin{eqnarray} \boldsymbol{M}^{-1}\boldsymbol{A}\boldsymbol {x}= \boldsymbol{M}^{-1}\boldsymbol {b}. \end{eqnarray} (45) The matrix $$\boldsymbol{M}$$ is a good preconditioner if   \begin{eqnarray} \kappa (\boldsymbol{M}^{-1}\boldsymbol{A}) \ll \kappa (\boldsymbol{A}), \end{eqnarray} (46)i.e. if it lowers the condition number of a matrix. If $$\boldsymbol{A}$$ is a nearly diagonal matrix, the Jacobian preconditioner is a natural choice of $$\boldsymbol{M}$$ and can be computed as   \begin{eqnarray} \boldsymbol{M}= \boldsymbol{A}{\odot }\boldsymbol{I}. \end{eqnarray} (47) In order to quantify the effectiveness of the CG method in redundant calibration, we investigate the spectral condition number and the sparsity of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ (i.e. λ = 2). We generated simulated (i.e. corrupted) visibilities according to models described in Table 3. We used the complex LM algorithm described in Section 3 to calibrate the corrupted visibilities. To invert the modified Hessian, we used the CG method, with and without a Jacobian preconditioner. Fig. 5(a) shows us that preconditioning reduces the condition number of $$\boldsymbol {\mathcal {H}}$$ to a small constant value and therefore effectively eliminates it from equation (42), i.e. equation (42) reduces to   \begin{eqnarray} O(m). \end{eqnarray} (48)This result is confirmed by Fig. 5(b). Fig. 5(b) shows us that the number of major iterations needed by the PCG method to invert $$\boldsymbol {\mathcal {H}}$$ is independent of the number of antennas in the array and that it is much less than the dimension of $$\boldsymbol {\mathcal {H}}$$. Figure 5. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: spectral condition number κ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of N, before (magenta and red curves) and after (blue curve) preconditioning, at different SNR values. Right: number of major iterations required by the CG method to invert $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N in the array, before (magenta and red curves) and after preconditioning (blue and green curves), at different SNR values. Both plots show that the Jacobian preconditioner definitely speeds up the CG method. Figure 5. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: spectral condition number κ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of N, before (magenta and red curves) and after (blue curve) preconditioning, at different SNR values. Right: number of major iterations required by the CG method to invert $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N in the array, before (magenta and red curves) and after preconditioning (blue and green curves), at different SNR values. Both plots show that the Jacobian preconditioner definitely speeds up the CG method. Let us now shift our attention towards the remaining factor m. To get a useful idea of the computational complexity of CG, we must relate m and P. This can be achieved by utilizing the modified Hessian's measure of sparsity γ:   \begin{eqnarray} \gamma = \left(1 - \frac{m}{P^2} \right), \end{eqnarray} (49)where m is the number of non-zero entries in $$\boldsymbol {\mathcal {H}}$$ and P2 is total number of matrix elements. For a regular east–west geometric configuration, the sparsity and the asymptotic sparsity of $$\boldsymbol {\mathcal {H}}$$ can be derived analytically:   \begin{eqnarray} \gamma = \frac{5N^2-7N+3}{8N^2-8N+2} \qquad \gamma _{\infty } = \lim _{N\rightarrow \infty }\gamma = \frac{5}{8} . \end{eqnarray} (50) For more complicated array layouts, however, there is no straightforward analytical solution and we empirically determined the sparsity ratios for three different geometric layouts as a function of the number antennas in the array (see Fig. 6a). Figure 6. View largeDownload slide Left: the sparsity ratio γ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N for a hexagonal (blue circles), square (green crosses), and regular east–west (red circles) array geometry. The red dashed and black dotted lines show the analytical expression for γ and its limit for the regular east–west grid case (see the text for details). Right: the order of the computational cost c for inverting $$\boldsymbol {\mathcal {H}}$$ as a function of N for different array geometries (same colour scheme as in the left-hand panel). The red dashed and black dotted lines are the analytical expression of c and its limit in the east–west regular grid case. Figure 6. View largeDownload slide Left: the sparsity ratio γ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N for a hexagonal (blue circles), square (green crosses), and regular east–west (red circles) array geometry. The red dashed and black dotted lines show the analytical expression for γ and its limit for the regular east–west grid case (see the text for details). Right: the order of the computational cost c for inverting $$\boldsymbol {\mathcal {H}}$$ as a function of N for different array geometries (same colour scheme as in the left-hand panel). The red dashed and black dotted lines are the analytical expression of c and its limit in the east–west regular grid case. It now follows that   \begin{eqnarray} P^{c} = m = (1 - \gamma )P^2, \end{eqnarray} (51)which leads to   \begin{eqnarray} c = \log _{P}(1 - \gamma ) + 2 \qquad c_{\infty } = \lim _{N\rightarrow \infty } c = 2. \end{eqnarray} (52)The computational complexity is therefore asymptotically bounded by O(P2), although it converges very slowly to its asymptotic value, and in general is equal to O(Pc) (with c < 2). In the case of a hexagonal geometric layout with N < 200, we have that c ∼ 1.7 (see Fig. 6b). We are now finally able to compare the computational complexity of redundant stefcal and the PCG method. Redundant stefcal is computationally inexpensive as it just needs to invert a diagonal matrix; however, the PCG inversion is accurate and, therefore, may require fewer iterations, ultimately leading to a faster convergence. We computed the theoretical number of redundant stefcal iterations Δk in excess of the number of LM (implementing PCG) iterations required for LM to outperform redundant stefcal  \begin{eqnarray} (k+\Delta k)P > kP^c, \qquad \Delta k > k(P^{c-1}-1). \end{eqnarray} (53)We then compared Δk with the empirically obtained average excess number of iterations. The results are displayed in Fig. 7, which shows that redundant stefcal outperforms the PCG method. We note that, in this comparison, we have only taken into account the cost of inverting $$\boldsymbol {\mathcal {H}}$$ and ignored the cost of preconditioning. Figure 7. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: number of LM iterations required by redundant stefcal (green and blue curves) and the PCG (magenta and red curves) methods to converge to a parameter error tolerance of 10−6 whilst using different SNR values as a function of the number of antennas N in the array. Right: average number of LM iterations (difference between the redundant stefcal and PCG curves in the left-hand panel) saved (green and blue curve) by computing the full inverse with the PCG method whilst using different SNR values. The black curve is Δk, the theoretical number of redundant stefcal iterations that are needed in excess of the number of LM iterations used for the LM algorithm to outperform redundant stefcal. For the black curve, we assumed c = 1.7 and that k could be approximated with the magenta curve plotted in the left-hand panel. Figure 7. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: number of LM iterations required by redundant stefcal (green and blue curves) and the PCG (magenta and red curves) methods to converge to a parameter error tolerance of 10−6 whilst using different SNR values as a function of the number of antennas N in the array. Right: average number of LM iterations (difference between the redundant stefcal and PCG curves in the left-hand panel) saved (green and blue curve) by computing the full inverse with the PCG method whilst using different SNR values. The black curve is Δk, the theoretical number of redundant stefcal iterations that are needed in excess of the number of LM iterations used for the LM algorithm to outperform redundant stefcal. For the black curve, we assumed c = 1.7 and that k could be approximated with the magenta curve plotted in the left-hand panel. 5 CONCLUSION In this paper, we have formulated the calibration of redundant interferometric arrays using the complex optimization formalism (Smirnov & Tasse 2015). We derived the associated complex Jacobian and the GN and LM parameter update steps. We also showed that the LM parameter update step can be simplified to obtain an ADI type of algorithm (Marthi & Chengalur 2014; Salvini & Wijnholds 2014). Our code implementation of this algorithm (redundant stefcal) is publicly available at https://github.com/Trienko/heracommissioning/blob/master/code/stef.py. We note that, in its current implementation, redundant stefcal does not solve for the degeneracies inherent to redundant calibration (Zheng et al. 2014; Kurien et al. 2016; Dillon et al. 2017), which will be the subject of future work. Compared to current redundant calibration algorithms, redundant stefcal is more robust to initial conditions and allows for easier parallelization. We investigated the computational cost of redundant stefcal and compared it with the performance of the PCG method (as suggested by Liu et al. 2010). We found that, although the PCG method greatly improves the speed of redundant calibration, it still significantly underperforms when compared to redundant stefcal. The characteristics of redundant stefcal make it an appealing calibration algorithm for large redundant arrays like HERA (DeBoer et al. 2017), CHIME (Bandura et al. 2014), HIRAX (Newburgh et al. 2016), or even hybrid arrays like the MWA (Tingay et al. 2013) in its updated phase. 6 FUTURE OUTLOOK We plan to apply redundant stefcal to HERA data in the near future. The reason for applying redundant stefcal to real data is twofold. First, we wish to validate the theoretical computational complexity of redundant stefcal (derived earlier in the paper). Secondly, we would like to test whether it could be used to calibrate HERA in near-realtime. We are also interested in parallelizing our current implementation and to see how much can be gained by doing so. We would also like to conduct a perturbation analysis, similar to the one conducted by Liu et al. (2010), to estimate the error that is introduced into our estimated gains and visibilities when the array contains baselines that are not perfectly redundant. We are also interested in quantifying the error that is introduced into the estimated gains and visibilities because of differing primary beam patterns between array elements (Noorishad et al. 2012). ACKNOWLEDGEMENTS This work is based upon research supported by the South African Research Chairs Initiative of the Department of Science and Technology and National Research Foundation. GB acknowledges support from the Royal Society and the Newton Fund under grant NA150184. This work is also based on the research supported in part by the National Research Foundation of South Africa (grant No. 103424). We would like to thank the anonymous reviewer for their useful comments, which greatly improved the quality of the paper. Footnotes 1 A matrix composed of first-order partial derivatives. 2 Element-wise product. 3 A square matrix composed of second-order partial derivatives. 4 k ∈ {0, 2, …}. 5 k ∈ {1, 3, …}. REFERENCES Ali Z. S. et al.  , 2015, ApJ , 809, 61 CrossRef Search ADS   Bandura K. et al.  , 2014, Proc. SPIE , 9145, 914522 Barry N., Hazelton B., Sullivan I., Morales M. F., Pober J. C., 2016, MNRAS , 461, 3135 CrossRef Search ADS   Bernardi G. et al.  , 2009, A&A , 500, 965 CrossRef Search ADS   Bernardi G. et al.  , 2010, A&A , 522, A67 CrossRef Search ADS   DeBoer D. R. et al.  , 2017, PASP , 129, 045001 CrossRef Search ADS   Dillon J. S., Parsons A. R., 2016, ApJ , 826, 181 CrossRef Search ADS   Dillon J. S. et al.  , 2014, Phys. Rev. D , 89, 023002 CrossRef Search ADS   Dillon J. S. et al.  , 2017, preprint (arXiv:1712.07212) Ewall-Wice A., Dillon J. S., Liu A., Hewitt J., 2017, MNRAS , 470, 1849 CrossRef Search ADS   Furlanetto S. R., 2016, in Mesinger A., ed., Astrophysics and Space Science Library, Vol. 423, Understanding the Epoch of Cosmic Reionization: Challenges and Progress . Springer-Verlag, Berlin, p. 247 Ghosh A., Prasad J., Bharadwaj S., Ali S. S., Chengalur J. N., 2012, MNRAS , 426, 3295 CrossRef Search ADS   Grobler T. L., Nunhokee C. D., Smirnov O. M., van Zyl A. J., de Bruyn A. G., 2014, MNRAS , 439, 4030 CrossRef Search ADS   Hamaker J. P., Bregman J. D., Sault R. J., 1996, A&AS , 117, 137 CrossRef Search ADS   Hestenes M. R., Stiefel E., 1952, J. Res. Natl. Bur. Stand. , 49, 409 CrossRef Search ADS   Kurien B. G., Tarokh V., Rachlin Y., Shah V. N., Ashcom J. B., 2016, MNRAS , 461, 3585 CrossRef Search ADS   Liu S., Trenkler G., 2008, Int. J. Inf. Syst. Sci , 4, 160 Liu A., Tegmark M., Morrison S., Lutomirski A., Zaldarriaga M., 2010, MNRAS , 408, 1029 CrossRef Search ADS   Lu Z., Chen X., 2018, Math. Oper. Res. , 43, 275 McQuinn M., 2016, ARA&A , 54, 313 CrossRef Search ADS   Madsen K., Nielsen H. B., Tingleff O., 2004, Methods for Non-Linear Least Squares Problems , 2nd edn. Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Lyngby Marthi V. R., Chengalur J., 2014, MNRAS , 437, 524 CrossRef Search ADS   Mitchell D. A., Greenhill L. J., Wayth R. B., Sault R. J., Lonsdale C. J., Cappallo R. J., Morales M. F., Ord S. M., 2008, IEEE J. Sel. Top. Signal Process. , 2, 707 CrossRef Search ADS   Newburgh L. B. et al.  , 2016, Proc. SPIE , 9906, 99065X Noordam J., De Bruyn A., 1982, Nature , 299, 597 CrossRef Search ADS   Noordam J. E., Smirnov O. M., 2010, A&A , 524, A61 CrossRef Search ADS   Noorishad P., Wijnholds S. J., van Ardenne A., van der Hulst J., 2012, A&A , 545, A108 CrossRef Search ADS   Parsons A., Pober J., McQuinn M., Jacobs D., Aguirre J., 2012, ApJ , 753, 81 CrossRef Search ADS   Parsons A. R. et al.  , 2014, ApJ , 788, 106 CrossRef Search ADS   Pearson T., Readhead A., 1984, ARA&A , 22, 97 CrossRef Search ADS   Reid J. K. ed., 1971, Proc. Conf. on Large Sparse Sets of Linear Equations, On the Method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations . Academic Press, New York, p. 231 Salvini S., Wijnholds S. J., 2014, A&A , 571, A97 CrossRef Search ADS   Sault R. J., Hamaker J. P., Bregman J. D., 1996, A&AS , 117, 149 CrossRef Search ADS   Sievers J., 2017, preprint (arXiv:1701.01860) Smirnov O. M., 2011a, A&A , 527, A106 CrossRef Search ADS   Smirnov O. M., 2011b, A&A , 527, A108 CrossRef Search ADS   Smirnov O., Tasse C., 2015, MNRAS , 449, 2668 CrossRef Search ADS   Tingay S. J. et al.  , 2013, PASA , 30, e007 CrossRef Search ADS   Wieringa M. H., 1992, Exp. Astron. , 2, 203 CrossRef Search ADS   Wijnholds S. J., Noorishad P., 2012, in Bartleson K., ed., Signal Processing Conference (EUSIPCO), Statistically Optimal Self-Calibration of Regular Imaging Arrays. IEEE, Piscataway, NJ , p. 1304 Wirtinger W., 1927, Math. Ann. , 97, 357 CrossRef Search ADS   Zheng H. et al.  , 2014, MNRAS , 445, 1084 CrossRef Search ADS   APPENDIX A: DERIVATION OF REDUNDANT WIRTINGER CALIBRATION If we apply the definition in equation (24) to equation (23), we obtain the following analytic result:   \begin{eqnarray} \boldsymbol{J}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{M}& \boldsymbol{N} \\ \overline{\boldsymbol{N}} & \overline{\boldsymbol{M}} \end{array}\right]}, \end{eqnarray} (A1)where   \begin{eqnarray} \boldsymbol{M}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{O}& \boldsymbol{P}\end{array}\right]} \end{eqnarray} (A2)and   \begin{eqnarray} \boldsymbol{N}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{Q}& \boldsymbol{0}\end{array}\right]}. \end{eqnarray} (A3)Moreover,   \begin{eqnarray} \left[ \boldsymbol{O}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad}l}y_{\phi _{pq}}\overline{g_q} & {\rm if}\,p=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A4)  \begin{eqnarray} \left[ \boldsymbol{P}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad }l}g_p\overline{g_q} & {\rm if}\,\phi _{pq}=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A5)and   \begin{eqnarray} \left[ \boldsymbol{Q}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad}l}g_py_{\phi _{pq}} & {\rm if}\,q=j \\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A6) We use $$\boldsymbol{0}$$ to denote an all-zero matrix. It is now trivial to compute the Hessian $$\boldsymbol{H}$$ by using equation (A1). If we substitute equation (A1) into $$\boldsymbol{J}^H\boldsymbol{J}$$, we obtain   \begin{eqnarray} \boldsymbol{H}= \boldsymbol{J}^H\boldsymbol{J}= {\left[\begin{array}{l{@}{\quad}l}\boldsymbol{A}& \boldsymbol{B}\\ \overline{\boldsymbol{B}} & \overline{\boldsymbol{A}} \end{array}\right]}, \end{eqnarray} (A7)where   \begin{eqnarray} \boldsymbol{A}={\left[\begin{array}{l{@}{\quad}l}\boldsymbol{C}& \boldsymbol{D}\\ \boldsymbol{D}^H & \boldsymbol{E}\end{array}\right]}, \qquad \boldsymbol{B}={\left[\begin{array}{l{@}{\quad}l}\boldsymbol{F}& \boldsymbol{G}\\ \boldsymbol{G}^T & \boldsymbol{0}\end{array}\right]}, \end{eqnarray} (A8)  \begin{eqnarray} [\boldsymbol{C}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l}\sum _{k \ne i} \left| g_k \right|^2 \left| y_{\zeta _{ik}} \right|^2 & {\rm if}\,i=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A9)  \begin{eqnarray} [\boldsymbol{D}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i \overline{y}_j \left| g_{\psi _{ij}} \right|^2 & {\rm if}\,\psi _{ij}\ne 0 \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A10)  \begin{eqnarray} [\boldsymbol{E}]_{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\sum _{rs \in \mathcal {RS}_i} \left| g_r \right|^2 \left| g_s \right|^2 & {\rm if}\,i=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A11)  \begin{eqnarray} [\boldsymbol{F}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i g_j \left| y_{\zeta _{ij}} \right|^2 & {\rm if}\,i \ne j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A12)and   \begin{eqnarray} [\boldsymbol{G}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i y_j \left| g_{\xi _{ij}} \right|^2 & {\rm if}\,\xi _{ij}\ne 0 \\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A13)Moreover,   \begin{eqnarray} \mathcal {RS}_i = \left\lbrace rs\in \mathbb {N}^2|(\phi _{rs} = i) \right\rbrace , \end{eqnarray} (A14)  \begin{eqnarray} \xi _{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}p&{\rm if}\,\exists ! p \in \mathbb {N} s.t. (\phi _{pi} = j)\\ 0&{\rm otherwise} \end{array}\right., \end{eqnarray} (A15)and   \begin{eqnarray} \psi _{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} q&{\rm if}\,\exists ! q \in \mathbb {N} s.t. (\phi _{iq} = j) \\ 0&{\rm otherwise} \end{array}\right.. \end{eqnarray} (A16) Furthermore, substituting equation (A1) into $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ results in   \begin{eqnarray} \boldsymbol{J}^H\breve{\boldsymbol {r}} ={\left[\begin{array}{l}\boldsymbol {a}\\ \boldsymbol {b}\\ \overline{\boldsymbol {a}}\\ \overline{\boldsymbol {b}} \end{array}\right]}, \end{eqnarray} (A17)where   \begin{eqnarray} \left[ \boldsymbol {a}\right]_i = \sum _{k\ne i} g_k \widetilde{y}_{ik}r_{ik}, \qquad \left[ \boldsymbol {b}\right]_i = \sum _{rs\in \mathcal {RS}_i}\overline{g}_r g_s r_{rs}, \end{eqnarray} (A18)and   \begin{eqnarray} \widetilde{y}_{ik} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\overline{y}_{\zeta _{ik}} & {\rm if}\,\,k > i\\ y_{\zeta _{ik}} & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A19)Additionally, $$\boldsymbol {a}$$ and $$\boldsymbol {b}$$ are both column vectors. The lengths of $$\boldsymbol {a}$$ and $$\boldsymbol {b}$$ are N and L, respectively. The dimensions of the matrices we defined in this section are presented in Table A1. Table A1. The dimensions of the matrices defined in Appendix A. Matrix  Dimension  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol{M}$$  B × (N + L)  $$\boldsymbol{N}$$  B × (N + L)  $$\boldsymbol{O}$$  B × N  $$\boldsymbol{P}$$  B × L  $$\boldsymbol{Q}$$  B × N  $$\boldsymbol{0}$$  B × L  $$\boldsymbol{H}$$  P × P  $$\boldsymbol{A}$$  (N + L) × (N + L)  $$\boldsymbol{B}$$  (N + L) × (N + L)  $$\boldsymbol{C}$$  N × N  $$\boldsymbol{D}$$  N × L  $$\boldsymbol{E}$$  L × L  $$\boldsymbol{F}$$  N × N  $$\boldsymbol{G}$$  N × L  $$\boldsymbol{0}$$  L × L  Matrix  Dimension  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol{M}$$  B × (N + L)  $$\boldsymbol{N}$$  B × (N + L)  $$\boldsymbol{O}$$  B × N  $$\boldsymbol{P}$$  B × L  $$\boldsymbol{Q}$$  B × N  $$\boldsymbol{0}$$  B × L  $$\boldsymbol{H}$$  P × P  $$\boldsymbol{A}$$  (N + L) × (N + L)  $$\boldsymbol{B}$$  (N + L) × (N + L)  $$\boldsymbol{C}$$  N × N  $$\boldsymbol{D}$$  N × L  $$\boldsymbol{E}$$  L × L  $$\boldsymbol{F}$$  N × N  $$\boldsymbol{G}$$  N × L  $$\boldsymbol{0}$$  L × L  View Large Furthermore,   \begin{eqnarray} \frac{1}{3}\boldsymbol{J}\breve{\boldsymbol {z}} = \breve{\boldsymbol {v}}, \qquad \boldsymbol{J}^H\breve{\boldsymbol {v}} = (\boldsymbol{I}{\odot }\boldsymbol{H})\breve{\boldsymbol {z}}. \end{eqnarray} (A20)The above identities can be trivially established by mechanically showing that the left-hand side of each expression in equation (A20) is equal to its right-hand side. APPENDIX B: ADI The basic skymodel-based stefcal update step is equal to the leftmost term in equation (38) (barring ρ; Salvini & Wijnholds 2014). Assume without any loss of generality that the array is in an east–west regular grid. Furthermore, assume that $$\boldsymbol{d}$$ (see equation 3) has been reordered and that the result of this reordering is the following:   \begin{eqnarray} \widetilde{\boldsymbol{d}} = \left[d_{12}, \ldots ,d_{N-1,N},d_{13}, \ldots ,d_{N-2,N}, \ldots ,d_{1N}\right]^T. \end{eqnarray} (B1)The vector $$\widetilde{\boldsymbol {n}}$$ should be interpreted in a similar manner. Equation (3) can now be rewritten as   \begin{eqnarray} \widetilde{\boldsymbol{d}} = \boldsymbol{J}\boldsymbol{y} + \widetilde{\boldsymbol {n}}, \end{eqnarray} (B2)if we assume that $$\boldsymbol{g}$$ and its conjugate are known vectors. In equation (B2),   \begin{eqnarray} \boldsymbol{J} = {\left[\begin{array}{l{@}{\qquad}l{@}{\qquad}l{@}{\qquad}l}g_1\overline{g}_2 & 0 & \cdots & 0\\ g_2\overline{g}_3 & 0 & \cdots & 0\\ \vdots & 0 & \cdots & 0\\ g_{N-1}\overline{g}_N & 0 & \cdots & 0\\ 0 & g_1\overline{g}_3 & \cdots & 0\\ 0 & \vdots & \cdots & 0\\ 0 & g_{N-2}\overline{g}_N & \cdots & 0\\ 0 & 0 & \cdots & 0\\ & \vdots & \\ 0 & 0 & \cdots & g_1\overline{g}_N\\ \end{array}\right]}. \end{eqnarray} (B3)We can now estimate $$\boldsymbol{y}$$ with   \begin{eqnarray} \boldsymbol{y} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\widetilde{\boldsymbol{d}}, \end{eqnarray} (B4)where   \begin{eqnarray} [\boldsymbol{J}^H\boldsymbol{J}]_{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\sum _{rs\in \mathcal {RS}_i} |g_r|^2|g_s|^2 &{\rm if}\,\,i=j\\ 0&{\rm otherwise} \end{array}\right., \end{eqnarray} (B5)and   \begin{eqnarray} [\boldsymbol{J}^H\widetilde{\boldsymbol{d}}]_i = \sum _{rs\in \mathcal {RS}_i} \overline{g}_r g_s d_{rs}. \end{eqnarray} (B6)Substituting equations (B5) and (B6) into equation (B4) and simplifying the result leads to the leftmost term in equation (39) (if we bar ρ and consider only the ith entry of $$\boldsymbol {y}$$). © 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Monthly Notices of the Royal Astronomical Society Oxford University Press

# Redundant interferometric calibration as a complex optimization problem

, Volume 476 (2) – May 1, 2018
11 pages

/lp/ou_press/redundant-interferometric-calibration-as-a-complex-optimization-glErTDdN5h
Publisher
The Royal Astronomical Society
ISSN
0035-8711
eISSN
1365-2966
D.O.I.
10.1093/mnras/sty357
Publisher site
See Article on Publisher Site

### Abstract

Abstract Observations of the redshifted 21 cm line from the epoch of reionization have recently motivated the construction of low-frequency radio arrays with highly redundant configurations. These configurations provide an alternative calibration strategy – ‘redundant calibration’ – and boost sensitivity on specific spatial scales. In this paper, we formulate calibration of redundant interferometric arrays as a complex optimization problem. We solve this optimization problem via the Levenberg–Marquardt algorithm. This calibration approach is more robust to initial conditions than current algorithms and, by leveraging an approximate matrix inversion, allows for further optimization and an efficient implementation (‘redundant stefcal’). We also investigated using the preconditioned conjugate gradient method as an alternative to the approximate matrix inverse, but found that its computational performance is not competitive with respect to ‘redundant stefcal’. The efficient implementation of this new algorithm is made publicly available. instrumentation: interferometers, methods: data analysis, methods: numerical, techniques: interferometric, cosmology: observations 1 INTRODUCTION The quest for the redshifted 21 cm line from the epoch of reionization (EoR) is a frontier of modern observational cosmology and has motivated the construction of a series of new interferometric arrays operating at low frequencies over the last decade. EoR measurements are challenging because of the intrinsic faintness of the cosmological signal (see for instance, Furlanetto 2016; McQuinn 2016, for recent reviews) buried underneath foreground emission, which is a few orders of magnitude brighter than the EoR anywhere in the sky (e.g. Bernardi et al. 2009, 2010; Ghosh et al. 2012; Dillon et al. 2014; Parsons et al. 2014). As miscalibrated foreground emission leads to artefacts that can jeopardize the EoR signal (e.g. Grobler et al. 2014; Barry et al. 2016; Ewall-Wice et al. 2017), an exceptional interferometric calibration is required. Such calibration needs accurate knowledge of both the sky emission and the instrumental response (e.g. Smirnov 2011b). A skymodel is required for standard calibration as it would be an ill-posed problem if one tries to directly solve for all of the unknowns, i.e. the antenna gains and the uncorrupted visibilities, without first attempting to simplify the calibration problem. When the uncorrupted visibilities, however, are predicted from a predefined skymodel, the problem simplifies and becomes solvable. In contrast, if an array is deployed in a redundant configuration, i.e. with receiving elements placed on regular grids so that multiple baselines measure the same sky brightness emission, we circumvent the need for a skymodel altogether. This is true, since redundancy leads to fewer unknowns (i.e. the number of unique uv-modes) and if the array is redundant enough, it reduces the number of unknowns to such an extent that the calibration problem becomes solvable without having to predict the uncorrupted visibilities from a skymodel. Redundancy, therefore, is a promising path to achieve highly accurate interferometric calibration (Noordam & De Bruyn 1982; Pearson & Readhead 1984; Wieringa 1992; Liu et al. 2010; Noorishad et al. 2012; Marthi & Chengalur 2014; Sievers 2017). Moreover, redundant configurations provide a sensitivity boost on particular spatial scales that can be tuned to be the highest signal-to-noise ratio (SNR) EoR modes (Parsons et al. 2012; Dillon & Parsons 2016). These reasons motivated the design and deployment of EoR arrays in redundant configurations like the MIT-EoR (Zheng et al. 2014), the Precision Array to Probe the Epoch of Reionization (Ali et al. 2015), and the Hydrogen Epoch of Reionization Array (HERA; DeBoer et al. 2017). In this paper, we present a new algorithm to calibrate redundant arrays based on the complex optimization formalism recently introduced by Smirnov & Tasse (2015). With respect to current algorithms, it is more robust to initial conditions, while remaining comparatively fast. We also show that given certain approximations this new algorithm reduces to the redundant calibration equivalent of the stefcal algorithm (Salvini & Wijnholds 2014). We investigate the speed-up that the preconditioned conjugate gradient (PCG) method provides if it is employed by the new algorithm (Liu et al. 2010). A comparison between the computational complexity of the optimized new algorithm and redundant stefcal is also performed. A summary of the notation that is used within the paper is presented in Table 1. We also discuss the overset notation used within the paper in a bit more detail below (we use x as an operand proxy in this paper): $$\widetilde{x}$$ – This notation is used to denote a new scalar value that was derived from the scalar x using a proper mathematical definition. $$\overline{x}$$ – This notation denotes the conjugation of its operand (the conjugation of x). For the readers’ convenience, we will redefine this operator when it is first used within the paper. $$\widehat{x}$$ – This notation denotes that the quantity is an estimated value. $$\breve{\boldsymbol {x}}$$ – This notation denotes an augmented vector, i.e. $$\breve{\boldsymbol {x}} = [\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$. Table 1. Notation and frequently used symbols. x, $$\boldsymbol {x}$$, and $$\boldsymbol{X}$$  Scalar, vector, and matrix  $$\mathbb {N}$$  Set of natural numbers  $$\mathbb {C}$$  Set of complex numbers  $$\boldsymbol{I}$$  Identity matrix  αpq, ϕpq, ζpq, ξpq, and  Indexing functions  $$\psi _{pq}:\mathbb {N}^2\rightarrow \mathbb {N}$$    $$\overline{x}$$  Conjugation  $$\overline{\boldsymbol {x}}, \overline{\boldsymbol{X}}$$  Element-wise conjugation  x2  Square  $$\Vert \boldsymbol {x}\Vert _{\rm F}$$  Frobenius norm  $$\boldsymbol{X}^{-1}$$  Matrix inversion  ⊙  Hadamard product    (element-wise product)  $$\boldsymbol{X}^H$$  Hermitian transpose  $$\boldsymbol {x}^T, \boldsymbol{X}^T$$  Transpose  $$\langle \boldsymbol {x}\rangle _x$$  Averaging over x  $$\frac{\mathrm{\partial} }{\mathrm{\partial} z}$$  Wirtinger derivative  $$\widetilde{x}$$  A new quantity derived from x  $$\widehat{x}$$  Estimated quantity  $$\breve{\boldsymbol {x}}$$  Augmented vector $$[\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$  i  $$\sqrt{-1}$$  N, B, L, and P  Number of antennas, baselines,    redundant groups, and parameters  $$\boldsymbol {d}$$, $$\boldsymbol {v}$$, $$\boldsymbol {g}$$, $$\boldsymbol {y}$$, $$\boldsymbol {r}$$, and $$\boldsymbol {n}$$  Data, predicted, antenna,    true visibility, residual,    and noise vector  $$\boldsymbol {z}$$  $$\boldsymbol {z}= [\boldsymbol {g}^T, \boldsymbol {y}^T]^T$$  Λ  Objective function  λ and ρ  Damping factor and $$\rho = \frac{1}{1+\lambda }$$  $$\boldsymbol{J}$$  Jacobian matrix  $$\boldsymbol{H}$$  $$\boldsymbol{J}^H\boldsymbol{J}$$ (Hessian matrix)  $$\boldsymbol {\mathcal {H}}$$  Modified Hessian matrix  $$[\boldsymbol{X}]_{ij}$$  Element ij of matrix $$\boldsymbol{X}$$  xij  Composite antenna index  xi  Antenna or redundant group index  xk  Iteration number  $$\boldsymbol{M}$$  Preconditioner matrix  m  Number of non-zero entries    in a matrix  κ  Spectral condition number    of a matrix  γ  Sparsity factor of a matrix  Δx  Parameter update  ∃!  There exists a unique  x, $$\boldsymbol {x}$$, and $$\boldsymbol{X}$$  Scalar, vector, and matrix  $$\mathbb {N}$$  Set of natural numbers  $$\mathbb {C}$$  Set of complex numbers  $$\boldsymbol{I}$$  Identity matrix  αpq, ϕpq, ζpq, ξpq, and  Indexing functions  $$\psi _{pq}:\mathbb {N}^2\rightarrow \mathbb {N}$$    $$\overline{x}$$  Conjugation  $$\overline{\boldsymbol {x}}, \overline{\boldsymbol{X}}$$  Element-wise conjugation  x2  Square  $$\Vert \boldsymbol {x}\Vert _{\rm F}$$  Frobenius norm  $$\boldsymbol{X}^{-1}$$  Matrix inversion  ⊙  Hadamard product    (element-wise product)  $$\boldsymbol{X}^H$$  Hermitian transpose  $$\boldsymbol {x}^T, \boldsymbol{X}^T$$  Transpose  $$\langle \boldsymbol {x}\rangle _x$$  Averaging over x  $$\frac{\mathrm{\partial} }{\mathrm{\partial} z}$$  Wirtinger derivative  $$\widetilde{x}$$  A new quantity derived from x  $$\widehat{x}$$  Estimated quantity  $$\breve{\boldsymbol {x}}$$  Augmented vector $$[\boldsymbol {x}^T, \overline{\boldsymbol {x}}^T]^T$$  i  $$\sqrt{-1}$$  N, B, L, and P  Number of antennas, baselines,    redundant groups, and parameters  $$\boldsymbol {d}$$, $$\boldsymbol {v}$$, $$\boldsymbol {g}$$, $$\boldsymbol {y}$$, $$\boldsymbol {r}$$, and $$\boldsymbol {n}$$  Data, predicted, antenna,    true visibility, residual,    and noise vector  $$\boldsymbol {z}$$  $$\boldsymbol {z}= [\boldsymbol {g}^T, \boldsymbol {y}^T]^T$$  Λ  Objective function  λ and ρ  Damping factor and $$\rho = \frac{1}{1+\lambda }$$  $$\boldsymbol{J}$$  Jacobian matrix  $$\boldsymbol{H}$$  $$\boldsymbol{J}^H\boldsymbol{J}$$ (Hessian matrix)  $$\boldsymbol {\mathcal {H}}$$  Modified Hessian matrix  $$[\boldsymbol{X}]_{ij}$$  Element ij of matrix $$\boldsymbol{X}$$  xij  Composite antenna index  xi  Antenna or redundant group index  xk  Iteration number  $$\boldsymbol{M}$$  Preconditioner matrix  m  Number of non-zero entries    in a matrix  κ  Spectral condition number    of a matrix  γ  Sparsity factor of a matrix  Δx  Parameter update  ∃!  There exists a unique  View Large The paper is organized as follows: we review the complex calibration formalism by Smirnov & Tasse (2015) in Section 2, we extend it to the redundant case in Section 3, we present some computational complexity results in Section 4, and we present our conclusions in Section 5. 2 WIRTINGER CALIBRATION In a radio interferometer, the true sky visibilities ypq measured by a baseline formed by antenna p and q are always ‘corrupted’ by the non-ideal response of the receiver, which is often incorporated into a single, receiver-based complex number g (i.e. an antenna gain). The observed visibility dpq is therefore given by (Hamaker, Bregman & Sault 1996; Sault, Hamaker & Bregman 1996; Smirnov 2011a)   \begin{eqnarray} d_{pq} = g_{p}\overline{g_q} \, y_{pq} + n_{pq}, \end{eqnarray} (1)where $$\overline{x}$$ indicates complex conjugation and npq is the thermal noise component. The real and imaginary components of the thermal noise are normally distributed with a mean of zero and a standard deviation σ:   \begin{eqnarray} \sigma \propto \frac{T_{{\rm sys}}}{\sqrt{\Delta \nu \tau }}, \end{eqnarray} (2)where Tsys is equal to the system temperature, Δν is the observational bandwidth, and τ is the integration time per visibility. Considering the number of visibilities B (i.e. baselines) measured by an array of N elements $$B = \frac{N^2-N}{2}$$, equation (1) can be expressed in the following vector form:   \begin{eqnarray} \boldsymbol {d}= \boldsymbol {v}+ \boldsymbol {n}, \end{eqnarray} (3)where   \begin{eqnarray} \left[ \boldsymbol {d}\right]_{\alpha _{pq}} &=& d_{pq},\quad \left[ \boldsymbol {v}\right]_{\alpha _{pq}} = v_{pq}=g_p y_{pq} \overline{g_q}, \nonumber \\ \left[ \boldsymbol {n}\right]_{\alpha _{pq}} &=& n_{pq}, \end{eqnarray} (4)and   \begin{eqnarray} \alpha _{pq} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}(q-p) + (p-1)\left(N-\frac{1}{2}p \right) & {\rm if}\,p<q\\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (5)The function αpq therefore maps composite antenna indices to unique single indices, i.e.   \begin{eqnarray} \lbrace \alpha _{12}, \alpha _{13}, \ldots , \alpha _{N-1N}\rbrace = \lbrace 1,2, \ldots ,B\rbrace . \end{eqnarray} (6)The vectors in equation (4) are column vectors of size B (i.e. p < q). It is important to point out here that all of the mathematical definitions in this paper assume that we are only considering composite indices belonging to the set {rs|r < s}. For the sake of mathematical completeness, however, αpq is defined for composite indices that do not belong to this set. Also note that, while the indexing function αpq does not attain the value zero, in practice the zero indexing value does play a very important role in some of the definitions used within this paper (see for example equation A10). Radio interferometric calibration aims to determine the best estimate of $$\boldsymbol {g}= [g_1,g_2, \ldots ,g_N]^T$$ in order to correct the data and, following equation (3), can be formulated as a non-linear least-squares optimization problem:   \begin{eqnarray} \min _{\boldsymbol {g}} \Lambda (\boldsymbol {g}) = \min _{\boldsymbol {g}} \Vert \boldsymbol {r}\Vert _{\rm F}^2 = \min _{\boldsymbol {g}} \Vert \boldsymbol {d}- \boldsymbol {v}(\boldsymbol {g})\Vert _{\rm F}^2, \end{eqnarray} (7)where Λ is the objective function, $$\boldsymbol {r}$$ is the residual vector, and $$\Vert \boldsymbol {x}\Vert _{\rm F}$$ denotes the Frobenius norm. In standard interferometric calibration, ypq is assumed to be known at some level, for instance through the observation of previously known calibration sources. Non-linear least-squares problems are generally solved by using gradient-based minimization algorithms [i.e. Gauss–Newton (GN) or Levenberg–Marquardt (LM)] that require the model [$$\boldsymbol {v}$$ in equation (7)] to be differentiable towards each parameter. When the least-squares problem is complex, it becomes less straightforward to apply these gradient-based minimization methods, as many complex functions are not differentiable if the classic notion of differentiation is used, i.e. $$\frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} z}$$ does not exist if $$z \in \mathbb {C}$$. In order to circumvent the differentiability conundrum associated with complex least-squares problems, standard interferometric calibration divides the complex optimization problem into its real and imaginary parts and solves for the real and imaginary parts of the unknown model parameters separately. Smirnov & Tasse (2015) showed, however, that this approach is not needed if complex calculus (Wirtinger 1927) is adopted. The Wirtinger derivatives are defined as   \begin{eqnarray} \frac{\mathrm{\partial} }{\mathrm{\partial} z} = \frac{1}{2}\left(\frac{\mathrm{\partial} }{\mathrm{\partial} x} - {\rm i} \frac{\mathrm{\partial} }{\mathrm{\partial} y} \right),\quad\frac{\mathrm{\partial} }{\mathrm{\partial} \overline{z}} = \frac{1}{2}\left(\frac{\mathrm{\partial} }{\mathrm{\partial} x} + {\rm i} \frac{\mathrm{\partial} }{\mathrm{\partial} y} \right), \end{eqnarray} (8)which lead to the following relations:   \begin{eqnarray} \frac{\mathrm{\partial} z}{\mathrm{\partial} z} = 1,\quad \frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} z} =0,\quad \frac{\mathrm{\partial} z}{\mathrm{\partial} \overline{z}} = 0,\quad \frac{\mathrm{\partial} \overline{z}}{\mathrm{\partial} \overline{z}} =1. \end{eqnarray} (9)If the gradient operator is defined using equation (8), the model $$\boldsymbol {v}$$ now becomes analytic in both $$\boldsymbol {g}$$ and $$\overline{\boldsymbol {g}}$$, and equation (9) can be used to derive the complex variants of the real-valued GN and LM algorithms. In the complex GN and LM algorithms, complex parameters and their conjugates are treated as separate variables. Assuming that ypq is known, equation (7) is recast as (Smirnov & Tasse 2015)   \begin{eqnarray} \min _{\breve{\boldsymbol {g}}} \Lambda (\breve{\boldsymbol {g}}) = \min _{\breve{\boldsymbol {g}}} \Vert \breve{\boldsymbol {r}}\Vert _{\rm F}^2 = \min _{\breve{\boldsymbol {g}}} \Vert \breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}(\breve{\boldsymbol {g}})\Vert _{\rm F}^2, \end{eqnarray} (10)where $$\breve{\boldsymbol {r}} = [\boldsymbol {r}^T, \overline{\boldsymbol {r}}^T]^T$$, $$\breve{\boldsymbol {d}} = [\boldsymbol {d}^T, \overline{\boldsymbol {d}}^T]^T$$, $$\breve{\boldsymbol {v}} = [\boldsymbol {v}^T, \overline{\boldsymbol {v}}^T]^T$$, and $$\breve{\boldsymbol {g}} = [\boldsymbol {g}^T, \overline{\boldsymbol {g}}^T]^T$$. The complex GN update is therefore defined as   \begin{eqnarray} \Delta \breve{\boldsymbol {g}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (11)with   \begin{eqnarray} {\boldsymbol{J}}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol {\mathcal {J}}_1 & \boldsymbol {\mathcal {J}}_2\\ \overline{\boldsymbol {\mathcal {J}}}_2 & \overline{\boldsymbol {\mathcal {J}}}_1 \end{array}\right]}, \end{eqnarray} (12)and   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_1]_{\alpha _{pq},i} = \frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} g_i},\quad [\boldsymbol {\mathcal {J}}_2]_{\alpha _{pq},i} = \frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{g}_i}. \end{eqnarray} (13)The matrix $$\boldsymbol{J}$$ is generally referred to as the Jacobian1 matrix. The complex LM update is very similar, the major difference being the introduction of a damping parameter, λ:   \begin{eqnarray} \Delta \breve{\boldsymbol {g}} = (\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (14)where $$\boldsymbol{D}=\boldsymbol{I}{\odot }\boldsymbol{J}^H\boldsymbol{J}$$. In this paper, we use single subscript indices (e.g. i) to refer to a specific antenna (or as will become apparent later a specific redundant group) and composite subscript indices (e.g. pq) to refer to a specific baseline. If these indices appear in the definition of matrix elements, then their allowed ranges are determined by the dimension of the matrix that is being defined (see Table 2). Furthermore, the identity matrix is denoted by $$\boldsymbol{I}$$. Moreover, the Hadamard2 product and the Hermitian transpose are denoted by ⊙ and $$\boldsymbol{X}^H$$, respectively (Liu & Trenkler 2008). Note the use of the Wirtinger derivatives in equation (13). We will refer to $$\boldsymbol{J}^H\boldsymbol{J}$$ as the Hessian3 matrix $$\boldsymbol{H}$$ and to $$\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D}$$ as the modified Hessian matrix $$\boldsymbol {\mathcal {H}}$$ throughout this paper (Madsen, Nielsen & Tingleff 2004). Table 2. The dimensions of the Jacobian matrices, and their respective sub-matrices, defined in Sections 2 and 3. Matrix  Dimension  Wirtinger calibration  $$\boldsymbol{J}$$  2B × 2N  $$\boldsymbol {\mathcal {J}}_1$$  B × N  $$\boldsymbol {\mathcal {J}}_2$$  B × N  Redundant Wirtinger calibration  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol {\mathcal {J}}_1$$  B × (N + L)  $$\boldsymbol {\mathcal {J}}_2$$  B × (N + L)  Matrix  Dimension  Wirtinger calibration  $$\boldsymbol{J}$$  2B × 2N  $$\boldsymbol {\mathcal {J}}_1$$  B × N  $$\boldsymbol {\mathcal {J}}_2$$  B × N  Redundant Wirtinger calibration  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol {\mathcal {J}}_1$$  B × (N + L)  $$\boldsymbol {\mathcal {J}}_2$$  B × (N + L)  View Large Equation (11) or (14) can now be used iteratively to update the parameter vector $$\breve{\boldsymbol {g}}$$:   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} = \breve{\boldsymbol {g}}_{k} + \Delta \breve{\boldsymbol {g}}_{k}, \end{eqnarray} (15)until convergence is reached. In the case of the GN algorithm, the parameter update step simplifies and becomes (Smirnov & Tasse 2015)   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{1}{2}\breve{\boldsymbol {g}}_{k}. \end{eqnarray} (16) Smirnov & Tasse (2015) realized that the diagonal entries of the Hessian matrix $$\boldsymbol{H}$$ are much more significant than its off-diagonal entries, i.e. $$\boldsymbol{H}$$ is nearly diagonal. By approximating $$\boldsymbol{H}$$ by its diagonal and substituting the approximate Hessian matrix, the LM parameter update step becomes (Smirnov & Tasse 2015)   \begin{eqnarray} \breve{\boldsymbol {g}}_{k+1} &\approx& \frac{1}{1+\lambda }\widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{\lambda }{1+\lambda } \breve{\boldsymbol {g}}_k, \nonumber \\ &=& \rho \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + (1-\rho )\breve{\boldsymbol {g}}_k, \end{eqnarray} (17)where $$\rho = \frac{1}{1+\lambda }$$. Note that equations (16) and (17) are not dependent on $$\breve{\boldsymbol {r}}$$. Interestingly enough, if λ = 0 we obtain the odd parameter update step of stefcal,4 and if λ = 1 (which corresponds to $$\rho =\frac{1}{2}$$), we obtain the even parameter update step of stefcal5 (Mitchell et al. 2008; Salvini & Wijnholds 2014). In the stefcal algorithm, the measurement equation (equation 1) is linearized by assuming that the gains are known, but that their conjugates are not. Under this assumption, the system of equations become linear and the conjugates of the gains can be obtained in a straightforward manner. Starting from the latest value of the gain conjugates, an updated estimate of the gains can be obtained iteratively until convergence is reached. Alternating between solving and fixing different sets of parameters (which is exactly what stefcal does) is referred to as the alternating direction implicit (ADI) method. The stefcal algorithm reduces the computational complexity of calibration from O(N3) to O(N2). 3 REDUNDANT WIRTINGER CALIBRATION Interferometric baselines are redundant when they sample the exact same visibilities in the uv-plane, i.e. if baseline pq and rs are redundant, then ypq = yrs. A redundant array layout allows us to solve for the unknown observed visibilities themselves in addition to the antenna gains (see equation 4). This is true, since in the case of a redundant layout, equation (4) is already an overdetermined system even before having predicted visibilities from a pre-existing skymodel. It is convenient to group redundant visibilities together and label each group using a single index rather than using their antenna pairs as in equation (1). We introduce a function ϕ that maps the antenna pair associated with a specific baseline to its corresponding redundant baseline group, i.e. if baseline pq and rs are redundant, then ϕpq = ϕrs (implying that they belong to the same group). To be exact, ϕ maps the composite index pq to its group index only if pq ∈ {rs|r ≤ s}. If pq ∉ {rs|r ≤ s}, then the composite index pq is mapped to zero. The function ϕpq is, therefore, not symmetric. Equation (1) can be rewritten for a redundant array as   \begin{eqnarray} d_{pq} = g_{p}\overline{g_q}y_{\phi _{pq}} + n_{pq}, \end{eqnarray} (18)with the same vector form as equation (3) if   \begin{eqnarray} \left[ \boldsymbol {d}\right]_{\alpha _{pq}} &=& d_{pq},\quad \left[ \boldsymbol {v}\right]_{\alpha _{pq}} = v_{pq}=g_p y_{\phi _{pq}} \overline{g_q}, \nonumber \\ \left[ \boldsymbol {n}\right]_{\alpha _{pq}} &=& n_{pq}, \end{eqnarray} (19)where the vectors in equation (19) are column vectors of size B (i.e. p < q). We also introduce the following symmetric variant of ϕpq:   \begin{eqnarray} \zeta _{pq} = \left\lbrace \begin{array}{l@{\quad}l}\phi _{pq}\,&{\rm if}\,p \le q\\ \phi _{qp}&{\rm if}\,\,p>q \end{array}\right., \end{eqnarray} (20)and we will refer to ζpq as the symmetric geometric function. It is possible to construct a simple analytic expression for ζpq for an east–west regular array, i.e. ζpq = |q − p|. It becomes, however, increasingly difficult to construct analytic expressions of ζpq for more complicated array layouts. The empirically constructed symmetric geometry functions of three different redundant layouts are displayed in Fig. 1. We denote the range of ζpq with $$\mathcal {R}(\zeta _{pq})$$. The maximal element that ζpq can ascertain is denoted by L and can be interpreted as the greatest number of unique redundant baseline groups that can be formed for a given array layout. Figure 1. View largeDownload slide Three different redundant antenna layouts: hexagonal (top left), square (top middle), and regular east–west (top right) with their associated symmetric redundancy geometry functions ζpq (bottom panels). We used 91 antennas to construct the hexagonal layout, while 100 antennas were used in the square and east–west layouts. In the case of the east–west layout, we only plot the positions of the first 10 antennas. The maximal numbers of redundant baseline groups L that can be formed for the hexagonal, square, and east–west layouts are 165, 180, and 99, respectively. The analytic expressions of L for a hexagonal, square, and east–west layout are $$L = 2N-\frac{1}{2}\sqrt{12N-3}-\frac{1}{2}$$, $$L=2N-2\sqrt{N}$$, and L = N − 1, respectively. Figure 1. View largeDownload slide Three different redundant antenna layouts: hexagonal (top left), square (top middle), and regular east–west (top right) with their associated symmetric redundancy geometry functions ζpq (bottom panels). We used 91 antennas to construct the hexagonal layout, while 100 antennas were used in the square and east–west layouts. In the case of the east–west layout, we only plot the positions of the first 10 antennas. The maximal numbers of redundant baseline groups L that can be formed for the hexagonal, square, and east–west layouts are 165, 180, and 99, respectively. The analytic expressions of L for a hexagonal, square, and east–west layout are $$L = 2N-\frac{1}{2}\sqrt{12N-3}-\frac{1}{2}$$, $$L=2N-2\sqrt{N}$$, and L = N − 1, respectively. We can now formulate redundant calibration as a least-squares problem:   \begin{eqnarray} \min _{\boldsymbol {z}} \Lambda (\boldsymbol {z}) = \min _{\boldsymbol {z}} \Vert \boldsymbol {r}\Vert _{\rm F}^2 = \min _{\boldsymbol {z}} \Vert \boldsymbol {d}- \boldsymbol {v}(\boldsymbol {z})\Vert _{\rm F}^2, \end{eqnarray} (21)where   \begin{eqnarray} \boldsymbol {g}&=&[g_1, \ldots ,g_N]^T,\quad \boldsymbol {y} = [y_1, \ldots ,y_L]^T, \nonumber \\ \boldsymbol {z}&=& [\boldsymbol {g}^T, \boldsymbol {y}^T]^T. \end{eqnarray} (22)The number of model parameters to be solved for is now P = 2(N + L), since redundant calibration is a complex problem. Note that equation (21) is only solvable (i.e. the array is redundant enough) if L + N ≤ B. In the literature, equation (21) is solved by splitting the problem into its real and imaginary parts. The real and imaginary parts of the unknown parameters are then solved for separately (Wieringa 1992; Liu et al. 2010; Zheng et al. 2014). Currently, the above is achieved by using the real-valued GN algorithm (Kurien et al. 2016). We, instead, intend to formulate the redundant calibration problem using Wirtinger calculus and recast equation (21) as   \begin{eqnarray} \min _{\breve{\boldsymbol {z}}} \Vert \breve{\boldsymbol {r}}\Vert = \min _{\breve{\boldsymbol {z}}} \Vert \breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}(\breve{\boldsymbol {z}})\Vert , \end{eqnarray} (23)where $$\breve{\boldsymbol {z}} = [\boldsymbol {z}^T, \overline{\boldsymbol {z}}^T]^T$$. We derive the complex Jacobian associated with equation (23) to be   \begin{eqnarray} \boldsymbol{J}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol {\mathcal {J}}_1 & \boldsymbol {\mathcal {J}}_2\\ \overline{\boldsymbol {\mathcal {J}}}_2 & \overline{\boldsymbol {\mathcal {J}}}_1 \end{array}\right]}, \end{eqnarray} (24)where   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_1]_{\alpha _{pq},i} =\left\lbrace \begin{array}{l{@}{\quad}l}\displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} g_i} & {\rm if}\,i\le N \\ \displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} y_{i-N}} & {\rm otherwise} \end{array}\right., \end{eqnarray} (25)and   \begin{eqnarray} [\boldsymbol {\mathcal {J}}_2]_{\alpha _{pq},i} =\left\lbrace \begin{array}{l{@}{\quad}l}\displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{g}_i} & {\rm if}\,i\le N \\ \displaystyle\frac{\mathrm{\partial} v_{pq}}{\mathrm{\partial} \overline{y}_{i-N}} & {\rm otherwise} \end{array}\right.. \end{eqnarray} (26) Algorithm 1 View largeDownload slide Constructing $$\boldsymbol {\mathcal {J}}_1$$ Algorithm 1 View largeDownload slide Constructing $$\boldsymbol {\mathcal {J}}_1$$ Note that we employ the same subscript indexing notation that we used in Section 2 in equations (25) and (26). To further aid the reader in understanding this indexing notation, we refer him/her to Fig. 2 (also see Algorithm 1), which depicts a flow diagram of the matrix construction procedure with which $$\boldsymbol {\mathcal {J}}_1$$ can be constructed. The range of values the indices in equations (25) and (26) can attain should also be clear to the reader after having inspected Fig. 2 and Table 2. Figure 2. View largeDownload slide A flow chart representing the procedure one would follow to partially construct the Jacobian matrix (i.e. $$\boldsymbol {\mathcal {J}}_1$$) associated with Wirtinger redundant calibration. The red circle represents the start of the flow diagram. The blue circle represents the end of the diagram. Blue diamonds denote loop conditionals, while green diamonds denote simple conditionals. The diagram elements following the red arrows just below a loop conditional element all form part of the main body of the loop that has the conditional statement of the aforementioned loop conditional element in its definition. The pseudo-code associated with this flow diagram is given in Algorithm 1. Figure 2. View largeDownload slide A flow chart representing the procedure one would follow to partially construct the Jacobian matrix (i.e. $$\boldsymbol {\mathcal {J}}_1$$) associated with Wirtinger redundant calibration. The red circle represents the start of the flow diagram. The blue circle represents the end of the diagram. Blue diamonds denote loop conditionals, while green diamonds denote simple conditionals. The diagram elements following the red arrows just below a loop conditional element all form part of the main body of the loop that has the conditional statement of the aforementioned loop conditional element in its definition. The pseudo-code associated with this flow diagram is given in Algorithm 1. We can now calculate the GN and LM updates to be   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}} \end{eqnarray} (27)and   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J}+ \lambda \boldsymbol{D})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {r}}, \end{eqnarray} (28)respectively. As in Section 2, equation (27) can be used to iteratively update our parameter vector:   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} = \breve{\boldsymbol {z}}_{k} + \Delta \breve{\boldsymbol {z}}_{k}. \end{eqnarray} (29)The analytic expressions for $$\boldsymbol{J}$$, $$\boldsymbol{H}$$, and $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ can be found in Appendix A. Appendix A also contains two useful identities involving $$\boldsymbol{J}$$ and $$\boldsymbol{H}$$. In the case of the GN algorithm, we can simplify equation (29) even further. Replacing $$\breve{\boldsymbol {r}}$$ with $$\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}$$ in equation (27) results in   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H(\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}). \end{eqnarray} (30)If we substitute the first identity of equation (A20) into equation (30) and we simplify the result, we obtain   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}-\frac{1}{3}\breve{\boldsymbol {z}}. \end{eqnarray} (31)If we use the above-simplified update in equation (29), it reduces to   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{2}{3}\breve{\boldsymbol {z}}_{k}. \end{eqnarray} (32)Equation (32) is the redundant equivalent of equation (16), and it shows us that in the case of redundant calibration, we can calculate the GN parameter update step without calculating the residual. Fig. 3 shows that the Hessian matrix $$\boldsymbol{H}$$ is nearly diagonal and sparse for both the regular east–west and hexagonal layouts we considered. We therefore follow the approach of Smirnov & Tasse (2015) and approximate the Hessian matrix $$\boldsymbol{H}$$ with its diagonal. If we substitute $$\boldsymbol{J}^H\boldsymbol{J}$$ with $$\widetilde{\boldsymbol{H}}=\boldsymbol{H}{\odot }\boldsymbol{I}$$ and replace $$\breve{\boldsymbol {r}}$$ with $$\breve{\boldsymbol {d}} - \breve{\boldsymbol {v}}$$ in equation (27), we obtain   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H(\breve{\boldsymbol {d}}-\breve{\boldsymbol {v}}). \end{eqnarray} (33)Utilizing the second identity in equation (A20) allows us to simplify equation (33) to   \begin{eqnarray} \Delta \breve{\boldsymbol {z}} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}-\breve{\boldsymbol {z}}, \end{eqnarray} (34)which leads to   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} \approx \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}}. \end{eqnarray} (35)Using equation (28), we follow the same procedure and obtain a similar result for LM   \begin{eqnarray} \breve{\boldsymbol {z}}_{k+1} &\approx& \frac{1}{1+\lambda }\widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + \frac{\lambda }{1+\lambda } \breve{\boldsymbol {z}}_k, \end{eqnarray} (36)  \begin{eqnarray} \hphantom{\breve{\boldsymbol {z}}_{k+1}}= \rho \widetilde{\boldsymbol{H}}^{-1}\boldsymbol{J}^H\breve{\boldsymbol {d}} + (1-\rho )\breve{\boldsymbol {z}}_k. \end{eqnarray} (37)The analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {d}}$$ will be very similar to the analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$, the only difference being that in equation (A18) the letter r would be replaced by a d. If we substitute the analytic expression of $$\boldsymbol{J}^H\breve{\boldsymbol {d}}$$ and $$\widetilde{\boldsymbol{H}}^{-1}$$ (which can easily be constructed using Appendix A) into equation (37), we obtain the following two update rules:   \begin{eqnarray} g_{i}^{k+1} = \rho \frac{\sum _{j\ne i} g_j^k \widetilde{y}_{ij}^{ \!\!k} d_{ij}}{\sum _{j\ne i} |g_j^k|^2|y_{\zeta _{ij}}^k|^2} + (1-\rho ) g_i^k \end{eqnarray} (38)and   \begin{eqnarray} y_{i}^{k+1} = \rho \frac{\sum _{rs \in \mathcal {RS}_i} \overline{g}_r^k g_s^k d_{rs}}{\sum _{rs \in \mathcal {RS}_i}|g_r^k|^2|g_s^k|^2} + (1-\rho ) y_i^k. \end{eqnarray} (39)The index set $$\mathcal {RS}_i$$ and the quantity $$\widetilde{y}_{ij}$$ are defined in equations (A14) and (A19), respectively. The computational complexity of inverting $$\widetilde{\boldsymbol{H}}$$ is O(P). We note that equation (38) is the gain estimator associated with stefcal. Figure 3. View largeDownload slide The number of analytic terms out of which the entries of the Hessian $$\boldsymbol{H}$$ consist for two different geometric layouts, namely a regular east–west grid with N = 5 (left-hand panel) and a hexagonal grid with N = 7. The diagonal entries of these two Hessians are clearly more significant than their off-diagonal entries. Moreover, these two Hessians also contain many zero entries. Note that the locations of the zero entries are dependent on the geometry of the array layout. Figure 3. View largeDownload slide The number of analytic terms out of which the entries of the Hessian $$\boldsymbol{H}$$ consist for two different geometric layouts, namely a regular east–west grid with N = 5 (left-hand panel) and a hexagonal grid with N = 7. The diagonal entries of these two Hessians are clearly more significant than their off-diagonal entries. Moreover, these two Hessians also contain many zero entries. Note that the locations of the zero entries are dependent on the geometry of the array layout. Equations (38) and (39) were obtained by Marthi & Chengalur (2014) by taking the derivative of the objective function Λ relative to the elements of $$\boldsymbol {g}$$ and $$\boldsymbol {y}$$, setting the intermediate results to zero and then solving for the unknown parameters (i.e. using the gradient descent algorithm). We note that their derivation is less general. The LM algorithm has better convergence properties than gradient descent and encompasses the gradient descent algorithm as a special case. In Appendix B, we show that equations (38) and (39) can also be derived using the ADI method. For this reason, we refer to the implementation of the pseudo-LM calibration scheme derived above, i.e. equations (38) and (39), as redundant stefcal throughout the rest of the paper. Interestingly, Marthi & Chengalur (2014) were not the first to make use of the ADI method to perform redundant calibration; a slower alternative ADI-based calibration algorithm is presented in Wijnholds & Noorishad (2012). The choice of the ρ parameter is somewhat unconstrained. In this paper, we chose ρ by adopting the same strategy that is used by stefcal and Marthi & Chengalur (2014), i.e. we chose ρ to be equal to $$\frac{1}{3}$$ (λ = 2). We also carried out simulations to validate this choice. We generated a skymodel that comprised of 100 flat spectrum sources distributed over a 3° by 3° sky patch. The flux density of each source was drawn from a power-law distribution with a slope of 2 and the source position was drawn from a uniform distribution. We also made use of multiple fictitious telescope layouts each one having a hexagonal geometry (see the upper-left image of Fig. 1 for an example layout). The largest hexagonal array that we used has 217 antennas, with a minimum and maximum baseline of 20 and 320 m, respectively. We corrupted visibilities by applying gain errors (see Tables 3 and 4) and calibrated the corrupted visibilities using redundant stefcal. Solutions were independently derived for each time-step and channel for five realizations. Table 3. The gain error models used in this paper. We have used the symbol x here as a proxy as it can either refer to time slots or channels. We either performed our simulations over multiple time slots and one frequency channel or one time slot and multiple frequency channels (see Table 4). Moreover, c in the leftmost column denotes the speed of light. We adopted a sinusoidal error model, similar to the sinusoidal error models used within meqtrees (Noordam & Smirnov 2010), as well as a phase slope across frequency that mimics a real case of physical delays between different antennas. In the first model, we model a gain error with amplitude around one and an additional phase error. In the second model, we chose A in such a way that we do not unintentionally produce nonsensical gain values, i.e. a zero gain value. For the third model, the value of τ was chosen so that the number of phase wraps that occur across the observing band is restricted to a realistic number. Number tag  1  2  3  Model  Sinusoidal: amplitude and phase  Sinusoidal: real and imaginary parts  Linear phase slope  Function  (A + 1)ejP  $$A\cos (2\pi fx+B)+1.5A+jC\sin (2\pi fx+D)$$  ejP  Parameters  $$A=a\cos (2\pi fx +b)$$  f = 5  P = τx    $$P =c \cos (2\pi fx +d)$$  A, C ∼ U[0.5, 10]  $$\tau = \frac{l}{c}$$    f = 5  $$B,D\sim U[0,2\pi ]$$  l ∼ U[5, 50] (m)    a ∼ U[0.8, 0.9]        c ∼ U[0.5, 5]        $$b,d\sim U[0,2\pi ]$$      Number tag  1  2  3  Model  Sinusoidal: amplitude and phase  Sinusoidal: real and imaginary parts  Linear phase slope  Function  (A + 1)ejP  $$A\cos (2\pi fx+B)+1.5A+jC\sin (2\pi fx+D)$$  ejP  Parameters  $$A=a\cos (2\pi fx +b)$$  f = 5  P = τx    $$P =c \cos (2\pi fx +d)$$  A, C ∼ U[0.5, 10]  $$\tau = \frac{l}{c}$$    f = 5  $$B,D\sim U[0,2\pi ]$$  l ∼ U[5, 50] (m)    a ∼ U[0.8, 0.9]        c ∼ U[0.5, 5]        $$b,d\sim U[0,2\pi ]$$      View Large Table 4. We generated results using two main setups. We either used one frequency channel and multiple time slots or one time slot and multiple frequency channels. The most important parameters used in realizing these two major setups are presented here. We chose the observational frequency band of setup 1 to coincide with the HERA array. To broaden the scope of our analysis, we chose the observational frequency of our second setup to be equal to 1.4 GHz, which is a typical observing frequency of the Westerbork Synthesis Radio Telescope.   Setup 1  Setup 2  Num. channels  1024  1  ν-range  100–200 MHz  1.4 GHz  Num. time slots  1  50    Setup 1  Setup 2  Num. channels  1024  1  ν-range  100–200 MHz  1.4 GHz  Num. time slots  1  50  View Large Note that our simulations are almost ideal as we did not include a primary beam response, nor did we incorporate time and frequency smearing into our simulation. We also did not explicitly define our noise in terms of the integration time, channel bandwidth, and Tsys; we instead follow the same approach described in Liu et al. (2010) and Marthi & Chengalur (2014) and make use of the definition of SNR to introduce noise into our visibilities. We use the following definition of SNR (Liu et al. 2010; Marthi & Chengalur 2014):   \begin{eqnarray} {\rm SNR} = 10\log \left(\frac{\langle \boldsymbol {v}{\odot }\overline{\boldsymbol {v}}\rangle _{\nu ,t,pq}}{\langle \boldsymbol {n}{\odot }\overline{\boldsymbol {n}}\rangle _{\nu ,t,pq}} \right), \end{eqnarray} (40)where $$\langle \boldsymbol {x}\rangle _{\nu ,t,pq}$$ denotes averaging over frequency, time, and baseline. It should be pointed out, however, that by producing visibilities with different SNR values, we are effectively either changing our integration time or our channel bandwidth or both (assuming a single Tsys value for our instrument). Fig. 4 shows the simulation results as a function of SNR and number of antennas. The accuracy of our solutions is quantified through the percentage error   \begin{eqnarray} \beta = \frac{\Vert \boldsymbol {v}- \widehat{\boldsymbol {v}}\Vert _{\rm F}^2}{\Vert \boldsymbol {v}\Vert _{\rm F}^2}, \end{eqnarray} (41)where $$\widehat{\boldsymbol {v}}$$ is the redundant stefcal parameter estimate. Figure 4. View largeDownload slide We plot the percentage error β between the simulated visibilities and the visibilities solved for by redundant stefcal for different SNR values as a function of the number of antennas (N) in the array. Figure 4. View largeDownload slide We plot the percentage error β between the simulated visibilities and the visibilities solved for by redundant stefcal for different SNR values as a function of the number of antennas (N) in the array. The error magnitude follows the expected behaviour, i.e. it decreases as a function of SNR and number of antennas N. Interestingly, it reduces to a few per cent when N > 120 for essentially any choice of SNR. 4 PCG METHOD Liu et al. (2010) suggested that the execution speed of redundant calibration could be reduced using the conjugate gradient (CG) method (Hestenes & Stiefel 1952), which would be computationally advantageous, since the Hessian matrix associated with redundant calibration (see Fig. 3) is sparse (Reid 1971). In this section, we study the computational complexity of the CG method when it is used to invert the modified Hessian matrix (see equation 28), in particular when preconditioning is used (i.e. the PCG method). Interestingly, empirical tests suggest that the unmodified Hessian itself is singular. It is therefore important to mention that the CG method can pseudo-invert the unmodified Hessian as well, i.e. the CG method can be directly applied to equation (27), because the Hessian is a positive semi-definite Hermitian matrix and the vector $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ is an element of its column range (Lu & Chen 2018). The computational complexity of the CG method is   \begin{eqnarray} O(\sqrt{\kappa }m), \end{eqnarray} (42)where m denotes the number of non-zero entries and κ denotes the spectral condition number of the matrix that is to be inverted. The spectral condition number κ of the matrix $$\boldsymbol{A}$$ is defined as   \begin{eqnarray} \kappa (\boldsymbol{A}) = \frac{\iota _{{\rm max}}}{\iota _{{\rm min}}}, \end{eqnarray} (43)where ιmax and ιmin denote the largest and the smallest eigenvalue of $$\boldsymbol{A}$$, respectively. Preconditioning is a technique used to improve the spectral condition number of a matrix. Let us consider a generic system of linear equations   \begin{eqnarray} \boldsymbol{A}\boldsymbol {x}= \boldsymbol {b}, \end{eqnarray} (44)and a positive-definite Hermitian matrix $$\boldsymbol{M}$$ so that   \begin{eqnarray} \boldsymbol{M}^{-1}\boldsymbol{A}\boldsymbol {x}= \boldsymbol{M}^{-1}\boldsymbol {b}. \end{eqnarray} (45) The matrix $$\boldsymbol{M}$$ is a good preconditioner if   \begin{eqnarray} \kappa (\boldsymbol{M}^{-1}\boldsymbol{A}) \ll \kappa (\boldsymbol{A}), \end{eqnarray} (46)i.e. if it lowers the condition number of a matrix. If $$\boldsymbol{A}$$ is a nearly diagonal matrix, the Jacobian preconditioner is a natural choice of $$\boldsymbol{M}$$ and can be computed as   \begin{eqnarray} \boldsymbol{M}= \boldsymbol{A}{\odot }\boldsymbol{I}. \end{eqnarray} (47) In order to quantify the effectiveness of the CG method in redundant calibration, we investigate the spectral condition number and the sparsity of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ (i.e. λ = 2). We generated simulated (i.e. corrupted) visibilities according to models described in Table 3. We used the complex LM algorithm described in Section 3 to calibrate the corrupted visibilities. To invert the modified Hessian, we used the CG method, with and without a Jacobian preconditioner. Fig. 5(a) shows us that preconditioning reduces the condition number of $$\boldsymbol {\mathcal {H}}$$ to a small constant value and therefore effectively eliminates it from equation (42), i.e. equation (42) reduces to   \begin{eqnarray} O(m). \end{eqnarray} (48)This result is confirmed by Fig. 5(b). Fig. 5(b) shows us that the number of major iterations needed by the PCG method to invert $$\boldsymbol {\mathcal {H}}$$ is independent of the number of antennas in the array and that it is much less than the dimension of $$\boldsymbol {\mathcal {H}}$$. Figure 5. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: spectral condition number κ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of N, before (magenta and red curves) and after (blue curve) preconditioning, at different SNR values. Right: number of major iterations required by the CG method to invert $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N in the array, before (magenta and red curves) and after preconditioning (blue and green curves), at different SNR values. Both plots show that the Jacobian preconditioner definitely speeds up the CG method. Figure 5. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: spectral condition number κ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of N, before (magenta and red curves) and after (blue curve) preconditioning, at different SNR values. Right: number of major iterations required by the CG method to invert $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N in the array, before (magenta and red curves) and after preconditioning (blue and green curves), at different SNR values. Both plots show that the Jacobian preconditioner definitely speeds up the CG method. Let us now shift our attention towards the remaining factor m. To get a useful idea of the computational complexity of CG, we must relate m and P. This can be achieved by utilizing the modified Hessian's measure of sparsity γ:   \begin{eqnarray} \gamma = \left(1 - \frac{m}{P^2} \right), \end{eqnarray} (49)where m is the number of non-zero entries in $$\boldsymbol {\mathcal {H}}$$ and P2 is total number of matrix elements. For a regular east–west geometric configuration, the sparsity and the asymptotic sparsity of $$\boldsymbol {\mathcal {H}}$$ can be derived analytically:   \begin{eqnarray} \gamma = \frac{5N^2-7N+3}{8N^2-8N+2} \qquad \gamma _{\infty } = \lim _{N\rightarrow \infty }\gamma = \frac{5}{8} . \end{eqnarray} (50) For more complicated array layouts, however, there is no straightforward analytical solution and we empirically determined the sparsity ratios for three different geometric layouts as a function of the number antennas in the array (see Fig. 6a). Figure 6. View largeDownload slide Left: the sparsity ratio γ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N for a hexagonal (blue circles), square (green crosses), and regular east–west (red circles) array geometry. The red dashed and black dotted lines show the analytical expression for γ and its limit for the regular east–west grid case (see the text for details). Right: the order of the computational cost c for inverting $$\boldsymbol {\mathcal {H}}$$ as a function of N for different array geometries (same colour scheme as in the left-hand panel). The red dashed and black dotted lines are the analytical expression of c and its limit in the east–west regular grid case. Figure 6. View largeDownload slide Left: the sparsity ratio γ of the modified Hessian $$\boldsymbol {\mathcal {H}}$$ as a function of the number of antennas N for a hexagonal (blue circles), square (green crosses), and regular east–west (red circles) array geometry. The red dashed and black dotted lines show the analytical expression for γ and its limit for the regular east–west grid case (see the text for details). Right: the order of the computational cost c for inverting $$\boldsymbol {\mathcal {H}}$$ as a function of N for different array geometries (same colour scheme as in the left-hand panel). The red dashed and black dotted lines are the analytical expression of c and its limit in the east–west regular grid case. It now follows that   \begin{eqnarray} P^{c} = m = (1 - \gamma )P^2, \end{eqnarray} (51)which leads to   \begin{eqnarray} c = \log _{P}(1 - \gamma ) + 2 \qquad c_{\infty } = \lim _{N\rightarrow \infty } c = 2. \end{eqnarray} (52)The computational complexity is therefore asymptotically bounded by O(P2), although it converges very slowly to its asymptotic value, and in general is equal to O(Pc) (with c < 2). In the case of a hexagonal geometric layout with N < 200, we have that c ∼ 1.7 (see Fig. 6b). We are now finally able to compare the computational complexity of redundant stefcal and the PCG method. Redundant stefcal is computationally inexpensive as it just needs to invert a diagonal matrix; however, the PCG inversion is accurate and, therefore, may require fewer iterations, ultimately leading to a faster convergence. We computed the theoretical number of redundant stefcal iterations Δk in excess of the number of LM (implementing PCG) iterations required for LM to outperform redundant stefcal  \begin{eqnarray} (k+\Delta k)P > kP^c, \qquad \Delta k > k(P^{c-1}-1). \end{eqnarray} (53)We then compared Δk with the empirically obtained average excess number of iterations. The results are displayed in Fig. 7, which shows that redundant stefcal outperforms the PCG method. We note that, in this comparison, we have only taken into account the cost of inverting $$\boldsymbol {\mathcal {H}}$$ and ignored the cost of preconditioning. Figure 7. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: number of LM iterations required by redundant stefcal (green and blue curves) and the PCG (magenta and red curves) methods to converge to a parameter error tolerance of 10−6 whilst using different SNR values as a function of the number of antennas N in the array. Right: average number of LM iterations (difference between the redundant stefcal and PCG curves in the left-hand panel) saved (green and blue curve) by computing the full inverse with the PCG method whilst using different SNR values. The black curve is Δk, the theoretical number of redundant stefcal iterations that are needed in excess of the number of LM iterations used for the LM algorithm to outperform redundant stefcal. For the black curve, we assumed c = 1.7 and that k could be approximated with the magenta curve plotted in the left-hand panel. Figure 7. View largeDownload slide The graphs presented here were generated using the simulations discussed in Section 3. Left: number of LM iterations required by redundant stefcal (green and blue curves) and the PCG (magenta and red curves) methods to converge to a parameter error tolerance of 10−6 whilst using different SNR values as a function of the number of antennas N in the array. Right: average number of LM iterations (difference between the redundant stefcal and PCG curves in the left-hand panel) saved (green and blue curve) by computing the full inverse with the PCG method whilst using different SNR values. The black curve is Δk, the theoretical number of redundant stefcal iterations that are needed in excess of the number of LM iterations used for the LM algorithm to outperform redundant stefcal. For the black curve, we assumed c = 1.7 and that k could be approximated with the magenta curve plotted in the left-hand panel. 5 CONCLUSION In this paper, we have formulated the calibration of redundant interferometric arrays using the complex optimization formalism (Smirnov & Tasse 2015). We derived the associated complex Jacobian and the GN and LM parameter update steps. We also showed that the LM parameter update step can be simplified to obtain an ADI type of algorithm (Marthi & Chengalur 2014; Salvini & Wijnholds 2014). Our code implementation of this algorithm (redundant stefcal) is publicly available at https://github.com/Trienko/heracommissioning/blob/master/code/stef.py. We note that, in its current implementation, redundant stefcal does not solve for the degeneracies inherent to redundant calibration (Zheng et al. 2014; Kurien et al. 2016; Dillon et al. 2017), which will be the subject of future work. Compared to current redundant calibration algorithms, redundant stefcal is more robust to initial conditions and allows for easier parallelization. We investigated the computational cost of redundant stefcal and compared it with the performance of the PCG method (as suggested by Liu et al. 2010). We found that, although the PCG method greatly improves the speed of redundant calibration, it still significantly underperforms when compared to redundant stefcal. The characteristics of redundant stefcal make it an appealing calibration algorithm for large redundant arrays like HERA (DeBoer et al. 2017), CHIME (Bandura et al. 2014), HIRAX (Newburgh et al. 2016), or even hybrid arrays like the MWA (Tingay et al. 2013) in its updated phase. 6 FUTURE OUTLOOK We plan to apply redundant stefcal to HERA data in the near future. The reason for applying redundant stefcal to real data is twofold. First, we wish to validate the theoretical computational complexity of redundant stefcal (derived earlier in the paper). Secondly, we would like to test whether it could be used to calibrate HERA in near-realtime. We are also interested in parallelizing our current implementation and to see how much can be gained by doing so. We would also like to conduct a perturbation analysis, similar to the one conducted by Liu et al. (2010), to estimate the error that is introduced into our estimated gains and visibilities when the array contains baselines that are not perfectly redundant. We are also interested in quantifying the error that is introduced into the estimated gains and visibilities because of differing primary beam patterns between array elements (Noorishad et al. 2012). ACKNOWLEDGEMENTS This work is based upon research supported by the South African Research Chairs Initiative of the Department of Science and Technology and National Research Foundation. GB acknowledges support from the Royal Society and the Newton Fund under grant NA150184. This work is also based on the research supported in part by the National Research Foundation of South Africa (grant No. 103424). We would like to thank the anonymous reviewer for their useful comments, which greatly improved the quality of the paper. Footnotes 1 A matrix composed of first-order partial derivatives. 2 Element-wise product. 3 A square matrix composed of second-order partial derivatives. 4 k ∈ {0, 2, …}. 5 k ∈ {1, 3, …}. REFERENCES Ali Z. S. et al.  , 2015, ApJ , 809, 61 CrossRef Search ADS   Bandura K. et al.  , 2014, Proc. SPIE , 9145, 914522 Barry N., Hazelton B., Sullivan I., Morales M. F., Pober J. C., 2016, MNRAS , 461, 3135 CrossRef Search ADS   Bernardi G. et al.  , 2009, A&A , 500, 965 CrossRef Search ADS   Bernardi G. et al.  , 2010, A&A , 522, A67 CrossRef Search ADS   DeBoer D. R. et al.  , 2017, PASP , 129, 045001 CrossRef Search ADS   Dillon J. S., Parsons A. R., 2016, ApJ , 826, 181 CrossRef Search ADS   Dillon J. S. et al.  , 2014, Phys. Rev. D , 89, 023002 CrossRef Search ADS   Dillon J. S. et al.  , 2017, preprint (arXiv:1712.07212) Ewall-Wice A., Dillon J. S., Liu A., Hewitt J., 2017, MNRAS , 470, 1849 CrossRef Search ADS   Furlanetto S. R., 2016, in Mesinger A., ed., Astrophysics and Space Science Library, Vol. 423, Understanding the Epoch of Cosmic Reionization: Challenges and Progress . Springer-Verlag, Berlin, p. 247 Ghosh A., Prasad J., Bharadwaj S., Ali S. S., Chengalur J. N., 2012, MNRAS , 426, 3295 CrossRef Search ADS   Grobler T. L., Nunhokee C. D., Smirnov O. M., van Zyl A. J., de Bruyn A. G., 2014, MNRAS , 439, 4030 CrossRef Search ADS   Hamaker J. P., Bregman J. D., Sault R. J., 1996, A&AS , 117, 137 CrossRef Search ADS   Hestenes M. R., Stiefel E., 1952, J. Res. Natl. Bur. Stand. , 49, 409 CrossRef Search ADS   Kurien B. G., Tarokh V., Rachlin Y., Shah V. N., Ashcom J. B., 2016, MNRAS , 461, 3585 CrossRef Search ADS   Liu S., Trenkler G., 2008, Int. J. Inf. Syst. Sci , 4, 160 Liu A., Tegmark M., Morrison S., Lutomirski A., Zaldarriaga M., 2010, MNRAS , 408, 1029 CrossRef Search ADS   Lu Z., Chen X., 2018, Math. Oper. Res. , 43, 275 McQuinn M., 2016, ARA&A , 54, 313 CrossRef Search ADS   Madsen K., Nielsen H. B., Tingleff O., 2004, Methods for Non-Linear Least Squares Problems , 2nd edn. Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Lyngby Marthi V. R., Chengalur J., 2014, MNRAS , 437, 524 CrossRef Search ADS   Mitchell D. A., Greenhill L. J., Wayth R. B., Sault R. J., Lonsdale C. J., Cappallo R. J., Morales M. F., Ord S. M., 2008, IEEE J. Sel. Top. Signal Process. , 2, 707 CrossRef Search ADS   Newburgh L. B. et al.  , 2016, Proc. SPIE , 9906, 99065X Noordam J., De Bruyn A., 1982, Nature , 299, 597 CrossRef Search ADS   Noordam J. E., Smirnov O. M., 2010, A&A , 524, A61 CrossRef Search ADS   Noorishad P., Wijnholds S. J., van Ardenne A., van der Hulst J., 2012, A&A , 545, A108 CrossRef Search ADS   Parsons A., Pober J., McQuinn M., Jacobs D., Aguirre J., 2012, ApJ , 753, 81 CrossRef Search ADS   Parsons A. R. et al.  , 2014, ApJ , 788, 106 CrossRef Search ADS   Pearson T., Readhead A., 1984, ARA&A , 22, 97 CrossRef Search ADS   Reid J. K. ed., 1971, Proc. Conf. on Large Sparse Sets of Linear Equations, On the Method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations . Academic Press, New York, p. 231 Salvini S., Wijnholds S. J., 2014, A&A , 571, A97 CrossRef Search ADS   Sault R. J., Hamaker J. P., Bregman J. D., 1996, A&AS , 117, 149 CrossRef Search ADS   Sievers J., 2017, preprint (arXiv:1701.01860) Smirnov O. M., 2011a, A&A , 527, A106 CrossRef Search ADS   Smirnov O. M., 2011b, A&A , 527, A108 CrossRef Search ADS   Smirnov O., Tasse C., 2015, MNRAS , 449, 2668 CrossRef Search ADS   Tingay S. J. et al.  , 2013, PASA , 30, e007 CrossRef Search ADS   Wieringa M. H., 1992, Exp. Astron. , 2, 203 CrossRef Search ADS   Wijnholds S. J., Noorishad P., 2012, in Bartleson K., ed., Signal Processing Conference (EUSIPCO), Statistically Optimal Self-Calibration of Regular Imaging Arrays. IEEE, Piscataway, NJ , p. 1304 Wirtinger W., 1927, Math. Ann. , 97, 357 CrossRef Search ADS   Zheng H. et al.  , 2014, MNRAS , 445, 1084 CrossRef Search ADS   APPENDIX A: DERIVATION OF REDUNDANT WIRTINGER CALIBRATION If we apply the definition in equation (24) to equation (23), we obtain the following analytic result:   \begin{eqnarray} \boldsymbol{J}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{M}& \boldsymbol{N} \\ \overline{\boldsymbol{N}} & \overline{\boldsymbol{M}} \end{array}\right]}, \end{eqnarray} (A1)where   \begin{eqnarray} \boldsymbol{M}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{O}& \boldsymbol{P}\end{array}\right]} \end{eqnarray} (A2)and   \begin{eqnarray} \boldsymbol{N}={\left[\begin{array}{c{@}{\quad}c}\boldsymbol{Q}& \boldsymbol{0}\end{array}\right]}. \end{eqnarray} (A3)Moreover,   \begin{eqnarray} \left[ \boldsymbol{O}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad}l}y_{\phi _{pq}}\overline{g_q} & {\rm if}\,p=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A4)  \begin{eqnarray} \left[ \boldsymbol{P}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad }l}g_p\overline{g_q} & {\rm if}\,\phi _{pq}=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A5)and   \begin{eqnarray} \left[ \boldsymbol{Q}\right]_{\alpha _{pq},j} =\left\lbrace \begin{array}{l{@}{\quad}l}g_py_{\phi _{pq}} & {\rm if}\,q=j \\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A6) We use $$\boldsymbol{0}$$ to denote an all-zero matrix. It is now trivial to compute the Hessian $$\boldsymbol{H}$$ by using equation (A1). If we substitute equation (A1) into $$\boldsymbol{J}^H\boldsymbol{J}$$, we obtain   \begin{eqnarray} \boldsymbol{H}= \boldsymbol{J}^H\boldsymbol{J}= {\left[\begin{array}{l{@}{\quad}l}\boldsymbol{A}& \boldsymbol{B}\\ \overline{\boldsymbol{B}} & \overline{\boldsymbol{A}} \end{array}\right]}, \end{eqnarray} (A7)where   \begin{eqnarray} \boldsymbol{A}={\left[\begin{array}{l{@}{\quad}l}\boldsymbol{C}& \boldsymbol{D}\\ \boldsymbol{D}^H & \boldsymbol{E}\end{array}\right]}, \qquad \boldsymbol{B}={\left[\begin{array}{l{@}{\quad}l}\boldsymbol{F}& \boldsymbol{G}\\ \boldsymbol{G}^T & \boldsymbol{0}\end{array}\right]}, \end{eqnarray} (A8)  \begin{eqnarray} [\boldsymbol{C}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l}\sum _{k \ne i} \left| g_k \right|^2 \left| y_{\zeta _{ik}} \right|^2 & {\rm if}\,i=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A9)  \begin{eqnarray} [\boldsymbol{D}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i \overline{y}_j \left| g_{\psi _{ij}} \right|^2 & {\rm if}\,\psi _{ij}\ne 0 \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A10)  \begin{eqnarray} [\boldsymbol{E}]_{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\sum _{rs \in \mathcal {RS}_i} \left| g_r \right|^2 \left| g_s \right|^2 & {\rm if}\,i=j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A11)  \begin{eqnarray} [\boldsymbol{F}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i g_j \left| y_{\zeta _{ij}} \right|^2 & {\rm if}\,i \ne j \\ 0 & {\rm otherwise} \end{array}\right., \end{eqnarray} (A12)and   \begin{eqnarray} [\boldsymbol{G}]_{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} g_i y_j \left| g_{\xi _{ij}} \right|^2 & {\rm if}\,\xi _{ij}\ne 0 \\ 0 & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A13)Moreover,   \begin{eqnarray} \mathcal {RS}_i = \left\lbrace rs\in \mathbb {N}^2|(\phi _{rs} = i) \right\rbrace , \end{eqnarray} (A14)  \begin{eqnarray} \xi _{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}p&{\rm if}\,\exists ! p \in \mathbb {N} s.t. (\phi _{pi} = j)\\ 0&{\rm otherwise} \end{array}\right., \end{eqnarray} (A15)and   \begin{eqnarray} \psi _{ij} = \left\lbrace \begin{array}{l{@}{\quad}l} q&{\rm if}\,\exists ! q \in \mathbb {N} s.t. (\phi _{iq} = j) \\ 0&{\rm otherwise} \end{array}\right.. \end{eqnarray} (A16) Furthermore, substituting equation (A1) into $$\boldsymbol{J}^H\breve{\boldsymbol {r}}$$ results in   \begin{eqnarray} \boldsymbol{J}^H\breve{\boldsymbol {r}} ={\left[\begin{array}{l}\boldsymbol {a}\\ \boldsymbol {b}\\ \overline{\boldsymbol {a}}\\ \overline{\boldsymbol {b}} \end{array}\right]}, \end{eqnarray} (A17)where   \begin{eqnarray} \left[ \boldsymbol {a}\right]_i = \sum _{k\ne i} g_k \widetilde{y}_{ik}r_{ik}, \qquad \left[ \boldsymbol {b}\right]_i = \sum _{rs\in \mathcal {RS}_i}\overline{g}_r g_s r_{rs}, \end{eqnarray} (A18)and   \begin{eqnarray} \widetilde{y}_{ik} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\overline{y}_{\zeta _{ik}} & {\rm if}\,\,k > i\\ y_{\zeta _{ik}} & {\rm otherwise} \end{array}\right.. \end{eqnarray} (A19)Additionally, $$\boldsymbol {a}$$ and $$\boldsymbol {b}$$ are both column vectors. The lengths of $$\boldsymbol {a}$$ and $$\boldsymbol {b}$$ are N and L, respectively. The dimensions of the matrices we defined in this section are presented in Table A1. Table A1. The dimensions of the matrices defined in Appendix A. Matrix  Dimension  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol{M}$$  B × (N + L)  $$\boldsymbol{N}$$  B × (N + L)  $$\boldsymbol{O}$$  B × N  $$\boldsymbol{P}$$  B × L  $$\boldsymbol{Q}$$  B × N  $$\boldsymbol{0}$$  B × L  $$\boldsymbol{H}$$  P × P  $$\boldsymbol{A}$$  (N + L) × (N + L)  $$\boldsymbol{B}$$  (N + L) × (N + L)  $$\boldsymbol{C}$$  N × N  $$\boldsymbol{D}$$  N × L  $$\boldsymbol{E}$$  L × L  $$\boldsymbol{F}$$  N × N  $$\boldsymbol{G}$$  N × L  $$\boldsymbol{0}$$  L × L  Matrix  Dimension  $$\boldsymbol{J}$$  2B × P  $$\boldsymbol{M}$$  B × (N + L)  $$\boldsymbol{N}$$  B × (N + L)  $$\boldsymbol{O}$$  B × N  $$\boldsymbol{P}$$  B × L  $$\boldsymbol{Q}$$  B × N  $$\boldsymbol{0}$$  B × L  $$\boldsymbol{H}$$  P × P  $$\boldsymbol{A}$$  (N + L) × (N + L)  $$\boldsymbol{B}$$  (N + L) × (N + L)  $$\boldsymbol{C}$$  N × N  $$\boldsymbol{D}$$  N × L  $$\boldsymbol{E}$$  L × L  $$\boldsymbol{F}$$  N × N  $$\boldsymbol{G}$$  N × L  $$\boldsymbol{0}$$  L × L  View Large Furthermore,   \begin{eqnarray} \frac{1}{3}\boldsymbol{J}\breve{\boldsymbol {z}} = \breve{\boldsymbol {v}}, \qquad \boldsymbol{J}^H\breve{\boldsymbol {v}} = (\boldsymbol{I}{\odot }\boldsymbol{H})\breve{\boldsymbol {z}}. \end{eqnarray} (A20)The above identities can be trivially established by mechanically showing that the left-hand side of each expression in equation (A20) is equal to its right-hand side. APPENDIX B: ADI The basic skymodel-based stefcal update step is equal to the leftmost term in equation (38) (barring ρ; Salvini & Wijnholds 2014). Assume without any loss of generality that the array is in an east–west regular grid. Furthermore, assume that $$\boldsymbol{d}$$ (see equation 3) has been reordered and that the result of this reordering is the following:   \begin{eqnarray} \widetilde{\boldsymbol{d}} = \left[d_{12}, \ldots ,d_{N-1,N},d_{13}, \ldots ,d_{N-2,N}, \ldots ,d_{1N}\right]^T. \end{eqnarray} (B1)The vector $$\widetilde{\boldsymbol {n}}$$ should be interpreted in a similar manner. Equation (3) can now be rewritten as   \begin{eqnarray} \widetilde{\boldsymbol{d}} = \boldsymbol{J}\boldsymbol{y} + \widetilde{\boldsymbol {n}}, \end{eqnarray} (B2)if we assume that $$\boldsymbol{g}$$ and its conjugate are known vectors. In equation (B2),   \begin{eqnarray} \boldsymbol{J} = {\left[\begin{array}{l{@}{\qquad}l{@}{\qquad}l{@}{\qquad}l}g_1\overline{g}_2 & 0 & \cdots & 0\\ g_2\overline{g}_3 & 0 & \cdots & 0\\ \vdots & 0 & \cdots & 0\\ g_{N-1}\overline{g}_N & 0 & \cdots & 0\\ 0 & g_1\overline{g}_3 & \cdots & 0\\ 0 & \vdots & \cdots & 0\\ 0 & g_{N-2}\overline{g}_N & \cdots & 0\\ 0 & 0 & \cdots & 0\\ & \vdots & \\ 0 & 0 & \cdots & g_1\overline{g}_N\\ \end{array}\right]}. \end{eqnarray} (B3)We can now estimate $$\boldsymbol{y}$$ with   \begin{eqnarray} \boldsymbol{y} = (\boldsymbol{J}^H\boldsymbol{J})^{-1}\boldsymbol{J}^H\widetilde{\boldsymbol{d}}, \end{eqnarray} (B4)where   \begin{eqnarray} [\boldsymbol{J}^H\boldsymbol{J}]_{ij} = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\sum _{rs\in \mathcal {RS}_i} |g_r|^2|g_s|^2 &{\rm if}\,\,i=j\\ 0&{\rm otherwise} \end{array}\right., \end{eqnarray} (B5)and   \begin{eqnarray} [\boldsymbol{J}^H\widetilde{\boldsymbol{d}}]_i = \sum _{rs\in \mathcal {RS}_i} \overline{g}_r g_s d_{rs}. \end{eqnarray} (B6)Substituting equations (B5) and (B6) into equation (B4) and simplifying the result leads to the leftmost term in equation (39) (if we bar ρ and consider only the ith entry of $$\boldsymbol {y}$$). © 2018 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society

### Journal

Monthly Notices of the Royal Astronomical SocietyOxford University Press

Published: May 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Unlimited reading Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere. ### Stay up to date Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates. ### Organize your research It’s easy to organize your research with our built-in tools. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. ### DeepDyve Freelancer ### DeepDyve Pro Price FREE$49/month

\$360/year
Save searches from Google Scholar, PubMed
Create lists to organize your research
Export lists, citations