# Demixing sines and spikes: Robust spectral super-resolution in the presence of outliers

Demixing sines and spikes: Robust spectral super-resolution in the presence of outliers Abstract We consider the problem of super-resolving the line spectrum of a multisinusoidal signal from a finite number of samples, some of which may be completely corrupted. Measurements of this form can be modeled as an additive mixture of a sinusoidal and a sparse component. We propose to demix the two components and super-resolve the spectrum of the multisinusoidal signal by solving a convex program. Our main theoretical result is that—up to logarithmic factors—this approach is guaranteed to be successful with high probability for a number of spectral lines that is linear in the number of measurements, even if a constant fraction of the data are outliers. The result holds under the assumption that the phases of the sinusoidal and sparse components are random and the line spectrum satisfies a minimum-separation condition. We show that the method can be implemented via semi-definite programming, and explain how to adapt it in the presence of dense perturbations as well as exploring its connection to atomic-norm denoising. In addition, we propose a fast greedy demixing method that provides good empirical results when coupled with a local non-convex-optimization step. 1. Introduction The goal of spectral super-resolution is to estimate the spectrum of a multisinusoidal signal from a finite number of samples. This is a problem of crucial importance in signal processing applications, such as target identification from radar measurements [3,21], digital filter design [59], underwater acoustics [2], seismic imaging [6], nuclear magnetic resonance spectroscopy [72] and power electronics [43]. In this paper, we study spectral super-resolution in the presence of perturbations that completely corrupt a subset of the data. The corrupted samples can be interpreted as outliers that do not follow the same multisinusoidal model as the rest of the measurements, and complicate significantly the task of super-resolving the spectrum of the signal of interest. Depending on the application, outliers may appear due to sensor failures, interference from other signals or impulsive noise. For instance, radar measurements can be corrupted by lightning discharges, spurious radio emissions or telephone switching transients [36,45]. Figure 1 illustrates the problem of performing spectral super-resolution in the presence of outliers. The top row shows a superposition of sinusoids and its corresponding sparse spectrum. In the second row, the multisinusoidal signal is sampled at the Nyquist rate over a finite interval, which induces spectral aliasing and makes it challenging to resolve the individual spectral lines. The sparse signal in the third row represents an additive perturbation that corrupts some of the samples. Finally, the bottom row shows the available measurements: a mixture of sines (samples from the multisinusoidal signal) and spikes (the sparse perturbation). Our objective is to demix these two components and super-resolve the spectrum of the sines. Fig. 1. View largeDownload slide The top row shows a multisinusoidal signal (left) and its sparse spectrum (right). The minimum separation of the spectrum is $$2.8 / (n - 1)$$ (see Section 2.2). On the second row, truncating the signal to a finite interval after measuring $$n:= 101$$ samples at the Nyquist rate (left) results in aliasing in the frequency domain (right). The third row shows some impulsive noise (left) and its corresponding spectrum (right). The last row shows the superposition of the multisinusoidal signal and the sparse noise, which yields a mixture of sines and spikes depicted in the time (left) and frequency domains (right). For ease of visualization, the amplitudes of the spectrum of the sines and of the spikes are real (we only show half of the spectrum and half of the spikes because their amplitudes and positions are symmetric). Fig. 1. View largeDownload slide The top row shows a multisinusoidal signal (left) and its sparse spectrum (right). The minimum separation of the spectrum is $$2.8 / (n - 1)$$ (see Section 2.2). On the second row, truncating the signal to a finite interval after measuring $$n:= 101$$ samples at the Nyquist rate (left) results in aliasing in the frequency domain (right). The third row shows some impulsive noise (left) and its corresponding spectrum (right). The last row shows the superposition of the multisinusoidal signal and the sparse noise, which yields a mixture of sines and spikes depicted in the time (left) and frequency domains (right). For ease of visualization, the amplitudes of the spectrum of the sines and of the spikes are real (we only show half of the spectrum and half of the spikes because their amplitudes and positions are symmetric). Broadly speaking, there are three main approaches to spectral super-resolution: linear non-parametric methods [62], techniques based on Prony’s method [27,62] and optimization-based methods [4,38,65]. The first three rows of Fig. 2 show the results of applying a representative of each approach to a spectral super-resolution problem when there are no outliers in the data (left column) and when there are (right column). Fig. 2. View largeDownload slide Estimate of the sparse spectrum of the multisinusoidal signal from Fig. 1, when outliers are absent from the data (left column) and when they are present (right column). The estimates are shown in red; the true location of the spectra is shown in blue. Methods that do not account for outliers fail to recover all the spectral lines when impulsive noise corrupts the data, whereas an optimization-based estimator incorporating a sparse-noise model still achieves exact recovery. Fig. 2. View largeDownload slide Estimate of the sparse spectrum of the multisinusoidal signal from Fig. 1, when outliers are absent from the data (left column) and when they are present (right column). The estimates are shown in red; the true location of the spectra is shown in blue. Methods that do not account for outliers fail to recover all the spectral lines when impulsive noise corrupts the data, whereas an optimization-based estimator incorporating a sparse-noise model still achieves exact recovery. In the absence of corruptions, the periodogram—a linear non-parametric technique that uses windowing to reduce spectral aliasing [41]—locates most of the relevant frequencies, albeit at a coarse resolution. In contrast, both the Prony-based approach—represented by the Multiple Signal Classification (MUSIC) algorithm [5,57]—and the optimization-based method—based on total-variation norm minimization [4,13,65]—recover the true spectrum of the signal perfectly. All these techniques are designed to allow for small Gaussian-like perturbations to the data, and hence, their performance degrades gracefully when such noise is present (not shown in the figure). However, as we can see in the right column of Fig. 2, when outliers are present in the data their performance is severely affected: none of the methods detect the fourth spectral line of the signal, and they all hallucinate two large spurious spectral lines to the right of the true spectrum. The subject of this paper is an optimization-based method that leverages sparsity-inducing norms to perform spectral super-resolution and simultaneously detect outliers in the data. The bottom row of Fig. 2 shows that this approach is capable of super-resolving the spectrum of the multisinusoidal signal in Fig. 1 exactly from the corrupted measurements, in contrast to techniques that do not account for the presence of outliers in the data. Below is a brief road map of the paper. Section 2 describes our methods and main results. In Section 2.1, we introduce a mathematical model of the spectral super-resolution problem. Section 2.2 justifies the need for a minimum-separation condition on the spectrum of the signal for spectral super-resolution to be well posed. In Section 2.3, we present our optimization-based method and provide a theoretical characterization of its performance. Section 2.4 discusses the robustness of the technique to the choice of regularization parameter. Section 2.5 explains how to adapt the method when the data are perturbed by dense noise. Section 2.6 establishes a connection between our method and atomic-norm denoising. Finally, in Section 2.7 we review the related literature. Our main theoretical contribution—Theorem 2.2—establishes that solving the convex program introduced in Section 2.3 allows to super-resolve up to $$k$$ spectral lines exactly in the presence of $$s$$ outliers (i.e. when $$s$$ measurements are completely corrupted) with high probability from a number of data that is linear both in $$k$$ and $$s$$ up to logarithmic factors. Section 3 is dedicated to the proof of this result, which is non-asymptotic and holds under several assumptions that are described in Section 2.3. Section 4 focuses on demixing algorithms. In Sections 4.1 and 4.2, we explain how to implement the methods discussed in Sections 2.3 and 2.5, respectively, by recasting the dual of the corresponding optimization problems as a tractable semi-definite program (SDP). In Section 4.3, we propose a greedy demixing technique that achieves good empirical results when combined with a local non-convex-optimization step. Section 4.4 describes the implementation of atomic-norm denoising in the presence of outliers using semi-definite programming. Matlab code of all the algorithms discussed in this section is available in the Supplementary Material. Section 5 reports numerical experiments illustrating the performance of the proposed approach. In Section 5.1, we investigate under what conditions our optimization-based method achieves exact demixing empirically. In Section 5.2, we compare atomic-norm denoising to an alternative approach based on matrix completion. We conclude the paper outlining several future research directions in Section 6. 2. Robust spectral super-resolution via convex programming 2.1 Mathematical model We model the multisinusoidal signal of interest as a superposition of $$k$$ complex exponentials   g(t) :=∑j=1kxjexp⁡(i2πfjt), (2.1) where $$\boldsymbol{x} \in \mathbb{C}^{k}$$ is the vector of complex amplitudes and $$\boldsymbol{x}_j$$ is its $$j$$th entry. The spectrum of $$g$$ consists of spectral lines, modeled by Dirac measures that are supported on a subset $$T:=\left\{{f_1, \ldots, f_k}\right\}$$ of the unit interval $$\left[{0,1}\right]$$  μ =∑fj∈Txjδ(f−fj), (2.2) where $$\delta \left({f - f_j}\right)$$ denotes a Dirac measure located at $$f_j$$. Sparse spectra of this form are often called line spectra in the literature. Note that a simple change of variable allows to apply this model to signals with spectra restricted to any interval $$\left[{f_{\min},f_{\max}}\right]$$. By the Nyquist–Shannon sampling theorem, we can recover $$g$$, and consequently $$\mu$$, from an infinite sequence of regularly spaced samples $$\left\{{g\left({l}\right),\; l \in \mathbb{Z}}\right\}$$ by sinc interpolation. The aim of spectral super-resolution is to estimate the support of the line spectrum $$T$$ and the amplitude vector $$\boldsymbol{x}$$ from a finite set of $$n$$ contiguous samples instead. Note that $$\left\{{g\left({l}\right), \; l \in \mathbb{Z}}\right\}$$ are the Fourier series coefficients of $$\mu$$, so mathematically we seek to recover an atomic measure from a subset of its Fourier coefficients. As described in the introduction, we are interested in tackling this problem when a subset of the data is completely corrupted. These corruptions are modeled as additive impulsive noise, represented by a sparse vector $$\boldsymbol{z} \in \mathbb{C}^{n}$$ with $$s$$ non-zero entries. The data are consequently of the form   yl =g(l)+zl,1≤l≤n. (2.3) To represent the measurement model more compactly, we define an operator $$\mathcal{F}_{n}$$ that maps a measure to its first $$n$$ Fourier series coefficients,   y =Fnμ+z. (2.4) Intuitively, $$\mathcal{F}_{n}$$ maps the spectrum $$\mu$$ to $$n$$ regularly spaced samples of the signal $$g$$ in the time domain. 2.2 Minimum-separation condition Even in the absence of any noise, the problem of recovering a signal from $$n$$ samples is vastly underdetermined: we can fill in the missing samples $$g\left({0}\right), g \left({-1}\right), \ldots$$ and $$g\left({n+1}\right), g \left({n+2}\right), \ldots$$ any way we like and then apply sinc interpolation to obtain an estimate that is consistent with the data. For the inverse problem to make sense, we need to leverage additional assumptions about the structure of the signal. In spectral super-resolution, the usual assumption is that the spectrum of the signal is sparse. This is reminiscent of compressed sensing [17], where signals are recovered robustly from randomized measurements by exploiting a sparsity prior. A crucial insight underlying compressed-sensing theory is that the randomized operator obeys the restricted isometry property (RIP), which ensures that the measurements preserve the energy of any sparse signal with high probability [18]. Unfortunately, this is not the case for our measurement operator of interest. The reason is that signals consisting of clustered spectral lines may lie almost in the null space of the sampling operator, even if the number of spectral lines is small. Additional conditions beyond sparsity are necessary to ensure that the problem is well posed. To this end, we define the minimum separation of the support of a signal, as introduced in [12]. Definition 2.1 (Minimum separation) For a set of points $$T \subset \left[{0,1}\right]$$, the minimum separation (or minimum distance) is defined as the closest distance between any two elements from $$T$$,   Δ(T)=inf(f1,f2)∈T:f1≠f2|f2−f1|. (2.5) To be clear, this is the wrap-around distance so that the distance between $$f_1 = 0$$ and $$f_2 = 3/4$$ is equal to $$1/4$$. If the minimum distance is too small with respect to the number of measurements, then it may be impossible to resolve a signal even under very small levels of noise. A fundamental limit in this sense is $${\it{\Delta}}^{\ast} := \frac{2}{n-1}$$, which is the width of the main lobe of the periodized sinc kernel that is convolved with the spectrum when we truncate the number of samples to $$n$$. This limit arises because for minimum separations just below $${\it{\Delta}}^{\ast} / 2$$ there exist signals that are almost suppressed by the sampling operator $$\mathcal{F}_{n}$$. If such a signal $$d$$ corresponds to the difference between two different signals $$s_1$$ and $$s_2$$ so that $$s_1 - s_2 = d$$, it will be very challenging to distinguish $$s_1$$ and $$s_2$$ from the available data.1 This phenomenon can be characterized theoretically in an asymptotic setting using Slepian’s prolate-spheroidal sequences [58] (see also Section 3.2 in [12]). More recently, Theorem 1.3 of [49] provides a non-asymptotic analysis, and other works have obtained lower bounds on the minimum separation necessary for convex-programming approaches to succeed [33,64]. 2.3 Robust spectral super-resolution via convex programming Spectral super-resolution in the presence of outliers boils down to estimating $$\mu$$ and $$\boldsymbol{z}$$ in the mixture model (2.4). Without additional constraints, this is not very ambitious: data consistency is trivially achieved, for instance, by setting the sines to zero and declaring every sample to be a spike. Our goal is to fit the two components in the simplest way possible, i.e. so that the spectrum of the multisinusoidal signal—the sines—is restricted to a small number of frequencies and the impulsive noise—the spikes—only affects a small subset of the data. Many modern signal processing methods rely on the design of cost functions that (1) encode prior knowledge about signal structure and (2) can be minimized efficiently. In particular, penalizing the $$\ell_1$$ norm is an efficient and robust method for obtaining sparse estimates in denoising [25], regression [69] and inverse problems such as compressed sensing [19,28]. In order to fit a mixture model where both the spikes and the spectrum of the sines are sparse, we propose minimizing a cost function that penalizes the $$\ell_1$$ norm of both components (or rather a continuous counterpart of the $$\ell_1$$ norm in the case of the spectrum, as we explain below). We would like to note that this approach was introduced by some of the authors of the present paper in [38,68], but without any theoretical analysis, and applied to multiple target tracking from radar measurements in [75]. Similar ideas have been previously leveraged to separate low-rank and sparse matrices [14,23], perform compressed sensing from corrupted data [44] and demix signals that are sparse in different bases [48]. Recall that the spectrum of the sinusoidal component in our mixture model is modeled as a measure that is supported on a continuous interval. Its $$\ell_1$$ norm is therefore not well defined. In order to promote sparsity in the estimate, we resort instead to a continuous version of the $$\ell_1$$ norm: the total variation (TV) norm.2 If we consider the space of measures supported on the unit interval, this norm is dual to the infinity norm, so that   ||ν||TV:=sup||h||∞≤1,h∈C(T)Re[∫Th(f)¯ν(df)], (2.6) for any measure $$\nu$$ (for a different definition see Section A in the Appendix of [12]). In the case of a superposition of Dirac deltas as in (2.2), the TV norm is equal to the $$\ell_1$$ norm of the coefficients, i.e. $$\left|\left|{ \mu }\right|\right| _{\mathrm{TV}}=\left|\left|{ \boldsymbol{x}}\right|\right| _{1}$$. Spectral super-resolution via TV norm minimization, introduced in [12,26] (see also [11]), has been shown to achieve exact recovery under a minimum separation of $$\frac{2.52}{ n-1 }$$ in [38] and to be robust to missing data in [66]. Our proposed method minimizes the sum of the $$\ell_1$$ norm of the spikes and the TV norm of the spectrum of the sines subject to a data-consistency constraint:   minμ~,z~||μ~||TV+λ||z~||1subject toFnμ~+z~=y. (2.7) $$\lambda > 0$$ is a regularization parameter that governs the weight of each penalty term. This optimization program is convex. Section 4.1 explains how to solve it by reformulating its dual as an SDP. Our main theoretical result is that solving (2.7) achieves perfect demixing with high probability under certain assumptions. Theorem 2.2 (Proof in Section 3) Suppose that we observe $$n$$ samples of the form   y =Fnμ+z, (2.8) where each entry in $$\boldsymbol{z}$$ is non-zero with probability $$\frac{s}{n}$$ (independently of each other) and the support $$T:=\left\{{f_1, \ldots, f_k}\right\}$$ of   μ :=∑j=1kxjδ(f−fj), (2.9) has a minimum separation lower bounded by   Δmin:=2.52n−1. (2.10) If the phases of the entries in $$\boldsymbol{x} \in \mathbb{C}^{k}$$ and the non-zero entries in $$\boldsymbol{z} \in \mathbb{C}^{n}$$ are i.i.d. random variables uniformly distributed in $$\left[{0,2\pi}\right]$$, then the solution to Problem (2.7) with $$\lambda = 1/\sqrt{n}$$ is exactly equal to $$\mu$$ and $$\boldsymbol{z}$$ with probability $$1-\epsilon$$ for any $$\epsilon>0$$ as long as   k ≤Ck(log⁡nϵ)−2n, (2.11)  s ≤Cs(log⁡nϵ)−2n, (2.12) for fixed numerical constants $$C_k$$ and $$C_s$$ and $$n \geq 2 \times 10^3$$. The theorem guarantees that our method is able to super-resolve a number of spectral lines that is proportional to the number of measurements, even if the data contain a constant fraction of outliers, up to logarithmic factors. The proof is presented in Section 3; it is based on the construction of a random trigonometric polynomial that certifies exact demixing. Our result is non-asymptotic and holds with high probability under several assumptions, which we now discuss in more detail. The support of the sparse corruptions follows a Bernoulli model, where each entry is non-zero with probability $$s/n$$ independently from each other. This model is essentially equivalent to choosing the support of the outliers uniformly at random from all possible subsets of cardinality $$s$$, as shown in Section 7.1 of [14] (see also [17, Section 2.3] and [20, Section 8.1]). The phases of the amplitudes of the spectral lines are assumed to be i.i.d. uniform random variables (note, however, that the amplitudes can take any value). Modeling the phase of the spectral components of a multisinusoidal signal in this way is a common assumption in signal processing, see, for example, [62, Chapter 4.1]. The phases of the amplitudes of the additive corruptions are also assumed to be i.i.d. uniform random variables (the amplitudes can again take any value). If we constrain the corruptions to be real, the derandomization argument in [14, Section 2.2] allows to obtain guarantees for arbitrary sign patterns. We have already discussed the minimum-separation condition on the spectrum of the multisinusoidal component in Section 2.2. Our assumptions model a non-adversarial situation where the outliers are not designed to cancel out the samples from the multisinusoidal signal. In the absence of any such assumption, it is possible to concoct instances for which the demixing problem is ill posed, even if the number of spectral lines and outliers is small. We illustrate this with a simple example, based on the picket-fence sequence used as an extremal function for signal-decomposition uncertainty principles in [29,30]. Consider $$k'$$ spectral lines with unit amplitudes with an equispaced support   μ′:=1k′∑j=0k′−1δ(f−j/k′). (2.13) The samples of the corresponding multisinusoidal signal $$g'$$ are zero except at multiples of $$k'$$  g′(l) ={1if l/k′∈Z,0otherwise.  (2.14) If we choose the corruptions $$\boldsymbol{z'}$$ to cancel out these non-zero samples   z′l ={−1if l/k′∈Z,0otherwise,  (2.15) then the corresponding measurements are all equal to zero! For these data, the demixing problem is obviously impossible to solve by any method. Set $$k':= \sqrt{n}$$ so that the number of measurements $$n$$ equals $$\left({k'}\right)^2$$. Then the number of outliers is just $$n/k' = \sqrt{n}$$ and the minimum separation between the spikes is $$1/\sqrt{n}$$, which amply satisfies the minimum-separation condition 2.10. This shows that additional assumptions beyond the minimum-separation condition are necessary for the inverse problem to make sense. A related phenomenon arises in compressed sensing, where random measurement schemes avoid similar adversarial situations (see [17, Section 1.3] and [70]). An interesting subject for future research is whether it is possible to establish the guarantees for exact demixing provided by Theorem 2.2 without random assumptions on the phase of the different components, or if these assumptions are necessary for the demixing problem to be well posed. 2.4 Regularization parameter A question of practical importance is whether the performance of our demixing method is robust to the choice of the regularization parameter $$\lambda$$ in Problem (2.7). Theorem 2.2 indicates that this is the case in the following sense. If we set $$\lambda$$ to a fixed value that is proportional to $$1/\sqrt{n}$$,3 then exact demixing occurs for a number of spectral lines $$k$$ and a number of outliers $$s$$ that range from zero to a certain maximum value proportional to $$n$$ (up to logarithmic factors). In this section, we provide additional theoretical evidence for the robustness of our method to the choice of $$\lambda$$. If exact recovery occurs for a certain pair $$\left\{{\mu, \boldsymbol{z}}\right\}$$ and a certain $$\lambda$$, then it will also succeed for any trimmed version$$\left\{{\mu', \boldsymbol{z'}}\right\}$$ (obtained by removing some elements of the support of $$\mu$$ or $$\boldsymbol{z}$$, or both) for the same value of $$\lambda$$. Lemma 2.3 (Proof in Section A) Let $$\boldsymbol{z}$$ be a vector with support $${\it{\Omega}}$$ and let $$\mu$$ be an arbitrary measure such that   y=Fnμ+z. (2.16) Assume that the pair $$\left\{{\mu,\boldsymbol{z}}\right\}$$ is the unique solution to Problem (2.7) and consider the data   y′=Fnμ′+z′. (2.17) $$\mu'$$ is a trimmed version of $$\mu$$: it is equal to $$\mu$$ on a subset of its support $$T' \subseteq T$$ and is zero everywhere else. Similarly, $$\boldsymbol{z'}$$ equals $$\boldsymbol{z}$$ on a subset of entries $${\it{\Omega}}' \subseteq {\it{\Omega}}$$ and is zero otherwise. For any choice of $$T'$$ and $${\it{\Omega}}'$$, $$\left\{{\mu,\boldsymbol{\boldsymbol{z'}}}\right\}$$ is the unique solution to Problem (2.7) if we set the data vector to equal $$\boldsymbol{y'}$$ for the same value of $$\lambda$$. This result and its proof are inspired by Theorem 2.2 in [14]. As illustrated by Figs 12 and 13, our numerical experiments corroborate the lemma: we consistently observe that if exact demixing occurs for most signals with a certain number of spectral lines and outliers, then it also occurs for most signals with less spectral lines and less corruptions (as long as the minimum separation is the same) for a fixed value of $$\lambda$$. 2.5 Stability to dense perturbations One of the advantages of our optimization-based framework is that we can account for additional assumptions on the problem structure by modifying either the cost function or the constraints of the optimization problem used to perform demixing. In most applications of spectral super-resolution, the data will deviate from the multisinusoidal model (2.1) because of measurement noise and other perturbations, even in the absence of outliers. We model such deviations as a dense additive perturbation $$\boldsymbol{w}$$, such that $$\left|\left|{ \boldsymbol{w}}\right|\right| _{2} \leq \sigma$$ for a certain noise level $$\sigma$$,   y=Fnμ+z+w. (2.18) Problem (2.7) can be adapted to this measurement model by relaxing the equality constraint that enforces data consistency to an inequality which takes into account the noise level   minμ~,z~||μ~||TV+λ||z~||1 subject to ||y−Fnμ~+z~||2≤σ. (2.19) Just like Problem (2.7), this optimization problem can be solved by recasting its dual as a tractable SDP, as we explain in detail in Section 4.2. 2.6 Atomic-norm denoising Our demixing method is closely related to atomic-norm denoising of multisinusoidal samples. Consider the $$n$$-dimensional vector $$\boldsymbol{g} := \mathcal{F}_n \, \mu$$ containing clean samples from a signal $$g$$ defined by (2.1). The assumption that the spectrum $$\mu$$ of $$g$$ consists of $$k$$ spectral lines is equivalent to $$\boldsymbol{g}$$ having a sparse representation in an infinite dictionary of $$n$$-dimensional sinusoidal atoms$$\boldsymbol{a} \left({f, \phi}\right) \in \mathbb{C}^{n}$$ parameterized by frequency $$f \in [0, 1)$$ and phase $$\phi \in [0, 2\pi)$$,   a(f,ϕ)l :=1neiϕei2πlf,1≤l≤n. (2.20) Indeed, $$\boldsymbol{g}$$ can be expressed as a linear combination of $$k$$ atoms   g =n∑j=1k|xj|a(fj,ϕj),xj:=|xj|ei2πϕj. (2.21) This representation can be leveraged in an optimization framework using the atomic norm, an idea introduced in [22] and first applied to spectral super-resolution in [4]. The atomic norm induced by a set of atoms $$\mathcal{A}$$ is equal to the gauge of $$\mathcal{A}$$ defined by   ||u||A :=inf{t>0:u∈tconv(A)}, (2.22) which is a norm as long as $$\mathcal{A}$$ is centrally symmetric around the origin (as is the case for (2.20)). Geometrically, the unit ball of the atomic norm is the convex hull of the atoms in $$\mathcal{A}$$, just like the $$\ell_1$$ norm ball is the convex hull of unit-norm one-sparse vectors. As a result, signals consisting of a small number of atoms tend to have a smaller atomic norm (just like sparse vectors tend to have a smaller $$\ell_1$$ norm). Consider the problem of denoising the samples of $$g$$ from corrupted data of the form (2.4),   y=g+z. (2.23) To be clear, the aim is now to separate $$\boldsymbol{g}$$ from the corruption vector $$\boldsymbol{z}$$ instead of directly estimating the spectrum of $$\boldsymbol{g}$$. In order to demix the two signals, we penalize the atomic norm of the multisinusoidal component and the $$\ell_1$$ norm of the sparse component,   ming~,z~1n||g~||A+λ||z~||1subject to g~+z~=y, (2.24) where $$\lambda > 0$$ is a regularization parameter. Problems 2.19 and 2.24 are closely related. Their convex cost functions are designed to exploit sparsity assumptions on the spectrum of $$g$$ and on the corruption vector $$\boldsymbol{z}$$ in ways that are essentially equivalent. More formally, both problems have the same dual, as implied by the following lemma and Lemma 4.1. Lemma 2.4 (Proof in Section B.1) The dual of Problem (2.24) is   maxη∈Cn⟨y,η⟩subject to||Fn∗η||∞≤1, (2.25)  ||η||∞≤λ, (2.26) where the inner product is defined as $$\left \langle{ \boldsymbol{y}}, { \boldsymbol{\eta}}\right \rangle : = \mathrm{Re}\left({\boldsymbol{y}^{\ast}\boldsymbol{\eta}}\right)$$. The fact that the two optimization problems share the same dual has an important consequence established in Section B.2: the same dual certificate can be used to prove that they achieve exact demixing. As a result, the proof of Theorem 2.2 immediately implies that solving Problem (2.24) is successful in separating $$\boldsymbol{g}$$ and $$\boldsymbol{z}$$ under the conditions described in Section 2.3. Corollary 2.5 (Proof in Section B.2) Under the assumptions of Theorem 2.2, $$\boldsymbol{g} := \mathcal{F}_n \, \mu$$ and $$\boldsymbol{z}$$ are the unique solutions to Problem (2.24). Problem (2.24) can be adapted to denoise data that is perturbed by both outliers and dense noise, which follows the measurement model (2.18). Inspired by previous work on line-spectra denoising via atomic-norm minimization [4,65], we remove the equality constraint and add a regularization term to ensure consistency with the data,   ming~,z~1n||g~||A+λ||z~||1+γ2||y−g~−z~||22, (2.27) where $$\gamma > 0$$ is a regularization parameter with a role analogous to $$\sigma$$ in Problem (2.19). In Section 4.4, we discuss how to implement atomic-norm denoising by reformulating Problems 2.24 and 2.27 as SDPs. 2.7 Related work Most previous works analyzing the problem of demixing sines and spikes make the assumption that the frequencies of the sinusoidal component lie on a grid with step size $$1/n$$, where $$n$$ is the number of samples. In that case, demixing reduces to a discrete sparse decomposition problem in a dictionary formed by the concatenation of an identity and a discrete Fourier transform matrix [30]. Bounds on the coherence of this dictionary can be used to derive guarantees for basis pursuit [29] and also techniques based on Prony’s method [31]. Coherence-based bounds do not reflect the fact that most sparse subsets of the dictionary are well conditioned [70], which can be exploited to obtain stronger guarantees for $$\ell_1$$ norm-based methods under random assumptions [44,63]. In this paper, we depart from this previous literature by considering a sinusoidal component whose spectrum may lie on arbitrary points of the unit interval. Our work draws from recent developments on the super-resolution of point sources and line spectra via convex optimization. In [12] (see also [26]), the authors establish that TV minimization achieves exact recovery of measures satisfying a minimum separation of $$\frac{4}{n-1}$$, a result that is sharpened to $$\frac{2.52}{n-1}$$ in [38]. In [66] the method is adapted to a compressed-sensing setting, where a large fraction of the measurements may be missing. The proof of Theorem 2.2 builds upon the techniques developed in [12,38,66]. We would like to point out that stability guarantees for TV norm-based approaches established in subsequent works [1,13,33,37,65] hold only for small perturbations, and do not apply when the data may be perturbed by sparse noise of arbitrary amplitude, as is the case in this paper. In [24], a spectral super-resolution approach based on robust low-rank matrix recovery is shown to be robust to outliers under some incoherence assumptions, which are empirically related to our minimum-separation condition (see Section A in [24]). Ignoring logarithmic factors, the guarantees in [24] allow for exact denoising of up to $$\mathcal{O}\left({\sqrt{n}}\right)$$ spectral lines in the presence of $$\mathcal{O}\left({n}\right)$$ outliers, where $$n$$ is the number of measurements. Corollary 2.5, which follows from our main result Theorem 2.2,} establishes that our approach succeeds in denoising up to $$\mathcal{O}\left({n}\right)$$ spectral lines also in the presence of $$\mathcal{O}\left({n}\right)$$ outliers (again ignoring logarithmic factors). In Section 5.2, we compare both techniques empirically. Finally, we would like to mention another method exploiting optimization and low-rank matrix structure [74] and an alternative approach to gridless spectral super-resolution [60], which has been recently adapted to account for missing data and impulsive noise [73]. In both cases, no theoretical results guaranteeing exact recovery in the presence of outliers are provided. 3. Proof of Theorem 2.2 3.1 Dual polynomial We prove Theorem 2.2 by constructing a trigonometric polynomial whose existence certifies that solving Problem (2.7) achieves exact demixing. We refer to this object as a dual polynomial, because its vector of coefficients is a solution to the dual of Problem (2.7). This vector is known as a dual certificate in the compressed-sensing literature [17]. Proposition 3.1 (Proof in Section C) Let $$T \subset \left[{0,1}\right]$$ be the non-zero support of $$\mu$$ and $${\it{\Omega}} \subset \left\{{1,2,\ldots,n}\right\}$$ the non-zero support of $$\boldsymbol{z}$$. If there exists a trigonometric polynomial of the form   Q(f) =Fn∗q (3.1)   =∑j=1nqje−i2πjf, (3.2) which satisfies   Q(fj)=xj|xj|,∀fj∈T, (3.3)  |Q(f)|<1,∀f∈Tc, (3.4)  ql=λzl|zl|,∀l∈Ω, (3.5)  |ql|<λ,∀l∈Ωc, (3.6) then $$\left({\mu,\boldsymbol{z}}\right)$$ is the unique solution to Problem 2.7 as long as $$k+s \leq n$$. The dual polynomial can be interpreted as a subgradient of the TV norm at the measure $$\mu$$, in the sense that   ||μ+ν||TV ≥||μ||TV+⟨Q,ν⟩,⟨Q,ν⟩:=Re[∫[0,1]Q(f)¯dν(f)], (3.7) for any measure $$\nu$$ supported in the unit interval. In addition, weighting the coefficients of $$Q$$ by $$1/\lambda$$ yields a subgradient of the $$\ell_1$$ norm at the vector $$\boldsymbol{z}$$. This means that for any other feasible pair $$\left({ \mu', \boldsymbol{z}'}\right)$$ such that $$\boldsymbol{y} = \mathcal{F}_n \, \mu' + \boldsymbol{z}'$$  ||μ′||TV+λ||z′||1 ≥||μ||TV+⟨Q,μ′−μ⟩+λ||z||1+λ⟨1λq,z′−z⟩ (3.8)  ≥||μ||TV+⟨Fn∗q,μ′−μ⟩+λ||z||1+⟨q,z′−z⟩ (3.9)   =||μ||TV+λ||z||1+⟨q,Fn(μ′−μ)+z′−z⟩ (3.10)   =||μ||TV+λ||z||1since Fnμ′+z′=Fnμ+z. (3.11) The existence of $$Q$$ thus implies that $$\left({\mu,\boldsymbol{z}}\right)$$ is a solution to Problem 2.7. In fact, as stated in Proposition 3.1, it implies that $$\left({\mu,\boldsymbol{z}}\right)$$ is the unique solution. The rest of this section is devoted to showing that a dual polynomial exists with high probability, as formalized by the following proposition. Proposition 3.2 (Existence of dual polynomial) Under the assumptions of Theorem 2.2, there exists a dual polynomial associated to $$\mu$$ and $$\boldsymbol{z}$$ with probability at least $$1-\epsilon$$. In order to simplify notation in the sequel, we define the vectors $$\boldsymbol{h} \in \mathbb{C}^{k}$$ and $$\boldsymbol{r} \in \mathbb{C}^{s}$$ and an integer $$m$$ such that   hj :=xj|xj|1≤j≤k, (3.12)  rl :=zl|zl|l∈Ω, (3.13)  m :={n−12if n is odd,n2−1if n is even.  (3.14) Applying a simple change of variable, we express $$Q$$ as   Q(f) =∑l=−mmqle−i2πlf. (3.15) In a nutshell, our goal is (1) to construct a polynomial of this form so that $$Q$$ interpolates $$\boldsymbol{h}$$ on $$T$$ and $$\boldsymbol{q}$$ interpolates $$\boldsymbol{r}$$ on $${\it{\Omega}}$$, and (2) to verify that the magnitude of $$Q$$ is strictly bounded by one on $$T^c$$ and the magnitude of $$\boldsymbol{q}$$ is strictly bounded by $$\lambda$$ on $${\it{\Omega}}^c$$. 3.2 Construction via interpolation We now take a brief detour to introduce a basic technique for the construction of dual polynomials. Consider the spectral super-resolution problem when the data are of the form $$\boldsymbol{\bar{y}} := \mathcal{F}_n \, \mu$$, i.e. when there are no outliers. A simple corollary to Proposition 3.1 is that the existence of a dual polynomial of the form   Q¯(f) =∑l=−mmq¯le−i2πlf (3.16) such that   Q¯(fj)=hj,∀fj∈T, (3.17)  |Q¯(f)|<1,∀f∈Tc, (3.18) implies that TV norm minimization achieves exact recovery in the absence of noise. In this section, we describe how to construct such a polynomial using interpolation. This technique was introduced in [12] to obtain guarantees for super-resolution under a minimum-separation condition. The basic idea is to use a kernel $$\bar{ K }$$ and its derivative $$\bar{ K}^{\left({1}\right)}$$ to interpolate $$\boldsymbol{h}$$ while forcing the derivative of the polynomial to equal zero on $$T$$. Setting the derivative to zero induces a local extremum, which ensures that the magnitude of the polynomial stays bounded below one in the vicinity of $$T$$ (see Fig. 11 in [38] for an illustration). More formally,   Q¯(f) :=∑j=1kα¯jK¯(f−fj)+κ∑j=1kβ¯jK¯(1)(f−fj), (3.19) where   κ:=1|K¯(2)(0)| (3.20) is the value of the second derivative of the kernel at the origin. This quantity will appear often in the proof to simplify notation. $$\boldsymbol{\bar{\alpha}} \in \mathbb{C}^{k}$$ and $$\boldsymbol{\bar{\beta}} \in \mathbb{C}^{k}$$ are coefficient vectors set so that   Q¯(fj) =hj,fj∈T, (3.21)  Q¯R(1)(fj)+iQ¯I(1)(fj) =0,fj∈T, (3.22) where $$\bar{ Q}_R^{\left({1}\right)}$$ denotes the real part of $$\bar{Q}^{\left({1}\right)}$$ and $$\bar{ Q}_I^{\left({1}\right)}$$ the imaginary part. In matrix form, $$\boldsymbol{\bar{\alpha}}$$ and $$\boldsymbol{\bar{\beta}}$$ are the solution to the system   [D¯0D¯1−D¯1D¯2][α¯β¯] =[h0], (3.23) where   (D¯0)jl=K¯(fj−fl),(D¯1)jl=κK¯(1)(fj−fl),(D¯2)jl=−κ2K¯(2)(fj−fl). (3.24) In [12], $$\bar{Q}$$ is shown to be a valid dual polynomial for a minimum separation equal to $$\frac{4}{n-1}$$ when the interpolation kernel is a squared Fejér kernel. The required minimum separation is sharpened to $$\frac{2.52}{n-1}$$ in [38] using a different kernel, which will be our choice for $$\bar{K}$$ in this paper. Consider the Dirichlet kernel of order $$\tilde{m} >0$$  Dm~(f):=12m~+1∑l=−m~m~ei2πlf={1if f=0sin⁡((2m~+1)πf)(2m~+1)sin⁡(πf)otherwise.  (3.25) Following [38], we define $$\bar{K}$$ as the product of three different Dirichlet kernels with different orders   K¯(f) :=D0.247m(f)D0.339m(f)D0.414m(f) (3.26)   =∑l=−mmclei2πlf, (3.27) where $$\boldsymbol{c} \in \mathbb{C}^{n}$$ is the convolution of the Fourier coefficients of the three Dirichlet kernels. The choice of the width of the three kernels might seem rather arbitrary; it is chosen to optimize the bound on the minimum separation by achieving a good trade-off between the spikiness of $$\bar{K}$$ in the vicinity of the origin and its asymptotic decay [38]. For simplicity, we assume that $${0.247} \, m$$, $${0.339} \, m$$ and $${0.414} \, m$$ are all integers.4Figure 3 shows $$\bar{K}$$ and its first derivative. Fig. 3. View largeDownload slide The top row shows the interpolating kernel $$K$$ and $$K^{\left({1}\right)}$$ compared with a scaled version of $$\bar{K}$$ and $$\bar{K}^{\left({1}\right)}$$. In the second row, we see the asymptotic decay of the magnitudes of both kernels and their derivatives. The left image in the bottom row illustrates the construction of $$K$$: the Fourier coefficients $$\boldsymbol{c}$$ of $$\bar{K}$$ that lie in $${\it{\Omega}}$$ are set to zero. On the right, we can see the Fourier coefficients of $$K^{\left({1}\right)}$$ and a scaled version of $$\bar{K}^{\left({1}\right)}$$. Fig. 3. View largeDownload slide The top row shows the interpolating kernel $$K$$ and $$K^{\left({1}\right)}$$ compared with a scaled version of $$\bar{K}$$ and $$\bar{K}^{\left({1}\right)}$$. In the second row, we see the asymptotic decay of the magnitudes of both kernels and their derivatives. The left image in the bottom row illustrates the construction of $$K$$: the Fourier coefficients $$\boldsymbol{c}$$ of $$\bar{K}$$ that lie in $${\it{\Omega}}$$ are set to zero. On the right, we can see the Fourier coefficients of $$K^{\left({1}\right)}$$ and a scaled version of $$\bar{K}^{\left({1}\right)}$$. We end the section with two lemmas bounding $$\kappa$$ and the magnitude of the coefficients of $$\boldsymbol{q}$$, which will be useful at different points of the proof. Lemma 3.3 If $$m \geq 10^3$$, the constant $$\kappa$$, defined by (3.20), satisfies   0.467m≤κ≤0.468m. (3.28) Proof. The bound follows from the fact that $$\mathcal{D}_{\tilde{m}}^{\left({2}\right)} \left({0}\right) := -4 \pi^2 \tilde{m} \left({1+\tilde{m}}\right)/3$$ and equation (C.19) in [38] (see also Lemma 4.8 in [38]). □ Lemma 3.4 (Proof in Section D) The coefficients of $$\bar{ K }$$ satisfy   ||c||∞ ≤1.3m. (3.29) 3.3 Interpolation with a random kernel The trigonometric polynomial $$\bar{Q}$$ defined in the previous section is not a valid certificate when outliers are present in the data; it does not satisfy (3.5) and (3.6). In order to adapt the construction so that it meets these conditions, we draw upon techniques developed in [66], which studies spectral super-resolution in a compressed-sensing scenario where a subset $$\mathcal{S}$$ of the samples is missing. To prove that TV norm minimization succeeds in such a scenario, the authors of [66] construct a bounded polynomial with coefficients restricted to the complement of $$\mathcal{S}$$, which interpolates the sign pattern of the line spectra on their support. This is achieved using an interpolation kernel with coefficients supported on $$\mathcal{S}^c$$. We denote our dual-polynomial candidate by $$Q$$. Let us begin by decomposing $$Q$$ into two components   Q(f) :=Qaux(f)+R(f), (3.30) such that the coefficients of the first component are restricted to $${\it{\Omega}}^c$$,   Qaux(f) :=∑l∈Ωcqle−i2πlf, (3.31) and the coefficients of the second component are restricted to $${\it{\Omega}}$$ and fixed to equal$$\lambda \boldsymbol{r}$$ (recall that $$\lambda = 1/\sqrt{n}$$),   R(f) :=1n∑l∈Ωrle−i2πlf. (3.32) This immediately guarantees that $$Q$$ satisfies (3.5). Now our task is to construct $$Q_{\mathrm{aux}}$$, so that $$Q$$ also meets the rest of conditions in Proposition 3.1. Following the interpolation technique described in Section 3.2, we constrain $$Q$$ to interpolate $$\boldsymbol{h}$$ and have zero derivative in $$T$$,   Q(fj) =hj,fj∈T, (3.33)  QR(1)(fj)+iQI(1)(fj) =0,fj∈T. (3.34) Given that $$R$$ is fixed, this is equivalent to   Qaux(fj) =hj−R(fj),fj∈T, (3.35)  (Qaux)R(1)(fj)+i(Qaux)I(1)(fj) =−RR(1)(fj)−iRI(1)(fj),fj∈T, (3.36) where the subscript $$R$$ indicates the real part of a number and the subscript $$I$$ the imaginary part. This interpolation problem is very similar to the one that arises in compressed sensing off the grid [66]: we must interpolate a certain vector with a polynomial whose coefficients are restricted to a certain subset, in our case $${\it{\Omega}}^c$$. Following [66] we employ an interpolation kernel $$K$$ obtained by selecting the coefficients of $$\bar{K}$$ in $${\it{\Omega}}^c$$,   K(f) :=∑l∈Ωcclei2πlf (3.37)   =∑l=−mmδΩc(l)clei2πlf, (3.38) where $$\delta_{{\it{\Omega}}^c}$$ is an indicator random variable that is equal to one if $$l \in {\it{\Omega}}^c$$ and to zero otherwise. Under the assumptions of Theorem 2.2, these are independent Bernoulli random variables with parameter $$\frac{n-s}{n}$$, so that the mean of $$K$$ is equal to a scaled version of $$\bar{K}$$,   E(K(f)) :=n−sn∑l=−mmclei2πlf (3.39)   =n−snK¯(f). (3.40)$$K$$ and its derivatives concentrate around $$\bar{K}$$ and its derivatives (scaled by $$\frac{n-s}{n}$$) near the origin, but they do not display the same asymptotic decay. This is illustrated in Fig. 3. Using $$K$$ and its first derivative $$K^{\left({1}\right)}$$ to construct $$Q_{\mathrm{aux}}$$ ensures that its non-zero coefficients are restricted to $${\it{\Omega}}^c$$. In more detail, $$Q_{\mathrm{aux}}$$ is a linear combination of shifted and scaled copies of $$K$$ and $$K^{\left({1}\right)}$$,   Qaux(f) :=∑j=1kαjK(f−fj)+κβjK(1)(f−fj), (3.41) where $$\boldsymbol{\alpha} \in \mathbb{C}^{k}$$ and $$\boldsymbol{\beta} \in \mathbb{C}^{k}$$ are chosen to satisfy (3.35) and (3.36). The corresponding system of equations (3.35) and (3.36) can be recast in matrix form:   [D0D1−D1D2][αβ]=[h0]−1nBΩr, (3.42) where   (D0)jl=K(fj−fl),(D1)jl=κK(1)(fj−fl),(D2)jl=−κ2K(2)(fj−fl). (3.43) Note that we have expressed the values of $$R$$ and $$R^{\left({1}\right)}$$ in $$T$$ in terms of $$\boldsymbol{r}$$,   1nBΩr=[R(f1)R(f2)⋯R(fk)−κR(1)(f1)−κR(1)(f2)⋯−κR(1)(fk)]T, (3.44) where   b(l) :=[e−i2πlf1e−i2πlf2⋯e−i2πlfki2πlκe−i2πlf1⋯i2πlκe−i2πlfk]T, (3.45)  BΩ :=[b(i1)b(i2)⋯b(is)],Ω={i1,i2,…is}. (3.46) Solving this system of equations yields $$\boldsymbol{\alpha}$$ and $$\boldsymbol{\beta}$$, and fixes the dual-polynomial candidate,   Q(f) :=∑j=1kαjK(f−fj)+κ∑j=1kβjK(1)(f−fj)+R(f) (3.47)   =v0(f)TD−1([h0]−1nBΩr)+R(f), (3.48) where we define   vℓ(f) :=κℓ[K(ℓ)(f−f1)⋯K(ℓ)(f−fk)κK(ℓ+1)(f−f1)⋯κK(ℓ+1)(f−fk)]T for $$\ell=0,1,2, \ldots$$ In the next section, we establish that a polynomial of this form is guaranteed to be a valid certificate with high probability. Figure 4 illustrates our construction for a specific example (note that for ease of visualization $$\boldsymbol{h}$$ is real instead of complex). Fig. 4. View largeDownload slide Illustration of our construction of a dual-polynomial candidate $$Q$$. The first row shows $$R$$, the component that results from fixing the coefficients of $$Q$$ in $${\it{\Omega}}$$ to equal $$\boldsymbol{r}$$. The second row shows $$Q_{\mathrm{aux}}$$, the component built to ensure that $$Q$$ interpolates $$\boldsymbol{h}$$ by correcting for the presence of $$R$$. On the right image of the second row, we see that the coefficients of $$Q_{\mathrm{aux}}$$ are indeed restricted to $${\it{\Omega}}^c$$. Finally, the last row shows that $$Q$$ satisfies all of the conditions in Proposition 3.1. Fig. 4. View largeDownload slide Illustration of our construction of a dual-polynomial candidate $$Q$$. The first row shows $$R$$, the component that results from fixing the coefficients of $$Q$$ in $${\it{\Omega}}$$ to equal $$\boldsymbol{r}$$. The second row shows $$Q_{\mathrm{aux}}$$, the component built to ensure that $$Q$$ interpolates $$\boldsymbol{h}$$ by correcting for the presence of $$R$$. On the right image of the second row, we see that the coefficients of $$Q_{\mathrm{aux}}$$ are indeed restricted to $${\it{\Omega}}^c$$. Finally, the last row shows that $$Q$$ satisfies all of the conditions in Proposition 3.1. Before ending this section, we record three useful lemmas concerning $$\boldsymbol{b}$$, $$B_{{\it{\Omega}}}$$ and $$\boldsymbol{v_{\ell}}$$. The first bounds the $$\ell_2$$ norm of $$\boldsymbol{b}$$. Lemma 3.5 If $$m \geq 10^3$$, for $$-m \leq l \leq m$$  ||b(l)||22≤10k. (3.49) Proof.   ||b(l)||22≤k(1+max−m≤l≤m(2πlκ)2) ≤9.65kby Lemma 3.3. (3.50) □ The second yields a bound on the operator norm of $$B_{{\it{\Omega}}}$$ that holds with high probability. Lemma 3.6 (Proof in Section E) Under the assumptions of Theorem 2.2, the event   EB :={‖BΩ‖>CB(log⁡nϵ)−12n}, (3.51) where $$C_{B}$$ is a numerical constant defined by (H.41), occurs with probability at most $$\epsilon / 5$$. The third allows to control the behavior of $$\boldsymbol{v_{\ell}}$$, establishing that it does not deviate much from   v¯ℓ(f) :=κℓ[K¯(ℓ)(f−f1)⋯K¯(ℓ)(f−fk)κK¯(ℓ+1)(f−f1)⋯κK¯(ℓ+1)(f−fk)]T on a fine grid with high probability. Lemma 3.7 (Proof in Section F) Let $$\mathcal{G} \subseteq \left[{0,1}\right]$$ be an equispaced grid with cardinality $$400 \, n^2$$. Under the assumptions of Theorem 2.2, the event   Ev:={||vℓ(f)−n−snv¯ℓ(f)||2>Cv(log⁡nϵ)−12,for all f∈G and ℓ∈{0,1,2,3}}, (3.52) where $$C_{\boldsymbol{v}}$$ is a numerical constant defined by (H.45), has probability bounded by $$\epsilon / 5$$. 3.4 Proof of Proposition 3.2 This section summarizes the remaining steps to establish that our proposed construction yields a valid certificate. A detailed description of each step is included in the Appendix. First, we show that the system of equations (3.42) has a unique solution with high probability, so that $$Q$$ is well defined. To alleviate notation, let   D :=[D0D1−D1D2],D¯:=[D¯0D¯1−D¯1D¯2]. (3.53) The following result implies that $$D$$ concentrates around a scaled version of $$\bar{D}$$. As a result, it is invertible, and we can bound the operator norm of its inverse leveraging results from [38]. Lemma 3.8 (Proof in Section G) Under the assumptions of Theorem 2.2, the event   ED :={‖D−n−snD¯‖≥n−s4nmin{1,CD4(log⁡nϵ)−12}} (3.54) occurs with probability at most $$\epsilon / 5$$. In addition, within the event $$\mathcal{E}_{D}^c$$, $$D$$ is invertible and   ‖D−1‖ ≤8, (3.55)  ‖D−1−nn−sD¯−1‖ ≤CD(log⁡nϵ)−12, (3.56) where $$C_{D}$$ is a numerical constant defined by (H.49). An immediate consequence of the lemma is that there exists a solution to the system (3.42) and therefore (3.5) holds as long as $$\mathcal{E}_{D}^c$$ occurs. Corollary 3.9 In $$\mathcal{E}_{D}^c$$, $$Q$$ is well defined and $$Q\left({ f_j }\right) = \boldsymbol{h}_j$$ for all $$f_j \in T$$. All that remains is to establish that $$Q$$ meets conditions (3.6) and (3.6); recall that (3.5) is satisfied by construction. To prove (3.6), we apply a technique from [66]. We first show that $$Q$$ and its derivatives concentrate around $$\bar{Q}$$ and its derivatives, respectively, on a fine grid. Then we leverage Bernstein’s inequality to demonstrate that both polynomials and their respective derivatives are close on the whole unit interval. Finally, we borrow some bounds on $$\bar{Q}$$ and its second derivative from [38] to complete the proof. The details can be found in Section H of the Appendix. Proposition 3.10 (Proof in Section H) Conditioned on $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$  |Q(f)|<1for all f∈Tc, (3.57) with probability at least $$1-\epsilon/5$$ under the assumptions of Theorem 2.2. Finally, the following proposition establishes that the remaining condition (3.6) holds in $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$ with high probability. The proof uses Hoeffding’s inequality combined with Lemmas 3.8 and 3.9 to control the magnitude of the coefficients of $$\boldsymbol{q}$$. Proposition 3.11 (Proof in Section I) Conditioned on $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$  |ql| <1nfor all l∈Ωc, (3.58) with probability at least $$1-\epsilon/5$$ under the assumptions of Theorem 2.2. Now, to complete the proof, let us define $$\mathcal{E}_{Q}$$ to be the event that (3.6) holds and $$\mathcal{E}_{q}$$ the event that (3.6) holds. Applying De Morgan’s laws, the union bound and the fact that for any pair of events $$\mathcal{E}_A$$ and $$\mathcal{E}_B$$  P(EA)≤P(EA|EBc)+P(EB), (3.59) we have   P((EQ∩Eq)c) =P(EQc∪Eqc) (3.60)   ≤P(EQc∪Eqc|EBc∩EDc∩Evc)+P(EB∪ED∪Ev) (3.61)   ≤P(EQc|EBc∩EDc∩Evc)+P(Eqc|EBc∩EDc∩Evc)+P(EB)+P(ED)+P(Ev) (3.62)   ≤ϵ (3.63) by Lemmas 3.6, 3.7 and 3.8 and Propositions 3.10 and 3.11. We conclude that our construction yields a valid certificate with probability at least $$1-\epsilon$$. 4. Algorithms In this section, we discuss how to implement the techniques described in Section 2. In addition, we introduce a greedy demixing method that yields good empirical results. Matlab code implementing all the algorithms presented below is available in the Supplementary Material. The code allows to reproduce the figures in this section, which illustrate the performance of the different approaches through a running example. 4.1 Demixing via semi-definite programming The main obstacle to solving Problem (2.7) is that the primal variable $$\tilde{\mu}$$ is infinite dimensional. One could tackle this issue by discretizing the possible support of $$\tilde{\mu}$$ and replacing its TV norm by the $$\ell_1$$ norm of the corresponding vector [67]. Here, we present an alternative approach, originally proposed in [38], which solves the infinite-dimensional optimization problem directly without resorting to discretization. The approach, inspired by a method for TV norm minimization [12] (see also [4]), relies on the fact that the dual of Problem (2.7) can be recast as a finite-dimensional SDP. To simplify notation, we introduce the operator $$\mathcal{T}$$. For any vector $$\boldsymbol{u}$$ whose first entry $$\boldsymbol{u}_1$$ is positive and real, $$\mathcal{T}\left({\boldsymbol{u}}\right)$$ is a Hermitian Toeplitz matrix whose first row is equal to $$\boldsymbol{u}^T$$. The adjoint of $$\mathcal{T}$$ with respect to the usual matrix inner product $$\left \langle{M_1}, {M_2}\right \rangle=\text{Tr}\left({M_1^{\ast}M_2}\right)$$ extracts the sums of the diagonal and of the different off-diagonal elements of a matrix   T∗(M)j=∑i=1n−j+1Mi,i+j−1. (4.1) Lemma 4.1 The dual of Problem (2.7) is   maxη∈Cn⟨y,η⟩subject to||Fn∗η||∞≤1, (4.2)  ||η||∞≤λ, (4.3) where the inner product is defined as $$\left \langle{ \boldsymbol{y}}, { \boldsymbol{\eta}}\right \rangle : = \mathrm{Re}\left({\boldsymbol{y}^{\ast}\boldsymbol{\eta}}\right)$$. This problem is equivalent to the SDP   maxη∈Cn,Λ∈Cn×n⟨y,η⟩subject to[Ληη∗1]⪰0,T∗(Λ)=[10],||η||∞≤λ, (4.4) where $$\boldsymbol{0} \in \mathbb{C}^{n-1}$$ is a vector of zeros. Lemma 4.1, which follows from Lemma 4.3 below, shows that it is tractable to compute the $$n$$-dimensional solution to the dual of Problem (2.7). However, our goal is to obtain the primal solution, which represents the estimate of the line spectrum and the sparse corruptions. The following lemma, which is a consequence of Lemma 4.4, establishes that we can decode the support of the primal solution from the dual solution. Lemma 4.2 Let   μ^ =∑fj∈T^x^jδ(f−fj), (4.5) and $$\boldsymbol{\hat{z}}$$ be a solution to (2.7), such that $$\widehat{T}$$ and $$\widehat{{\it{\Omega}}}$$ are the non-zero supports of the line spectrum $$\hat{\mu}$$ and the spikes $$\boldsymbol{\hat{z}}$$, respectively. If $$\boldsymbol{ \hat{\eta} } \in \mathbb{C}^n$$ is a corresponding dual solution, then for any $$f_j$$ in $$\widehat{T}$$  (Fn∗η^)(fj)=x^j|x^j| (4.6) and for any $$l$$ in $$\widehat{{\it{\Omega}}}$$  η^l=λz^l|z^l|. (4.7) In other words, the weighted dual solution $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ and the corresponding polynomial $$\mathcal{F}_{n}^{\ast} \, \boldsymbol{ \hat{\eta} }$$ interpolate the sign patterns of the primal-solution components $$\boldsymbol{\hat{z}}$$ and $$\hat{\mu}$$ on their respective supports, as illustrated in the top row of Fig. 5. This suggests estimating the support of the line spectrum and the outliers in the following way. 1. Solve (4.4) to obtain a dual solution $$\boldsymbol{ \hat{\eta} }$$ and compute $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$. 2. Set the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$ to the set of points where $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ equals $$\lambda$$. 3. Set the estimated support of the line spectrum $$\widehat{T}$$ to the set of points where $$\left|{ \mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} } }\right|$$ equals one. 4. Estimate the amplitudes of $$\hat{\mu}$$ and $$\boldsymbol{\hat{\eta}}$$ on $$\hat{T}$$ and $$\hat{{\it{\Omega}}}$$, respectively by solving a system of linear equations $$\boldsymbol{y} = \mathcal{F}_n \hat{\mu} + \hat{\boldsymbol{\eta}}$$. Fig. 5. View largeDownload slide Demixing of the signal in Fig. 1 by semi-definite programming. Top left: the polynomial $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$ (light red), where $$\boldsymbol{ \hat{\eta} }$$ is a solution of Problem (4.4), interpolates the sign of the line spectrum of the sines (dashed red) on their support. Top right: $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ interpolates the sign pattern of the spikes on their support. Bottom: locating the support of $$\mu$$ and $$\boldsymbol{z}$$ allows to demix very accurately (the circular markers represent the original spectrum of the sines and the original spikes and the crosses the corresponding estimates). The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$. Fig. 5. View largeDownload slide Demixing of the signal in Fig. 1 by semi-definite programming. Top left: the polynomial $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$ (light red), where $$\boldsymbol{ \hat{\eta} }$$ is a solution of Problem (4.4), interpolates the sign of the line spectrum of the sines (dashed red) on their support. Top right: $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ interpolates the sign pattern of the spikes on their support. Bottom: locating the support of $$\mu$$ and $$\boldsymbol{z}$$ allows to demix very accurately (the circular markers represent the original spectrum of the sines and the original spikes and the crosses the corresponding estimates). The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$. Figure 5 shows the results obtained by this method on the data described in Fig. 1: both components are recovered very accurately. However, we caution the reader that while the primal solution $$(\hat{\mu}, \hat{\boldsymbol{z}})$$ is generally unique, the dual solutions are non-unique, and some of the dual solutions might produce spurious frequencies and spikes in Steps 2 and 3. In fact, the dual solutions form a convex set, and only those in the interior of this convex set give exact supports $$\hat{{\it{\Omega}}}$$ and $$\hat{T}$$, while those on the boundary generate spurious estimates. When the SDP (4.4) is solved using interior point algorithms as the case in CVX, a dual solution in the interior is returned, generating correct supports as shown in Fig. 5. Refer to [66] for a rigorous treatment of this topic for the related missing-data case. Such technical complication will not seriously affect our estimates of the supports since the amplitudes inferred in Step 4 will be zero for the extra frequencies and spikes, providing a means to eliminate them. 4.2 Demixing in the presence of dense perturbations As described in Section 2.5, our demixing method can be adapted to the presence of dense noise in the data by relaxing the equality constraint in Problem 2.7 to an inequality constraint. The only effect on the dual of the optimization problem, which can still be reformulated as an SDP, is an extra term in the cost function. Lemma 4.3 (Proof in Section J.1) The dual of Problem (2.19) is    maxη∈Cn⟨y,η⟩ −σ||η||2 (4.8)  subject to||Fn∗η||∞≤1, (4.9)  ||η||∞≤λ. (4.10) This problem is equivalent to the SDP   maxη∈Cn,Λ∈Cn×n⟨y,η⟩−σ||η||2subject to[Ληη∗1]⪰0, (4.11)  T∗(Λ)=[10], (4.12)  ||η||∞≤λ, (4.13) where $$\boldsymbol{0} \in \mathbb{C}^{n-1}$$ is a vector of zeros. As in the case without dense noise, the support of the primal solution of Problem (2.19) can be decoded from the dual solution. This is justified by the following lemma, which establishes that the weighted dual solution $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ and the corresponding polynomial $$\mathcal{F}_{n}^{\ast} \, \boldsymbol{ \hat{\eta} }$$ interpolate the sign patterns of the primal-solution components $$\boldsymbol{\hat{z}}$$ and $$\hat{\mu}$$ on their respective supports. Lemma 4.4 (Proof in Section J.2) Let   μ^ =∑fj∈T^x^jδ(f−fj), (4.14) and $$\boldsymbol{\hat{z}}$$ be a solution to (2.19), such that $$\widehat{T}$$ and $$\widehat{{\it{\Omega}}}$$ are the non-zero supports of the line spectrum $$\hat{\mu}$$ and the spikes $$\boldsymbol{\hat{z}}$$, respectively. If $$\boldsymbol{ \hat{\eta} } \in \mathbb{C}^n$$ is a corresponding dual solution, then for any $$f_j$$ in $$\widehat{T}$$  (Fn∗η^)(fj)=x^j|x^j| (4.15) and for any $$l$$ in $$\widehat{{\it{\Omega}}}$$  η^l=λz^l|z^l|. (4.16) Figure 6 shows the magnitude of the dual solutions for different values of additive noise. Motivated by the lemma, we propose to estimate the support of the outliers using $$\boldsymbol{ \hat{\eta} }$$ and the support of the spectral lines using $$\left|{\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }}\right|$$. Our method to perform spectral super-resolution in the presence of outliers and dense noise consequently consists of the following steps: 1. Solve (4.11) to obtain a dual solution $$\boldsymbol{ \hat{\eta} }$$ and compute $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$. 2. Set the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$ to the set of points where $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ equals $$\lambda$$. 3. Set the estimated support of the spectrum $$\widehat{T}$$ to the set of points where $$\left|{ \mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} } }\right|$$ equals one. 4. Estimate the amplitudes of $$\hat{\mu}$$ by solving a least-squares problem using only the data that do not lie in the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$. Fig. 6. View largeDownload slide The left column shows the magnitude of the solution to Problem (B.5) (top row) and to Problem 4.8 for different noise levels (second and third rows). $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ is represented by red lines. Additionally, the support of the sparse perturbation $$\boldsymbol{z}$$ is marked in blue. The right column shows the trigonometric polynomial corresponding to the dual solutions in red, as well as the support of the spectrum of the multisinusoidal components in blue. The data are the same as in Fig. 1 (except for the added noise, which is i.i.d. Gaussian). The parameters $$\lambda$$ and $$\sigma$$ are set to $$1/\sqrt{n}$$ and $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$, respectively. Note that in practice, the value of the noise level would have to be estimated, for example by cross validation. Fig. 6. View largeDownload slide The left column shows the magnitude of the solution to Problem (B.5) (top row) and to Problem 4.8 for different noise levels (second and third rows). $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ is represented by red lines. Additionally, the support of the sparse perturbation $$\boldsymbol{z}$$ is marked in blue. The right column shows the trigonometric polynomial corresponding to the dual solutions in red, as well as the support of the spectrum of the multisinusoidal components in blue. The data are the same as in Fig. 1 (except for the added noise, which is i.i.d. Gaussian). The parameters $$\lambda$$ and $$\sigma$$ are set to $$1/\sqrt{n}$$ and $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$, respectively. Note that in practice, the value of the noise level would have to be estimated, for example by cross validation. Figure 7 shows the result of applying our method to data that includes additive i.i.d. Gaussian noise with a signal-to-noise ratio (SNR) of 30 and 15 dB. Despite the presence of the dense noise, our method is able to detect all spectral lines at 30 dB and all but one at 15 dB. Additionally, it is capable of detecting most of the spikes correctly: at 30 dB it detects a spurious spike and at 15 dB it misses one. Note that the spike that is not detected when the SNR is 15 dB has a magnitude small enough for it to be considered part of the dense noise. Fig. 7. View largeDownload slide The top row shows the results of applying SDP-based spectral super-resolution in the presence of both dense noise and outliers (bottom row) for two different dense noise levels (left and right columns). The second row shows the magnitude of the data, the location of the outliers and the outlier estimate produced by the method. In the bottom row, we can see the magnitude of the sparse and dense noise (note that when the SNR is 15 dB, the smallest sparse-noise components is below the dense noise level). The signal is the same as in Fig. 1, and the data are the same as in Fig. 6. The parameter $$\sigma$$ is set to $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$ and $$\lambda$$ is set to $$1/\sqrt{n}$$. Fig. 7. View largeDownload slide The top row shows the results of applying SDP-based spectral super-resolution in the presence of both dense noise and outliers (bottom row) for two different dense noise levels (left and right columns). The second row shows the magnitude of the data, the location of the outliers and the outlier estimate produced by the method. In the bottom row, we can see the magnitude of the sparse and dense noise (note that when the SNR is 15 dB, the smallest sparse-noise components is below the dense noise level). The signal is the same as in Fig. 1, and the data are the same as in Fig. 6. The parameter $$\sigma$$ is set to $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$ and $$\lambda$$ is set to $$1/\sqrt{n}$$. 4.3 Greedy demixing enhanced by local non-convex optimization In this section, we propose an alternative method for spectral super-resolution in the presence of outliers, which is significantly faster than the SDP-based approach described in the previous sections. In the spirit of matching-pursuit methods [47,51], the algorithm selects the spectral lines of the signal and the locations of the outliers in a greedy fashion. This is equivalent to choosing atoms from a dictionary of the form   D:={a(f,0),f∈[0,1]}∪{e(l),1≤l≤n}. (4.17) The dictionary includes the multisinusoidal atoms $$\boldsymbol{a} \left({ f, 0 }\right)$$ defined in (2.20) and $$n$$spiky atoms $$\boldsymbol{e}\left({l}\right) \in \mathbb{R}^{n}$$, which are equal to the one-sparse standard basis vectors. By (2.23), if the data $$\boldsymbol{y}$$ are of the form (2.3) then they have a $$\left({k+s}\right)$$-sparse representation in terms of the atoms in $$\mathcal{D}$$. Greedy demixing aims to find this sparse representation iteratively. Inspired by recent work on atomic-norm minimization based on the conditional-gradient method [8,52,53], our greedy-demixing procedure includes selection, pruning and local-optimization steps (see also [34,35,61], for spectral super-resolution algorithms that leverage a local optimization step similar to ours). 1. Initialization: The residual $$\boldsymbol{r} \in \mathbb{C}^{n}$$ is initialized to equal the data vector $$\boldsymbol{y}$$. The sets of estimated spectral lines $$\widehat{T}$$ and spikes $$\widehat{{\it{\Omega}}}$$ are initialized to equal the empty set. 2. Selection: At each iteration we compute the atom in $$\mathcal{D}$$ that has the highest correlation with the current residual $$\boldsymbol{r}$$ and update either $$\widehat{T}$$ or $$\widehat{{\it{\Omega}}}$$. For the spiky atoms the correlation is just equal to $$\left|\left|{\boldsymbol{r}}\right|\right| _{\infty}$$. For the sinusoidal atoms, we compute the highest correlation by first determining the location $$f_{\mathrm{grid}}$$ of the maximum of the function $$\mathrm{{corr}}\left({f}\right):= \left|{\left \langle{\boldsymbol{a}\left({f,0}\right)}, {\boldsymbol{r}}\right \rangle}\right|$$ on a fine grid, which can be done efficiently by computing an oversampled fast Fourier transform, and then finding a local minimum of the function $$\mathrm{{corr}}\left({f}\right)$$ using a local search method initialized at $$f_{\mathrm{grid}}$$. 3. Pruning: After adding a new atom to $$\widehat{T}$$ or $$\widehat{{\it{\Omega}}}$$, we compute the coefficients corresponding to the selected atoms using a least squares fit. We then remove any atoms whose corresponding coefficients are smaller than a threshold $$\tau > 0$$. 4. Local optimization: We fix the number of selected sinusoidal atoms $$\hat{k}:=|\widehat{T}|$$, and optimize their locations to update $$\widehat{T}$$ by finding a local minimum of the least squares cost function   ls(f1,…,fk^):=minx^∈Ck^,z^∈C|Ω^|||y−n∑j=1k^x^ja(fj,0)−∑l∈Ω^z^le(l)||2, (4.18) using a local search method5 initialized at the current estimate $$\widehat{T}$$. Alternatively, one can use other methods such as gradient descent to find a local minimum of the non-convex function. 5. The residual is updated by computing the coefficients corresponding to the currently selected atoms using least squares and subtracting the resulting approximation from $$\boldsymbol{y}$$. This algorithm can be applied without any modification to data that are perturbed by dense noise. In Figs 8 and 9, we illustrate the performance of the method on the same data used in Figs 5 and 7. Figure 8 shows what happens if we omit the local optimization step: the algorithm does not yield exact demixing even in the absence of dense noise. In contrast, in Fig. 9, we see that greedy demixing combined with local optimization recovers the two mixed components exactly when no additional noise perturbs the data. In addition, the procedure is robust to the presence of dense noise, as shown in the last two rows of Fig. 9. Fig. 8. View largeDownload slide Greedy demixing without a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6 and 7. The thresholding parameter $$\tau$$ is set depending on the noise level: at 30 dB and in the absence of dense noise it is set small enough not to eliminate the spectral line with the smallest coefficient in the pruning step, whereas at 15 dB, it is set so as not to discard the spectral line with the second smallest coefficient. Fig. 8. View largeDownload slide Greedy demixing without a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6 and 7. The thresholding parameter $$\tau$$ is set depending on the noise level: at 30 dB and in the absence of dense noise it is set small enough not to eliminate the spectral line with the smallest coefficient in the pruning step, whereas at 15 dB, it is set so as not to discard the spectral line with the second smallest coefficient. Fig. 9. View largeDownload slide Greedy demixing with a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6–8. The thresholding parameter $$\tau$$ is set as described in the caption of Fig. 8}. Fig. 9. View largeDownload slide Greedy demixing with a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6–8. The thresholding parameter $$\tau$$ is set as described in the caption of Fig. 8}. Intuitively, the greedy method is not able to achieve exact recovery, because it optimizes the position of each spectral line one by one, eventually not being able to make further progress. The local optimization step refines the fit by optimizing over the positions of the spectral lines simultaneously. This succeeds when the initialization is close enough to a good local minimum of the cost function. Our experiments seem to indicate that the greedy scheme provides such an initialization. As illustrated in Fig. 10, the greedy scheme is significantly faster than the SDP-based approach described earlier. These preliminary empirical results show the potential of coupling greedy approaches with local non-convex optimization. Establishing guarantees for such demixing procedures is an interesting research direction. Fig. 10. View largeDownload slide Comparison of average running times for the SDP-based demixing approach described in Section 4.1 and greedy demixing with a local optimization step over 10 tries (the error bars show 95% confidence intervals). The number of spectral lines and of outliers equal $$10$$. The amplitudes of both components are i.i.d. Gaussian. The minimum separation of the spectral lines is $$2.8/(n+1)$$. Both algorithms achieve exact recovery in all instances. The experiments were carried out on a laptop with an Intel Core i5-5300 CPU 2.3GHz and 12G RAM. Fig. 10. View largeDownload slide Comparison of average running times for the SDP-based demixing approach described in Section 4.1 and greedy demixing with a local optimization step over 10 tries (the error bars show 95% confidence intervals). The number of spectral lines and of outliers equal $$10$$. The amplitudes of both components are i.i.d. Gaussian. The minimum separation of the spectral lines is $$2.8/(n+1)$$. Both algorithms achieve exact recovery in all instances. The experiments were carried out on a laptop with an Intel Core i5-5300 CPU 2.3GHz and 12G RAM. 4.4 Atomic-norm denoising In this section, we discuss how to implement the atomic-norm based denoising procedure described in Section 2.6. Our method relies on the fact that the atomic norm has a semi-definite characterization when the dictionary contains sinusoidal atoms of the form (2.20). This is established in the following proposition, which we borrow from [4,66]. Proposition 4.5 (Proposition 2.1 [66], [4]) For $$\boldsymbol{g} \in \mathbb{C}^{n}$$  ||g||A=inft∈R,u∈Cn{nu1+t2:[T(u)gg∗t]⪰0}, (4.19) where the operator $$\mathcal{T}$$ is defined in Section 4.1. This result allows us to rewrite (2.24) as the SDP   mint∈R,u∈Cn,g~∈Cn,z~∈Cnnu1+t2n+λ||z~||1subject to[T(u)g~g~∗t]⪰0, (4.20)  g~+z~=y, (4.21) which is precisely the dual program of (4.4). Similarly, Problem (2.27) can be reformulated as the SDP   mint∈R,u∈Cn,g~∈Cn,z~∈Cnnu1+t2n+λ||z~||1+γ2||y−g~−z~||22subject to[T(u)g~g~∗t]⪰0. (4.22) This problem can be solved efficiently using the alternating direction method of multipliers [4] (see also [4] for a similar implementation of SDP-based atomic-norm denoising for the case without outliers), as described in detail in Section J.3 of the Appendix. Figure 11 shows the results of applying this method to denoise the data used in Figs 7–9. In the absence of dense noise, the approach yields perfect denoising (not shown in the figure). When dense noise perturbs the data, the method is still able to perform effective denoising, correcting for the presence of the outliers. Fig. 11. View largeDownload slide Denoising via atomic-norm minimization in the presence of both outliers and dense noise. The signal is the same as in Fig. 1 and the data are the same as in Figs 6 and 7. The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$, whereas $$\gamma$$ is set to $$1/\left|\left|{w}\right|\right| _{2}$$ (in practice, we would have to estimate the noise level or set the parameter via cross validation). Fig. 11. View largeDownload slide Denoising via atomic-norm minimization in the presence of both outliers and dense noise. The signal is the same as in Fig. 1 and the data are the same as in Figs 6 and 7. The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$, whereas $$\gamma$$ is set to $$1/\left|\left|{w}\right|\right| _{2}$$ (in practice, we would have to estimate the noise level or set the parameter via cross validation). 5. Numerical Experiments 5.1 Demixing via semi-definite programming In this section, we investigate the performance of the method described in Section 4.1. To do this, we apply the SDP-based approach described in Section 4.1 to data of the form (2.3) varying the different parameters of interest. Fixing either the number of spectral lines $$k$$ or the number of outliers $$s$$ allows us to visualize the performance of the method for a range of values of the line spectrum’s minimum separation $${\it{\Delta}}$$ (defined by (2.5)). The results are shown in Fig. 12. We observe that in every instance there is a rapid-phase transition between the values at which the method always achieves exact demixing and the values at which it fails. The minimum separation at which this phase transition takes place is between $$1/{\left( n-1 \right)\!}$$ and $$2/{\left( n-1 \right)\!}$$, which is smaller than the minimum-separation required by Theorem 2.2. We conjecture that if we allow for arbitrary sign patterns, the phase transition would occur near $$2/{\left( n-1 \right)\!}$$. In fact, if we constrain the amplitudes of the spectral lines to be real instead of complex, the phase transition occurs at a higher minimum separation, as shown in [38, Fig. 7]. Fig. 12. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ (left column) and outliers $$s$$ (right column), as well as different values of the minimum separation of the spectral lines. Each row shows results for a different number of measurements. The value of the regularization parameter $$\lambda$$ is 0.1 for the left column and 0.15 for the second column. The simulations are carried out using CVX [39]. Fig. 12. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ (left column) and outliers $$s$$ (right column), as well as different values of the minimum separation of the spectral lines. Each row shows results for a different number of measurements. The value of the regularization parameter $$\lambda$$ is 0.1 for the left column and 0.15 for the second column. The simulations are carried out using CVX [39]. In order to investigate the effect of the regularization parameter on the performance of the algorithm, we fix $${\it{\Delta}}$$ and perform demixing for different values of $$k$$ and $$s$$. The results are shown in Fig. 13. As suggested by Lemma 2.3, for fixed $$s$$ the method succeeds for all values of $$k$$ below a certain limit, and vice versa when we vary $$s$$. Since $$\lambda$$ weighs the effect of the terms that promote sparsity of the two different components in our mixture model, it is no surprise that varying it affects the trade-off between the number of spectral lines and of spikes that we can demix. For smaller $$\lambda$$ the sparsity-inducing term affecting the multisinusoidal component is stronger, so the method succeeds for mixtures with smaller $$k$$ and larger $$s$$. Analogously, for larger $$\lambda$$ the sparsity-inducing term affecting the outlier component is stronger, so the method succeeds for mixtures with larger $$k$$ and smaller $$s$$. Fig. 13. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ and outliers $$s$$. The minimum separation of the spectral lines is $$2 / (n-1)$$. Each column shows results for a different value of the regularization parameter $$\lambda$$. Each row shows results for a different number of measurements $$n$$. The simulations are carried out using CVX [39]. Fig. 13. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ and outliers $$s$$. The minimum separation of the spectral lines is $$2 / (n-1)$$. Each column shows results for a different value of the regularization parameter $$\lambda$$. Each row shows results for a different number of measurements $$n$$. The simulations are carried out using CVX [39]. 5.2 Comparison with matrix-completion based denoising In this section, we compare the SDP-based atomic-norm denoising method described in Section 4.4 to the matrix-completion based denoising method from [24]. Both algorithms are implemented using CVX [39] and applied to data following model (2.23). In general, we observe that both methods either succeed, achieving extremely small errors (the relative MSE6 is smaller than $$10^{-8}$$), or fail, producing very large errors. We compare the performance by recording whether the methods succeed or fail in denoising randomly generated signals for a different number of spectral lines $$k$$ and outliers $$s$$. To provide a more complete picture, we repeat the simulations for different values of the regularization parameters ($$\lambda$$ for atomic-norm denoising and $$\theta$$ for matrix-completion denoising) that govern the sparsity-inducing terms of the corresponding optimization problems. The values of $$\lambda$$ and $$\theta$$ are chosen separately to yield the best possible performance. Figure 14 shows the results. We observe that atomic-norm denoising consistently outperforms matrix-completion denoising across regimes in which the methods achieve different trade-offs between the values of $$k$$ and $$s$$. In addition, atomic-norm denoising is faster: the average running time for each trial is 3.25 seconds with a standard deviation of 0.30 s, whereas the average running time for the matrix-completion approach is of 11.1 s with a standard deviation of 1.32 s. The experiments were carried out on an Intel Xeon desktop computer with a 3.5 GHz CPU and 24 GB of RAM. Fig. 14. View largeDownload slide Graphs showing the fraction of times Problem (2.7) (top row), and the matrix-completion approach from [24] (bottom row) achieve exact denoising for different values of their respective regularization parameters over 10 trials with random signs and supports. The minimum separation of the spectral lines is $$2 / (n-1)$$ and the number of data is $$n=61$$. The simulations are carried out using CVX [39]. Fig. 14. View largeDownload slide Graphs showing the fraction of times Problem (2.7) (top row), and the matrix-completion approach from [24] (bottom row) achieve exact denoising for different values of their respective regularization parameters over 10 trials with random signs and supports. The minimum separation of the spectral lines is $$2 / (n-1)$$ and the number of data is $$n=61$$. The simulations are carried out using CVX [39]. 6. Conclusion and future research directions In this work, we propose an optimization-based method for spectral super-resolution in the presence of outliers and characterize its performance theoretically. In addition, we describe how to implement the approach using semi-definite programming, discuss its connection to atomic-norm denoising and present a greedy demixing algorithm with a promising empirical performance. Our results suggest the following directions for future research. Proving a result similar to Theorem 2.2 without the assumption that the phases of the different components are random. This would require showing that the dual-polynomial construction in Section 3.3 is valid, without leveraging the concentration bounds that we use for our proof. It is unclear whether this is possible because the interpolation kernel $$K$$ does not display a good asymptotic decay, as shown in Fig. 3. Note that if the amplitudes of the sparse noise $${\boldsymbol{z}}$$ are constrained to be real, then a derandomization argument similar to the one in [14, Theorem 2.1] allows to establish the same guarantees as Theorem 2.2 for a sparse perturbation that has an arbitrary deterministic sign pattern. Deriving guarantees for spectral super-resolution via the approach described in Section 2.5 in the presence of dense and sparse noise. To achieve this, one could combine our dual polynomial construction with the techniques developed in [13,37,65]. In addition, it would be interesting to investigate the application of the method when the level of dense noise is unknown, as in [10]. Developing fast algorithms to solve the SDPs in Sections 4.1 and 4.2. We have found that alternating direction method of multipliers (ADMM) is effective for denoising, but the dual variable converges too slowly for it to be effective in super-resolving the line spectrum. Investigating whether greedy demixing techniques, like the one in Section 4.3, can achieve the same performance as our convex-programming approach both empirically and theoretically. Considering other structured noise models, beyond sparse perturbations, which could be learnt from data by leveraging techniques such as dictionary learning [46,50]. For instance, this could allow to deal with recurring interferences in radar applications. Supplementary Materials Code to replicate the experiments in the paper is available at IMAIAI online. Funding National Science Foundation (DMS-1616340 to C.F., CCF-1464205 to G.T.). Appendix A. Proof of Lemma 2.3 For any vector $${\boldsymbol{u}}$$ and any atomic measure $$\nu$$, we denote by $${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ and $$\nu_{{\mathcal{{S}}}}$$ the restriction of $${\boldsymbol{u}}$$ and $$\nu$$ to the subset of their support indexed by a set $${\mathcal{{S}}}$$. Let $${\left\{ {\hat{\mu},{\boldsymbol{{ \hat{z} }}}}\right\}\!}$$ be any solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$. The pair $${\left\{ {\hat{\mu}+\mu_{T/T'},{\boldsymbol{{ \hat{z} }}}+{\boldsymbol{ z }}_{{\it{\Omega}}/{\it{\Omega}}'}}\right\}\!}$$ is feasible for Problem (2.7) applied to $${\boldsymbol{y}}$$ since   Fnμ^+FnμT/T′+z^+zΩ/Ω′ =y′+FnμT/T′+zΩ/Ω′ (A.1)   =Fnμ′+FnμT/T′+z′+zΩ/Ω′ (A.2)   =Fnμ+z (A.3)   =y. (A.4) By the triangle inequality and the assumption that $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$ is the unique solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$, this implies   ||μ||TV+λ||z||1 <||μ^+μT/T′||TV+λ||z^+zΩ/Ω′||1 (A.5)   ≤||μ^||TV+||μ^T/T′||TV+λ||z^||1+λ||zΩ/Ω′||1, (A.6) unless $$\hat{\mu}+\mu_{T/T'} = \mu$$ and $${\boldsymbol{{ \hat{z} }}}+{\boldsymbol{z}}_{{\it{\Omega}}/{\it{\Omega}}'} = {\boldsymbol{z}}$$, so that   ||μ′||TV+λ||z′||1 =||μ||TV−||μT/T′||TV+λ||z||1−λ||zΩ/Ω′||1 (A.7)   <||μ^||TV+λ||z^||1, (A.8) unless $$\hat{\mu} = \mu$$ and $${\boldsymbol{{ \hat{z} }}} = {\boldsymbol{z'}}$$. We conclude that $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ must be the unique solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$. Appendix B. Atomic-norm denoising B.1 Proof of Lemma 2.4 We define a scaled dual norm $$\|\cdot\|_{{\mathcal{A}}'} := \|\cdot\|_{\mathcal{A}} / \sqrt{n}$$. The dual norm of $$\|\cdot\|_{{\mathcal{A}}'}$$ is   ‖η‖A′∗ =sup||g~||A≤n⟨η,g~⟩ (B.1)   =supϕ∈[0,2π),f∈[0,1]⟨η,neiϕa(f,0)⟩ (B.2)   =supf∈[0,1]|⟨η,na(f,0)⟩| (B.3)   =||Fn∗η||∞. (B.4) The result now follows from the fact that the dual of 2.24 is   maxη∈Cn⟨y,η⟩subject to ‖η‖A′∗≤1, (B.5)  ||η||∞≤λ, (B.6) by a standard argument [22, Section 2.1]. B.2 Proof of Corollary 2.5 The corollary is a direct consequence of the following lemma, which establishes that the dual polynomial whose existence we establish in Proposition 3.2 also guarantees that solving Problem (2.24) achieves exact demixing. Lemma B.1 If there exists a trigonometric polynomial $$Q$$ satisfying the conditions listed in Proposition 3.1, then $${\boldsymbol{g}}$$ and $${\boldsymbol{z}}$$ are the unique solutions to Problem (2.24). Proof. In the case of the atoms defined by (2.20), the atomic norm is given by   ||u||A =inf{x~j≥0},{ϕj∈[0,2π)}{fj∈[0,1]}{∑jx~j:u=∑jx~ja(fj,ϕj)}, (B.7) so that   ||g||A ≤||x||1due to (2.21) (B.8)   =||μ||TV. (B.9) By construction,   ⟨q,y⟩ =⟨q,g+z⟩ (B.10)   =⟨Fn∗q,μ⟩+⟨q,z⟩ (B.11)   =∫[0,1]Q(f)¯dμ(f)+λ∑l=1s|zl| (B.12)   =||μ||TV+λ||z||1. (B.13) Consider an arbitrary feasible pair $${\left\{ { {\boldsymbol{g'}}, {\boldsymbol{z'}}}\right\}\!}$$ different from $${\left\{ { {\boldsymbol{g}}, {\boldsymbol{z}}}\right\}\!}$$, such that $${\boldsymbol{z'}}$$ has non-zero support $${\it{\Omega}}'$$ and   g′=n∑fj∈T′x′ja(fj,0),||g′||A:=∑fj∈T′|x′j| (B.14) for a sequence of complex coefficients $${\boldsymbol{x'}}$$ and a set of frequency locations $$T' \subseteq {\left[{0,1}\right]\!}$$. Note that as long as $$k + s \leq n$$ (recall that $$k := {\left|{T}\right|\!}$$ and $$s:={\left|{{\it{\Omega}}}\right|\!}$$) then either $$T \neq T'$$ or $${\it{\Omega}} \neq {\it{\Omega}}'$$. The reason is that under that condition any set formed by $$k$$ atoms of the form $${\boldsymbol{a}}{\left( f_j,0 \right)\!}$$ and $$s$$ vectors with cardinality one is linearly independent (this is equivalent to the matrix $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ]$$ in Section C.1 being full rank), so that if both $$T = T'$$ and $${\it{\Omega}} = {\it{\Omega}}'$$ then $${\boldsymbol{g}} + {\boldsymbol{z}}= {\boldsymbol{g'}} + {\boldsymbol{z}}$$ would imply that $${\boldsymbol{g}}= {\boldsymbol{g'}}$$ and $${\boldsymbol{z}}= {\boldsymbol{z}}$$ (and we are assuming this is not the case). By conditions (3.3) and (3.4)   n⟨q,a(fj,0)⟩ =Q(fj) (B.15)   =xj|xj|,∀fj∈T, (B.16)  n⟨q,a(fj,0)⟩ =|Q(f)| (B.17)   <1,∀f∈Tc. (B.18) We have   ||g||A+λ||z||1 ≤⟨q,y⟩by (B.9) and (B.13) (B.19)   =⟨q,g′⟩+⟨q,z′⟩ (B.20)   =n∑fj∈T′x′j⟨q,a(f,0)⟩+⟨qΩ′,z′⟩ (B.21)   <n∑fj∈T′|x′j|+λ∑l∈Ω′|z′j| (B.22)   =||g′||A+λ||z′||1, (B.23) where (B.22) follows from conditions (3.5) and (3.6), (B.16), (B.18) and the fact that either $$T \neq T'$$ or $${\it{\Omega}} \neq {\it{\Omega}}'$$. We conclude that $${\left\{ { {\boldsymbol{g}}, {\boldsymbol{z}}}\right\}\!}$$ must be the unique solution to Problem (2.24). □ Appendix C. Proof of Proposition 3.1 For any vector $${\boldsymbol{u}}$$ and any atomic measure $$\nu$$, we denote by $${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ and $$\nu_{{\mathcal{{S}}}}$$ the restriction of $${\boldsymbol{u}}$$ and $$\nu$$ to the subset of their support indexed by a set $${\mathcal{{S}}}$$ ($${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ has the same dimension as $${\boldsymbol{u}}$$ and $$\nu_{{\mathcal{{S}}}}$$ is still a measure in the unit interval). Let us consider an arbitrary feasible pair $$\mu'$$ and $${\boldsymbol{z'}}$$, such that $$\mu'\neq \mu$$ or $${\boldsymbol{z'}}\neq {\boldsymbol{z}}$$. Due to the constraints of the optimization problem, $$\mu'$$ and $${\boldsymbol{z'}}$$ satisfy   y=Fnμ+z=Fnμ′+z′. (C.1) The following lemma establishes that $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ cannot both equal zero. Lemma C.1 (Proof in Section C.1) If $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ is feasible and $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ both equal zero, then $$\mu=\mu'$$ and $${\boldsymbol{z}}={\boldsymbol{z'}}$$. This lemma and the existence of $$Q$$ imply that the cost function evaluated at $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ is larger than at $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$:   ||μ′||TV+λ||z′||1 =||μT′||TV+||μTc′||TV+λ||zΩ′||1+λ||zΩc′||1 >||μT′||TV+⟨Q,μTc′⟩+λ||zΩ′||1+⟨q,zΩc′⟩by Lemma C.1, (3.4) and (3.4) (C.2)   ≥⟨Q,μ′⟩+⟨q,z′⟩by (3.3) and (3.3) (C.3)   =⟨Fn∗q,μ′⟩+⟨q,z′⟩ (C.4)   =⟨q,Fnμ′+z′⟩ (C.5)   =⟨q,Fnμ+z⟩by (C.1) (C.6)   =⟨Fn∗q,μ⟩+⟨q,z⟩ (C.7)   =⟨Q,μ⟩+⟨q,z⟩ (C.8)   =||μ||TV+λ||z||1by (3.3) and (3.3). (C.9) We conclude that $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$ must be the unique solution. C.1. Proof of Lemma C.1 If $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ both equal zero, then   Fnμ+z−FnμT′−zΩ′ =Fnμ′+z′−FnμT′−zΩ′by (C.1) (C.10)   =FnμTc′+zΩc′ (C.11)   =0. (C.12) We index the entries of $${\it{\Omega}} := {\left\{ {i_1,i_2, \ldots,i_s}\right\}\!}$$ and define the matrix $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ] \in {\mathbb{C}}^{n \times {\left( k+s \right)}}$$, where   (FT)lj =ei2πlfjfor 1≤l≤n,1≤j≤k, (C.13)  (IΩ)lj ={1if l=ij0otherwise for 1≤l≤n,1≤j≤s. (C.14) If $$k + s \leq n$$ then $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ]$$ is full rank (this follows from the fact that $$F_T$$ is a submatrix of a Vandermonde matrix). Equation (C.12) implies   [FTIΩ][x−x′PΩz−PΩz′]=0, (C.15) where $${\mathcal{{P}}}_{{\it{\Omega}}} {\boldsymbol{u}}' \in {\mathbb{C}}^s$$ is the subvector of $${\boldsymbol{u}}'$$ containing the entries indexed by $${\it{\Omega}}$$ and $${\boldsymbol{x}}' \in {\mathbb{C}}^T$$ is the vector containing the amplitudes of $$\mu'$$ (recall that by assumption $$\mu_{T^c}'=0$$). We conclude that $$\mu=\mu'$$ and $${\boldsymbol{z}}={\boldsymbol{z'}}$$. Appendix D. Proof of Lemma 3.4 The vector of coefficients $${\boldsymbol{c}}$$ equals the convolution of three rectangles of widths $$2 \,{\cdot}\, {0.247} \, m + 1$$, $$2 \,{\cdot}\, {0.339} \, m + 1$$ and $$2 \cdot {0.414} \, m + 1$$ and amplitudes $${\left( 2 \cdot {0.247} \, m + 1 \right)\!}^{-1}$$, $${\left( 2 \cdot {0.339} \, m + 1 \right)\!}^{-1}$$ and $${\left( 2 \cdot {0.414} \, m + 1 \right)\!}^{-1}$$. Some simple computations show that the amplitude of the convolution of three rectangles with unit amplitudes and widths $$a_1 < a_2<a_3$$ is bounded by $$a_1 a_2$$. An immediate consequence is that the amplitude of $${\boldsymbol{c}}$$ is bounded by   ||c||∞ ≤(2⋅0.247m+1)(2⋅0.339m+1)(2⋅0.247m+1)(2⋅0.339m+1)(2⋅0.414m+1) (D.1)   ≤1(2⋅0.414m+1) (D.2)   ≤1.3m. (D.3) Appendix E. Proof of Lemma 3.6 To bound the operator norm of $$B_{{\it{\Omega}}}$$, we control the behavior of   H:=BΩBΩ∗ (E.1)   =∑l∈Ωb(l)b(l)∗, (E.2) which concentrates around a scaled version of   H¯ :=∑l=−mmb(l)b(l)∗. (E.3) The following lemma bounds the operator norm of $$\bar{H}$$. Lemma E.1 (Proof in Section E.1) Under the assumptions of Theorem 2.2   ‖H¯‖ ≤260π2nlog⁡k. (E.4) By (2.12) $$s \leq C_s \, n {\left({ \log k \log \frac{n}{\epsilon}}\right)\!}^{-1}$$ which together with the lemma implies   ‖snH¯‖ ≤CB2n2(log⁡nϵ)−1 (E.5) if we set $$C_s$$ small enough. The following lemma uses the matrix Bernstein inequality to control the deviation of $$H$$ from a scaled version of $$\bar{H}$$. Lemma E.2 (Proof in Section E.2) Under the assumptions of Theorem 2.2   ‖H−snH¯‖ ≤CB2n2(log⁡nϵ)−1 (E.6) with probability at least $$1- \epsilon /5$$. We conclude that   ‖BΩ‖ ≤‖H‖ (E.7)   ≤sn‖H¯‖+‖H−snH¯‖ (E.8)   ≤CBn(log⁡nϵ)−12 (E.9) with probability at least $$1- \epsilon /5$$ by the triangle inequality. E.1 Proof of Lemma E.1 We express the matrix $$\bar{H}$$ in terms of the Dirichlet kernel $${\mathcal{{D}}}_m$$ of order $$m$$ defined in (3.25) and its derivatives,   H¯=n[H¯0H¯1−H¯1H2¯], (E.10) where   (H¯0)jl =Dm(fj−fl),(H¯1)jl=κDm(1)(fj−fl),(H¯2)jl=−κ2Dm(2)(fj−fl). (E.11) In order to bound the operator norm of $$\bar{H}$$ we first establish some bounds on $${\mathcal{{D}}}_m ^{{\left( \ell \right)\!}}$$ for $$\ell=0,1,2$$. Due to how the kernel is normalized in (3.25), the magnitude of $${\mathcal{{D}}}_m$$ is bounded by one. This yields a uniform bound on the magnitude of its derivatives by Bernstein’s polynomial inequality. Theorem E.3 (Bernstein’s polynomial inequality [56]) For any complex-valued polynomial $$P$$ of degree $$N$$  sup|z|≤1|P(1)(z)|≤Nsup|z|≤1|P(z)|. (E.12) Applying the theorem, we have   |Dm(ℓ)(f)|≤(2πm)ℓ. (E.13) The following lemma allows us to control the tail of the Dirichlet kernel and its derivatives. Lemma E.4 ([38, Section C.4]) If $$m \geq 10^3$$, for $$f \geq 80 /m$$  |Dm(ℓ)(f)| ≤1.12ℓ−2πℓmℓ−1f. (E.14) We now combine these two bounds to control the sum of the magnitudes of $${\mathcal{{D}}}_m^{{\left( \ell \right)\!}}$$ when evaluated at $$T$$ for $$\ell = 0,1,2$$. By the minimum-separation condition (2.10), if we fix $$f_i \in T$$ then there are at most 126 other frequencies in $$T$$ that are at a distance of $$80/m$$ or less from $$f_i$$. We bound those terms using (E.13) and deal with the rest by applying Lemma E.4,   supfi∑j=1kκℓ|Dm(ℓ)(fi−fj)| ≤126πℓκℓsupf|Dm(ℓ)(f)|+2κℓ∑j=1ksup|f|≥jΔmin|Dm(ℓ)(f)| (E.15)   ≤126πℓ+1m(l)∑j=1k1.1πℓmℓ−14jΔminby Lemma 3.3 and (E.13) (E.16)   ≤130πℓlog⁡ksince Δmin:=1.26m and ∑j=1k1j≤1+log⁡k≤2log⁡k (E.17) as long as $$k$$ is larger than 2 (the argument can be easily modified if this is not the case). By Gershgorin’s circle theorem, the eigenvalues of $$\bar{H}$$, and consequently its operator norm, are bounded by   nmaxi{ ∑j=1k|Dm(fi−fj)|+∑j=1kκ|Dm(1)(fi−fj)|, (E.18)   ∑j=1kκ|Dm(1)(fi−fj)|+∑j=1kκ2|Dm(2)(fi−fj)|}≤260π2nlog⁡k. (E.19) E.2 Proof of Lemma E.2 Under the assumptions of Theorem 2.2   H =∑l=−mmδΩ(l)b(l)b(l)∗, (E.20) where $$\delta_{{\it{\Omega}}}{\left( -m \right)\!}$$, $$\delta_{{\it{\Omega}}}{\left( -m+1 \right)\!}$$,..., $$\delta_{{\it{\Omega}}}{\left( m \right)\!}$$ are i.i.d. Bernouilli random variables with parameter $$\frac{s}{n}$$. We control this sum of independent random matrices using the matrix Bernstein inequality. Theorem E.5 (Matrix Bernstein inequality [71, Theorem 1.4]) Let $${\left\{ {X_l}\right\}\!}$$ be a finite sequence of independent zero-mean self-adjoint random matrices of dimension $$d$$ such that $${\left\lVert{X_l}\right\rVert} \leq B$$ almost surely for a certain constant $$B$$. For all $$t \geq 0$$ and a positive constant $$\sigma^2$$  P{‖∑l=−mmXl‖≥t}≤dexp⁡(−t2/2σ2+Bt/3)as long as‖∑l=−mmE(Xl2)‖≤σ2. (E.21) We apply the matrix Bernstein inequality to the finite sequence of independent adjoint zero-mean random matrices of the form   Xl:=(δΩ(l)−sn)b(l)b(l)∗,−m≤l≤m. (E.22) These random matrices satisfy   H−snH¯ =∑l=−mmXl. (E.23) By Lemma 3.5   ‖Xl‖ ≤sup−m≤l≤m||b(l)||22 (E.24)   ≤B:=10k. (E.25) In addition,   σ2 :=‖∑l=−mmE(Xl2)‖ (E.26)   =‖∑l=−mmE((δ¯(l)−sn)2)||b(l)||22b(l)b(l)∗‖ (E.27)   ≤10ksn‖H¯‖ (E.28)   ≤10CB2nk(log⁡nϵ)−1 (E.29) by Lemma (3.5), (E.5) and the fact that the variance of a Bernouilli random variable of parameter $$p$$ equals $$p {\left( 1-p \right)\!}$$. Setting $$t := \frac{C_B^2 \, n}{2} {\left({ \log \frac{n}{\epsilon}}\right)\!}^{-1}$$ in Theorem E.5, so that $$\sigma^2 = 20 \, k \, t$$, yields   P{‖H−snH¯‖≥t}≤2kexp⁡(−t2/2σ2+Bt/3) (E.30)   =2kexp⁡(−3t140k). (E.31) The probability is smaller or equal to $$\epsilon/5$$ as long as   k≤3CB2n280(log⁡10kϵlog⁡nϵ)−1, (E.32) which holds by (2.11) if we set $$C_k$$ small enough. Appendix F. Proof of Lemma 3.7 The proof uses the following concentration bound that controls the deviation of a sum of independent vectors. Theorem F.1 (Vector Bernstein inequality [15, Theorem 2.6], [40, Theorem 12]) Let $${\mathcal{{U}}} \subset \mathbb{R}^d$$ be a finite sequence of independent zero-mean random vectors with $${\left|\left|{ {\boldsymbol{u}} }\right|\right| _{2}\!} \leq B$$ almost surely and $$\sum_{ {\boldsymbol{u}} \in {\mathcal{{U}}}} \mathbb{E} {\left|\left|{ {\boldsymbol{u}} }\right|\right| _{2}\!}^2 \leq \sigma^2$$ for all $${\boldsymbol{u}} \in {\mathcal{{U}}}$$, where $$B$$ and $$\sigma^2$$ are positive constants. For all $$t \geq 0$$  P(||∑u∈Uu||2≥t)≤exp⁡(−t28σ2+14) for 0≤t≤σ2B. (F.1) By the definitions of $$\bar{K}$$, $$K$$ and $${\boldsymbol{b}}$$ in (3.27), (3.38) and (3.45),   v¯ℓ(f) =∑l=−mm(i2πκl)ℓclei2πlfb(l), (F.2)  vℓ(f) =∑l=−mmδΩc(l)(i2πκl)ℓclei2πlfb(l), (F.3) where by assumption $$\delta_{{\it{\Omega}}^c} {\left( -m \right)\!}, \ldots, \delta_{{\it{\Omega}}^c} {\left( m \right)\!}$$ are i.i.d. Bernoulli random variables with parameter $$p := \frac{n-s}{n}$$. This implies that the finite collection of zero-mean random vectors of the form   u(ℓ,l) :=(δΩc(l)−p)(i2πκl)ℓclei2πlfb(l), (F.4) satisfy   vℓ(f)−pv¯ℓ(f) =∑l=−mmu(l). (F.5) We have   ||u(ℓ,l)||2 ≤π3||c||∞sup−m≤l≤m||b(l)||2by Lemma (3.3) and ℓ≤3 (F.6)   ≤B:=128kmby Lemmas 3.4 and 3.5,  (F.7) as well as   ∑l=−mmE||u(ℓ,l)||22 =∑l=−mmE((δΩc(l)−p)2)(2πκl)2ℓ|cl|2||b(l)||22 (F.8)   ≤π6nE((δΩc(1)−p)2)||c||∞2sup−m≤l≤m||b(l)||22by Lemma (3.3) (F.9)   ≤σ2:=3.25104km, (F.10) where the last inequality follow from Lemmas 3.4 and 3.5 and $$\mathbb{E} \left({\left({ p - \delta_{{\it{\Omega}}^c}{\left( l \right)\!}}\right)^2}\right) = p{\left( 1-p \right)\!}$$. By the vector Bernstein inequality for $$0 \leq t \leq \sigma^2/B$$ and the union bound, we have   P(supf∈G‖vℓ(f)−pv¯ℓ(f)‖2≥t,ℓ∈{0,1,2,3})≤4|G|exp⁡(−t28σ2+14). (F.11) To make the right-hand side smaller than $$\epsilon /5$$, we fix $$t$$ to equal   t :=σ8(14+log⁡20|G|ϵ). (F.12) This choice of $$t$$ is valid because   tσ =8(14+log⁡20|G|ϵ) (F.13)   ≤74+16log⁡n+8log⁡1ϵ (F.14)   ≤0.315n+8log⁡1ϵ (F.15)  ≤0.32n. (F.16) Inequality (F.15) follows from the fact that $$\sqrt{74 + 16 \log n} \leq 0.315 \sqrt{n}$$ for $$n \geq 2\, 10^3$$. Inequality (F.16) holds by (2.11) and (2.12) as long as we set $$C_k$$ and $$C_s$$ small enough, and either $$k \geq 1$$ or $$s \geq 1$$. This establishes that $$t/\sigma$$ is smaller than $$0.32 \sqrt{n} \leq \sigma/B$$. We conclude that the desired bound holds as long as   Cv(log⁡nϵ)−12 ≥t≥2103kn(14+log⁡8103n2ϵ), (F.17) which is the case by (2.11) if we set $$C_k$$ small enough. Appendix G. Proof of Lemma 3.8 The proof is based on the proof of Lemma 4.4 in [66]. The following lemma establishes that $$\bar{D}$$ is invertible and close to the identity. Lemma G.1 (Proof in Section G.1) Under the assumptions of Theorem 2.2   ‖I−D¯‖ ≤0.468, (G.1)  ‖D¯‖ ≤1.468, (G.2)  ‖D¯−1‖ ≤1.88. (G.3) By the definition of $$\bar{K}$$ and $$K$$ in (3.27) and (3.38), respectively, we can write $$D$$ and $$\bar{D}$$ as sums of self-adjoint matrices,   D¯ =∑l=−mmclb(l)b(l)∗, (G.4)  D =∑l=−mmδΩc(l)clb(l)b(l)∗, (G.5) where by assumption $$\delta_{\Omega^c} {\left( -m \right)\!}$$,..., $$\delta_{\Omega^c} {\left( m \right)\!}$$ are i.i.d. Bernoulli random variables with parameter $$p := \frac{n-s}{n}$$. In the following lemma, we leverage the matrix Bernstein inequality to establish that $$D$$ concentrates around $$p \, \bar{D}$$. Lemma G.2 (Proof in Section G.2) Under the assumptions of Theorem 2.2   ‖D−pD¯‖≥p4min{1,CD4(log⁡nϵ)−12}, (G.6) with probability at most $$\epsilon /5$$. Applying the triangle inequality together with Lemma G.1 allows to lower bound the smallest singular value of $$D$$ under the assumption that (G.6) holds   σmin(D)p ≥σmin(I)−‖I−D¯‖−1p‖D−pD¯‖ (G.7)   ≥0.282. (G.8) This proves that $$D$$ is invertible. To complete the proof we borrow two inequalities from [66]. Lemma G.3 ([66, Appendix E]) For any matrices $$A$$ and $$B$$ such that $$B$$ is invertible and   ‖A−B‖‖B−1‖≤12 (G.9) we have   ‖A−1‖ ≤2‖B−1‖, (G.10)  ‖A−1−B−1‖ ≤2‖B−1‖2‖A−B‖. (G.11) We set $$A:= D$$ and $$B:=p\bar{D}$$. By Lemmas G.1 and Lemma G.2,   ‖D−pD¯‖‖(pD¯)−1‖ ≤12, (G.12) with probability at least $$1-\epsilon/5$$. Lemmas G.1, G.2 and G.3 then imply   ‖D−1‖ ≤2‖(pD¯)−1‖ (G.13)   ≤4p, (G.14)  ‖D−1−(pD¯)−1‖ ≤2‖(pD¯)−1‖2‖D−pD¯‖ (G.15)   ≤CD2p(log⁡nϵ)−12, (G.16) with the same probability. Finally, if $$s \leq n/2$$, which is the case by (2.12), we have $$1/p \leq 2$$ and the proof is complete. G.1 Proof of Lemma G.1 The following bounds on the submatrices of $$\bar{D}$$ are obtained by combining Lemma 3.3 with some results borrowed from [38]. Lemma G.4 ([38, Section 4.2]) Under the assumptions of Theorem 2.2   ||I−D¯0||∞ ≤1.85510−2, (G.17)  ||D¯1||∞ ≤5.14810−2, (G.18)  ||I−D¯2||∞ ≤0.416. (G.19) Following a similar argument as in Appendix C of [66] yields the desired result:   ‖I−D¯‖ ≤||I−D¯||∞ (G.20)   ≤max{||I−D¯0||∞+||D¯1||∞,||I−D¯2||∞+||D¯1||∞} (G.21)   ≤0.468, (G.22)  ‖D¯‖ ≤1+‖I−D¯‖≤1.468, (G.23)  ‖D¯−1‖ ≤11−||I−D¯||∞≤1.88. (G.24) G.2 Proof of Lemma G.2 We define   Xl:=(p−δΩc(l))clb(l)b(l)T, (G.25) which has zero mean since   E(Xl) =(p−E(δΩc(l)))clb(l)b(l)T (G.26)   =0. (G.27) By the proofs of Lemmas 3.4 and (3.5), for any $$-m \leq l \leq m$$,   ‖Xl‖ ≤max−m≤l≤m‖clb(l)b(l)T‖ (G.28)   ≤||c||∞max−m≤l≤m||b(l)||22 (G.29)   ≤B:=12.6km. (G.30) Also, $${\mathbb{E}} {\left({{\left({ p - \delta_{\Omega^c}{\left( {l}\right)}}\right)\!}^2}\right)\!} = p{\left( 1-p \right)\!}$$, which implies   E(Xl2)=p(1−p)cl2||b(l)||22b(l)b(l)T. (G.31) Since $${\boldsymbol{c}}_l \geq 0$$ for all $$l$$ ($${\boldsymbol{c}}$$ is the convolution of three positive rectangular pulses),   ∑l=−mmcl2||b(l)||22b(l)b(l)T ⪯||c||∞max−m≤l≤m||b(l)||22∑l=−mmclb(l)b(l)T (G.32)   ⪯12.6kmD¯by Lemma B3.4 and (3.5), (G.33) so that   ∑l=−mmE(Xl2) ≤p‖∑l=−mmcl2||b(l)||22b(l)b(l)T‖ (G.34)   ≤12.6pk‖D¯‖m (G.35)   ≤σ2:=18.5pkmby Lemma G.1. (G.36) Setting $$t = \frac{p}{4}C_{\min} {\left({\log \frac{n}{\epsilon} }\right)\!}^{-\frac{1}{2} }$$ where $$C_{\min}:=\min {\left\{ {1,C_{D}/4}\right\}\!}$$, the matrix Bernstein inequality from Theorem E.5 implies that   Pr{‖D−1−pD¯−1‖>t} ≤2kexp⁡(−Cmin2pm32k(18.5log⁡nϵ+1.05Cminlog⁡nϵ)−1) ≤2kexp⁡(−CD′(n−s)klog⁡nϵ) (G.37) for a small enough constant $$C_{D}'$$. This probability is smaller than $$\epsilon/5$$ as long as   k ≤CD′n2(log⁡10kϵlog⁡nϵ−1), (G.38)  s ≤n2, (G.39) which holds by (2.11) and (2.12) if we set $$C_k$$ and $$C_s$$ small enough. Appendix H. Proof of Proposition 3.10 We begin by expressing $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{Q}^{{\left( \ell \right)\!}}$$ in terms of $$\boldsymbol{h}$$ and $$\boldsymbol{r}$$,   κℓQ¯(ℓ)(f) :=κℓ∑j=1kα¯jK¯(ℓ)(f−fj)+κℓ+1∑j=1kβ¯jK¯(ℓ+1)(f−fj) (H.1)   =v¯ℓ(f)TD¯−1[h0], (H.2)  κℓQ(ℓ)(f) :=κℓ∑j=1kαjK(ℓ)(f−fj)+κℓ+1∑j=1kβjK(ℓ+1)(f−fj)+κℓR(ℓ)(f) (H.3)   =vℓ(f)TD−1([h0]−1nBΩr)+κℓR(ℓ)(f). (H.4) The difference between $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{ Q }^{{\left( \ell \right)\!}}$$ can be decomposed into several terms,   κℓQ(ℓ)(f) =κℓQ¯(ℓ)(f)+κℓR(ℓ)(f)+I1(ℓ)(f)+I2(ℓ)(f)+I3(ℓ)(f), (H.5)  I1(ℓ)(f) :=−1nvℓ(f)TD−1BΩr, (H.6)  I2(ℓ)(f) :=(vℓ(f)−n−snv¯ℓ(f))TD−1[h0], (H.7)  I3(ℓ)(f) :=n−snv¯ℓ(f)T(D−1−nn−sD¯−1)[h0]. (H.8) The following lemma provides bounds on these terms that hold with high probability in every point of a grid $${\mathcal{{G}}}$$ that discretizes the unit interval. Lemma H.1 (Proof in Section H.1) Conditioned on $${\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$, the events   ER :={supf∈G|κℓR(ℓ)(f)|≥10−28,ℓ=0,1,2,3} (H.9) and   Ei :={supf∈G|Ii(ℓ)(f)|≥10−28,ℓ=0,1,2,3}i=1,2,3, (H.10) where $${\mathcal{{G}}} \subseteq \left[ 0,1 \right]\!$$ is an equispaced grid with cardinality $${\left|{ {\mathcal{{G}}} }\right|\!} = 400 n^2$$ occur each with probability at most $$\epsilon / 20$$ under the assumptions of Theorem 2.2. By the triangle inequality, Lemma H.1 implies   supf∈G|κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)|≤10−22 (H.11) with probability at least $$1-\epsilon /5$$ conditioned on $${\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$. We have controlled the deviation between $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{ Q }^{{\left( \ell \right)\!}}$$ on a fine grid. The following result extends the bound to the whole unit interval. Lemma H.2 (Proof in Section H.3) Under the assumptions of Theorem 2.2   |κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)|≤10−2 for ℓ∈{0,1,2}. (H.12) This bound suffices to establish the desired result for values of $$f$$ that lie away from $$T$$. Let us define   Snear :={f||f−fj|≤0.09for some fj∈T}, (H.13)  Sfar :=[0,1]/Snear. (H.14) Section 4 of [38] provides a bound on $$\bar{Q}$$ which holds over all of $${\mathcal{{S}}}_{\mathrm{far}}$$ under the minimum-separation condition (2.10) (see Fig. 12 in [38] as well as the code that supplements [38]). Proposition H.3 (Bound on $$\bar Q$$ [38, Section 4]) Under the assumptions of Theorem 2.2   |Q¯(f)| <0.99f∈Sfar. (H.15) Combining Lemma H.2 and Proposition H.3   |Q(f)| ≤|Q¯(f)|+10−2 (H.16)   <1for all f∈Sfar. (H.17) To bound $$Q$$ in $${\mathcal{{S}}}_{\mathrm{near}}$$ we recall that by Corollary 3.9 in $${\mathcal{{E}}}_{D}^c$$$${\left|{Q{\left( f_j \right)\!}}\right|\!}^2=1$$ and   d|Q(fj)|2df =2QR(1)(fj)QR(fj)+2QI(1)(fj)QI(fj) (H.18)   =0 (H.19) for every $$f_j$$ in $$T$$. Let $$\tilde{f}$$ be the element in $$T$$ that is closest to an arbitrary $$f$$ belonging to $${\mathcal{{S}}}_{\mathrm{near}}$$. The second-order bound   |Q(f)|2 ≤1+(f−f~)2supf∈Sneard2|Q(f)|2df2 (H.20) implies that we only need to show that $${\left|{Q}\right|\!}^2$$ is concave in $${\mathcal{{S}}}_{\mathrm{near}}$$ to complete the proof. First, we bound the derivatives of $$\bar{Q}$$ and $$Q$$ using Bernstein’s polynomial inequality. Lemma H.4 Under the assumptions of Theorem 2.2, for any $$\ell =0,1,2, \ldots$$  supf∈[0,1]|κℓQ¯(ℓ)(f)|≤1, (H.21)  supf∈[0,1]|κℓQ(ℓ)(f)|≤1.01. (H.22) Proof. $$\bar{Q}$$ is a trigonometric polynomial of degree $$m$$ and its magnitude is bounded by one (see Proposition 2.3 in [38]). Combining Theorem E.3 and Lemma 3.3 yields (H.21). The triangle inequality, Lemma H.2 and (H.21) imply (H.22). □ Section 4 of [38] also provides a bound on the second derivative of $${\left|{\bar{Q}}\right|\!}^2$$, which holds over all of $${\mathcal{{S}}}_{\mathrm{near}}$$ under the minimum-separation condition (2.10) (again, see Fig. 12 in [38] as well as the code that supplements [38]). Proposition H.5 (Bound on the second derivative of $${\left|{\bar{Q}}\right|\!}$$ [38, Section 4]) Under the assumptions of Theorem 2.2   d2|Q¯(f)|2df2 ≤−0.8m2f∈Snear. (H.23) Combining Proposition H.5, Lemma H.4 and the triangle inequality, as well as the lower bound on $$\kappa$$ from Lemma 3.3, allows us to conclude that the second derivative of $${\left|{\bar{Q}}\right|\!}^2$$ is negative in $${\mathcal{{S}}}_{\mathrm{near}}$$. Indeed, for any $$f \in {\mathcal{{S}}}_{\mathrm{near}}$$  κ22d2|Q(f)|2df2 =κ2QR(2)(f)QR(f)+κ2QI(2)(f)QI(f)+|κQ(1)(f)|2 ≤κ22d2|Q¯(f)|2df2+2|κ2Q(2)(f)−κ2Q¯(2)(f)|supf′|Q(f′)| +2|Q(f)−Q¯(f)|supf′|κ2Q¯(2)(f′)| (H.24)  +2 |κQ(1)(f)−κQ¯(1)(f) | (supf′⁡|κQ(1)(f′)|+supf′⁡|κQ¯(1)(f′)| ) (H.25)   ≤−0.087+2⋅10−2(4+2⋅10−2) (H.26)   <0. (H.27) H.1 Proof of Lemma H.1 Following an argument used in [66] (see also [16]), we use Hoeffding’s inequality to bound the different terms. Theorem H.6 (Hoeffding’s inequality) Let the components of $${\boldsymbol{{\tilde{u}}}}$$ be sampled i.i.d. from a symmetric distribution on the complex unit circle. For any $$t >0$$ and any vector $${\boldsymbol{u}}$$  Pr(|⟨u~,u⟩|≥ϵ~) ≤4exp⁡(−ϵ~24||u||22). (H.28) Corollary H.7 Let the components of $${\boldsymbol{{\tilde{u}}}}$$ be sampled i.i.d. from a symmetric distribution on the complex unit circle. For any finite collection of vectors $${\mathcal{{U}}}$$ with cardinality $$4 {\left|{G}\right|\!} = 1600 n^2$$, the event   E:={|⟨u~,u⟩|>10−28for all u∈U} (H.29) has probability at most $$\epsilon / 20$$ as long as   ||u||22≤CU2(log⁡nϵ)−1for all u∈U, (H.30) where $$C_{{\mathcal{{U}}}} : =1/5000$$. Proof. The result follows directly from the proposition and the union bound. □ Bound on $$\Pr \left({{\mathcal{{E}}}_{R} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We consider the family of vectors   u(ℓ,f):=κℓn[(i2πl1)ℓei2πl1f(i2πl2)ℓei2πl2f⋯(i2πls)ℓei2πlsf]T, (H.31) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. We have   ||u(ℓ,f)||22 ≤κ2ℓ(2πm)2ℓsn (H.32)   ≤π6snby Lemma 3.3 (H.33)   ≤CU2(log⁡nϵ)−1by (2.12) if we set Cs small enough. (H.34) The desired result follows by Corollary H.7 because   κℓR(ℓ)(f)= ⟨r, u(ℓ,f)⟩ . (H.35) Bound on $$\Pr \left({{\mathcal{{E}}}_{1} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I1(ℓ)(f) =⟨u(ℓ,f),r⟩,u(ℓ,f):=−1nBΩ∗D−1vℓ(f), (H.36) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. To bound $${\left|\left|{ {\boldsymbol{u}}{\left( \ell, f \right)\!} }\right|\right| _{2}\!}$$, we leverage a bound on the $$\ell_2$$ norm of $${\boldsymbol{{ v_{\ell}}}}$$ which follows from Lemma 3.7 and the following bound on the $$\ell_2$$ norm of $${\boldsymbol{{\bar{v}_{\ell}}}}$$. Lemma H.8 (Proof in Section H.2) Under the assumptions of Theorem 2.2, there is a fixed numerical constant $$C_{{\boldsymbol{{\bar{v}}}}}$$ such that for any $$f$$  ||v¯ℓ(f)||2 ≤Cv¯. (H.37) Corollary H.9 In $${\mathcal{{E}}}_{v}^c$$ for any $$f \in {\mathcal{{G}}}$$  ||vℓ(f)||2 ≤Cv¯+Cv. (H.38) Proof. The result follows from the lemma, the triangle inequality and Lemma 3.7. □ Combining Lemma 3.8 and Corollary H.9 yields   ||u(ℓ,f)||2 ≤1n‖BΩ‖‖D−1‖||vℓ(f)||2 (H.39)   ≤8(Cv¯+Cv)‖BΩ‖n (H.40) in $${\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$. Corollary H.7 implies the desired result if   ‖BΩ‖ ≤CB(log⁡nϵ)−12n,CB:=CU8(Cv¯+Cv), (H.41) which is the case in $${\mathcal{{E}}}_{B}^{c}$$ by Lemma 3.6. Bound on $$\Pr \left({{\mathcal{{E}}}_{2} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I2(ℓ)(f) =⟨u,(ℓ,f)⟩h,u(ℓ,f):=PD−1(vℓ(f)−n−snv¯ℓ(f)), (H.42) where $$P \in \mathbb{R}^{k \times 2k}$$ is the projection matrix that selects the first $$k$$ entries in a vector, $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. Since $${\left\lVert{{P}}\right\rVert}=1$$, by Lemma 3.8 in $${\mathcal{{E}}}_{D}^{c}$$  ||u(ℓ,f)||2 ≤‖P‖‖D−1‖||vℓ(f)−n−snv¯ℓ(f)||2 (H.43)   ≤8||vℓ(f)−n−snv¯ℓ(f)||2. (H.44) The desired result holds if   ||vℓ(f)−n−snv¯ℓ(f)||2 ≤Cv(log⁡nϵ)−12,Cv:=CU8, (H.45) which is the case in $${\mathcal{{E}}}_{v}^{c}$$ by Lemma 3.7. Bound on $$\Pr \left({{\mathcal{{E}}}_{3} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I3(ℓ)(f) =⟨u(ℓ,f),h⟩,u(ℓ,f):=n−snP(D−1−nn−sD¯−1)v¯ℓ(f), (H.46) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. Since $${\left\lVert{{P}}\right\rVert}=1$$, by Lemma 3.7   ||u(ℓ,f)||2 ≤‖P‖‖D−1−nn−sD¯−1‖||v¯ℓ(f)||2 (H.47)   ≤Cv¯‖D−1−nn−sD¯−1‖. (H.48) The desired result holds if   ‖D−1−nn−sD¯−1‖ ≤CD(log⁡nϵ)−12,CD:=CUCv¯, (H.49) for a fixed numerical constant $$C_{D}$$, which is the case in $${\mathcal{{E}}}_{D}^{c}$$ by Lemma 3.8. H.2 Proof of Lemma H.8 We use the $$\ell_1$$ norm to bound the $$\ell_2$$ norm of $${\boldsymbol{{\bar{v}_{\ell}}}}{\left( f \right)\!}$$:   ||v¯ℓ(f)||2 ≤||v¯ℓ(f)||1 (H.50)   =∑j=1kκℓ|K¯(ℓ)(f−fj)|+∑j=1kκℓ+1|K¯(ℓ+1)(f−fj)|. (H.51) To bound the sum on the right we leverage some results from [38]. Lemma H.10   κℓ|K¯(ℓ)(f)|≤{C1∀f∈[−12,12],C2m−3|f|−3 if 80m≤|f|≤12,  (H.52) for suitably chosen numerical constant $$C_1$$ and $$C_2$$. Proof. The constant bound on the kernel follows from Corollary 4.5, Lemma 4.6 and Lemma C.2 in [38] (see also Figures 14 and 15 in the same paper). The bound for large $$f$$ follows from Lemma C.2 in [38]. □ By the minimum-separation condition (2.10), there are at most 127 elements of $$T$$ that are at a distance of $$80/m$$ or less from $$f$$. We use the first bound in (H.52) to control the contribution of those elements and the second bound to deal with the remaining terms,   ∑j=1kκℓ|K¯(ℓ)(f−fj)| ≤∑j:|f−fj|<80mC1+∑j:80m≤|f−fj|≤12C2m3|f−fj|3 (H.53)   ≤127C1+2C2∑j=1∞1m3(jΔmin)3 (H.54)   ≤127C1+2C2∑j=1∞1j3 (H.55)   =127C1+2C2ζ(3), (H.56) where $$\zeta {\left( 3 \right)\!}$$ is Apéry’s constant, which is bounded by 1.21. This completes the proof. H.3. Proof of Lemma H.2 The proof follows a similar argument to the proof of Proposition 4.12 in [66]. We begin by bounding the deviations of $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{Q}^{{\left( \ell \right)\!}}$$ on neighboring points. Lemma H.11 (Proof in Section H.3.1) Under the assumptions of Theorem 2.2, for any $$f_1$$, $$f_2$$ in the unit interval   |κℓQ(ℓ)(f2)−κℓQ(ℓ)(f1)| ≤n2|f2−f1|, (H.57)  |κℓQ¯(ℓ)(f2)−κℓQ¯(ℓ)(f1)| ≤n2|f2−f1|. (H.58) For any $$f$$ in the unit interval, there exists a grid point $$f_{{\mathcal{{G}}}}$$ such that the distance between the two points is smaller than the step size $${\left( 400 \, n^2 \right)\!}^{-1}$$. This allows to establish the desired result by combining (H.11) with Lemma H.11 and the triangle inequality,   |κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)| ≤|κℓQ(ℓ)(f)−κℓQ(ℓ)(fG)|+|κℓQ(ℓ)(fG)−κℓQ¯(ℓ)(fG)| (H.59)   +|κℓQ¯(ℓ)(fG)−κℓQ¯(ℓ)(f)| (H.60)   ≤2n2|f−fG|+510−3 (H.61)   ≤10−2. (H.62) H.3.1. Proof of Lemma H.11 We first derive a coarse uniform bound on $$Q^{{\left( \ell \right)\!}}$$ for $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$. For this, we need bounds on the $$\ell_2$$ norm of $${\boldsymbol{{v_{\ell}}}} {\left( f \right)\!}$$ and the magnitude of $$R^{{\left( \ell \right)\!}} {\left( f \right)\!}$$ that hold over the whole unit interval, not only on a discrete grid. By the definitions of $$K$$ and $${\boldsymbol{b}}{\left( j \right)\!}$$ in (3.38) and (3.45), for any $$f$$  ||vℓ(f)||2 =||∑l∈Ωc(i2πκl)ℓclei2πlfb(l)||2 (H.63)   ≤πℓn||c||∞sup−m≤l≤m||b(l)||2by Lemma 3.3 (H.64)   ≤1.3π3n10kmby Lemmas 3.4 and (3.5) (H.65)   ≤256k. (H.66) Similarly, for any $$f$$  |κℓR(ℓ)(f)| =|λκℓ∑l∈Ω(−i2πl)ℓrle−i2πlf| (H.67)   ≤κℓ(2π)ℓn∑l∈Ωlℓ (H.68)   ≤κℓ(2π)ℓsmℓn (H.69)   ≤4π3snby Lemma 3.3. (H.70) We also derive a coarse bound on the operator norm $$B_{{\it \Omega}}$$  ‖BΩ‖ ≤‖H¯‖ (H.71)   ≤260π2nlog⁡kby Lemma E.1,  (H.72) which holds because $$B_{{\it \Omega}}$$ is a submatrix of a matrix $$\bar{B}$$ such that $$\bar{H}=\bar{B}\bar{B}^{\ast}$$. These bounds together with (H.4), the Cauchy–Schwarz inequality and the triangle inequality imply that in $${\mathcal{{E}}}_{D}^c$$  |κℓQ(ℓ)(f)| ≤||vℓ(f)||2‖D−1‖(||h||2+1n‖BΩ‖||r||2)+|κℓR(ℓ)(f)| (H.73)   ≤5105(k+kslog⁡k) (H.74)   ≤n7by (2.11)and (2.12) if we set Ck and Cs small enough. (H.75) Finally, if we interpret $$Q^{\left({\ell}\right)\!}\left({z}\right)\!$$ as a function of $$z \in \mathbb{C}$$, a generalization of the mean-value theorem yields   |κℓQ(ℓ)(f2)−κℓQ(ℓ)(f1)|≤κℓ|ei2πf2−ei2πf1|supz′⁡ |dQ(ℓ)(z′)dz| (H.76)   ≤2π|f2−f1|κsupf|κℓ+1Q(ℓ+1)(f)| (H.77)   ≤n2|f2−f1|by (H.75) for ℓ∈{0,1,2}. (H.78) The bound on the deviation of $$\bar{Q}^{\ell}$$ is obtained using exactly the same argument together with the bound (H.21). In the case of $$\bar{Q}$$ the bound is extremely coarse, but it suffices for our purpose. Appendix I. Proof of Proposition 3.11 Let $$l$$ be an arbitrary element of $${\it{\Omega}}^c$$. We express the corresponding coefficient $${\boldsymbol{q}}_{l}$$ in terms of the sign patterns $$\boldsymbol{h}$$ and $$\boldsymbol{r}$$,   ql =cl(∑j=1kαjei2πlfj+i2πlκ∑j=1kβjei2πlfj) (I.1)   =clb(l)∗[αβ] (I.2)   =clb(l)∗D−1([h0]−1nBΩr) (I.3)   =cl(⟨PD−1b(l),h⟩+1n⟨BΩ∗D−1b(l),r⟩), (I.4) where $$P \in \mathbb{R}^{k \times 2k}$$ is the projection matrix that selects the first $$k$$ entries in a vector. The bounds   ||PD−1b(l)||22 ≤‖P‖2‖D−1‖2||b(l)||22 (I.5)   ≤640kin EDc by Lemmas 3.5 and 3.8 (I.6)   ≤0.182nlog⁡40ϵby (2.11) if we set Ck small enough, (I.7) and   ||BΩ∗D−1b(l)||22 ≤‖BΩ‖2‖D−1‖2||b(l)||22 (I.8)   ≤640CB2knin EBc∩EDc by Lemmas 3.6 and 3.8 (I.9)   ≤0.182n2log⁡40ϵby (2.11) if we set Ck small enough, (I.10) imply by Hoeffding’s inequality (Theorem H.6) that the probability of each of the events   |⟨PD−1b(l),h⟩| >0.18n, (I.11)  |⟨BΩ∗D−1b(l),r⟩| >0.18n (I.12) is bounded by $$\epsilon / 10$$. By Lemma 3.4 and the union bound, this implies   |ql| ≤||c||∞(|⟨D−1b(l),[h0]⟩|+|⟨BΩ∗D−1b(l),r⟩|n) (I.13)   ≤2.6n(0.18n+0.18n) (I.14)   <1n (I.15) with probability at least $$1-\epsilon/5$$. Appendix J. Algorithms J.1 Proof of Lemma 4.3 The problem is equivalent to   minμ~,z~,u||μ~||TV+λ||z~||1 subject to ||y−u||22≤σ2 (J.1)  Fnμ~+z~=u, (J.2) where we have introduced an auxiliary primal variable $${\boldsymbol{u}}\in \mathbb{C}^{n}$$. Let us define the dual variables $${\boldsymbol{\eta}} \in \mathbb{C}^{n}$$ and $$\nu \geq 0$$. The Lagrangian is equal to   L(μ~,z~,η) =||μ~||TV+λ||z~||1+⟨u−Fnμ~−z~,η⟩+ν(||y−u||22−σ2) (J.3)   =||μ~||TV−⟨μ~,Fn∗η⟩+λ||z~||1−⟨z~,η⟩+⟨u,η⟩+ν(||y−u||22−σ2), (J.4) where $$\eta \in \mathbb{C}^{n}$$ is the dual variable. To compute the Lagrange dual function, we minimize the value of the Lagrangian over the primal variables [9]. The minimum of   ||μ~||TV−⟨μ~,Fn∗η⟩ (J.5) over $$\tilde{ \mu }$$ is $$-\infty$$ unless (4.9) holds. Moreover, if (4.9) holds then the minimum is at $$\tilde{\mu}=0$$ by Hölder’s inequality. Similarly, minimizing   λ||z~||1−⟨z~,η⟩ (J.6) over $${\boldsymbol{z}}$$ yields $$-\infty$$ unless (4.10) holds, whereas if (4.10) holds the minimum is attained at $${\boldsymbol{{\tilde{z}}}}=0$$. All that remains is to minimize   ⟨u,η⟩+ν(||y−u||22−σ2) (J.7) with respect to $${\boldsymbol{u}}$$ (note that (4.9) and (4.10) do not involve $${\boldsymbol{u}}$$). The function is convex with respect to $${\boldsymbol{u}}$$, so we set the gradient to zero to deduce that the minimum is at $${\boldsymbol{u}} = {\boldsymbol{y}} - \frac{1}{2\nu} \eta$$. Plugging in this value yields the Lagrange dual function   ⟨y,η⟩−14ν||η||22−νσ2. (J.8) The dual problem consists of maximizing the Lagrange dual function subject to $$\nu \geq 0$$, (4.9) and (4.10). For any fixed value of $$\tilde{\eta}$$, maximizing over $$\nu$$ is easy, the expression is convex in the half plane $$\nu \geq 0$$ and the derivative is zero at $${\left|\left|{ \eta }\right|\right| _{2}\!} / 2\sigma$$. Plugging this into (J.8) yields the dual problem (4.8). The reformulation of (4.8) as a semi-definite program is an immediate consequence of the following proposition. Proposition J.1 (Semi-definite characterization [32, Theorem 4.24], [38, Proposition 2.4]) Let $${\boldsymbol{\eta}} \in \mathbb{C}^{n }$$,   |(Fn∗η)(f)| ≤1for all f∈[0,1] if and only if there exists a Hermitian matrix $${\it{\Lambda}} \in \mathbb{C}^{n \times n}$$, obeying   [Ληη∗I]⪰0,T∗(Λ)=[10], (J.9) where $${\boldsymbol{0}} \in \mathbb{C}^{n-1}$$ is a vector of zeros. J.2 Proof of Lemma 4.4 The interior of the feasible set of Problem (4.8) contains the origin and is therefore non empty, so strong duality holds by a generalized Slater condition [54], and we have   ∑fj∈T^|x^j|+λ∑l∈Ω^|z^l|=||μ^||TV+λ||z^||1 =⟨η^,y⟩−σ||η||2 (J.10)   ≤⟨η^,y⟩−⟨η^,y−Fnμ^−z^⟩ (J.11)   =⟨η^,Fnμ^+z^⟩ (J.12)   =Re⁡[∑fj∈T^|x^j|(Fn∗η^)(fj)¯x^j|x^j|+∑l∈Ω^|z^l|η^l¯z^l|z^l|]. (J.13) The inequality (J.11) follows from the Cauchy–Schwarz inequality because $$ke{\left\{ {\hat{\mu}, {\boldsymbol{{\hat{z}}}}}\right\}\!}$$ is primal feasible, and hence $${\left|\left|{{\boldsymbol{y}}-\mathcal{F}_{n} \, \hat{\mu} - {\boldsymbol{{\hat{z}}}}}\right|\right| _{2}\!} \leq \sigma$$. Due to the constraints (4.9) and (4.10) and Hölder’s inequality, the inequality that we have established is only possible if (4.15) and (4.16) hold. The proof is complete. J.3 Atomic-noise denoising via the alternating direction method of multipliers We rewrite Problem (4.22) as   mint∈R,u∈Cn,g~∈Cn,z~∈CnΨ∈Cn+1×n+1ξ2(nu1+t)+λ′||z~||1+12||y−g~−z~||22subject toΨ=[T(u)g~g~∗t], (J.14)  Ψ⪰0, (J.15) where $$\xi := \frac{1}{\gamma\sqrt{n}}$$ and $$\lambda' := \frac{\lambda}{\gamma}$$. The augmented Lagrangian for this problem is of the form   Lρ(t,u,g~,z~,Υ,Ψ):=ξ2(nu1+t)+λ′||z~||1+12||y−g~−z~||22+⟨Υ,Ψ−[T(u)g~g~∗t]⟩ (J.16)   +ρ2||Ψ−[T(u)g~g~∗t]||F2, (J.17) where $$\rho > 0$$ is a parameter. The alternating direction method of multipliers (ADMM) minimizes the augmented Lagrangian by iteratively applying the updates:   t(l+1) :=arg⁡mintLρ(t,u(l),g~(l),z~(l),Υ(l),Ψ(l)), (J.18)  u(l+1) :=arg⁡minuLρ(t(l),u,g~(l),z~(l),Υ(l),Ψ(l)), (J.19)  g~(l+1) :=arg⁡ming~Lρ(t(l),u(l),g~,z~(l),Υ(l),Ψ(l)), (J.20)  z~(l+1) :=arg⁡minz~Lρ(t(l),u(l),g~(l),z~,Υ(l),Ψ(l)), (J.21)  Ψ(l+1) :=arg⁡minΨLρ(t(l),u(l),g~(l),z~(l),Υ(l),Ψ), (J.22)  Υ(l+1) :=Υ(l)+ρ(Ψ(l+1)−[T(u(l+1))g~(l+1)(g~(l+1))∗t(l+1)]), (J.23) where $$l$$ indicates the iteration number. We refer the interested reader to the tutorial [7], and references therein for a justification of these steps and more information on ADMM. For the method to be practical, we need an efficient implementation of all the updates. The augmented Lagrangian is convex and differentiable with respect to $$t$$, $${\boldsymbol{{u}}}$$ and $${\boldsymbol{{\tilde{g}}}}$$, so for these variables we just need to compute their gradient and set it to zero. This yields the closed-form updates:   t(l+1) =Ψn+1(l)+1ρ(Υn+1(l)−ξ2), (J.24)  u(l+1) =MT∗(Ψ0(l)+Υ0(l)ρ)−ξ2ρe(1), (J.25)  g~(l+1) =12ρ+1(y−z~(l)+2ρψ(l)+2υ(l)), (J.26) where $${\boldsymbol{{e}}}\left({1}\right): = [1,0,0,\,{\ldots}\,,0]^T$$, $${\mathcal{{T}}}^{\ast}$$ outputs a vector whose $$j$$th element is the trace of the $$(j-1)$$th subdiagonal of the input matrix, $$M$$ is a diagonal matrix such that   Mj,j=1n−j+1,j=1,…n, (J.27) and   Ψ(l) :=[Ψ0(l)ψ(l)(ψ(l))∗Ψn+1(l)],Υ(l):=[Υ0(l)υ(l)(υ(l))∗Υn+1(l)]. (J.28)$${\it{\Psi}} _{0}^{\left({l}\right)}$$ and $$\Upsilon_{0}^{\left({l}\right)}$$ are $$n \times n$$ matrices, $${\boldsymbol{{\psi}}}^{\left({l}\right)}$$ and $${\boldsymbol{{\upsilon}}}^{\left({l}\right)}$$ are $$n$$-dimensional vectors, and $${\it{\Psi}}_{n+1}^{\left({l}\right)}$$ and $$\Upsilon _{n+1}^{\left({l}\right)}$$ are scalars. Updating $${\boldsymbol{{\tilde{z}}}}$$ requires solving the problem   minz~λ′‖z~‖1+12‖y−g~(l)−z~‖22, (J.29) which is easily achieved by the applying a proximal operator   z~(l+1):=proxλ′(y−g~(l)), (J.30) where for $$1\leq j \leq n$$  proxλ′(z~)j:={sign(z~j)(|z~j|−λ′)if |z~j|>λ′0otherwise.  (J.31) Finally, the update of $${\it{\Psi}}^{\left({l}\right)}$$ amounts to a projection onto the positive semi-definite cone   Ψ(l+1)=arg⁡minΨ⪰0‖Ψ−[T(u(l))g~(l)(g~(l))∗t(l)]+1ρΥ(l)‖F2, (J.32) which can be accomplished by computing the eigenvalue decomposition of the matrix and setting all negative eigenvalues to zero. Footnotes 1 For a concrete example of two signals with a minimum separation of $$0.9 {\it{\Delta}}^{\ast}$$ that are almost indistinguishable from data consisting of $$n = 2 \, 10^{3}$$ samples, see Fig. 2 of [38]. 2Total variation often also refers to the $$\ell_1$$ norm of the discontinuities of a piecewise constant function, which is a popular regularizer in image processing and other applications [55]. 3 To be precise, Theorem 2.2 assumes $$\lambda:=1/\sqrt{n}$$, but one can check that the whole proof goes through if we set $$\lambda$$ to $$c/\sqrt{n}$$ for any positive constant $$c$$. The only effect is a change in the constants $$C_s$$ and $$C_k$$ in (2.11) and (2.12). 4 To avoid this assumption, one can adapt the width of the three kernels so that the length of their convolution equals $$2m$$ and then recompute the bounds that we borrow from [38]. 5 We use the Matlab function fminsearch based on the simplex search method [42]. 6The relative MSE is defined as the ratio between the $$\ell_2$$ norm of the difference between the clean samples $${\boldsymbol{g}}$$ and the estimate divided by $${\left|\left|{{\boldsymbol{g}}}\right|\right| _{2}\!}$$. References 1. Azais J.-M., De Castro Y. & Gamboa F. ( 2015) Spike detection from inaccurate samplings. Appl. Comput. Harmon. Anal. , 38, 177– 195. Google Scholar CrossRef Search ADS   2. Beatty L. G., George J. D. & Robinson A. Z. ( 1978) Use of the complex exponential expansion as a signal representation for underwater acoustic calibration. J. Acoust. Soc. Am. , 63, 1782– 1794. Google Scholar CrossRef Search ADS   3. Berni A. J. ( 1975) Target identification by natural resonance estimation. IEEE Trans. Aerosp. Electron. Syst. , 11, 147– 154. Google Scholar CrossRef Search ADS   4. Bhaskar B., Tang G. & Recht B. ( 2013) Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Sig. Proc. , 61, 5987– 5999. Google Scholar CrossRef Search ADS   5. Bienvenu G. ( 1979) Influence of the spatial coherence of the background noise on high resolution passive methods. Proceedings of the International Conference on Acoustics, Speech and Signal Processing , vol. 4. pp. 306– 309. 6. Borcea L., Papanicolaou G., Tsogka C. & Berryman J. ( 2002) Imaging and time reversal in random media. Inverse Prob. , 18, 1247. Google Scholar CrossRef Search ADS   7. Boyd S., Parikh N., Chu E., Peleato B. & Eckstein J. ( 2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning , 3, 1– 122. Google Scholar CrossRef Search ADS   8. Boyd N., Schiebinger G. & Recht B. ( 2017) The alternating descent conditional gradient method for sparse inverse problems. SIAM J. Optimiz. , 27, 616– 639. Google Scholar CrossRef Search ADS   9. Boyd S. P. & Vandenberghe L. ( 2004) Convex Optimization . Cambridge University Press. Google Scholar CrossRef Search ADS   10. Boyer C., De Castro Y. & Salmon J. ( 2016) Adapting to unknown noise level in sparse deconvolution. arXiv preprint arXiv:1606.04760 . 11. Bredies K. & Pikkarainen H. K. ( 2013) Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations , 19, 190– 218. Google Scholar CrossRef Search ADS   12. Candès E. J. & Fernandez-Granda C. ( 2014) Towards a mathematical theory of super-resolution. Commun. Pure Appl. Math. , 67, 906– 956. Google Scholar CrossRef Search ADS   13. Candès E. J. & Fernandez-Granda C. ( 2013) Super-resolution from noisy data. J. Fourier Anal. Appl. , 19, 1229– 1254. Google Scholar CrossRef Search ADS   14. Candès E. J., Li X., Ma Y. & Wright J. ( 2011) Robust principal component analysis? J. ACM , 58, 11. Google Scholar CrossRef Search ADS   15. Candes E. J. & Plan Y. ( 2011) A probabilistic and ripless theory of compressed sensing. IEEE Trans. Inf. Theory , 57, 7235– 7254. Google Scholar CrossRef Search ADS   16. Candès E. J. & Romberg J. ( 2007) Sparsity and incoherence in compressive sampling. Inverse Probl. , 23, 969– 985. Google Scholar CrossRef Search ADS   17. Candès E. J., Romberg J. & Tao T. ( 2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory , 52, 489– 509. Google Scholar CrossRef Search ADS   18. Candès E. J. & Tao T. ( 2005) Decoding by linear programming. IEEE Trans. Inf. Theory , 51, 4203– 4215. Google Scholar CrossRef Search ADS   19. Candes E. J. & Tao T. ( 2006) Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory , 52, 5406– 5425. Google Scholar CrossRef Search ADS   20. Candès E. J. & Tao T. ( 2010) The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory , 56, 2053– 2080. Google Scholar CrossRef Search ADS   21. Carriere R. & Moses R. L. ( 1992) High resolution radar target modeling using a modified Prony estimator. IEEE Trans. Antennas Propag. , 40, 13– 18. Google Scholar CrossRef Search ADS   22. Chandrasekaran V., Recht B., Parrilo P. A. & Willsky A. S. ( 2012) The convex geometry of linear inverse problems. Found. Comput. Math. , 12, 805– 849. Google Scholar CrossRef Search ADS   23. Chandrasekaran V., Sanghavi S., Parrilo P. A. & Willsky A. S. ( 2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. , 21, 572– 596. Google Scholar CrossRef Search ADS   24. Chen Y. & Chi Y. ( 2014) Robust spectral compressed sensing via structured matrix completion. IEEE Trans. Inf. Theory , 60, 6576– 6601. Google Scholar CrossRef Search ADS   25. Chen S. S., Donoho D. L. & Saunders M. A. ( 2001) Atomic decomposition by basis pursuit. SIAM Rev. , 43, 129– 159. Google Scholar CrossRef Search ADS   26. De Castro Y. & Gamboa F. ( 2012) Exact reconstruction using Beurling minimal extrapolation. J. Math. Anal. Appl. , 395, 336– 354. Google Scholar CrossRef Search ADS   27. De Prony B. G. R. ( 1795) Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures. J. de l’école Polytechnique , 1, 24– 76. 28. Donoho D. L. ( 2006) Compressed sensing. IEEE Trans. Inf. Theory , 52, 1289– 1306. Google Scholar CrossRef Search ADS   29. Donoho D. L. & Huo X. ( 2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory , 47, 2845– 2862. Google Scholar CrossRef Search ADS   30. Donoho D. L. & Stark P. B. ( 1989) Uncertainty principles and signal recovery. SIAM J. Appl. Math. , 49, 906– 931. Google Scholar CrossRef Search ADS   31. Dragotti P. L. & Lu Y. M. ( 2014) On sparse representation in Fourier and local bases. IEEE Trans. Inf. Theory , 60, 7888– 7899. Google Scholar CrossRef Search ADS   32. Dumitrescu B. ( 2007) Positive Trigonometric Polynomials and Signal Processing Applications . Springer. 33. Duval V. & Peyré G. ( 2014) Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. , 15, 1– 41. 34. Eftekhari A. & Wakin M. B. ( 2015) Greed is super: a fast algorithm for super-resolution. arXiv preprint arXiv:1511.03385 . 35. Fannjiang A. & Liao W. ( 2012) Coherence pattern-guided compressive sensing with unresolved grids. SIAM J. Imag. Sci. , 5, 179– 202. Google Scholar CrossRef Search ADS   36. Faxin Y., Yiying S. & Yongtan L. ( 2001) An effective method of anti-impulsive-disturbance for ship-target detection in hf radar. Radar, 2001 CIE International Conference on, Proceedings . IEEE. pp. 372– 375. 37. Fernandez-Granda C. ( 2013) Support detection in super-resolution. Proceedings of the 10th International Conference on Sampling Theory and Applications . pp. 145– 148. 38. Fernandez-Granda C. ( 2016) Super-resolution of point sources via convex programming. Information Inference . https://doi.org/10.1093/imaiai/iaw005. 39. Grant M., Boyd S. & Ye Y. ( 2008) CVX: Matlab software for disciplined convex programming. 40. Gross D. ( 2009) Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory , 57, 1548– 1566. Google Scholar CrossRef Search ADS   41. Harris F. ( 1978) On the use of windows for harmonic analysis with the discrete Fourier transform. IEEE Proc. , 66, 51– 83. Google Scholar CrossRef Search ADS   42. Lagarias J. C., Reeds J. A., Wright M. H. & Wright P. E. ( 1998) Convergence properties of the nelder–mead simplex method in low dimensions. SIAM J. Optim. , 9, 112– 147. Google Scholar CrossRef Search ADS   43. Leonowicz Z., Lobos T. & Rezmer J. ( 2003) Advanced spectrum estimation methods for signal analysis in power electronics. IEEE Trans. Ind. Electron. , 50, 514– 519. Google Scholar CrossRef Search ADS   44. Li X. ( 2013) Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. , 37, 73– 99. Google Scholar CrossRef Search ADS   45. Lu X., Wang J., Ponsford A. M. & Kirlin R. L. ( 2010) Impulsive noise excision and performance analysis. 2010 IEEE Radar Conference . Washington, DC: IEEE. pp. 1295– 1300. 46. Mairal J., Bach F. & Ponce J., et al.   ( 2014) Sparse modeling for image and vision processing. Foundations and Trends® in Computer Graphics and Vision , 8, 85– 283. Google Scholar CrossRef Search ADS   47. Mallat S. G. & Zhang Z. ( 1993) Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Proc. , 41, 3397– 3415. Google Scholar CrossRef Search ADS   48. McCoy M. B. & Tropp J. A. ( 2014) Sharp recovery bounds for convex demixing, with applications. Found. Comput. Math. , 14, 503– 567. Google Scholar CrossRef Search ADS   49. Moitra A. ( 2015) Super-resolution, extremal functions and the condition number of Vandermonde matrices. Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC) . ACM, pp. 821– 830. 50. Olshausen B. A. & Field D. ( 1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature , 381, 607– 609. Google Scholar CrossRef Search ADS   51. Pati Y. C., Rezaiifar R. & Krishnaprasad P. ( 1993) Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on. IEEE. pp. 40– 44. 52. Rao N., Shah P. & Wright S. ( 2014) Forward-backward greedy algorithms for signal demixing. Signals, Systems and Computers, 2014 48th Asilomar Conference on IEEE. pp. 437– 441. 53. Rao N., Shah P. & Wright S. ( 2015) Forward–backward greedy algorithms for atomic norm regularization. IEEE Trans. Sig. Proc. , 63, 5798– 5811. Google Scholar CrossRef Search ADS   54. Rockafellar R. ( 1974) Conjugate Duality and Optimization . Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics. 55. Rudin L. I., Osher S. & Fatemi E. ( 1992) Nonlinear total variation based noise removal algorithms. Physica D , 60, 259– 268. Google Scholar CrossRef Search ADS   56. Schaeffer A. C. ( 1941) Inequalities of A. Markoff and S. Bernstein for polynomials and related functions. Bull. Amer. Math. Soc. , 47, 565– 579. Google Scholar CrossRef Search ADS   57. Schmidt R. ( 1986) Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. , 34, 276– 280. Google Scholar CrossRef Search ADS   58. Slepian D. ( 1978) Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V – The discrete case. Bell Syst. Tech. J. , 57, 1371– 1430. Google Scholar CrossRef Search ADS   59. Smith J. O. ( 2008) Introduction to Digital Filters: with Audio Applications , vol. 2. Julius Smith. 60. Stoica P., Babu P. & Li J. ( 2011) New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data. IEEE Trans. Sig. Proc. , 59, 35– 47. Google Scholar CrossRef Search ADS   61. Stoica P., Moses R., Friedlander B. & Soderstrom T. ( 1989) Maximum likelihood estimation of the parameters of multiple sinusoids from noisy measurements. IEEE Trans. Acoust. Speech Sig. Proc. , 37, 378– 392. Google Scholar CrossRef Search ADS   62. Stoica P. & Moses R. L. ( 2005) Spectral Analysis of Signals , 1 edn. Upper Saddle River, NJ: Prentice Hall. 63. Su D. ( 2016) Compressed sensing with corrupted Fourier measurements. arXiv preprint arXiv:1607.04926 . 64. Tang G. ( 2015) Resolution limits for atomic decompositions via Markov-Bernstein type inequalities. Proceedings of the 10th International Conference on Sampling Theory and Applications . pp. 548– 552. 65. Tang G., Bhaskar B. & Recht B. ( 2015) Near minimax line spectral estimation. IEEE Trans. Inf. Theory,  61, 499– 512. Google Scholar CrossRef Search ADS   66. Tang G., Bhaskar B., Shah P. & Recht B. ( 2013) Compressed sensing off the grid. IEEE Trans. Inf. Theory , 59, 7465– 7490. Google Scholar CrossRef Search ADS   67. Tang G., Bhaskar B. N. & Recht B. ( 2013) Sparse recovery over continuous dictionaries-just discretize. 2013 Asilomar Conference on Signals, Systems and Computers . pp. 1043– 1047. 68. Tang G., Shah P., Bhaskar B. N. & Recht B. ( 2014) Robust line spectral estimation. Signals, Systems and Computers, 2014 48th Asilomar Conference on . IEEE. pp. 301– 305. 69. Tibshirani R. ( 1996) Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B , 58, 267– 288. 70. Tropp J. A. ( 2008) On the linear independence of spikes and sines. J. Fourier Anal. Appl. , 14, 838– 858. Google Scholar CrossRef Search ADS   71. Tropp J. A. ( 2011) User-friendly tail bounds for sums of random matrices. Found. Comput. Math. , 12, 389– 434. Google Scholar CrossRef Search ADS   72. Viti V., Petrucci C. & Barone P. ( 1997) Prony methods in NMR spectroscopy. Int. J. Imaging Syst. Technol. , 8, 565– 571. Google Scholar CrossRef Search ADS   73. Yang Z. & Xie L. On gridless sparse methods for line spectral estimation from complete and incomplete data. IEEE Trans. Sig. Proc. , 63, 3139– 3153. CrossRef Search ADS   74. Zeng W.-J., So H. & Huang L. ( 2013) $$\ell_p$$-music: Robust direction-of-arrival estimator for impulsive noise environments. IEEE Trans. Sig. Proc. , 61, 4296– 4308. Google Scholar CrossRef Search ADS   75. Zheng L. & Wang X. ( 2017) Improved NN-JPDAF for joint multiple target tracking and feature extraction. arXiv preprint arXiv:1703.08254 . © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information and Inference: A Journal of the IMA Oxford University Press

# Demixing sines and spikes: Robust spectral super-resolution in the presence of outliers

, Volume 7 (1) – Mar 1, 2018
64 pages

Loading next page...

/lp/ou_press/demixing-sines-and-spikes-robust-spectral-super-resolution-in-the-ryzQaJPra2
Publisher
Oxford University Press
Copyright
© The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
ISSN
2049-8764
eISSN
2049-8772
D.O.I.
10.1093/imaiai/iax005
Publisher site
See Article on Publisher Site

### Abstract

Abstract We consider the problem of super-resolving the line spectrum of a multisinusoidal signal from a finite number of samples, some of which may be completely corrupted. Measurements of this form can be modeled as an additive mixture of a sinusoidal and a sparse component. We propose to demix the two components and super-resolve the spectrum of the multisinusoidal signal by solving a convex program. Our main theoretical result is that—up to logarithmic factors—this approach is guaranteed to be successful with high probability for a number of spectral lines that is linear in the number of measurements, even if a constant fraction of the data are outliers. The result holds under the assumption that the phases of the sinusoidal and sparse components are random and the line spectrum satisfies a minimum-separation condition. We show that the method can be implemented via semi-definite programming, and explain how to adapt it in the presence of dense perturbations as well as exploring its connection to atomic-norm denoising. In addition, we propose a fast greedy demixing method that provides good empirical results when coupled with a local non-convex-optimization step. 1. Introduction The goal of spectral super-resolution is to estimate the spectrum of a multisinusoidal signal from a finite number of samples. This is a problem of crucial importance in signal processing applications, such as target identification from radar measurements [3,21], digital filter design [59], underwater acoustics [2], seismic imaging [6], nuclear magnetic resonance spectroscopy [72] and power electronics [43]. In this paper, we study spectral super-resolution in the presence of perturbations that completely corrupt a subset of the data. The corrupted samples can be interpreted as outliers that do not follow the same multisinusoidal model as the rest of the measurements, and complicate significantly the task of super-resolving the spectrum of the signal of interest. Depending on the application, outliers may appear due to sensor failures, interference from other signals or impulsive noise. For instance, radar measurements can be corrupted by lightning discharges, spurious radio emissions or telephone switching transients [36,45]. Figure 1 illustrates the problem of performing spectral super-resolution in the presence of outliers. The top row shows a superposition of sinusoids and its corresponding sparse spectrum. In the second row, the multisinusoidal signal is sampled at the Nyquist rate over a finite interval, which induces spectral aliasing and makes it challenging to resolve the individual spectral lines. The sparse signal in the third row represents an additive perturbation that corrupts some of the samples. Finally, the bottom row shows the available measurements: a mixture of sines (samples from the multisinusoidal signal) and spikes (the sparse perturbation). Our objective is to demix these two components and super-resolve the spectrum of the sines. Fig. 1. View largeDownload slide The top row shows a multisinusoidal signal (left) and its sparse spectrum (right). The minimum separation of the spectrum is $$2.8 / (n - 1)$$ (see Section 2.2). On the second row, truncating the signal to a finite interval after measuring $$n:= 101$$ samples at the Nyquist rate (left) results in aliasing in the frequency domain (right). The third row shows some impulsive noise (left) and its corresponding spectrum (right). The last row shows the superposition of the multisinusoidal signal and the sparse noise, which yields a mixture of sines and spikes depicted in the time (left) and frequency domains (right). For ease of visualization, the amplitudes of the spectrum of the sines and of the spikes are real (we only show half of the spectrum and half of the spikes because their amplitudes and positions are symmetric). Fig. 1. View largeDownload slide The top row shows a multisinusoidal signal (left) and its sparse spectrum (right). The minimum separation of the spectrum is $$2.8 / (n - 1)$$ (see Section 2.2). On the second row, truncating the signal to a finite interval after measuring $$n:= 101$$ samples at the Nyquist rate (left) results in aliasing in the frequency domain (right). The third row shows some impulsive noise (left) and its corresponding spectrum (right). The last row shows the superposition of the multisinusoidal signal and the sparse noise, which yields a mixture of sines and spikes depicted in the time (left) and frequency domains (right). For ease of visualization, the amplitudes of the spectrum of the sines and of the spikes are real (we only show half of the spectrum and half of the spikes because their amplitudes and positions are symmetric). Broadly speaking, there are three main approaches to spectral super-resolution: linear non-parametric methods [62], techniques based on Prony’s method [27,62] and optimization-based methods [4,38,65]. The first three rows of Fig. 2 show the results of applying a representative of each approach to a spectral super-resolution problem when there are no outliers in the data (left column) and when there are (right column). Fig. 2. View largeDownload slide Estimate of the sparse spectrum of the multisinusoidal signal from Fig. 1, when outliers are absent from the data (left column) and when they are present (right column). The estimates are shown in red; the true location of the spectra is shown in blue. Methods that do not account for outliers fail to recover all the spectral lines when impulsive noise corrupts the data, whereas an optimization-based estimator incorporating a sparse-noise model still achieves exact recovery. Fig. 2. View largeDownload slide Estimate of the sparse spectrum of the multisinusoidal signal from Fig. 1, when outliers are absent from the data (left column) and when they are present (right column). The estimates are shown in red; the true location of the spectra is shown in blue. Methods that do not account for outliers fail to recover all the spectral lines when impulsive noise corrupts the data, whereas an optimization-based estimator incorporating a sparse-noise model still achieves exact recovery. In the absence of corruptions, the periodogram—a linear non-parametric technique that uses windowing to reduce spectral aliasing [41]—locates most of the relevant frequencies, albeit at a coarse resolution. In contrast, both the Prony-based approach—represented by the Multiple Signal Classification (MUSIC) algorithm [5,57]—and the optimization-based method—based on total-variation norm minimization [4,13,65]—recover the true spectrum of the signal perfectly. All these techniques are designed to allow for small Gaussian-like perturbations to the data, and hence, their performance degrades gracefully when such noise is present (not shown in the figure). However, as we can see in the right column of Fig. 2, when outliers are present in the data their performance is severely affected: none of the methods detect the fourth spectral line of the signal, and they all hallucinate two large spurious spectral lines to the right of the true spectrum. The subject of this paper is an optimization-based method that leverages sparsity-inducing norms to perform spectral super-resolution and simultaneously detect outliers in the data. The bottom row of Fig. 2 shows that this approach is capable of super-resolving the spectrum of the multisinusoidal signal in Fig. 1 exactly from the corrupted measurements, in contrast to techniques that do not account for the presence of outliers in the data. Below is a brief road map of the paper. Section 2 describes our methods and main results. In Section 2.1, we introduce a mathematical model of the spectral super-resolution problem. Section 2.2 justifies the need for a minimum-separation condition on the spectrum of the signal for spectral super-resolution to be well posed. In Section 2.3, we present our optimization-based method and provide a theoretical characterization of its performance. Section 2.4 discusses the robustness of the technique to the choice of regularization parameter. Section 2.5 explains how to adapt the method when the data are perturbed by dense noise. Section 2.6 establishes a connection between our method and atomic-norm denoising. Finally, in Section 2.7 we review the related literature. Our main theoretical contribution—Theorem 2.2—establishes that solving the convex program introduced in Section 2.3 allows to super-resolve up to $$k$$ spectral lines exactly in the presence of $$s$$ outliers (i.e. when $$s$$ measurements are completely corrupted) with high probability from a number of data that is linear both in $$k$$ and $$s$$ up to logarithmic factors. Section 3 is dedicated to the proof of this result, which is non-asymptotic and holds under several assumptions that are described in Section 2.3. Section 4 focuses on demixing algorithms. In Sections 4.1 and 4.2, we explain how to implement the methods discussed in Sections 2.3 and 2.5, respectively, by recasting the dual of the corresponding optimization problems as a tractable semi-definite program (SDP). In Section 4.3, we propose a greedy demixing technique that achieves good empirical results when combined with a local non-convex-optimization step. Section 4.4 describes the implementation of atomic-norm denoising in the presence of outliers using semi-definite programming. Matlab code of all the algorithms discussed in this section is available in the Supplementary Material. Section 5 reports numerical experiments illustrating the performance of the proposed approach. In Section 5.1, we investigate under what conditions our optimization-based method achieves exact demixing empirically. In Section 5.2, we compare atomic-norm denoising to an alternative approach based on matrix completion. We conclude the paper outlining several future research directions in Section 6. 2. Robust spectral super-resolution via convex programming 2.1 Mathematical model We model the multisinusoidal signal of interest as a superposition of $$k$$ complex exponentials   g(t) :=∑j=1kxjexp⁡(i2πfjt), (2.1) where $$\boldsymbol{x} \in \mathbb{C}^{k}$$ is the vector of complex amplitudes and $$\boldsymbol{x}_j$$ is its $$j$$th entry. The spectrum of $$g$$ consists of spectral lines, modeled by Dirac measures that are supported on a subset $$T:=\left\{{f_1, \ldots, f_k}\right\}$$ of the unit interval $$\left[{0,1}\right]$$  μ =∑fj∈Txjδ(f−fj), (2.2) where $$\delta \left({f - f_j}\right)$$ denotes a Dirac measure located at $$f_j$$. Sparse spectra of this form are often called line spectra in the literature. Note that a simple change of variable allows to apply this model to signals with spectra restricted to any interval $$\left[{f_{\min},f_{\max}}\right]$$. By the Nyquist–Shannon sampling theorem, we can recover $$g$$, and consequently $$\mu$$, from an infinite sequence of regularly spaced samples $$\left\{{g\left({l}\right),\; l \in \mathbb{Z}}\right\}$$ by sinc interpolation. The aim of spectral super-resolution is to estimate the support of the line spectrum $$T$$ and the amplitude vector $$\boldsymbol{x}$$ from a finite set of $$n$$ contiguous samples instead. Note that $$\left\{{g\left({l}\right), \; l \in \mathbb{Z}}\right\}$$ are the Fourier series coefficients of $$\mu$$, so mathematically we seek to recover an atomic measure from a subset of its Fourier coefficients. As described in the introduction, we are interested in tackling this problem when a subset of the data is completely corrupted. These corruptions are modeled as additive impulsive noise, represented by a sparse vector $$\boldsymbol{z} \in \mathbb{C}^{n}$$ with $$s$$ non-zero entries. The data are consequently of the form   yl =g(l)+zl,1≤l≤n. (2.3) To represent the measurement model more compactly, we define an operator $$\mathcal{F}_{n}$$ that maps a measure to its first $$n$$ Fourier series coefficients,   y =Fnμ+z. (2.4) Intuitively, $$\mathcal{F}_{n}$$ maps the spectrum $$\mu$$ to $$n$$ regularly spaced samples of the signal $$g$$ in the time domain. 2.2 Minimum-separation condition Even in the absence of any noise, the problem of recovering a signal from $$n$$ samples is vastly underdetermined: we can fill in the missing samples $$g\left({0}\right), g \left({-1}\right), \ldots$$ and $$g\left({n+1}\right), g \left({n+2}\right), \ldots$$ any way we like and then apply sinc interpolation to obtain an estimate that is consistent with the data. For the inverse problem to make sense, we need to leverage additional assumptions about the structure of the signal. In spectral super-resolution, the usual assumption is that the spectrum of the signal is sparse. This is reminiscent of compressed sensing [17], where signals are recovered robustly from randomized measurements by exploiting a sparsity prior. A crucial insight underlying compressed-sensing theory is that the randomized operator obeys the restricted isometry property (RIP), which ensures that the measurements preserve the energy of any sparse signal with high probability [18]. Unfortunately, this is not the case for our measurement operator of interest. The reason is that signals consisting of clustered spectral lines may lie almost in the null space of the sampling operator, even if the number of spectral lines is small. Additional conditions beyond sparsity are necessary to ensure that the problem is well posed. To this end, we define the minimum separation of the support of a signal, as introduced in [12]. Definition 2.1 (Minimum separation) For a set of points $$T \subset \left[{0,1}\right]$$, the minimum separation (or minimum distance) is defined as the closest distance between any two elements from $$T$$,   Δ(T)=inf(f1,f2)∈T:f1≠f2|f2−f1|. (2.5) To be clear, this is the wrap-around distance so that the distance between $$f_1 = 0$$ and $$f_2 = 3/4$$ is equal to $$1/4$$. If the minimum distance is too small with respect to the number of measurements, then it may be impossible to resolve a signal even under very small levels of noise. A fundamental limit in this sense is $${\it{\Delta}}^{\ast} := \frac{2}{n-1}$$, which is the width of the main lobe of the periodized sinc kernel that is convolved with the spectrum when we truncate the number of samples to $$n$$. This limit arises because for minimum separations just below $${\it{\Delta}}^{\ast} / 2$$ there exist signals that are almost suppressed by the sampling operator $$\mathcal{F}_{n}$$. If such a signal $$d$$ corresponds to the difference between two different signals $$s_1$$ and $$s_2$$ so that $$s_1 - s_2 = d$$, it will be very challenging to distinguish $$s_1$$ and $$s_2$$ from the available data.1 This phenomenon can be characterized theoretically in an asymptotic setting using Slepian’s prolate-spheroidal sequences [58] (see also Section 3.2 in [12]). More recently, Theorem 1.3 of [49] provides a non-asymptotic analysis, and other works have obtained lower bounds on the minimum separation necessary for convex-programming approaches to succeed [33,64]. 2.3 Robust spectral super-resolution via convex programming Spectral super-resolution in the presence of outliers boils down to estimating $$\mu$$ and $$\boldsymbol{z}$$ in the mixture model (2.4). Without additional constraints, this is not very ambitious: data consistency is trivially achieved, for instance, by setting the sines to zero and declaring every sample to be a spike. Our goal is to fit the two components in the simplest way possible, i.e. so that the spectrum of the multisinusoidal signal—the sines—is restricted to a small number of frequencies and the impulsive noise—the spikes—only affects a small subset of the data. Many modern signal processing methods rely on the design of cost functions that (1) encode prior knowledge about signal structure and (2) can be minimized efficiently. In particular, penalizing the $$\ell_1$$ norm is an efficient and robust method for obtaining sparse estimates in denoising [25], regression [69] and inverse problems such as compressed sensing [19,28]. In order to fit a mixture model where both the spikes and the spectrum of the sines are sparse, we propose minimizing a cost function that penalizes the $$\ell_1$$ norm of both components (or rather a continuous counterpart of the $$\ell_1$$ norm in the case of the spectrum, as we explain below). We would like to note that this approach was introduced by some of the authors of the present paper in [38,68], but without any theoretical analysis, and applied to multiple target tracking from radar measurements in [75]. Similar ideas have been previously leveraged to separate low-rank and sparse matrices [14,23], perform compressed sensing from corrupted data [44] and demix signals that are sparse in different bases [48]. Recall that the spectrum of the sinusoidal component in our mixture model is modeled as a measure that is supported on a continuous interval. Its $$\ell_1$$ norm is therefore not well defined. In order to promote sparsity in the estimate, we resort instead to a continuous version of the $$\ell_1$$ norm: the total variation (TV) norm.2 If we consider the space of measures supported on the unit interval, this norm is dual to the infinity norm, so that   ||ν||TV:=sup||h||∞≤1,h∈C(T)Re[∫Th(f)¯ν(df)], (2.6) for any measure $$\nu$$ (for a different definition see Section A in the Appendix of [12]). In the case of a superposition of Dirac deltas as in (2.2), the TV norm is equal to the $$\ell_1$$ norm of the coefficients, i.e. $$\left|\left|{ \mu }\right|\right| _{\mathrm{TV}}=\left|\left|{ \boldsymbol{x}}\right|\right| _{1}$$. Spectral super-resolution via TV norm minimization, introduced in [12,26] (see also [11]), has been shown to achieve exact recovery under a minimum separation of $$\frac{2.52}{ n-1 }$$ in [38] and to be robust to missing data in [66]. Our proposed method minimizes the sum of the $$\ell_1$$ norm of the spikes and the TV norm of the spectrum of the sines subject to a data-consistency constraint:   minμ~,z~||μ~||TV+λ||z~||1subject toFnμ~+z~=y. (2.7) $$\lambda > 0$$ is a regularization parameter that governs the weight of each penalty term. This optimization program is convex. Section 4.1 explains how to solve it by reformulating its dual as an SDP. Our main theoretical result is that solving (2.7) achieves perfect demixing with high probability under certain assumptions. Theorem 2.2 (Proof in Section 3) Suppose that we observe $$n$$ samples of the form   y =Fnμ+z, (2.8) where each entry in $$\boldsymbol{z}$$ is non-zero with probability $$\frac{s}{n}$$ (independently of each other) and the support $$T:=\left\{{f_1, \ldots, f_k}\right\}$$ of   μ :=∑j=1kxjδ(f−fj), (2.9) has a minimum separation lower bounded by   Δmin:=2.52n−1. (2.10) If the phases of the entries in $$\boldsymbol{x} \in \mathbb{C}^{k}$$ and the non-zero entries in $$\boldsymbol{z} \in \mathbb{C}^{n}$$ are i.i.d. random variables uniformly distributed in $$\left[{0,2\pi}\right]$$, then the solution to Problem (2.7) with $$\lambda = 1/\sqrt{n}$$ is exactly equal to $$\mu$$ and $$\boldsymbol{z}$$ with probability $$1-\epsilon$$ for any $$\epsilon>0$$ as long as   k ≤Ck(log⁡nϵ)−2n, (2.11)  s ≤Cs(log⁡nϵ)−2n, (2.12) for fixed numerical constants $$C_k$$ and $$C_s$$ and $$n \geq 2 \times 10^3$$. The theorem guarantees that our method is able to super-resolve a number of spectral lines that is proportional to the number of measurements, even if the data contain a constant fraction of outliers, up to logarithmic factors. The proof is presented in Section 3; it is based on the construction of a random trigonometric polynomial that certifies exact demixing. Our result is non-asymptotic and holds with high probability under several assumptions, which we now discuss in more detail. The support of the sparse corruptions follows a Bernoulli model, where each entry is non-zero with probability $$s/n$$ independently from each other. This model is essentially equivalent to choosing the support of the outliers uniformly at random from all possible subsets of cardinality $$s$$, as shown in Section 7.1 of [14] (see also [17, Section 2.3] and [20, Section 8.1]). The phases of the amplitudes of the spectral lines are assumed to be i.i.d. uniform random variables (note, however, that the amplitudes can take any value). Modeling the phase of the spectral components of a multisinusoidal signal in this way is a common assumption in signal processing, see, for example, [62, Chapter 4.1]. The phases of the amplitudes of the additive corruptions are also assumed to be i.i.d. uniform random variables (the amplitudes can again take any value). If we constrain the corruptions to be real, the derandomization argument in [14, Section 2.2] allows to obtain guarantees for arbitrary sign patterns. We have already discussed the minimum-separation condition on the spectrum of the multisinusoidal component in Section 2.2. Our assumptions model a non-adversarial situation where the outliers are not designed to cancel out the samples from the multisinusoidal signal. In the absence of any such assumption, it is possible to concoct instances for which the demixing problem is ill posed, even if the number of spectral lines and outliers is small. We illustrate this with a simple example, based on the picket-fence sequence used as an extremal function for signal-decomposition uncertainty principles in [29,30]. Consider $$k'$$ spectral lines with unit amplitudes with an equispaced support   μ′:=1k′∑j=0k′−1δ(f−j/k′). (2.13) The samples of the corresponding multisinusoidal signal $$g'$$ are zero except at multiples of $$k'$$  g′(l) ={1if l/k′∈Z,0otherwise.  (2.14) If we choose the corruptions $$\boldsymbol{z'}$$ to cancel out these non-zero samples   z′l ={−1if l/k′∈Z,0otherwise,  (2.15) then the corresponding measurements are all equal to zero! For these data, the demixing problem is obviously impossible to solve by any method. Set $$k':= \sqrt{n}$$ so that the number of measurements $$n$$ equals $$\left({k'}\right)^2$$. Then the number of outliers is just $$n/k' = \sqrt{n}$$ and the minimum separation between the spikes is $$1/\sqrt{n}$$, which amply satisfies the minimum-separation condition 2.10. This shows that additional assumptions beyond the minimum-separation condition are necessary for the inverse problem to make sense. A related phenomenon arises in compressed sensing, where random measurement schemes avoid similar adversarial situations (see [17, Section 1.3] and [70]). An interesting subject for future research is whether it is possible to establish the guarantees for exact demixing provided by Theorem 2.2 without random assumptions on the phase of the different components, or if these assumptions are necessary for the demixing problem to be well posed. 2.4 Regularization parameter A question of practical importance is whether the performance of our demixing method is robust to the choice of the regularization parameter $$\lambda$$ in Problem (2.7). Theorem 2.2 indicates that this is the case in the following sense. If we set $$\lambda$$ to a fixed value that is proportional to $$1/\sqrt{n}$$,3 then exact demixing occurs for a number of spectral lines $$k$$ and a number of outliers $$s$$ that range from zero to a certain maximum value proportional to $$n$$ (up to logarithmic factors). In this section, we provide additional theoretical evidence for the robustness of our method to the choice of $$\lambda$$. If exact recovery occurs for a certain pair $$\left\{{\mu, \boldsymbol{z}}\right\}$$ and a certain $$\lambda$$, then it will also succeed for any trimmed version$$\left\{{\mu', \boldsymbol{z'}}\right\}$$ (obtained by removing some elements of the support of $$\mu$$ or $$\boldsymbol{z}$$, or both) for the same value of $$\lambda$$. Lemma 2.3 (Proof in Section A) Let $$\boldsymbol{z}$$ be a vector with support $${\it{\Omega}}$$ and let $$\mu$$ be an arbitrary measure such that   y=Fnμ+z. (2.16) Assume that the pair $$\left\{{\mu,\boldsymbol{z}}\right\}$$ is the unique solution to Problem (2.7) and consider the data   y′=Fnμ′+z′. (2.17) $$\mu'$$ is a trimmed version of $$\mu$$: it is equal to $$\mu$$ on a subset of its support $$T' \subseteq T$$ and is zero everywhere else. Similarly, $$\boldsymbol{z'}$$ equals $$\boldsymbol{z}$$ on a subset of entries $${\it{\Omega}}' \subseteq {\it{\Omega}}$$ and is zero otherwise. For any choice of $$T'$$ and $${\it{\Omega}}'$$, $$\left\{{\mu,\boldsymbol{\boldsymbol{z'}}}\right\}$$ is the unique solution to Problem (2.7) if we set the data vector to equal $$\boldsymbol{y'}$$ for the same value of $$\lambda$$. This result and its proof are inspired by Theorem 2.2 in [14]. As illustrated by Figs 12 and 13, our numerical experiments corroborate the lemma: we consistently observe that if exact demixing occurs for most signals with a certain number of spectral lines and outliers, then it also occurs for most signals with less spectral lines and less corruptions (as long as the minimum separation is the same) for a fixed value of $$\lambda$$. 2.5 Stability to dense perturbations One of the advantages of our optimization-based framework is that we can account for additional assumptions on the problem structure by modifying either the cost function or the constraints of the optimization problem used to perform demixing. In most applications of spectral super-resolution, the data will deviate from the multisinusoidal model (2.1) because of measurement noise and other perturbations, even in the absence of outliers. We model such deviations as a dense additive perturbation $$\boldsymbol{w}$$, such that $$\left|\left|{ \boldsymbol{w}}\right|\right| _{2} \leq \sigma$$ for a certain noise level $$\sigma$$,   y=Fnμ+z+w. (2.18) Problem (2.7) can be adapted to this measurement model by relaxing the equality constraint that enforces data consistency to an inequality which takes into account the noise level   minμ~,z~||μ~||TV+λ||z~||1 subject to ||y−Fnμ~+z~||2≤σ. (2.19) Just like Problem (2.7), this optimization problem can be solved by recasting its dual as a tractable SDP, as we explain in detail in Section 4.2. 2.6 Atomic-norm denoising Our demixing method is closely related to atomic-norm denoising of multisinusoidal samples. Consider the $$n$$-dimensional vector $$\boldsymbol{g} := \mathcal{F}_n \, \mu$$ containing clean samples from a signal $$g$$ defined by (2.1). The assumption that the spectrum $$\mu$$ of $$g$$ consists of $$k$$ spectral lines is equivalent to $$\boldsymbol{g}$$ having a sparse representation in an infinite dictionary of $$n$$-dimensional sinusoidal atoms$$\boldsymbol{a} \left({f, \phi}\right) \in \mathbb{C}^{n}$$ parameterized by frequency $$f \in [0, 1)$$ and phase $$\phi \in [0, 2\pi)$$,   a(f,ϕ)l :=1neiϕei2πlf,1≤l≤n. (2.20) Indeed, $$\boldsymbol{g}$$ can be expressed as a linear combination of $$k$$ atoms   g =n∑j=1k|xj|a(fj,ϕj),xj:=|xj|ei2πϕj. (2.21) This representation can be leveraged in an optimization framework using the atomic norm, an idea introduced in [22] and first applied to spectral super-resolution in [4]. The atomic norm induced by a set of atoms $$\mathcal{A}$$ is equal to the gauge of $$\mathcal{A}$$ defined by   ||u||A :=inf{t>0:u∈tconv(A)}, (2.22) which is a norm as long as $$\mathcal{A}$$ is centrally symmetric around the origin (as is the case for (2.20)). Geometrically, the unit ball of the atomic norm is the convex hull of the atoms in $$\mathcal{A}$$, just like the $$\ell_1$$ norm ball is the convex hull of unit-norm one-sparse vectors. As a result, signals consisting of a small number of atoms tend to have a smaller atomic norm (just like sparse vectors tend to have a smaller $$\ell_1$$ norm). Consider the problem of denoising the samples of $$g$$ from corrupted data of the form (2.4),   y=g+z. (2.23) To be clear, the aim is now to separate $$\boldsymbol{g}$$ from the corruption vector $$\boldsymbol{z}$$ instead of directly estimating the spectrum of $$\boldsymbol{g}$$. In order to demix the two signals, we penalize the atomic norm of the multisinusoidal component and the $$\ell_1$$ norm of the sparse component,   ming~,z~1n||g~||A+λ||z~||1subject to g~+z~=y, (2.24) where $$\lambda > 0$$ is a regularization parameter. Problems 2.19 and 2.24 are closely related. Their convex cost functions are designed to exploit sparsity assumptions on the spectrum of $$g$$ and on the corruption vector $$\boldsymbol{z}$$ in ways that are essentially equivalent. More formally, both problems have the same dual, as implied by the following lemma and Lemma 4.1. Lemma 2.4 (Proof in Section B.1) The dual of Problem (2.24) is   maxη∈Cn⟨y,η⟩subject to||Fn∗η||∞≤1, (2.25)  ||η||∞≤λ, (2.26) where the inner product is defined as $$\left \langle{ \boldsymbol{y}}, { \boldsymbol{\eta}}\right \rangle : = \mathrm{Re}\left({\boldsymbol{y}^{\ast}\boldsymbol{\eta}}\right)$$. The fact that the two optimization problems share the same dual has an important consequence established in Section B.2: the same dual certificate can be used to prove that they achieve exact demixing. As a result, the proof of Theorem 2.2 immediately implies that solving Problem (2.24) is successful in separating $$\boldsymbol{g}$$ and $$\boldsymbol{z}$$ under the conditions described in Section 2.3. Corollary 2.5 (Proof in Section B.2) Under the assumptions of Theorem 2.2, $$\boldsymbol{g} := \mathcal{F}_n \, \mu$$ and $$\boldsymbol{z}$$ are the unique solutions to Problem (2.24). Problem (2.24) can be adapted to denoise data that is perturbed by both outliers and dense noise, which follows the measurement model (2.18). Inspired by previous work on line-spectra denoising via atomic-norm minimization [4,65], we remove the equality constraint and add a regularization term to ensure consistency with the data,   ming~,z~1n||g~||A+λ||z~||1+γ2||y−g~−z~||22, (2.27) where $$\gamma > 0$$ is a regularization parameter with a role analogous to $$\sigma$$ in Problem (2.19). In Section 4.4, we discuss how to implement atomic-norm denoising by reformulating Problems 2.24 and 2.27 as SDPs. 2.7 Related work Most previous works analyzing the problem of demixing sines and spikes make the assumption that the frequencies of the sinusoidal component lie on a grid with step size $$1/n$$, where $$n$$ is the number of samples. In that case, demixing reduces to a discrete sparse decomposition problem in a dictionary formed by the concatenation of an identity and a discrete Fourier transform matrix [30]. Bounds on the coherence of this dictionary can be used to derive guarantees for basis pursuit [29] and also techniques based on Prony’s method [31]. Coherence-based bounds do not reflect the fact that most sparse subsets of the dictionary are well conditioned [70], which can be exploited to obtain stronger guarantees for $$\ell_1$$ norm-based methods under random assumptions [44,63]. In this paper, we depart from this previous literature by considering a sinusoidal component whose spectrum may lie on arbitrary points of the unit interval. Our work draws from recent developments on the super-resolution of point sources and line spectra via convex optimization. In [12] (see also [26]), the authors establish that TV minimization achieves exact recovery of measures satisfying a minimum separation of $$\frac{4}{n-1}$$, a result that is sharpened to $$\frac{2.52}{n-1}$$ in [38]. In [66] the method is adapted to a compressed-sensing setting, where a large fraction of the measurements may be missing. The proof of Theorem 2.2 builds upon the techniques developed in [12,38,66]. We would like to point out that stability guarantees for TV norm-based approaches established in subsequent works [1,13,33,37,65] hold only for small perturbations, and do not apply when the data may be perturbed by sparse noise of arbitrary amplitude, as is the case in this paper. In [24], a spectral super-resolution approach based on robust low-rank matrix recovery is shown to be robust to outliers under some incoherence assumptions, which are empirically related to our minimum-separation condition (see Section A in [24]). Ignoring logarithmic factors, the guarantees in [24] allow for exact denoising of up to $$\mathcal{O}\left({\sqrt{n}}\right)$$ spectral lines in the presence of $$\mathcal{O}\left({n}\right)$$ outliers, where $$n$$ is the number of measurements. Corollary 2.5, which follows from our main result Theorem 2.2,} establishes that our approach succeeds in denoising up to $$\mathcal{O}\left({n}\right)$$ spectral lines also in the presence of $$\mathcal{O}\left({n}\right)$$ outliers (again ignoring logarithmic factors). In Section 5.2, we compare both techniques empirically. Finally, we would like to mention another method exploiting optimization and low-rank matrix structure [74] and an alternative approach to gridless spectral super-resolution [60], which has been recently adapted to account for missing data and impulsive noise [73]. In both cases, no theoretical results guaranteeing exact recovery in the presence of outliers are provided. 3. Proof of Theorem 2.2 3.1 Dual polynomial We prove Theorem 2.2 by constructing a trigonometric polynomial whose existence certifies that solving Problem (2.7) achieves exact demixing. We refer to this object as a dual polynomial, because its vector of coefficients is a solution to the dual of Problem (2.7). This vector is known as a dual certificate in the compressed-sensing literature [17]. Proposition 3.1 (Proof in Section C) Let $$T \subset \left[{0,1}\right]$$ be the non-zero support of $$\mu$$ and $${\it{\Omega}} \subset \left\{{1,2,\ldots,n}\right\}$$ the non-zero support of $$\boldsymbol{z}$$. If there exists a trigonometric polynomial of the form   Q(f) =Fn∗q (3.1)   =∑j=1nqje−i2πjf, (3.2) which satisfies   Q(fj)=xj|xj|,∀fj∈T, (3.3)  |Q(f)|<1,∀f∈Tc, (3.4)  ql=λzl|zl|,∀l∈Ω, (3.5)  |ql|<λ,∀l∈Ωc, (3.6) then $$\left({\mu,\boldsymbol{z}}\right)$$ is the unique solution to Problem 2.7 as long as $$k+s \leq n$$. The dual polynomial can be interpreted as a subgradient of the TV norm at the measure $$\mu$$, in the sense that   ||μ+ν||TV ≥||μ||TV+⟨Q,ν⟩,⟨Q,ν⟩:=Re[∫[0,1]Q(f)¯dν(f)], (3.7) for any measure $$\nu$$ supported in the unit interval. In addition, weighting the coefficients of $$Q$$ by $$1/\lambda$$ yields a subgradient of the $$\ell_1$$ norm at the vector $$\boldsymbol{z}$$. This means that for any other feasible pair $$\left({ \mu', \boldsymbol{z}'}\right)$$ such that $$\boldsymbol{y} = \mathcal{F}_n \, \mu' + \boldsymbol{z}'$$  ||μ′||TV+λ||z′||1 ≥||μ||TV+⟨Q,μ′−μ⟩+λ||z||1+λ⟨1λq,z′−z⟩ (3.8)  ≥||μ||TV+⟨Fn∗q,μ′−μ⟩+λ||z||1+⟨q,z′−z⟩ (3.9)   =||μ||TV+λ||z||1+⟨q,Fn(μ′−μ)+z′−z⟩ (3.10)   =||μ||TV+λ||z||1since Fnμ′+z′=Fnμ+z. (3.11) The existence of $$Q$$ thus implies that $$\left({\mu,\boldsymbol{z}}\right)$$ is a solution to Problem 2.7. In fact, as stated in Proposition 3.1, it implies that $$\left({\mu,\boldsymbol{z}}\right)$$ is the unique solution. The rest of this section is devoted to showing that a dual polynomial exists with high probability, as formalized by the following proposition. Proposition 3.2 (Existence of dual polynomial) Under the assumptions of Theorem 2.2, there exists a dual polynomial associated to $$\mu$$ and $$\boldsymbol{z}$$ with probability at least $$1-\epsilon$$. In order to simplify notation in the sequel, we define the vectors $$\boldsymbol{h} \in \mathbb{C}^{k}$$ and $$\boldsymbol{r} \in \mathbb{C}^{s}$$ and an integer $$m$$ such that   hj :=xj|xj|1≤j≤k, (3.12)  rl :=zl|zl|l∈Ω, (3.13)  m :={n−12if n is odd,n2−1if n is even.  (3.14) Applying a simple change of variable, we express $$Q$$ as   Q(f) =∑l=−mmqle−i2πlf. (3.15) In a nutshell, our goal is (1) to construct a polynomial of this form so that $$Q$$ interpolates $$\boldsymbol{h}$$ on $$T$$ and $$\boldsymbol{q}$$ interpolates $$\boldsymbol{r}$$ on $${\it{\Omega}}$$, and (2) to verify that the magnitude of $$Q$$ is strictly bounded by one on $$T^c$$ and the magnitude of $$\boldsymbol{q}$$ is strictly bounded by $$\lambda$$ on $${\it{\Omega}}^c$$. 3.2 Construction via interpolation We now take a brief detour to introduce a basic technique for the construction of dual polynomials. Consider the spectral super-resolution problem when the data are of the form $$\boldsymbol{\bar{y}} := \mathcal{F}_n \, \mu$$, i.e. when there are no outliers. A simple corollary to Proposition 3.1 is that the existence of a dual polynomial of the form   Q¯(f) =∑l=−mmq¯le−i2πlf (3.16) such that   Q¯(fj)=hj,∀fj∈T, (3.17)  |Q¯(f)|<1,∀f∈Tc, (3.18) implies that TV norm minimization achieves exact recovery in the absence of noise. In this section, we describe how to construct such a polynomial using interpolation. This technique was introduced in [12] to obtain guarantees for super-resolution under a minimum-separation condition. The basic idea is to use a kernel $$\bar{ K }$$ and its derivative $$\bar{ K}^{\left({1}\right)}$$ to interpolate $$\boldsymbol{h}$$ while forcing the derivative of the polynomial to equal zero on $$T$$. Setting the derivative to zero induces a local extremum, which ensures that the magnitude of the polynomial stays bounded below one in the vicinity of $$T$$ (see Fig. 11 in [38] for an illustration). More formally,   Q¯(f) :=∑j=1kα¯jK¯(f−fj)+κ∑j=1kβ¯jK¯(1)(f−fj), (3.19) where   κ:=1|K¯(2)(0)| (3.20) is the value of the second derivative of the kernel at the origin. This quantity will appear often in the proof to simplify notation. $$\boldsymbol{\bar{\alpha}} \in \mathbb{C}^{k}$$ and $$\boldsymbol{\bar{\beta}} \in \mathbb{C}^{k}$$ are coefficient vectors set so that   Q¯(fj) =hj,fj∈T, (3.21)  Q¯R(1)(fj)+iQ¯I(1)(fj) =0,fj∈T, (3.22) where $$\bar{ Q}_R^{\left({1}\right)}$$ denotes the real part of $$\bar{Q}^{\left({1}\right)}$$ and $$\bar{ Q}_I^{\left({1}\right)}$$ the imaginary part. In matrix form, $$\boldsymbol{\bar{\alpha}}$$ and $$\boldsymbol{\bar{\beta}}$$ are the solution to the system   [D¯0D¯1−D¯1D¯2][α¯β¯] =[h0], (3.23) where   (D¯0)jl=K¯(fj−fl),(D¯1)jl=κK¯(1)(fj−fl),(D¯2)jl=−κ2K¯(2)(fj−fl). (3.24) In [12], $$\bar{Q}$$ is shown to be a valid dual polynomial for a minimum separation equal to $$\frac{4}{n-1}$$ when the interpolation kernel is a squared Fejér kernel. The required minimum separation is sharpened to $$\frac{2.52}{n-1}$$ in [38] using a different kernel, which will be our choice for $$\bar{K}$$ in this paper. Consider the Dirichlet kernel of order $$\tilde{m} >0$$  Dm~(f):=12m~+1∑l=−m~m~ei2πlf={1if f=0sin⁡((2m~+1)πf)(2m~+1)sin⁡(πf)otherwise.  (3.25) Following [38], we define $$\bar{K}$$ as the product of three different Dirichlet kernels with different orders   K¯(f) :=D0.247m(f)D0.339m(f)D0.414m(f) (3.26)   =∑l=−mmclei2πlf, (3.27) where $$\boldsymbol{c} \in \mathbb{C}^{n}$$ is the convolution of the Fourier coefficients of the three Dirichlet kernels. The choice of the width of the three kernels might seem rather arbitrary; it is chosen to optimize the bound on the minimum separation by achieving a good trade-off between the spikiness of $$\bar{K}$$ in the vicinity of the origin and its asymptotic decay [38]. For simplicity, we assume that $${0.247} \, m$$, $${0.339} \, m$$ and $${0.414} \, m$$ are all integers.4Figure 3 shows $$\bar{K}$$ and its first derivative. Fig. 3. View largeDownload slide The top row shows the interpolating kernel $$K$$ and $$K^{\left({1}\right)}$$ compared with a scaled version of $$\bar{K}$$ and $$\bar{K}^{\left({1}\right)}$$. In the second row, we see the asymptotic decay of the magnitudes of both kernels and their derivatives. The left image in the bottom row illustrates the construction of $$K$$: the Fourier coefficients $$\boldsymbol{c}$$ of $$\bar{K}$$ that lie in $${\it{\Omega}}$$ are set to zero. On the right, we can see the Fourier coefficients of $$K^{\left({1}\right)}$$ and a scaled version of $$\bar{K}^{\left({1}\right)}$$. Fig. 3. View largeDownload slide The top row shows the interpolating kernel $$K$$ and $$K^{\left({1}\right)}$$ compared with a scaled version of $$\bar{K}$$ and $$\bar{K}^{\left({1}\right)}$$. In the second row, we see the asymptotic decay of the magnitudes of both kernels and their derivatives. The left image in the bottom row illustrates the construction of $$K$$: the Fourier coefficients $$\boldsymbol{c}$$ of $$\bar{K}$$ that lie in $${\it{\Omega}}$$ are set to zero. On the right, we can see the Fourier coefficients of $$K^{\left({1}\right)}$$ and a scaled version of $$\bar{K}^{\left({1}\right)}$$. We end the section with two lemmas bounding $$\kappa$$ and the magnitude of the coefficients of $$\boldsymbol{q}$$, which will be useful at different points of the proof. Lemma 3.3 If $$m \geq 10^3$$, the constant $$\kappa$$, defined by (3.20), satisfies   0.467m≤κ≤0.468m. (3.28) Proof. The bound follows from the fact that $$\mathcal{D}_{\tilde{m}}^{\left({2}\right)} \left({0}\right) := -4 \pi^2 \tilde{m} \left({1+\tilde{m}}\right)/3$$ and equation (C.19) in [38] (see also Lemma 4.8 in [38]). □ Lemma 3.4 (Proof in Section D) The coefficients of $$\bar{ K }$$ satisfy   ||c||∞ ≤1.3m. (3.29) 3.3 Interpolation with a random kernel The trigonometric polynomial $$\bar{Q}$$ defined in the previous section is not a valid certificate when outliers are present in the data; it does not satisfy (3.5) and (3.6). In order to adapt the construction so that it meets these conditions, we draw upon techniques developed in [66], which studies spectral super-resolution in a compressed-sensing scenario where a subset $$\mathcal{S}$$ of the samples is missing. To prove that TV norm minimization succeeds in such a scenario, the authors of [66] construct a bounded polynomial with coefficients restricted to the complement of $$\mathcal{S}$$, which interpolates the sign pattern of the line spectra on their support. This is achieved using an interpolation kernel with coefficients supported on $$\mathcal{S}^c$$. We denote our dual-polynomial candidate by $$Q$$. Let us begin by decomposing $$Q$$ into two components   Q(f) :=Qaux(f)+R(f), (3.30) such that the coefficients of the first component are restricted to $${\it{\Omega}}^c$$,   Qaux(f) :=∑l∈Ωcqle−i2πlf, (3.31) and the coefficients of the second component are restricted to $${\it{\Omega}}$$ and fixed to equal$$\lambda \boldsymbol{r}$$ (recall that $$\lambda = 1/\sqrt{n}$$),   R(f) :=1n∑l∈Ωrle−i2πlf. (3.32) This immediately guarantees that $$Q$$ satisfies (3.5). Now our task is to construct $$Q_{\mathrm{aux}}$$, so that $$Q$$ also meets the rest of conditions in Proposition 3.1. Following the interpolation technique described in Section 3.2, we constrain $$Q$$ to interpolate $$\boldsymbol{h}$$ and have zero derivative in $$T$$,   Q(fj) =hj,fj∈T, (3.33)  QR(1)(fj)+iQI(1)(fj) =0,fj∈T. (3.34) Given that $$R$$ is fixed, this is equivalent to   Qaux(fj) =hj−R(fj),fj∈T, (3.35)  (Qaux)R(1)(fj)+i(Qaux)I(1)(fj) =−RR(1)(fj)−iRI(1)(fj),fj∈T, (3.36) where the subscript $$R$$ indicates the real part of a number and the subscript $$I$$ the imaginary part. This interpolation problem is very similar to the one that arises in compressed sensing off the grid [66]: we must interpolate a certain vector with a polynomial whose coefficients are restricted to a certain subset, in our case $${\it{\Omega}}^c$$. Following [66] we employ an interpolation kernel $$K$$ obtained by selecting the coefficients of $$\bar{K}$$ in $${\it{\Omega}}^c$$,   K(f) :=∑l∈Ωcclei2πlf (3.37)   =∑l=−mmδΩc(l)clei2πlf, (3.38) where $$\delta_{{\it{\Omega}}^c}$$ is an indicator random variable that is equal to one if $$l \in {\it{\Omega}}^c$$ and to zero otherwise. Under the assumptions of Theorem 2.2, these are independent Bernoulli random variables with parameter $$\frac{n-s}{n}$$, so that the mean of $$K$$ is equal to a scaled version of $$\bar{K}$$,   E(K(f)) :=n−sn∑l=−mmclei2πlf (3.39)   =n−snK¯(f). (3.40)$$K$$ and its derivatives concentrate around $$\bar{K}$$ and its derivatives (scaled by $$\frac{n-s}{n}$$) near the origin, but they do not display the same asymptotic decay. This is illustrated in Fig. 3. Using $$K$$ and its first derivative $$K^{\left({1}\right)}$$ to construct $$Q_{\mathrm{aux}}$$ ensures that its non-zero coefficients are restricted to $${\it{\Omega}}^c$$. In more detail, $$Q_{\mathrm{aux}}$$ is a linear combination of shifted and scaled copies of $$K$$ and $$K^{\left({1}\right)}$$,   Qaux(f) :=∑j=1kαjK(f−fj)+κβjK(1)(f−fj), (3.41) where $$\boldsymbol{\alpha} \in \mathbb{C}^{k}$$ and $$\boldsymbol{\beta} \in \mathbb{C}^{k}$$ are chosen to satisfy (3.35) and (3.36). The corresponding system of equations (3.35) and (3.36) can be recast in matrix form:   [D0D1−D1D2][αβ]=[h0]−1nBΩr, (3.42) where   (D0)jl=K(fj−fl),(D1)jl=κK(1)(fj−fl),(D2)jl=−κ2K(2)(fj−fl). (3.43) Note that we have expressed the values of $$R$$ and $$R^{\left({1}\right)}$$ in $$T$$ in terms of $$\boldsymbol{r}$$,   1nBΩr=[R(f1)R(f2)⋯R(fk)−κR(1)(f1)−κR(1)(f2)⋯−κR(1)(fk)]T, (3.44) where   b(l) :=[e−i2πlf1e−i2πlf2⋯e−i2πlfki2πlκe−i2πlf1⋯i2πlκe−i2πlfk]T, (3.45)  BΩ :=[b(i1)b(i2)⋯b(is)],Ω={i1,i2,…is}. (3.46) Solving this system of equations yields $$\boldsymbol{\alpha}$$ and $$\boldsymbol{\beta}$$, and fixes the dual-polynomial candidate,   Q(f) :=∑j=1kαjK(f−fj)+κ∑j=1kβjK(1)(f−fj)+R(f) (3.47)   =v0(f)TD−1([h0]−1nBΩr)+R(f), (3.48) where we define   vℓ(f) :=κℓ[K(ℓ)(f−f1)⋯K(ℓ)(f−fk)κK(ℓ+1)(f−f1)⋯κK(ℓ+1)(f−fk)]T for $$\ell=0,1,2, \ldots$$ In the next section, we establish that a polynomial of this form is guaranteed to be a valid certificate with high probability. Figure 4 illustrates our construction for a specific example (note that for ease of visualization $$\boldsymbol{h}$$ is real instead of complex). Fig. 4. View largeDownload slide Illustration of our construction of a dual-polynomial candidate $$Q$$. The first row shows $$R$$, the component that results from fixing the coefficients of $$Q$$ in $${\it{\Omega}}$$ to equal $$\boldsymbol{r}$$. The second row shows $$Q_{\mathrm{aux}}$$, the component built to ensure that $$Q$$ interpolates $$\boldsymbol{h}$$ by correcting for the presence of $$R$$. On the right image of the second row, we see that the coefficients of $$Q_{\mathrm{aux}}$$ are indeed restricted to $${\it{\Omega}}^c$$. Finally, the last row shows that $$Q$$ satisfies all of the conditions in Proposition 3.1. Fig. 4. View largeDownload slide Illustration of our construction of a dual-polynomial candidate $$Q$$. The first row shows $$R$$, the component that results from fixing the coefficients of $$Q$$ in $${\it{\Omega}}$$ to equal $$\boldsymbol{r}$$. The second row shows $$Q_{\mathrm{aux}}$$, the component built to ensure that $$Q$$ interpolates $$\boldsymbol{h}$$ by correcting for the presence of $$R$$. On the right image of the second row, we see that the coefficients of $$Q_{\mathrm{aux}}$$ are indeed restricted to $${\it{\Omega}}^c$$. Finally, the last row shows that $$Q$$ satisfies all of the conditions in Proposition 3.1. Before ending this section, we record three useful lemmas concerning $$\boldsymbol{b}$$, $$B_{{\it{\Omega}}}$$ and $$\boldsymbol{v_{\ell}}$$. The first bounds the $$\ell_2$$ norm of $$\boldsymbol{b}$$. Lemma 3.5 If $$m \geq 10^3$$, for $$-m \leq l \leq m$$  ||b(l)||22≤10k. (3.49) Proof.   ||b(l)||22≤k(1+max−m≤l≤m(2πlκ)2) ≤9.65kby Lemma 3.3. (3.50) □ The second yields a bound on the operator norm of $$B_{{\it{\Omega}}}$$ that holds with high probability. Lemma 3.6 (Proof in Section E) Under the assumptions of Theorem 2.2, the event   EB :={‖BΩ‖>CB(log⁡nϵ)−12n}, (3.51) where $$C_{B}$$ is a numerical constant defined by (H.41), occurs with probability at most $$\epsilon / 5$$. The third allows to control the behavior of $$\boldsymbol{v_{\ell}}$$, establishing that it does not deviate much from   v¯ℓ(f) :=κℓ[K¯(ℓ)(f−f1)⋯K¯(ℓ)(f−fk)κK¯(ℓ+1)(f−f1)⋯κK¯(ℓ+1)(f−fk)]T on a fine grid with high probability. Lemma 3.7 (Proof in Section F) Let $$\mathcal{G} \subseteq \left[{0,1}\right]$$ be an equispaced grid with cardinality $$400 \, n^2$$. Under the assumptions of Theorem 2.2, the event   Ev:={||vℓ(f)−n−snv¯ℓ(f)||2>Cv(log⁡nϵ)−12,for all f∈G and ℓ∈{0,1,2,3}}, (3.52) where $$C_{\boldsymbol{v}}$$ is a numerical constant defined by (H.45), has probability bounded by $$\epsilon / 5$$. 3.4 Proof of Proposition 3.2 This section summarizes the remaining steps to establish that our proposed construction yields a valid certificate. A detailed description of each step is included in the Appendix. First, we show that the system of equations (3.42) has a unique solution with high probability, so that $$Q$$ is well defined. To alleviate notation, let   D :=[D0D1−D1D2],D¯:=[D¯0D¯1−D¯1D¯2]. (3.53) The following result implies that $$D$$ concentrates around a scaled version of $$\bar{D}$$. As a result, it is invertible, and we can bound the operator norm of its inverse leveraging results from [38]. Lemma 3.8 (Proof in Section G) Under the assumptions of Theorem 2.2, the event   ED :={‖D−n−snD¯‖≥n−s4nmin{1,CD4(log⁡nϵ)−12}} (3.54) occurs with probability at most $$\epsilon / 5$$. In addition, within the event $$\mathcal{E}_{D}^c$$, $$D$$ is invertible and   ‖D−1‖ ≤8, (3.55)  ‖D−1−nn−sD¯−1‖ ≤CD(log⁡nϵ)−12, (3.56) where $$C_{D}$$ is a numerical constant defined by (H.49). An immediate consequence of the lemma is that there exists a solution to the system (3.42) and therefore (3.5) holds as long as $$\mathcal{E}_{D}^c$$ occurs. Corollary 3.9 In $$\mathcal{E}_{D}^c$$, $$Q$$ is well defined and $$Q\left({ f_j }\right) = \boldsymbol{h}_j$$ for all $$f_j \in T$$. All that remains is to establish that $$Q$$ meets conditions (3.6) and (3.6); recall that (3.5) is satisfied by construction. To prove (3.6), we apply a technique from [66]. We first show that $$Q$$ and its derivatives concentrate around $$\bar{Q}$$ and its derivatives, respectively, on a fine grid. Then we leverage Bernstein’s inequality to demonstrate that both polynomials and their respective derivatives are close on the whole unit interval. Finally, we borrow some bounds on $$\bar{Q}$$ and its second derivative from [38] to complete the proof. The details can be found in Section H of the Appendix. Proposition 3.10 (Proof in Section H) Conditioned on $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$  |Q(f)|<1for all f∈Tc, (3.57) with probability at least $$1-\epsilon/5$$ under the assumptions of Theorem 2.2. Finally, the following proposition establishes that the remaining condition (3.6) holds in $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$ with high probability. The proof uses Hoeffding’s inequality combined with Lemmas 3.8 and 3.9 to control the magnitude of the coefficients of $$\boldsymbol{q}$$. Proposition 3.11 (Proof in Section I) Conditioned on $$\mathcal{E}_{B}^{c} \cap \mathcal{E}_{D}^{c} \cap \mathcal{E}_{v}^{c}$$  |ql| <1nfor all l∈Ωc, (3.58) with probability at least $$1-\epsilon/5$$ under the assumptions of Theorem 2.2. Now, to complete the proof, let us define $$\mathcal{E}_{Q}$$ to be the event that (3.6) holds and $$\mathcal{E}_{q}$$ the event that (3.6) holds. Applying De Morgan’s laws, the union bound and the fact that for any pair of events $$\mathcal{E}_A$$ and $$\mathcal{E}_B$$  P(EA)≤P(EA|EBc)+P(EB), (3.59) we have   P((EQ∩Eq)c) =P(EQc∪Eqc) (3.60)   ≤P(EQc∪Eqc|EBc∩EDc∩Evc)+P(EB∪ED∪Ev) (3.61)   ≤P(EQc|EBc∩EDc∩Evc)+P(Eqc|EBc∩EDc∩Evc)+P(EB)+P(ED)+P(Ev) (3.62)   ≤ϵ (3.63) by Lemmas 3.6, 3.7 and 3.8 and Propositions 3.10 and 3.11. We conclude that our construction yields a valid certificate with probability at least $$1-\epsilon$$. 4. Algorithms In this section, we discuss how to implement the techniques described in Section 2. In addition, we introduce a greedy demixing method that yields good empirical results. Matlab code implementing all the algorithms presented below is available in the Supplementary Material. The code allows to reproduce the figures in this section, which illustrate the performance of the different approaches through a running example. 4.1 Demixing via semi-definite programming The main obstacle to solving Problem (2.7) is that the primal variable $$\tilde{\mu}$$ is infinite dimensional. One could tackle this issue by discretizing the possible support of $$\tilde{\mu}$$ and replacing its TV norm by the $$\ell_1$$ norm of the corresponding vector [67]. Here, we present an alternative approach, originally proposed in [38], which solves the infinite-dimensional optimization problem directly without resorting to discretization. The approach, inspired by a method for TV norm minimization [12] (see also [4]), relies on the fact that the dual of Problem (2.7) can be recast as a finite-dimensional SDP. To simplify notation, we introduce the operator $$\mathcal{T}$$. For any vector $$\boldsymbol{u}$$ whose first entry $$\boldsymbol{u}_1$$ is positive and real, $$\mathcal{T}\left({\boldsymbol{u}}\right)$$ is a Hermitian Toeplitz matrix whose first row is equal to $$\boldsymbol{u}^T$$. The adjoint of $$\mathcal{T}$$ with respect to the usual matrix inner product $$\left \langle{M_1}, {M_2}\right \rangle=\text{Tr}\left({M_1^{\ast}M_2}\right)$$ extracts the sums of the diagonal and of the different off-diagonal elements of a matrix   T∗(M)j=∑i=1n−j+1Mi,i+j−1. (4.1) Lemma 4.1 The dual of Problem (2.7) is   maxη∈Cn⟨y,η⟩subject to||Fn∗η||∞≤1, (4.2)  ||η||∞≤λ, (4.3) where the inner product is defined as $$\left \langle{ \boldsymbol{y}}, { \boldsymbol{\eta}}\right \rangle : = \mathrm{Re}\left({\boldsymbol{y}^{\ast}\boldsymbol{\eta}}\right)$$. This problem is equivalent to the SDP   maxη∈Cn,Λ∈Cn×n⟨y,η⟩subject to[Ληη∗1]⪰0,T∗(Λ)=[10],||η||∞≤λ, (4.4) where $$\boldsymbol{0} \in \mathbb{C}^{n-1}$$ is a vector of zeros. Lemma 4.1, which follows from Lemma 4.3 below, shows that it is tractable to compute the $$n$$-dimensional solution to the dual of Problem (2.7). However, our goal is to obtain the primal solution, which represents the estimate of the line spectrum and the sparse corruptions. The following lemma, which is a consequence of Lemma 4.4, establishes that we can decode the support of the primal solution from the dual solution. Lemma 4.2 Let   μ^ =∑fj∈T^x^jδ(f−fj), (4.5) and $$\boldsymbol{\hat{z}}$$ be a solution to (2.7), such that $$\widehat{T}$$ and $$\widehat{{\it{\Omega}}}$$ are the non-zero supports of the line spectrum $$\hat{\mu}$$ and the spikes $$\boldsymbol{\hat{z}}$$, respectively. If $$\boldsymbol{ \hat{\eta} } \in \mathbb{C}^n$$ is a corresponding dual solution, then for any $$f_j$$ in $$\widehat{T}$$  (Fn∗η^)(fj)=x^j|x^j| (4.6) and for any $$l$$ in $$\widehat{{\it{\Omega}}}$$  η^l=λz^l|z^l|. (4.7) In other words, the weighted dual solution $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ and the corresponding polynomial $$\mathcal{F}_{n}^{\ast} \, \boldsymbol{ \hat{\eta} }$$ interpolate the sign patterns of the primal-solution components $$\boldsymbol{\hat{z}}$$ and $$\hat{\mu}$$ on their respective supports, as illustrated in the top row of Fig. 5. This suggests estimating the support of the line spectrum and the outliers in the following way. 1. Solve (4.4) to obtain a dual solution $$\boldsymbol{ \hat{\eta} }$$ and compute $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$. 2. Set the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$ to the set of points where $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ equals $$\lambda$$. 3. Set the estimated support of the line spectrum $$\widehat{T}$$ to the set of points where $$\left|{ \mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} } }\right|$$ equals one. 4. Estimate the amplitudes of $$\hat{\mu}$$ and $$\boldsymbol{\hat{\eta}}$$ on $$\hat{T}$$ and $$\hat{{\it{\Omega}}}$$, respectively by solving a system of linear equations $$\boldsymbol{y} = \mathcal{F}_n \hat{\mu} + \hat{\boldsymbol{\eta}}$$. Fig. 5. View largeDownload slide Demixing of the signal in Fig. 1 by semi-definite programming. Top left: the polynomial $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$ (light red), where $$\boldsymbol{ \hat{\eta} }$$ is a solution of Problem (4.4), interpolates the sign of the line spectrum of the sines (dashed red) on their support. Top right: $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ interpolates the sign pattern of the spikes on their support. Bottom: locating the support of $$\mu$$ and $$\boldsymbol{z}$$ allows to demix very accurately (the circular markers represent the original spectrum of the sines and the original spikes and the crosses the corresponding estimates). The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$. Fig. 5. View largeDownload slide Demixing of the signal in Fig. 1 by semi-definite programming. Top left: the polynomial $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$ (light red), where $$\boldsymbol{ \hat{\eta} }$$ is a solution of Problem (4.4), interpolates the sign of the line spectrum of the sines (dashed red) on their support. Top right: $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ interpolates the sign pattern of the spikes on their support. Bottom: locating the support of $$\mu$$ and $$\boldsymbol{z}$$ allows to demix very accurately (the circular markers represent the original spectrum of the sines and the original spikes and the crosses the corresponding estimates). The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$. Figure 5 shows the results obtained by this method on the data described in Fig. 1: both components are recovered very accurately. However, we caution the reader that while the primal solution $$(\hat{\mu}, \hat{\boldsymbol{z}})$$ is generally unique, the dual solutions are non-unique, and some of the dual solutions might produce spurious frequencies and spikes in Steps 2 and 3. In fact, the dual solutions form a convex set, and only those in the interior of this convex set give exact supports $$\hat{{\it{\Omega}}}$$ and $$\hat{T}$$, while those on the boundary generate spurious estimates. When the SDP (4.4) is solved using interior point algorithms as the case in CVX, a dual solution in the interior is returned, generating correct supports as shown in Fig. 5. Refer to [66] for a rigorous treatment of this topic for the related missing-data case. Such technical complication will not seriously affect our estimates of the supports since the amplitudes inferred in Step 4 will be zero for the extra frequencies and spikes, providing a means to eliminate them. 4.2 Demixing in the presence of dense perturbations As described in Section 2.5, our demixing method can be adapted to the presence of dense noise in the data by relaxing the equality constraint in Problem 2.7 to an inequality constraint. The only effect on the dual of the optimization problem, which can still be reformulated as an SDP, is an extra term in the cost function. Lemma 4.3 (Proof in Section J.1) The dual of Problem (2.19) is    maxη∈Cn⟨y,η⟩ −σ||η||2 (4.8)  subject to||Fn∗η||∞≤1, (4.9)  ||η||∞≤λ. (4.10) This problem is equivalent to the SDP   maxη∈Cn,Λ∈Cn×n⟨y,η⟩−σ||η||2subject to[Ληη∗1]⪰0, (4.11)  T∗(Λ)=[10], (4.12)  ||η||∞≤λ, (4.13) where $$\boldsymbol{0} \in \mathbb{C}^{n-1}$$ is a vector of zeros. As in the case without dense noise, the support of the primal solution of Problem (2.19) can be decoded from the dual solution. This is justified by the following lemma, which establishes that the weighted dual solution $$\lambda^{-1} \boldsymbol{ \hat{\eta} }$$ and the corresponding polynomial $$\mathcal{F}_{n}^{\ast} \, \boldsymbol{ \hat{\eta} }$$ interpolate the sign patterns of the primal-solution components $$\boldsymbol{\hat{z}}$$ and $$\hat{\mu}$$ on their respective supports. Lemma 4.4 (Proof in Section J.2) Let   μ^ =∑fj∈T^x^jδ(f−fj), (4.14) and $$\boldsymbol{\hat{z}}$$ be a solution to (2.19), such that $$\widehat{T}$$ and $$\widehat{{\it{\Omega}}}$$ are the non-zero supports of the line spectrum $$\hat{\mu}$$ and the spikes $$\boldsymbol{\hat{z}}$$, respectively. If $$\boldsymbol{ \hat{\eta} } \in \mathbb{C}^n$$ is a corresponding dual solution, then for any $$f_j$$ in $$\widehat{T}$$  (Fn∗η^)(fj)=x^j|x^j| (4.15) and for any $$l$$ in $$\widehat{{\it{\Omega}}}$$  η^l=λz^l|z^l|. (4.16) Figure 6 shows the magnitude of the dual solutions for different values of additive noise. Motivated by the lemma, we propose to estimate the support of the outliers using $$\boldsymbol{ \hat{\eta} }$$ and the support of the spectral lines using $$\left|{\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }}\right|$$. Our method to perform spectral super-resolution in the presence of outliers and dense noise consequently consists of the following steps: 1. Solve (4.11) to obtain a dual solution $$\boldsymbol{ \hat{\eta} }$$ and compute $$\mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} }$$. 2. Set the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$ to the set of points where $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ equals $$\lambda$$. 3. Set the estimated support of the spectrum $$\widehat{T}$$ to the set of points where $$\left|{ \mathcal{F}_n^{\ast} \, \boldsymbol{ \hat{\eta} } }\right|$$ equals one. 4. Estimate the amplitudes of $$\hat{\mu}$$ by solving a least-squares problem using only the data that do not lie in the estimated support of the spikes $$\widehat{{\it{\Omega}}}$$. Fig. 6. View largeDownload slide The left column shows the magnitude of the solution to Problem (B.5) (top row) and to Problem 4.8 for different noise levels (second and third rows). $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ is represented by red lines. Additionally, the support of the sparse perturbation $$\boldsymbol{z}$$ is marked in blue. The right column shows the trigonometric polynomial corresponding to the dual solutions in red, as well as the support of the spectrum of the multisinusoidal components in blue. The data are the same as in Fig. 1 (except for the added noise, which is i.i.d. Gaussian). The parameters $$\lambda$$ and $$\sigma$$ are set to $$1/\sqrt{n}$$ and $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$, respectively. Note that in practice, the value of the noise level would have to be estimated, for example by cross validation. Fig. 6. View largeDownload slide The left column shows the magnitude of the solution to Problem (B.5) (top row) and to Problem 4.8 for different noise levels (second and third rows). $$\left|{\boldsymbol{ \hat{\eta} }}\right|$$ is represented by red lines. Additionally, the support of the sparse perturbation $$\boldsymbol{z}$$ is marked in blue. The right column shows the trigonometric polynomial corresponding to the dual solutions in red, as well as the support of the spectrum of the multisinusoidal components in blue. The data are the same as in Fig. 1 (except for the added noise, which is i.i.d. Gaussian). The parameters $$\lambda$$ and $$\sigma$$ are set to $$1/\sqrt{n}$$ and $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$, respectively. Note that in practice, the value of the noise level would have to be estimated, for example by cross validation. Figure 7 shows the result of applying our method to data that includes additive i.i.d. Gaussian noise with a signal-to-noise ratio (SNR) of 30 and 15 dB. Despite the presence of the dense noise, our method is able to detect all spectral lines at 30 dB and all but one at 15 dB. Additionally, it is capable of detecting most of the spikes correctly: at 30 dB it detects a spurious spike and at 15 dB it misses one. Note that the spike that is not detected when the SNR is 15 dB has a magnitude small enough for it to be considered part of the dense noise. Fig. 7. View largeDownload slide The top row shows the results of applying SDP-based spectral super-resolution in the presence of both dense noise and outliers (bottom row) for two different dense noise levels (left and right columns). The second row shows the magnitude of the data, the location of the outliers and the outlier estimate produced by the method. In the bottom row, we can see the magnitude of the sparse and dense noise (note that when the SNR is 15 dB, the smallest sparse-noise components is below the dense noise level). The signal is the same as in Fig. 1, and the data are the same as in Fig. 6. The parameter $$\sigma$$ is set to $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$ and $$\lambda$$ is set to $$1/\sqrt{n}$$. Fig. 7. View largeDownload slide The top row shows the results of applying SDP-based spectral super-resolution in the presence of both dense noise and outliers (bottom row) for two different dense noise levels (left and right columns). The second row shows the magnitude of the data, the location of the outliers and the outlier estimate produced by the method. In the bottom row, we can see the magnitude of the sparse and dense noise (note that when the SNR is 15 dB, the smallest sparse-noise components is below the dense noise level). The signal is the same as in Fig. 1, and the data are the same as in Fig. 6. The parameter $$\sigma$$ is set to $$1.5 \, \left|\left|{\boldsymbol{w}}\right|\right| _{2}$$ and $$\lambda$$ is set to $$1/\sqrt{n}$$. 4.3 Greedy demixing enhanced by local non-convex optimization In this section, we propose an alternative method for spectral super-resolution in the presence of outliers, which is significantly faster than the SDP-based approach described in the previous sections. In the spirit of matching-pursuit methods [47,51], the algorithm selects the spectral lines of the signal and the locations of the outliers in a greedy fashion. This is equivalent to choosing atoms from a dictionary of the form   D:={a(f,0),f∈[0,1]}∪{e(l),1≤l≤n}. (4.17) The dictionary includes the multisinusoidal atoms $$\boldsymbol{a} \left({ f, 0 }\right)$$ defined in (2.20) and $$n$$spiky atoms $$\boldsymbol{e}\left({l}\right) \in \mathbb{R}^{n}$$, which are equal to the one-sparse standard basis vectors. By (2.23), if the data $$\boldsymbol{y}$$ are of the form (2.3) then they have a $$\left({k+s}\right)$$-sparse representation in terms of the atoms in $$\mathcal{D}$$. Greedy demixing aims to find this sparse representation iteratively. Inspired by recent work on atomic-norm minimization based on the conditional-gradient method [8,52,53], our greedy-demixing procedure includes selection, pruning and local-optimization steps (see also [34,35,61], for spectral super-resolution algorithms that leverage a local optimization step similar to ours). 1. Initialization: The residual $$\boldsymbol{r} \in \mathbb{C}^{n}$$ is initialized to equal the data vector $$\boldsymbol{y}$$. The sets of estimated spectral lines $$\widehat{T}$$ and spikes $$\widehat{{\it{\Omega}}}$$ are initialized to equal the empty set. 2. Selection: At each iteration we compute the atom in $$\mathcal{D}$$ that has the highest correlation with the current residual $$\boldsymbol{r}$$ and update either $$\widehat{T}$$ or $$\widehat{{\it{\Omega}}}$$. For the spiky atoms the correlation is just equal to $$\left|\left|{\boldsymbol{r}}\right|\right| _{\infty}$$. For the sinusoidal atoms, we compute the highest correlation by first determining the location $$f_{\mathrm{grid}}$$ of the maximum of the function $$\mathrm{{corr}}\left({f}\right):= \left|{\left \langle{\boldsymbol{a}\left({f,0}\right)}, {\boldsymbol{r}}\right \rangle}\right|$$ on a fine grid, which can be done efficiently by computing an oversampled fast Fourier transform, and then finding a local minimum of the function $$\mathrm{{corr}}\left({f}\right)$$ using a local search method initialized at $$f_{\mathrm{grid}}$$. 3. Pruning: After adding a new atom to $$\widehat{T}$$ or $$\widehat{{\it{\Omega}}}$$, we compute the coefficients corresponding to the selected atoms using a least squares fit. We then remove any atoms whose corresponding coefficients are smaller than a threshold $$\tau > 0$$. 4. Local optimization: We fix the number of selected sinusoidal atoms $$\hat{k}:=|\widehat{T}|$$, and optimize their locations to update $$\widehat{T}$$ by finding a local minimum of the least squares cost function   ls(f1,…,fk^):=minx^∈Ck^,z^∈C|Ω^|||y−n∑j=1k^x^ja(fj,0)−∑l∈Ω^z^le(l)||2, (4.18) using a local search method5 initialized at the current estimate $$\widehat{T}$$. Alternatively, one can use other methods such as gradient descent to find a local minimum of the non-convex function. 5. The residual is updated by computing the coefficients corresponding to the currently selected atoms using least squares and subtracting the resulting approximation from $$\boldsymbol{y}$$. This algorithm can be applied without any modification to data that are perturbed by dense noise. In Figs 8 and 9, we illustrate the performance of the method on the same data used in Figs 5 and 7. Figure 8 shows what happens if we omit the local optimization step: the algorithm does not yield exact demixing even in the absence of dense noise. In contrast, in Fig. 9, we see that greedy demixing combined with local optimization recovers the two mixed components exactly when no additional noise perturbs the data. In addition, the procedure is robust to the presence of dense noise, as shown in the last two rows of Fig. 9. Fig. 8. View largeDownload slide Greedy demixing without a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6 and 7. The thresholding parameter $$\tau$$ is set depending on the noise level: at 30 dB and in the absence of dense noise it is set small enough not to eliminate the spectral line with the smallest coefficient in the pruning step, whereas at 15 dB, it is set so as not to discard the spectral line with the second smallest coefficient. Fig. 8. View largeDownload slide Greedy demixing without a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6 and 7. The thresholding parameter $$\tau$$ is set depending on the noise level: at 30 dB and in the absence of dense noise it is set small enough not to eliminate the spectral line with the smallest coefficient in the pruning step, whereas at 15 dB, it is set so as not to discard the spectral line with the second smallest coefficient. Fig. 9. View largeDownload slide Greedy demixing with a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6–8. The thresholding parameter $$\tau$$ is set as described in the caption of Fig. 8}. Fig. 9. View largeDownload slide Greedy demixing with a local optimization step. The signal is the same as in Fig. 1, and the noisy data are the same as in Figs 6–8. The thresholding parameter $$\tau$$ is set as described in the caption of Fig. 8}. Intuitively, the greedy method is not able to achieve exact recovery, because it optimizes the position of each spectral line one by one, eventually not being able to make further progress. The local optimization step refines the fit by optimizing over the positions of the spectral lines simultaneously. This succeeds when the initialization is close enough to a good local minimum of the cost function. Our experiments seem to indicate that the greedy scheme provides such an initialization. As illustrated in Fig. 10, the greedy scheme is significantly faster than the SDP-based approach described earlier. These preliminary empirical results show the potential of coupling greedy approaches with local non-convex optimization. Establishing guarantees for such demixing procedures is an interesting research direction. Fig. 10. View largeDownload slide Comparison of average running times for the SDP-based demixing approach described in Section 4.1 and greedy demixing with a local optimization step over 10 tries (the error bars show 95% confidence intervals). The number of spectral lines and of outliers equal $$10$$. The amplitudes of both components are i.i.d. Gaussian. The minimum separation of the spectral lines is $$2.8/(n+1)$$. Both algorithms achieve exact recovery in all instances. The experiments were carried out on a laptop with an Intel Core i5-5300 CPU 2.3GHz and 12G RAM. Fig. 10. View largeDownload slide Comparison of average running times for the SDP-based demixing approach described in Section 4.1 and greedy demixing with a local optimization step over 10 tries (the error bars show 95% confidence intervals). The number of spectral lines and of outliers equal $$10$$. The amplitudes of both components are i.i.d. Gaussian. The minimum separation of the spectral lines is $$2.8/(n+1)$$. Both algorithms achieve exact recovery in all instances. The experiments were carried out on a laptop with an Intel Core i5-5300 CPU 2.3GHz and 12G RAM. 4.4 Atomic-norm denoising In this section, we discuss how to implement the atomic-norm based denoising procedure described in Section 2.6. Our method relies on the fact that the atomic norm has a semi-definite characterization when the dictionary contains sinusoidal atoms of the form (2.20). This is established in the following proposition, which we borrow from [4,66]. Proposition 4.5 (Proposition 2.1 [66], [4]) For $$\boldsymbol{g} \in \mathbb{C}^{n}$$  ||g||A=inft∈R,u∈Cn{nu1+t2:[T(u)gg∗t]⪰0}, (4.19) where the operator $$\mathcal{T}$$ is defined in Section 4.1. This result allows us to rewrite (2.24) as the SDP   mint∈R,u∈Cn,g~∈Cn,z~∈Cnnu1+t2n+λ||z~||1subject to[T(u)g~g~∗t]⪰0, (4.20)  g~+z~=y, (4.21) which is precisely the dual program of (4.4). Similarly, Problem (2.27) can be reformulated as the SDP   mint∈R,u∈Cn,g~∈Cn,z~∈Cnnu1+t2n+λ||z~||1+γ2||y−g~−z~||22subject to[T(u)g~g~∗t]⪰0. (4.22) This problem can be solved efficiently using the alternating direction method of multipliers [4] (see also [4] for a similar implementation of SDP-based atomic-norm denoising for the case without outliers), as described in detail in Section J.3 of the Appendix. Figure 11 shows the results of applying this method to denoise the data used in Figs 7–9. In the absence of dense noise, the approach yields perfect denoising (not shown in the figure). When dense noise perturbs the data, the method is still able to perform effective denoising, correcting for the presence of the outliers. Fig. 11. View largeDownload slide Denoising via atomic-norm minimization in the presence of both outliers and dense noise. The signal is the same as in Fig. 1 and the data are the same as in Figs 6 and 7. The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$, whereas $$\gamma$$ is set to $$1/\left|\left|{w}\right|\right| _{2}$$ (in practice, we would have to estimate the noise level or set the parameter via cross validation). Fig. 11. View largeDownload slide Denoising via atomic-norm minimization in the presence of both outliers and dense noise. The signal is the same as in Fig. 1 and the data are the same as in Figs 6 and 7. The parameter $$\lambda$$ is set to $$1/\sqrt{n}$$, whereas $$\gamma$$ is set to $$1/\left|\left|{w}\right|\right| _{2}$$ (in practice, we would have to estimate the noise level or set the parameter via cross validation). 5. Numerical Experiments 5.1 Demixing via semi-definite programming In this section, we investigate the performance of the method described in Section 4.1. To do this, we apply the SDP-based approach described in Section 4.1 to data of the form (2.3) varying the different parameters of interest. Fixing either the number of spectral lines $$k$$ or the number of outliers $$s$$ allows us to visualize the performance of the method for a range of values of the line spectrum’s minimum separation $${\it{\Delta}}$$ (defined by (2.5)). The results are shown in Fig. 12. We observe that in every instance there is a rapid-phase transition between the values at which the method always achieves exact demixing and the values at which it fails. The minimum separation at which this phase transition takes place is between $$1/{\left( n-1 \right)\!}$$ and $$2/{\left( n-1 \right)\!}$$, which is smaller than the minimum-separation required by Theorem 2.2. We conjecture that if we allow for arbitrary sign patterns, the phase transition would occur near $$2/{\left( n-1 \right)\!}$$. In fact, if we constrain the amplitudes of the spectral lines to be real instead of complex, the phase transition occurs at a higher minimum separation, as shown in [38, Fig. 7]. Fig. 12. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ (left column) and outliers $$s$$ (right column), as well as different values of the minimum separation of the spectral lines. Each row shows results for a different number of measurements. The value of the regularization parameter $$\lambda$$ is 0.1 for the left column and 0.15 for the second column. The simulations are carried out using CVX [39]. Fig. 12. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ (left column) and outliers $$s$$ (right column), as well as different values of the minimum separation of the spectral lines. Each row shows results for a different number of measurements. The value of the regularization parameter $$\lambda$$ is 0.1 for the left column and 0.15 for the second column. The simulations are carried out using CVX [39]. In order to investigate the effect of the regularization parameter on the performance of the algorithm, we fix $${\it{\Delta}}$$ and perform demixing for different values of $$k$$ and $$s$$. The results are shown in Fig. 13. As suggested by Lemma 2.3, for fixed $$s$$ the method succeeds for all values of $$k$$ below a certain limit, and vice versa when we vary $$s$$. Since $$\lambda$$ weighs the effect of the terms that promote sparsity of the two different components in our mixture model, it is no surprise that varying it affects the trade-off between the number of spectral lines and of spikes that we can demix. For smaller $$\lambda$$ the sparsity-inducing term affecting the multisinusoidal component is stronger, so the method succeeds for mixtures with smaller $$k$$ and larger $$s$$. Analogously, for larger $$\lambda$$ the sparsity-inducing term affecting the outlier component is stronger, so the method succeeds for mixtures with larger $$k$$ and smaller $$s$$. Fig. 13. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ and outliers $$s$$. The minimum separation of the spectral lines is $$2 / (n-1)$$. Each column shows results for a different value of the regularization parameter $$\lambda$$. Each row shows results for a different number of measurements $$n$$. The simulations are carried out using CVX [39]. Fig. 13. View largeDownload slide Graphs showing the fraction of times Problem (2.7) achieves exact demixing over 10 trials with random signs and supports for different numbers of spectral lines $$k$$ and outliers $$s$$. The minimum separation of the spectral lines is $$2 / (n-1)$$. Each column shows results for a different value of the regularization parameter $$\lambda$$. Each row shows results for a different number of measurements $$n$$. The simulations are carried out using CVX [39]. 5.2 Comparison with matrix-completion based denoising In this section, we compare the SDP-based atomic-norm denoising method described in Section 4.4 to the matrix-completion based denoising method from [24]. Both algorithms are implemented using CVX [39] and applied to data following model (2.23). In general, we observe that both methods either succeed, achieving extremely small errors (the relative MSE6 is smaller than $$10^{-8}$$), or fail, producing very large errors. We compare the performance by recording whether the methods succeed or fail in denoising randomly generated signals for a different number of spectral lines $$k$$ and outliers $$s$$. To provide a more complete picture, we repeat the simulations for different values of the regularization parameters ($$\lambda$$ for atomic-norm denoising and $$\theta$$ for matrix-completion denoising) that govern the sparsity-inducing terms of the corresponding optimization problems. The values of $$\lambda$$ and $$\theta$$ are chosen separately to yield the best possible performance. Figure 14 shows the results. We observe that atomic-norm denoising consistently outperforms matrix-completion denoising across regimes in which the methods achieve different trade-offs between the values of $$k$$ and $$s$$. In addition, atomic-norm denoising is faster: the average running time for each trial is 3.25 seconds with a standard deviation of 0.30 s, whereas the average running time for the matrix-completion approach is of 11.1 s with a standard deviation of 1.32 s. The experiments were carried out on an Intel Xeon desktop computer with a 3.5 GHz CPU and 24 GB of RAM. Fig. 14. View largeDownload slide Graphs showing the fraction of times Problem (2.7) (top row), and the matrix-completion approach from [24] (bottom row) achieve exact denoising for different values of their respective regularization parameters over 10 trials with random signs and supports. The minimum separation of the spectral lines is $$2 / (n-1)$$ and the number of data is $$n=61$$. The simulations are carried out using CVX [39]. Fig. 14. View largeDownload slide Graphs showing the fraction of times Problem (2.7) (top row), and the matrix-completion approach from [24] (bottom row) achieve exact denoising for different values of their respective regularization parameters over 10 trials with random signs and supports. The minimum separation of the spectral lines is $$2 / (n-1)$$ and the number of data is $$n=61$$. The simulations are carried out using CVX [39]. 6. Conclusion and future research directions In this work, we propose an optimization-based method for spectral super-resolution in the presence of outliers and characterize its performance theoretically. In addition, we describe how to implement the approach using semi-definite programming, discuss its connection to atomic-norm denoising and present a greedy demixing algorithm with a promising empirical performance. Our results suggest the following directions for future research. Proving a result similar to Theorem 2.2 without the assumption that the phases of the different components are random. This would require showing that the dual-polynomial construction in Section 3.3 is valid, without leveraging the concentration bounds that we use for our proof. It is unclear whether this is possible because the interpolation kernel $$K$$ does not display a good asymptotic decay, as shown in Fig. 3. Note that if the amplitudes of the sparse noise $${\boldsymbol{z}}$$ are constrained to be real, then a derandomization argument similar to the one in [14, Theorem 2.1] allows to establish the same guarantees as Theorem 2.2 for a sparse perturbation that has an arbitrary deterministic sign pattern. Deriving guarantees for spectral super-resolution via the approach described in Section 2.5 in the presence of dense and sparse noise. To achieve this, one could combine our dual polynomial construction with the techniques developed in [13,37,65]. In addition, it would be interesting to investigate the application of the method when the level of dense noise is unknown, as in [10]. Developing fast algorithms to solve the SDPs in Sections 4.1 and 4.2. We have found that alternating direction method of multipliers (ADMM) is effective for denoising, but the dual variable converges too slowly for it to be effective in super-resolving the line spectrum. Investigating whether greedy demixing techniques, like the one in Section 4.3, can achieve the same performance as our convex-programming approach both empirically and theoretically. Considering other structured noise models, beyond sparse perturbations, which could be learnt from data by leveraging techniques such as dictionary learning [46,50]. For instance, this could allow to deal with recurring interferences in radar applications. Supplementary Materials Code to replicate the experiments in the paper is available at IMAIAI online. Funding National Science Foundation (DMS-1616340 to C.F., CCF-1464205 to G.T.). Appendix A. Proof of Lemma 2.3 For any vector $${\boldsymbol{u}}$$ and any atomic measure $$\nu$$, we denote by $${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ and $$\nu_{{\mathcal{{S}}}}$$ the restriction of $${\boldsymbol{u}}$$ and $$\nu$$ to the subset of their support indexed by a set $${\mathcal{{S}}}$$. Let $${\left\{ {\hat{\mu},{\boldsymbol{{ \hat{z} }}}}\right\}\!}$$ be any solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$. The pair $${\left\{ {\hat{\mu}+\mu_{T/T'},{\boldsymbol{{ \hat{z} }}}+{\boldsymbol{ z }}_{{\it{\Omega}}/{\it{\Omega}}'}}\right\}\!}$$ is feasible for Problem (2.7) applied to $${\boldsymbol{y}}$$ since   Fnμ^+FnμT/T′+z^+zΩ/Ω′ =y′+FnμT/T′+zΩ/Ω′ (A.1)   =Fnμ′+FnμT/T′+z′+zΩ/Ω′ (A.2)   =Fnμ+z (A.3)   =y. (A.4) By the triangle inequality and the assumption that $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$ is the unique solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$, this implies   ||μ||TV+λ||z||1 <||μ^+μT/T′||TV+λ||z^+zΩ/Ω′||1 (A.5)   ≤||μ^||TV+||μ^T/T′||TV+λ||z^||1+λ||zΩ/Ω′||1, (A.6) unless $$\hat{\mu}+\mu_{T/T'} = \mu$$ and $${\boldsymbol{{ \hat{z} }}}+{\boldsymbol{z}}_{{\it{\Omega}}/{\it{\Omega}}'} = {\boldsymbol{z}}$$, so that   ||μ′||TV+λ||z′||1 =||μ||TV−||μT/T′||TV+λ||z||1−λ||zΩ/Ω′||1 (A.7)   <||μ^||TV+λ||z^||1, (A.8) unless $$\hat{\mu} = \mu$$ and $${\boldsymbol{{ \hat{z} }}} = {\boldsymbol{z'}}$$. We conclude that $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ must be the unique solution to Problem (2.7) applied to $${\boldsymbol{y'}}$$. Appendix B. Atomic-norm denoising B.1 Proof of Lemma 2.4 We define a scaled dual norm $$\|\cdot\|_{{\mathcal{A}}'} := \|\cdot\|_{\mathcal{A}} / \sqrt{n}$$. The dual norm of $$\|\cdot\|_{{\mathcal{A}}'}$$ is   ‖η‖A′∗ =sup||g~||A≤n⟨η,g~⟩ (B.1)   =supϕ∈[0,2π),f∈[0,1]⟨η,neiϕa(f,0)⟩ (B.2)   =supf∈[0,1]|⟨η,na(f,0)⟩| (B.3)   =||Fn∗η||∞. (B.4) The result now follows from the fact that the dual of 2.24 is   maxη∈Cn⟨y,η⟩subject to ‖η‖A′∗≤1, (B.5)  ||η||∞≤λ, (B.6) by a standard argument [22, Section 2.1]. B.2 Proof of Corollary 2.5 The corollary is a direct consequence of the following lemma, which establishes that the dual polynomial whose existence we establish in Proposition 3.2 also guarantees that solving Problem (2.24) achieves exact demixing. Lemma B.1 If there exists a trigonometric polynomial $$Q$$ satisfying the conditions listed in Proposition 3.1, then $${\boldsymbol{g}}$$ and $${\boldsymbol{z}}$$ are the unique solutions to Problem (2.24). Proof. In the case of the atoms defined by (2.20), the atomic norm is given by   ||u||A =inf{x~j≥0},{ϕj∈[0,2π)}{fj∈[0,1]}{∑jx~j:u=∑jx~ja(fj,ϕj)}, (B.7) so that   ||g||A ≤||x||1due to (2.21) (B.8)   =||μ||TV. (B.9) By construction,   ⟨q,y⟩ =⟨q,g+z⟩ (B.10)   =⟨Fn∗q,μ⟩+⟨q,z⟩ (B.11)   =∫[0,1]Q(f)¯dμ(f)+λ∑l=1s|zl| (B.12)   =||μ||TV+λ||z||1. (B.13) Consider an arbitrary feasible pair $${\left\{ { {\boldsymbol{g'}}, {\boldsymbol{z'}}}\right\}\!}$$ different from $${\left\{ { {\boldsymbol{g}}, {\boldsymbol{z}}}\right\}\!}$$, such that $${\boldsymbol{z'}}$$ has non-zero support $${\it{\Omega}}'$$ and   g′=n∑fj∈T′x′ja(fj,0),||g′||A:=∑fj∈T′|x′j| (B.14) for a sequence of complex coefficients $${\boldsymbol{x'}}$$ and a set of frequency locations $$T' \subseteq {\left[{0,1}\right]\!}$$. Note that as long as $$k + s \leq n$$ (recall that $$k := {\left|{T}\right|\!}$$ and $$s:={\left|{{\it{\Omega}}}\right|\!}$$) then either $$T \neq T'$$ or $${\it{\Omega}} \neq {\it{\Omega}}'$$. The reason is that under that condition any set formed by $$k$$ atoms of the form $${\boldsymbol{a}}{\left( f_j,0 \right)\!}$$ and $$s$$ vectors with cardinality one is linearly independent (this is equivalent to the matrix $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ]$$ in Section C.1 being full rank), so that if both $$T = T'$$ and $${\it{\Omega}} = {\it{\Omega}}'$$ then $${\boldsymbol{g}} + {\boldsymbol{z}}= {\boldsymbol{g'}} + {\boldsymbol{z}}$$ would imply that $${\boldsymbol{g}}= {\boldsymbol{g'}}$$ and $${\boldsymbol{z}}= {\boldsymbol{z}}$$ (and we are assuming this is not the case). By conditions (3.3) and (3.4)   n⟨q,a(fj,0)⟩ =Q(fj) (B.15)   =xj|xj|,∀fj∈T, (B.16)  n⟨q,a(fj,0)⟩ =|Q(f)| (B.17)   <1,∀f∈Tc. (B.18) We have   ||g||A+λ||z||1 ≤⟨q,y⟩by (B.9) and (B.13) (B.19)   =⟨q,g′⟩+⟨q,z′⟩ (B.20)   =n∑fj∈T′x′j⟨q,a(f,0)⟩+⟨qΩ′,z′⟩ (B.21)   <n∑fj∈T′|x′j|+λ∑l∈Ω′|z′j| (B.22)   =||g′||A+λ||z′||1, (B.23) where (B.22) follows from conditions (3.5) and (3.6), (B.16), (B.18) and the fact that either $$T \neq T'$$ or $${\it{\Omega}} \neq {\it{\Omega}}'$$. We conclude that $${\left\{ { {\boldsymbol{g}}, {\boldsymbol{z}}}\right\}\!}$$ must be the unique solution to Problem (2.24). □ Appendix C. Proof of Proposition 3.1 For any vector $${\boldsymbol{u}}$$ and any atomic measure $$\nu$$, we denote by $${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ and $$\nu_{{\mathcal{{S}}}}$$ the restriction of $${\boldsymbol{u}}$$ and $$\nu$$ to the subset of their support indexed by a set $${\mathcal{{S}}}$$ ($${\boldsymbol{u}}_{{\mathcal{{S}}}}$$ has the same dimension as $${\boldsymbol{u}}$$ and $$\nu_{{\mathcal{{S}}}}$$ is still a measure in the unit interval). Let us consider an arbitrary feasible pair $$\mu'$$ and $${\boldsymbol{z'}}$$, such that $$\mu'\neq \mu$$ or $${\boldsymbol{z'}}\neq {\boldsymbol{z}}$$. Due to the constraints of the optimization problem, $$\mu'$$ and $${\boldsymbol{z'}}$$ satisfy   y=Fnμ+z=Fnμ′+z′. (C.1) The following lemma establishes that $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ cannot both equal zero. Lemma C.1 (Proof in Section C.1) If $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ is feasible and $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ both equal zero, then $$\mu=\mu'$$ and $${\boldsymbol{z}}={\boldsymbol{z'}}$$. This lemma and the existence of $$Q$$ imply that the cost function evaluated at $${\left\{ {\mu',{\boldsymbol{z'}}}\right\}\!}$$ is larger than at $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$:   ||μ′||TV+λ||z′||1 =||μT′||TV+||μTc′||TV+λ||zΩ′||1+λ||zΩc′||1 >||μT′||TV+⟨Q,μTc′⟩+λ||zΩ′||1+⟨q,zΩc′⟩by Lemma C.1, (3.4) and (3.4) (C.2)   ≥⟨Q,μ′⟩+⟨q,z′⟩by (3.3) and (3.3) (C.3)   =⟨Fn∗q,μ′⟩+⟨q,z′⟩ (C.4)   =⟨q,Fnμ′+z′⟩ (C.5)   =⟨q,Fnμ+z⟩by (C.1) (C.6)   =⟨Fn∗q,μ⟩+⟨q,z⟩ (C.7)   =⟨Q,μ⟩+⟨q,z⟩ (C.8)   =||μ||TV+λ||z||1by (3.3) and (3.3). (C.9) We conclude that $${\left\{ {\mu,{\boldsymbol{z}}}\right\}\!}$$ must be the unique solution. C.1. Proof of Lemma C.1 If $$\mu_{T^c}'$$ and $${\boldsymbol{z}}_{{\it{\Omega}}^c}'$$ both equal zero, then   Fnμ+z−FnμT′−zΩ′ =Fnμ′+z′−FnμT′−zΩ′by (C.1) (C.10)   =FnμTc′+zΩc′ (C.11)   =0. (C.12) We index the entries of $${\it{\Omega}} := {\left\{ {i_1,i_2, \ldots,i_s}\right\}\!}$$ and define the matrix $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ] \in {\mathbb{C}}^{n \times {\left( k+s \right)}}$$, where   (FT)lj =ei2πlfjfor 1≤l≤n,1≤j≤k, (C.13)  (IΩ)lj ={1if l=ij0otherwise for 1≤l≤n,1≤j≤s. (C.14) If $$k + s \leq n$$ then $$[ F_T \quad {\it{I}}_{{\it{\Omega}}} ]$$ is full rank (this follows from the fact that $$F_T$$ is a submatrix of a Vandermonde matrix). Equation (C.12) implies   [FTIΩ][x−x′PΩz−PΩz′]=0, (C.15) where $${\mathcal{{P}}}_{{\it{\Omega}}} {\boldsymbol{u}}' \in {\mathbb{C}}^s$$ is the subvector of $${\boldsymbol{u}}'$$ containing the entries indexed by $${\it{\Omega}}$$ and $${\boldsymbol{x}}' \in {\mathbb{C}}^T$$ is the vector containing the amplitudes of $$\mu'$$ (recall that by assumption $$\mu_{T^c}'=0$$). We conclude that $$\mu=\mu'$$ and $${\boldsymbol{z}}={\boldsymbol{z'}}$$. Appendix D. Proof of Lemma 3.4 The vector of coefficients $${\boldsymbol{c}}$$ equals the convolution of three rectangles of widths $$2 \,{\cdot}\, {0.247} \, m + 1$$, $$2 \,{\cdot}\, {0.339} \, m + 1$$ and $$2 \cdot {0.414} \, m + 1$$ and amplitudes $${\left( 2 \cdot {0.247} \, m + 1 \right)\!}^{-1}$$, $${\left( 2 \cdot {0.339} \, m + 1 \right)\!}^{-1}$$ and $${\left( 2 \cdot {0.414} \, m + 1 \right)\!}^{-1}$$. Some simple computations show that the amplitude of the convolution of three rectangles with unit amplitudes and widths $$a_1 < a_2<a_3$$ is bounded by $$a_1 a_2$$. An immediate consequence is that the amplitude of $${\boldsymbol{c}}$$ is bounded by   ||c||∞ ≤(2⋅0.247m+1)(2⋅0.339m+1)(2⋅0.247m+1)(2⋅0.339m+1)(2⋅0.414m+1) (D.1)   ≤1(2⋅0.414m+1) (D.2)   ≤1.3m. (D.3) Appendix E. Proof of Lemma 3.6 To bound the operator norm of $$B_{{\it{\Omega}}}$$, we control the behavior of   H:=BΩBΩ∗ (E.1)   =∑l∈Ωb(l)b(l)∗, (E.2) which concentrates around a scaled version of   H¯ :=∑l=−mmb(l)b(l)∗. (E.3) The following lemma bounds the operator norm of $$\bar{H}$$. Lemma E.1 (Proof in Section E.1) Under the assumptions of Theorem 2.2   ‖H¯‖ ≤260π2nlog⁡k. (E.4) By (2.12) $$s \leq C_s \, n {\left({ \log k \log \frac{n}{\epsilon}}\right)\!}^{-1}$$ which together with the lemma implies   ‖snH¯‖ ≤CB2n2(log⁡nϵ)−1 (E.5) if we set $$C_s$$ small enough. The following lemma uses the matrix Bernstein inequality to control the deviation of $$H$$ from a scaled version of $$\bar{H}$$. Lemma E.2 (Proof in Section E.2) Under the assumptions of Theorem 2.2   ‖H−snH¯‖ ≤CB2n2(log⁡nϵ)−1 (E.6) with probability at least $$1- \epsilon /5$$. We conclude that   ‖BΩ‖ ≤‖H‖ (E.7)   ≤sn‖H¯‖+‖H−snH¯‖ (E.8)   ≤CBn(log⁡nϵ)−12 (E.9) with probability at least $$1- \epsilon /5$$ by the triangle inequality. E.1 Proof of Lemma E.1 We express the matrix $$\bar{H}$$ in terms of the Dirichlet kernel $${\mathcal{{D}}}_m$$ of order $$m$$ defined in (3.25) and its derivatives,   H¯=n[H¯0H¯1−H¯1H2¯], (E.10) where   (H¯0)jl =Dm(fj−fl),(H¯1)jl=κDm(1)(fj−fl),(H¯2)jl=−κ2Dm(2)(fj−fl). (E.11) In order to bound the operator norm of $$\bar{H}$$ we first establish some bounds on $${\mathcal{{D}}}_m ^{{\left( \ell \right)\!}}$$ for $$\ell=0,1,2$$. Due to how the kernel is normalized in (3.25), the magnitude of $${\mathcal{{D}}}_m$$ is bounded by one. This yields a uniform bound on the magnitude of its derivatives by Bernstein’s polynomial inequality. Theorem E.3 (Bernstein’s polynomial inequality [56]) For any complex-valued polynomial $$P$$ of degree $$N$$  sup|z|≤1|P(1)(z)|≤Nsup|z|≤1|P(z)|. (E.12) Applying the theorem, we have   |Dm(ℓ)(f)|≤(2πm)ℓ. (E.13) The following lemma allows us to control the tail of the Dirichlet kernel and its derivatives. Lemma E.4 ([38, Section C.4]) If $$m \geq 10^3$$, for $$f \geq 80 /m$$  |Dm(ℓ)(f)| ≤1.12ℓ−2πℓmℓ−1f. (E.14) We now combine these two bounds to control the sum of the magnitudes of $${\mathcal{{D}}}_m^{{\left( \ell \right)\!}}$$ when evaluated at $$T$$ for $$\ell = 0,1,2$$. By the minimum-separation condition (2.10), if we fix $$f_i \in T$$ then there are at most 126 other frequencies in $$T$$ that are at a distance of $$80/m$$ or less from $$f_i$$. We bound those terms using (E.13) and deal with the rest by applying Lemma E.4,   supfi∑j=1kκℓ|Dm(ℓ)(fi−fj)| ≤126πℓκℓsupf|Dm(ℓ)(f)|+2κℓ∑j=1ksup|f|≥jΔmin|Dm(ℓ)(f)| (E.15)   ≤126πℓ+1m(l)∑j=1k1.1πℓmℓ−14jΔminby Lemma 3.3 and (E.13) (E.16)   ≤130πℓlog⁡ksince Δmin:=1.26m and ∑j=1k1j≤1+log⁡k≤2log⁡k (E.17) as long as $$k$$ is larger than 2 (the argument can be easily modified if this is not the case). By Gershgorin’s circle theorem, the eigenvalues of $$\bar{H}$$, and consequently its operator norm, are bounded by   nmaxi{ ∑j=1k|Dm(fi−fj)|+∑j=1kκ|Dm(1)(fi−fj)|, (E.18)   ∑j=1kκ|Dm(1)(fi−fj)|+∑j=1kκ2|Dm(2)(fi−fj)|}≤260π2nlog⁡k. (E.19) E.2 Proof of Lemma E.2 Under the assumptions of Theorem 2.2   H =∑l=−mmδΩ(l)b(l)b(l)∗, (E.20) where $$\delta_{{\it{\Omega}}}{\left( -m \right)\!}$$, $$\delta_{{\it{\Omega}}}{\left( -m+1 \right)\!}$$,..., $$\delta_{{\it{\Omega}}}{\left( m \right)\!}$$ are i.i.d. Bernouilli random variables with parameter $$\frac{s}{n}$$. We control this sum of independent random matrices using the matrix Bernstein inequality. Theorem E.5 (Matrix Bernstein inequality [71, Theorem 1.4]) Let $${\left\{ {X_l}\right\}\!}$$ be a finite sequence of independent zero-mean self-adjoint random matrices of dimension $$d$$ such that $${\left\lVert{X_l}\right\rVert} \leq B$$ almost surely for a certain constant $$B$$. For all $$t \geq 0$$ and a positive constant $$\sigma^2$$  P{‖∑l=−mmXl‖≥t}≤dexp⁡(−t2/2σ2+Bt/3)as long as‖∑l=−mmE(Xl2)‖≤σ2. (E.21) We apply the matrix Bernstein inequality to the finite sequence of independent adjoint zero-mean random matrices of the form   Xl:=(δΩ(l)−sn)b(l)b(l)∗,−m≤l≤m. (E.22) These random matrices satisfy   H−snH¯ =∑l=−mmXl. (E.23) By Lemma 3.5   ‖Xl‖ ≤sup−m≤l≤m||b(l)||22 (E.24)   ≤B:=10k. (E.25) In addition,   σ2 :=‖∑l=−mmE(Xl2)‖ (E.26)   =‖∑l=−mmE((δ¯(l)−sn)2)||b(l)||22b(l)b(l)∗‖ (E.27)   ≤10ksn‖H¯‖ (E.28)   ≤10CB2nk(log⁡nϵ)−1 (E.29) by Lemma (3.5), (E.5) and the fact that the variance of a Bernouilli random variable of parameter $$p$$ equals $$p {\left( 1-p \right)\!}$$. Setting $$t := \frac{C_B^2 \, n}{2} {\left({ \log \frac{n}{\epsilon}}\right)\!}^{-1}$$ in Theorem E.5, so that $$\sigma^2 = 20 \, k \, t$$, yields   P{‖H−snH¯‖≥t}≤2kexp⁡(−t2/2σ2+Bt/3) (E.30)   =2kexp⁡(−3t140k). (E.31) The probability is smaller or equal to $$\epsilon/5$$ as long as   k≤3CB2n280(log⁡10kϵlog⁡nϵ)−1, (E.32) which holds by (2.11) if we set $$C_k$$ small enough. Appendix F. Proof of Lemma 3.7 The proof uses the following concentration bound that controls the deviation of a sum of independent vectors. Theorem F.1 (Vector Bernstein inequality [15, Theorem 2.6], [40, Theorem 12]) Let $${\mathcal{{U}}} \subset \mathbb{R}^d$$ be a finite sequence of independent zero-mean random vectors with $${\left|\left|{ {\boldsymbol{u}} }\right|\right| _{2}\!} \leq B$$ almost surely and $$\sum_{ {\boldsymbol{u}} \in {\mathcal{{U}}}} \mathbb{E} {\left|\left|{ {\boldsymbol{u}} }\right|\right| _{2}\!}^2 \leq \sigma^2$$ for all $${\boldsymbol{u}} \in {\mathcal{{U}}}$$, where $$B$$ and $$\sigma^2$$ are positive constants. For all $$t \geq 0$$  P(||∑u∈Uu||2≥t)≤exp⁡(−t28σ2+14) for 0≤t≤σ2B. (F.1) By the definitions of $$\bar{K}$$, $$K$$ and $${\boldsymbol{b}}$$ in (3.27), (3.38) and (3.45),   v¯ℓ(f) =∑l=−mm(i2πκl)ℓclei2πlfb(l), (F.2)  vℓ(f) =∑l=−mmδΩc(l)(i2πκl)ℓclei2πlfb(l), (F.3) where by assumption $$\delta_{{\it{\Omega}}^c} {\left( -m \right)\!}, \ldots, \delta_{{\it{\Omega}}^c} {\left( m \right)\!}$$ are i.i.d. Bernoulli random variables with parameter $$p := \frac{n-s}{n}$$. This implies that the finite collection of zero-mean random vectors of the form   u(ℓ,l) :=(δΩc(l)−p)(i2πκl)ℓclei2πlfb(l), (F.4) satisfy   vℓ(f)−pv¯ℓ(f) =∑l=−mmu(l). (F.5) We have   ||u(ℓ,l)||2 ≤π3||c||∞sup−m≤l≤m||b(l)||2by Lemma (3.3) and ℓ≤3 (F.6)   ≤B:=128kmby Lemmas 3.4 and 3.5,  (F.7) as well as   ∑l=−mmE||u(ℓ,l)||22 =∑l=−mmE((δΩc(l)−p)2)(2πκl)2ℓ|cl|2||b(l)||22 (F.8)   ≤π6nE((δΩc(1)−p)2)||c||∞2sup−m≤l≤m||b(l)||22by Lemma (3.3) (F.9)   ≤σ2:=3.25104km, (F.10) where the last inequality follow from Lemmas 3.4 and 3.5 and $$\mathbb{E} \left({\left({ p - \delta_{{\it{\Omega}}^c}{\left( l \right)\!}}\right)^2}\right) = p{\left( 1-p \right)\!}$$. By the vector Bernstein inequality for $$0 \leq t \leq \sigma^2/B$$ and the union bound, we have   P(supf∈G‖vℓ(f)−pv¯ℓ(f)‖2≥t,ℓ∈{0,1,2,3})≤4|G|exp⁡(−t28σ2+14). (F.11) To make the right-hand side smaller than $$\epsilon /5$$, we fix $$t$$ to equal   t :=σ8(14+log⁡20|G|ϵ). (F.12) This choice of $$t$$ is valid because   tσ =8(14+log⁡20|G|ϵ) (F.13)   ≤74+16log⁡n+8log⁡1ϵ (F.14)   ≤0.315n+8log⁡1ϵ (F.15)  ≤0.32n. (F.16) Inequality (F.15) follows from the fact that $$\sqrt{74 + 16 \log n} \leq 0.315 \sqrt{n}$$ for $$n \geq 2\, 10^3$$. Inequality (F.16) holds by (2.11) and (2.12) as long as we set $$C_k$$ and $$C_s$$ small enough, and either $$k \geq 1$$ or $$s \geq 1$$. This establishes that $$t/\sigma$$ is smaller than $$0.32 \sqrt{n} \leq \sigma/B$$. We conclude that the desired bound holds as long as   Cv(log⁡nϵ)−12 ≥t≥2103kn(14+log⁡8103n2ϵ), (F.17) which is the case by (2.11) if we set $$C_k$$ small enough. Appendix G. Proof of Lemma 3.8 The proof is based on the proof of Lemma 4.4 in [66]. The following lemma establishes that $$\bar{D}$$ is invertible and close to the identity. Lemma G.1 (Proof in Section G.1) Under the assumptions of Theorem 2.2   ‖I−D¯‖ ≤0.468, (G.1)  ‖D¯‖ ≤1.468, (G.2)  ‖D¯−1‖ ≤1.88. (G.3) By the definition of $$\bar{K}$$ and $$K$$ in (3.27) and (3.38), respectively, we can write $$D$$ and $$\bar{D}$$ as sums of self-adjoint matrices,   D¯ =∑l=−mmclb(l)b(l)∗, (G.4)  D =∑l=−mmδΩc(l)clb(l)b(l)∗, (G.5) where by assumption $$\delta_{\Omega^c} {\left( -m \right)\!}$$,..., $$\delta_{\Omega^c} {\left( m \right)\!}$$ are i.i.d. Bernoulli random variables with parameter $$p := \frac{n-s}{n}$$. In the following lemma, we leverage the matrix Bernstein inequality to establish that $$D$$ concentrates around $$p \, \bar{D}$$. Lemma G.2 (Proof in Section G.2) Under the assumptions of Theorem 2.2   ‖D−pD¯‖≥p4min{1,CD4(log⁡nϵ)−12}, (G.6) with probability at most $$\epsilon /5$$. Applying the triangle inequality together with Lemma G.1 allows to lower bound the smallest singular value of $$D$$ under the assumption that (G.6) holds   σmin(D)p ≥σmin(I)−‖I−D¯‖−1p‖D−pD¯‖ (G.7)   ≥0.282. (G.8) This proves that $$D$$ is invertible. To complete the proof we borrow two inequalities from [66]. Lemma G.3 ([66, Appendix E]) For any matrices $$A$$ and $$B$$ such that $$B$$ is invertible and   ‖A−B‖‖B−1‖≤12 (G.9) we have   ‖A−1‖ ≤2‖B−1‖, (G.10)  ‖A−1−B−1‖ ≤2‖B−1‖2‖A−B‖. (G.11) We set $$A:= D$$ and $$B:=p\bar{D}$$. By Lemmas G.1 and Lemma G.2,   ‖D−pD¯‖‖(pD¯)−1‖ ≤12, (G.12) with probability at least $$1-\epsilon/5$$. Lemmas G.1, G.2 and G.3 then imply   ‖D−1‖ ≤2‖(pD¯)−1‖ (G.13)   ≤4p, (G.14)  ‖D−1−(pD¯)−1‖ ≤2‖(pD¯)−1‖2‖D−pD¯‖ (G.15)   ≤CD2p(log⁡nϵ)−12, (G.16) with the same probability. Finally, if $$s \leq n/2$$, which is the case by (2.12), we have $$1/p \leq 2$$ and the proof is complete. G.1 Proof of Lemma G.1 The following bounds on the submatrices of $$\bar{D}$$ are obtained by combining Lemma 3.3 with some results borrowed from [38]. Lemma G.4 ([38, Section 4.2]) Under the assumptions of Theorem 2.2   ||I−D¯0||∞ ≤1.85510−2, (G.17)  ||D¯1||∞ ≤5.14810−2, (G.18)  ||I−D¯2||∞ ≤0.416. (G.19) Following a similar argument as in Appendix C of [66] yields the desired result:   ‖I−D¯‖ ≤||I−D¯||∞ (G.20)   ≤max{||I−D¯0||∞+||D¯1||∞,||I−D¯2||∞+||D¯1||∞} (G.21)   ≤0.468, (G.22)  ‖D¯‖ ≤1+‖I−D¯‖≤1.468, (G.23)  ‖D¯−1‖ ≤11−||I−D¯||∞≤1.88. (G.24) G.2 Proof of Lemma G.2 We define   Xl:=(p−δΩc(l))clb(l)b(l)T, (G.25) which has zero mean since   E(Xl) =(p−E(δΩc(l)))clb(l)b(l)T (G.26)   =0. (G.27) By the proofs of Lemmas 3.4 and (3.5), for any $$-m \leq l \leq m$$,   ‖Xl‖ ≤max−m≤l≤m‖clb(l)b(l)T‖ (G.28)   ≤||c||∞max−m≤l≤m||b(l)||22 (G.29)   ≤B:=12.6km. (G.30) Also, $${\mathbb{E}} {\left({{\left({ p - \delta_{\Omega^c}{\left( {l}\right)}}\right)\!}^2}\right)\!} = p{\left( 1-p \right)\!}$$, which implies   E(Xl2)=p(1−p)cl2||b(l)||22b(l)b(l)T. (G.31) Since $${\boldsymbol{c}}_l \geq 0$$ for all $$l$$ ($${\boldsymbol{c}}$$ is the convolution of three positive rectangular pulses),   ∑l=−mmcl2||b(l)||22b(l)b(l)T ⪯||c||∞max−m≤l≤m||b(l)||22∑l=−mmclb(l)b(l)T (G.32)   ⪯12.6kmD¯by Lemma B3.4 and (3.5), (G.33) so that   ∑l=−mmE(Xl2) ≤p‖∑l=−mmcl2||b(l)||22b(l)b(l)T‖ (G.34)   ≤12.6pk‖D¯‖m (G.35)   ≤σ2:=18.5pkmby Lemma G.1. (G.36) Setting $$t = \frac{p}{4}C_{\min} {\left({\log \frac{n}{\epsilon} }\right)\!}^{-\frac{1}{2} }$$ where $$C_{\min}:=\min {\left\{ {1,C_{D}/4}\right\}\!}$$, the matrix Bernstein inequality from Theorem E.5 implies that   Pr{‖D−1−pD¯−1‖>t} ≤2kexp⁡(−Cmin2pm32k(18.5log⁡nϵ+1.05Cminlog⁡nϵ)−1) ≤2kexp⁡(−CD′(n−s)klog⁡nϵ) (G.37) for a small enough constant $$C_{D}'$$. This probability is smaller than $$\epsilon/5$$ as long as   k ≤CD′n2(log⁡10kϵlog⁡nϵ−1), (G.38)  s ≤n2, (G.39) which holds by (2.11) and (2.12) if we set $$C_k$$ and $$C_s$$ small enough. Appendix H. Proof of Proposition 3.10 We begin by expressing $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{Q}^{{\left( \ell \right)\!}}$$ in terms of $$\boldsymbol{h}$$ and $$\boldsymbol{r}$$,   κℓQ¯(ℓ)(f) :=κℓ∑j=1kα¯jK¯(ℓ)(f−fj)+κℓ+1∑j=1kβ¯jK¯(ℓ+1)(f−fj) (H.1)   =v¯ℓ(f)TD¯−1[h0], (H.2)  κℓQ(ℓ)(f) :=κℓ∑j=1kαjK(ℓ)(f−fj)+κℓ+1∑j=1kβjK(ℓ+1)(f−fj)+κℓR(ℓ)(f) (H.3)   =vℓ(f)TD−1([h0]−1nBΩr)+κℓR(ℓ)(f). (H.4) The difference between $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{ Q }^{{\left( \ell \right)\!}}$$ can be decomposed into several terms,   κℓQ(ℓ)(f) =κℓQ¯(ℓ)(f)+κℓR(ℓ)(f)+I1(ℓ)(f)+I2(ℓ)(f)+I3(ℓ)(f), (H.5)  I1(ℓ)(f) :=−1nvℓ(f)TD−1BΩr, (H.6)  I2(ℓ)(f) :=(vℓ(f)−n−snv¯ℓ(f))TD−1[h0], (H.7)  I3(ℓ)(f) :=n−snv¯ℓ(f)T(D−1−nn−sD¯−1)[h0]. (H.8) The following lemma provides bounds on these terms that hold with high probability in every point of a grid $${\mathcal{{G}}}$$ that discretizes the unit interval. Lemma H.1 (Proof in Section H.1) Conditioned on $${\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$, the events   ER :={supf∈G|κℓR(ℓ)(f)|≥10−28,ℓ=0,1,2,3} (H.9) and   Ei :={supf∈G|Ii(ℓ)(f)|≥10−28,ℓ=0,1,2,3}i=1,2,3, (H.10) where $${\mathcal{{G}}} \subseteq \left[ 0,1 \right]\!$$ is an equispaced grid with cardinality $${\left|{ {\mathcal{{G}}} }\right|\!} = 400 n^2$$ occur each with probability at most $$\epsilon / 20$$ under the assumptions of Theorem 2.2. By the triangle inequality, Lemma H.1 implies   supf∈G|κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)|≤10−22 (H.11) with probability at least $$1-\epsilon /5$$ conditioned on $${\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$. We have controlled the deviation between $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{ Q }^{{\left( \ell \right)\!}}$$ on a fine grid. The following result extends the bound to the whole unit interval. Lemma H.2 (Proof in Section H.3) Under the assumptions of Theorem 2.2   |κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)|≤10−2 for ℓ∈{0,1,2}. (H.12) This bound suffices to establish the desired result for values of $$f$$ that lie away from $$T$$. Let us define   Snear :={f||f−fj|≤0.09for some fj∈T}, (H.13)  Sfar :=[0,1]/Snear. (H.14) Section 4 of [38] provides a bound on $$\bar{Q}$$ which holds over all of $${\mathcal{{S}}}_{\mathrm{far}}$$ under the minimum-separation condition (2.10) (see Fig. 12 in [38] as well as the code that supplements [38]). Proposition H.3 (Bound on $$\bar Q$$ [38, Section 4]) Under the assumptions of Theorem 2.2   |Q¯(f)| <0.99f∈Sfar. (H.15) Combining Lemma H.2 and Proposition H.3   |Q(f)| ≤|Q¯(f)|+10−2 (H.16)   <1for all f∈Sfar. (H.17) To bound $$Q$$ in $${\mathcal{{S}}}_{\mathrm{near}}$$ we recall that by Corollary 3.9 in $${\mathcal{{E}}}_{D}^c$$$${\left|{Q{\left( f_j \right)\!}}\right|\!}^2=1$$ and   d|Q(fj)|2df =2QR(1)(fj)QR(fj)+2QI(1)(fj)QI(fj) (H.18)   =0 (H.19) for every $$f_j$$ in $$T$$. Let $$\tilde{f}$$ be the element in $$T$$ that is closest to an arbitrary $$f$$ belonging to $${\mathcal{{S}}}_{\mathrm{near}}$$. The second-order bound   |Q(f)|2 ≤1+(f−f~)2supf∈Sneard2|Q(f)|2df2 (H.20) implies that we only need to show that $${\left|{Q}\right|\!}^2$$ is concave in $${\mathcal{{S}}}_{\mathrm{near}}$$ to complete the proof. First, we bound the derivatives of $$\bar{Q}$$ and $$Q$$ using Bernstein’s polynomial inequality. Lemma H.4 Under the assumptions of Theorem 2.2, for any $$\ell =0,1,2, \ldots$$  supf∈[0,1]|κℓQ¯(ℓ)(f)|≤1, (H.21)  supf∈[0,1]|κℓQ(ℓ)(f)|≤1.01. (H.22) Proof. $$\bar{Q}$$ is a trigonometric polynomial of degree $$m$$ and its magnitude is bounded by one (see Proposition 2.3 in [38]). Combining Theorem E.3 and Lemma 3.3 yields (H.21). The triangle inequality, Lemma H.2 and (H.21) imply (H.22). □ Section 4 of [38] also provides a bound on the second derivative of $${\left|{\bar{Q}}\right|\!}^2$$, which holds over all of $${\mathcal{{S}}}_{\mathrm{near}}$$ under the minimum-separation condition (2.10) (again, see Fig. 12 in [38] as well as the code that supplements [38]). Proposition H.5 (Bound on the second derivative of $${\left|{\bar{Q}}\right|\!}$$ [38, Section 4]) Under the assumptions of Theorem 2.2   d2|Q¯(f)|2df2 ≤−0.8m2f∈Snear. (H.23) Combining Proposition H.5, Lemma H.4 and the triangle inequality, as well as the lower bound on $$\kappa$$ from Lemma 3.3, allows us to conclude that the second derivative of $${\left|{\bar{Q}}\right|\!}^2$$ is negative in $${\mathcal{{S}}}_{\mathrm{near}}$$. Indeed, for any $$f \in {\mathcal{{S}}}_{\mathrm{near}}$$  κ22d2|Q(f)|2df2 =κ2QR(2)(f)QR(f)+κ2QI(2)(f)QI(f)+|κQ(1)(f)|2 ≤κ22d2|Q¯(f)|2df2+2|κ2Q(2)(f)−κ2Q¯(2)(f)|supf′|Q(f′)| +2|Q(f)−Q¯(f)|supf′|κ2Q¯(2)(f′)| (H.24)  +2 |κQ(1)(f)−κQ¯(1)(f) | (supf′⁡|κQ(1)(f′)|+supf′⁡|κQ¯(1)(f′)| ) (H.25)   ≤−0.087+2⋅10−2(4+2⋅10−2) (H.26)   <0. (H.27) H.1 Proof of Lemma H.1 Following an argument used in [66] (see also [16]), we use Hoeffding’s inequality to bound the different terms. Theorem H.6 (Hoeffding’s inequality) Let the components of $${\boldsymbol{{\tilde{u}}}}$$ be sampled i.i.d. from a symmetric distribution on the complex unit circle. For any $$t >0$$ and any vector $${\boldsymbol{u}}$$  Pr(|⟨u~,u⟩|≥ϵ~) ≤4exp⁡(−ϵ~24||u||22). (H.28) Corollary H.7 Let the components of $${\boldsymbol{{\tilde{u}}}}$$ be sampled i.i.d. from a symmetric distribution on the complex unit circle. For any finite collection of vectors $${\mathcal{{U}}}$$ with cardinality $$4 {\left|{G}\right|\!} = 1600 n^2$$, the event   E:={|⟨u~,u⟩|>10−28for all u∈U} (H.29) has probability at most $$\epsilon / 20$$ as long as   ||u||22≤CU2(log⁡nϵ)−1for all u∈U, (H.30) where $$C_{{\mathcal{{U}}}} : =1/5000$$. Proof. The result follows directly from the proposition and the union bound. □ Bound on $$\Pr \left({{\mathcal{{E}}}_{R} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We consider the family of vectors   u(ℓ,f):=κℓn[(i2πl1)ℓei2πl1f(i2πl2)ℓei2πl2f⋯(i2πls)ℓei2πlsf]T, (H.31) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. We have   ||u(ℓ,f)||22 ≤κ2ℓ(2πm)2ℓsn (H.32)   ≤π6snby Lemma 3.3 (H.33)   ≤CU2(log⁡nϵ)−1by (2.12) if we set Cs small enough. (H.34) The desired result follows by Corollary H.7 because   κℓR(ℓ)(f)= ⟨r, u(ℓ,f)⟩ . (H.35) Bound on $$\Pr \left({{\mathcal{{E}}}_{1} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I1(ℓ)(f) =⟨u(ℓ,f),r⟩,u(ℓ,f):=−1nBΩ∗D−1vℓ(f), (H.36) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. To bound $${\left|\left|{ {\boldsymbol{u}}{\left( \ell, f \right)\!} }\right|\right| _{2}\!}$$, we leverage a bound on the $$\ell_2$$ norm of $${\boldsymbol{{ v_{\ell}}}}$$ which follows from Lemma 3.7 and the following bound on the $$\ell_2$$ norm of $${\boldsymbol{{\bar{v}_{\ell}}}}$$. Lemma H.8 (Proof in Section H.2) Under the assumptions of Theorem 2.2, there is a fixed numerical constant $$C_{{\boldsymbol{{\bar{v}}}}}$$ such that for any $$f$$  ||v¯ℓ(f)||2 ≤Cv¯. (H.37) Corollary H.9 In $${\mathcal{{E}}}_{v}^c$$ for any $$f \in {\mathcal{{G}}}$$  ||vℓ(f)||2 ≤Cv¯+Cv. (H.38) Proof. The result follows from the lemma, the triangle inequality and Lemma 3.7. □ Combining Lemma 3.8 and Corollary H.9 yields   ||u(ℓ,f)||2 ≤1n‖BΩ‖‖D−1‖||vℓ(f)||2 (H.39)   ≤8(Cv¯+Cv)‖BΩ‖n (H.40) in $${\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}$$. Corollary H.7 implies the desired result if   ‖BΩ‖ ≤CB(log⁡nϵ)−12n,CB:=CU8(Cv¯+Cv), (H.41) which is the case in $${\mathcal{{E}}}_{B}^{c}$$ by Lemma 3.6. Bound on $$\Pr \left({{\mathcal{{E}}}_{2} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I2(ℓ)(f) =⟨u,(ℓ,f)⟩h,u(ℓ,f):=PD−1(vℓ(f)−n−snv¯ℓ(f)), (H.42) where $$P \in \mathbb{R}^{k \times 2k}$$ is the projection matrix that selects the first $$k$$ entries in a vector, $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. Since $${\left\lVert{{P}}\right\rVert}=1$$, by Lemma 3.8 in $${\mathcal{{E}}}_{D}^{c}$$  ||u(ℓ,f)||2 ≤‖P‖‖D−1‖||vℓ(f)−n−snv¯ℓ(f)||2 (H.43)   ≤8||vℓ(f)−n−snv¯ℓ(f)||2. (H.44) The desired result holds if   ||vℓ(f)−n−snv¯ℓ(f)||2 ≤Cv(log⁡nϵ)−12,Cv:=CU8, (H.45) which is the case in $${\mathcal{{E}}}_{v}^{c}$$ by Lemma 3.7. Bound on $$\Pr \left({{\mathcal{{E}}}_{3} | {\mathcal{{E}}}_{B}^{c} \cap {\mathcal{{E}}}_{D}^{c} \cap {\mathcal{{E}}}_{v}^{c}}\right)\!$$ We have   I3(ℓ)(f) =⟨u(ℓ,f),h⟩,u(ℓ,f):=n−snP(D−1−nn−sD¯−1)v¯ℓ(f), (H.46) where $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$ and $$f$$ belongs to $${\mathcal{{G}}}$$, so that $${\left|{{\mathcal{{U}}}}\right|\!} = 4 {\left|{{\mathcal{{G}}}}\right|\!}$$. Since $${\left\lVert{{P}}\right\rVert}=1$$, by Lemma 3.7   ||u(ℓ,f)||2 ≤‖P‖‖D−1−nn−sD¯−1‖||v¯ℓ(f)||2 (H.47)   ≤Cv¯‖D−1−nn−sD¯−1‖. (H.48) The desired result holds if   ‖D−1−nn−sD¯−1‖ ≤CD(log⁡nϵ)−12,CD:=CUCv¯, (H.49) for a fixed numerical constant $$C_{D}$$, which is the case in $${\mathcal{{E}}}_{D}^{c}$$ by Lemma 3.8. H.2 Proof of Lemma H.8 We use the $$\ell_1$$ norm to bound the $$\ell_2$$ norm of $${\boldsymbol{{\bar{v}_{\ell}}}}{\left( f \right)\!}$$:   ||v¯ℓ(f)||2 ≤||v¯ℓ(f)||1 (H.50)   =∑j=1kκℓ|K¯(ℓ)(f−fj)|+∑j=1kκℓ+1|K¯(ℓ+1)(f−fj)|. (H.51) To bound the sum on the right we leverage some results from [38]. Lemma H.10   κℓ|K¯(ℓ)(f)|≤{C1∀f∈[−12,12],C2m−3|f|−3 if 80m≤|f|≤12,  (H.52) for suitably chosen numerical constant $$C_1$$ and $$C_2$$. Proof. The constant bound on the kernel follows from Corollary 4.5, Lemma 4.6 and Lemma C.2 in [38] (see also Figures 14 and 15 in the same paper). The bound for large $$f$$ follows from Lemma C.2 in [38]. □ By the minimum-separation condition (2.10), there are at most 127 elements of $$T$$ that are at a distance of $$80/m$$ or less from $$f$$. We use the first bound in (H.52) to control the contribution of those elements and the second bound to deal with the remaining terms,   ∑j=1kκℓ|K¯(ℓ)(f−fj)| ≤∑j:|f−fj|<80mC1+∑j:80m≤|f−fj|≤12C2m3|f−fj|3 (H.53)   ≤127C1+2C2∑j=1∞1m3(jΔmin)3 (H.54)   ≤127C1+2C2∑j=1∞1j3 (H.55)   =127C1+2C2ζ(3), (H.56) where $$\zeta {\left( 3 \right)\!}$$ is Apéry’s constant, which is bounded by 1.21. This completes the proof. H.3. Proof of Lemma H.2 The proof follows a similar argument to the proof of Proposition 4.12 in [66]. We begin by bounding the deviations of $$Q^{{\left( \ell \right)\!}}$$ and $$\bar{Q}^{{\left( \ell \right)\!}}$$ on neighboring points. Lemma H.11 (Proof in Section H.3.1) Under the assumptions of Theorem 2.2, for any $$f_1$$, $$f_2$$ in the unit interval   |κℓQ(ℓ)(f2)−κℓQ(ℓ)(f1)| ≤n2|f2−f1|, (H.57)  |κℓQ¯(ℓ)(f2)−κℓQ¯(ℓ)(f1)| ≤n2|f2−f1|. (H.58) For any $$f$$ in the unit interval, there exists a grid point $$f_{{\mathcal{{G}}}}$$ such that the distance between the two points is smaller than the step size $${\left( 400 \, n^2 \right)\!}^{-1}$$. This allows to establish the desired result by combining (H.11) with Lemma H.11 and the triangle inequality,   |κℓQ(ℓ)(f)−κℓQ¯(ℓ)(f)| ≤|κℓQ(ℓ)(f)−κℓQ(ℓ)(fG)|+|κℓQ(ℓ)(fG)−κℓQ¯(ℓ)(fG)| (H.59)   +|κℓQ¯(ℓ)(fG)−κℓQ¯(ℓ)(f)| (H.60)   ≤2n2|f−fG|+510−3 (H.61)   ≤10−2. (H.62) H.3.1. Proof of Lemma H.11 We first derive a coarse uniform bound on $$Q^{{\left( \ell \right)\!}}$$ for $$\ell \in {\left\{ {0,1,2,3}\right\}\!}$$. For this, we need bounds on the $$\ell_2$$ norm of $${\boldsymbol{{v_{\ell}}}} {\left( f \right)\!}$$ and the magnitude of $$R^{{\left( \ell \right)\!}} {\left( f \right)\!}$$ that hold over the whole unit interval, not only on a discrete grid. By the definitions of $$K$$ and $${\boldsymbol{b}}{\left( j \right)\!}$$ in (3.38) and (3.45), for any $$f$$  ||vℓ(f)||2 =||∑l∈Ωc(i2πκl)ℓclei2πlfb(l)||2 (H.63)   ≤πℓn||c||∞sup−m≤l≤m||b(l)||2by Lemma 3.3 (H.64)   ≤1.3π3n10kmby Lemmas 3.4 and (3.5) (H.65)   ≤256k. (H.66) Similarly, for any $$f$$  |κℓR(ℓ)(f)| =|λκℓ∑l∈Ω(−i2πl)ℓrle−i2πlf| (H.67)   ≤κℓ(2π)ℓn∑l∈Ωlℓ (H.68)   ≤κℓ(2π)ℓsmℓn (H.69)   ≤4π3snby Lemma 3.3. (H.70) We also derive a coarse bound on the operator norm $$B_{{\it \Omega}}$$  ‖BΩ‖ ≤‖H¯‖ (H.71)   ≤260π2nlog⁡kby Lemma E.1,  (H.72) which holds because $$B_{{\it \Omega}}$$ is a submatrix of a matrix $$\bar{B}$$ such that $$\bar{H}=\bar{B}\bar{B}^{\ast}$$. These bounds together with (H.4), the Cauchy–Schwarz inequality and the triangle inequality imply that in $${\mathcal{{E}}}_{D}^c$$  |κℓQ(ℓ)(f)| ≤||vℓ(f)||2‖D−1‖(||h||2+1n‖BΩ‖||r||2)+|κℓR(ℓ)(f)| (H.73)   ≤5105(k+kslog⁡k) (H.74)   ≤n7by (2.11)and (2.12) if we set Ck and Cs small enough. (H.75) Finally, if we interpret $$Q^{\left({\ell}\right)\!}\left({z}\right)\!$$ as a function of $$z \in \mathbb{C}$$, a generalization of the mean-value theorem yields   |κℓQ(ℓ)(f2)−κℓQ(ℓ)(f1)|≤κℓ|ei2πf2−ei2πf1|supz′⁡ |dQ(ℓ)(z′)dz| (H.76)   ≤2π|f2−f1|κsupf|κℓ+1Q(ℓ+1)(f)| (H.77)   ≤n2|f2−f1|by (H.75) for ℓ∈{0,1,2}. (H.78) The bound on the deviation of $$\bar{Q}^{\ell}$$ is obtained using exactly the same argument together with the bound (H.21). In the case of $$\bar{Q}$$ the bound is extremely coarse, but it suffices for our purpose. Appendix I. Proof of Proposition 3.11 Let $$l$$ be an arbitrary element of $${\it{\Omega}}^c$$. We express the corresponding coefficient $${\boldsymbol{q}}_{l}$$ in terms of the sign patterns $$\boldsymbol{h}$$ and $$\boldsymbol{r}$$,   ql =cl(∑j=1kαjei2πlfj+i2πlκ∑j=1kβjei2πlfj) (I.1)   =clb(l)∗[αβ] (I.2)   =clb(l)∗D−1([h0]−1nBΩr) (I.3)   =cl(⟨PD−1b(l),h⟩+1n⟨BΩ∗D−1b(l),r⟩), (I.4) where $$P \in \mathbb{R}^{k \times 2k}$$ is the projection matrix that selects the first $$k$$ entries in a vector. The bounds   ||PD−1b(l)||22 ≤‖P‖2‖D−1‖2||b(l)||22 (I.5)   ≤640kin EDc by Lemmas 3.5 and 3.8 (I.6)   ≤0.182nlog⁡40ϵby (2.11) if we set Ck small enough, (I.7) and   ||BΩ∗D−1b(l)||22 ≤‖BΩ‖2‖D−1‖2||b(l)||22 (I.8)   ≤640CB2knin EBc∩EDc by Lemmas 3.6 and 3.8 (I.9)   ≤0.182n2log⁡40ϵby (2.11) if we set Ck small enough, (I.10) imply by Hoeffding’s inequality (Theorem H.6) that the probability of each of the events   |⟨PD−1b(l),h⟩| >0.18n, (I.11)  |⟨BΩ∗D−1b(l),r⟩| >0.18n (I.12) is bounded by $$\epsilon / 10$$. By Lemma 3.4 and the union bound, this implies   |ql| ≤||c||∞(|⟨D−1b(l),[h0]⟩|+|⟨BΩ∗D−1b(l),r⟩|n) (I.13)   ≤2.6n(0.18n+0.18n) (I.14)   <1n (I.15) with probability at least $$1-\epsilon/5$$. Appendix J. Algorithms J.1 Proof of Lemma 4.3 The problem is equivalent to   minμ~,z~,u||μ~||TV+λ||z~||1 subject to ||y−u||22≤σ2 (J.1)  Fnμ~+z~=u, (J.2) where we have introduced an auxiliary primal variable $${\boldsymbol{u}}\in \mathbb{C}^{n}$$. Let us define the dual variables $${\boldsymbol{\eta}} \in \mathbb{C}^{n}$$ and $$\nu \geq 0$$. The Lagrangian is equal to   L(μ~,z~,η) =||μ~||TV+λ||z~||1+⟨u−Fnμ~−z~,η⟩+ν(||y−u||22−σ2) (J.3)   =||μ~||TV−⟨μ~,Fn∗η⟩+λ||z~||1−⟨z~,η⟩+⟨u,η⟩+ν(||y−u||22−σ2), (J.4) where $$\eta \in \mathbb{C}^{n}$$ is the dual variable. To compute the Lagrange dual function, we minimize the value of the Lagrangian over the primal variables [9]. The minimum of   ||μ~||TV−⟨μ~,Fn∗η⟩ (J.5) over $$\tilde{ \mu }$$ is $$-\infty$$ unless (4.9) holds. Moreover, if (4.9) holds then the minimum is at $$\tilde{\mu}=0$$ by Hölder’s inequality. Similarly, minimizing   λ||z~||1−⟨z~,η⟩ (J.6) over $${\boldsymbol{z}}$$ yields $$-\infty$$ unless (4.10) holds, whereas if (4.10) holds the minimum is attained at $${\boldsymbol{{\tilde{z}}}}=0$$. All that remains is to minimize   ⟨u,η⟩+ν(||y−u||22−σ2) (J.7) with respect to $${\boldsymbol{u}}$$ (note that (4.9) and (4.10) do not involve $${\boldsymbol{u}}$$). The function is convex with respect to $${\boldsymbol{u}}$$, so we set the gradient to zero to deduce that the minimum is at $${\boldsymbol{u}} = {\boldsymbol{y}} - \frac{1}{2\nu} \eta$$. Plugging in this value yields the Lagrange dual function   ⟨y,η⟩−14ν||η||22−νσ2. (J.8) The dual problem consists of maximizing the Lagrange dual function subject to $$\nu \geq 0$$, (4.9) and (4.10). For any fixed value of $$\tilde{\eta}$$, maximizing over $$\nu$$ is easy, the expression is convex in the half plane $$\nu \geq 0$$ and the derivative is zero at $${\left|\left|{ \eta }\right|\right| _{2}\!} / 2\sigma$$. Plugging this into (J.8) yields the dual problem (4.8). The reformulation of (4.8) as a semi-definite program is an immediate consequence of the following proposition. Proposition J.1 (Semi-definite characterization [32, Theorem 4.24], [38, Proposition 2.4]) Let $${\boldsymbol{\eta}} \in \mathbb{C}^{n }$$,   |(Fn∗η)(f)| ≤1for all f∈[0,1] if and only if there exists a Hermitian matrix $${\it{\Lambda}} \in \mathbb{C}^{n \times n}$$, obeying   [Ληη∗I]⪰0,T∗(Λ)=[10], (J.9) where $${\boldsymbol{0}} \in \mathbb{C}^{n-1}$$ is a vector of zeros. J.2 Proof of Lemma 4.4 The interior of the feasible set of Problem (4.8) contains the origin and is therefore non empty, so strong duality holds by a generalized Slater condition [54], and we have   ∑fj∈T^|x^j|+λ∑l∈Ω^|z^l|=||μ^||TV+λ||z^||1 =⟨η^,y⟩−σ||η||2 (J.10)   ≤⟨η^,y⟩−⟨η^,y−Fnμ^−z^⟩ (J.11)   =⟨η^,Fnμ^+z^⟩ (J.12)   =Re⁡[∑fj∈T^|x^j|(Fn∗η^)(fj)¯x^j|x^j|+∑l∈Ω^|z^l|η^l¯z^l|z^l|]. (J.13) The inequality (J.11) follows from the Cauchy–Schwarz inequality because $$ke{\left\{ {\hat{\mu}, {\boldsymbol{{\hat{z}}}}}\right\}\!}$$ is primal feasible, and hence $${\left|\left|{{\boldsymbol{y}}-\mathcal{F}_{n} \, \hat{\mu} - {\boldsymbol{{\hat{z}}}}}\right|\right| _{2}\!} \leq \sigma$$. Due to the constraints (4.9) and (4.10) and Hölder’s inequality, the inequality that we have established is only possible if (4.15) and (4.16) hold. The proof is complete. J.3 Atomic-noise denoising via the alternating direction method of multipliers We rewrite Problem (4.22) as   mint∈R,u∈Cn,g~∈Cn,z~∈CnΨ∈Cn+1×n+1ξ2(nu1+t)+λ′||z~||1+12||y−g~−z~||22subject toΨ=[T(u)g~g~∗t], (J.14)  Ψ⪰0, (J.15) where $$\xi := \frac{1}{\gamma\sqrt{n}}$$ and $$\lambda' := \frac{\lambda}{\gamma}$$. The augmented Lagrangian for this problem is of the form   Lρ(t,u,g~,z~,Υ,Ψ):=ξ2(nu1+t)+λ′||z~||1+12||y−g~−z~||22+⟨Υ,Ψ−[T(u)g~g~∗t]⟩ (J.16)   +ρ2||Ψ−[T(u)g~g~∗t]||F2, (J.17) where $$\rho > 0$$ is a parameter. The alternating direction method of multipliers (ADMM) minimizes the augmented Lagrangian by iteratively applying the updates:   t(l+1) :=arg⁡mintLρ(t,u(l),g~(l),z~(l),Υ(l),Ψ(l)), (J.18)  u(l+1) :=arg⁡minuLρ(t(l),u,g~(l),z~(l),Υ(l),Ψ(l)), (J.19)  g~(l+1) :=arg⁡ming~Lρ(t(l),u(l),g~,z~(l),Υ(l),Ψ(l)), (J.20)  z~(l+1) :=arg⁡minz~Lρ(t(l),u(l),g~(l),z~,Υ(l),Ψ(l)), (J.21)  Ψ(l+1) :=arg⁡minΨLρ(t(l),u(l),g~(l),z~(l),Υ(l),Ψ), (J.22)  Υ(l+1) :=Υ(l)+ρ(Ψ(l+1)−[T(u(l+1))g~(l+1)(g~(l+1))∗t(l+1)]), (J.23) where $$l$$ indicates the iteration number. We refer the interested reader to the tutorial [7], and references therein for a justification of these steps and more information on ADMM. For the method to be practical, we need an efficient implementation of all the updates. The augmented Lagrangian is convex and differentiable with respect to $$t$$, $${\boldsymbol{{u}}}$$ and $${\boldsymbol{{\tilde{g}}}}$$, so for these variables we just need to compute their gradient and set it to zero. This yields the closed-form updates:   t(l+1) =Ψn+1(l)+1ρ(Υn+1(l)−ξ2), (J.24)  u(l+1) =MT∗(Ψ0(l)+Υ0(l)ρ)−ξ2ρe(1), (J.25)  g~(l+1) =12ρ+1(y−z~(l)+2ρψ(l)+2υ(l)), (J.26) where $${\boldsymbol{{e}}}\left({1}\right): = [1,0,0,\,{\ldots}\,,0]^T$$, $${\mathcal{{T}}}^{\ast}$$ outputs a vector whose $$j$$th element is the trace of the $$(j-1)$$th subdiagonal of the input matrix, $$M$$ is a diagonal matrix such that   Mj,j=1n−j+1,j=1,…n, (J.27) and   Ψ(l) :=[Ψ0(l)ψ(l)(ψ(l))∗Ψn+1(l)],Υ(l):=[Υ0(l)υ(l)(υ(l))∗Υn+1(l)]. (J.28)$${\it{\Psi}} _{0}^{\left({l}\right)}$$ and $$\Upsilon_{0}^{\left({l}\right)}$$ are $$n \times n$$ matrices, $${\boldsymbol{{\psi}}}^{\left({l}\right)}$$ and $${\boldsymbol{{\upsilon}}}^{\left({l}\right)}$$ are $$n$$-dimensional vectors, and $${\it{\Psi}}_{n+1}^{\left({l}\right)}$$ and $$\Upsilon _{n+1}^{\left({l}\right)}$$ are scalars. Updating $${\boldsymbol{{\tilde{z}}}}$$ requires solving the problem   minz~λ′‖z~‖1+12‖y−g~(l)−z~‖22, (J.29) which is easily achieved by the applying a proximal operator   z~(l+1):=proxλ′(y−g~(l)), (J.30) where for $$1\leq j \leq n$$  proxλ′(z~)j:={sign(z~j)(|z~j|−λ′)if |z~j|>λ′0otherwise.  (J.31) Finally, the update of $${\it{\Psi}}^{\left({l}\right)}$$ amounts to a projection onto the positive semi-definite cone   Ψ(l+1)=arg⁡minΨ⪰0‖Ψ−[T(u(l))g~(l)(g~(l))∗t(l)]+1ρΥ(l)‖F2, (J.32) which can be accomplished by computing the eigenvalue decomposition of the matrix and setting all negative eigenvalues to zero. Footnotes 1 For a concrete example of two signals with a minimum separation of $$0.9 {\it{\Delta}}^{\ast}$$ that are almost indistinguishable from data consisting of $$n = 2 \, 10^{3}$$ samples, see Fig. 2 of [38]. 2Total variation often also refers to the $$\ell_1$$ norm of the discontinuities of a piecewise constant function, which is a popular regularizer in image processing and other applications [55]. 3 To be precise, Theorem 2.2 assumes $$\lambda:=1/\sqrt{n}$$, but one can check that the whole proof goes through if we set $$\lambda$$ to $$c/\sqrt{n}$$ for any positive constant $$c$$. The only effect is a change in the constants $$C_s$$ and $$C_k$$ in (2.11) and (2.12). 4 To avoid this assumption, one can adapt the width of the three kernels so that the length of their convolution equals $$2m$$ and then recompute the bounds that we borrow from [38]. 5 We use the Matlab function fminsearch based on the simplex search method [42]. 6The relative MSE is defined as the ratio between the $$\ell_2$$ norm of the difference between the clean samples $${\boldsymbol{g}}$$ and the estimate divided by $${\left|\left|{{\boldsymbol{g}}}\right|\right| _{2}\!}$$. References 1. Azais J.-M., De Castro Y. & Gamboa F. ( 2015) Spike detection from inaccurate samplings. Appl. Comput. Harmon. Anal. , 38, 177– 195. Google Scholar CrossRef Search ADS   2. Beatty L. G., George J. D. & Robinson A. Z. ( 1978) Use of the complex exponential expansion as a signal representation for underwater acoustic calibration. J. Acoust. Soc. Am. , 63, 1782– 1794. Google Scholar CrossRef Search ADS   3. Berni A. J. ( 1975) Target identification by natural resonance estimation. IEEE Trans. Aerosp. Electron. Syst. , 11, 147– 154. Google Scholar CrossRef Search ADS   4. Bhaskar B., Tang G. & Recht B. ( 2013) Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Sig. Proc. , 61, 5987– 5999. Google Scholar CrossRef Search ADS   5. Bienvenu G. ( 1979) Influence of the spatial coherence of the background noise on high resolution passive methods. Proceedings of the International Conference on Acoustics, Speech and Signal Processing , vol. 4. pp. 306– 309. 6. Borcea L., Papanicolaou G., Tsogka C. & Berryman J. ( 2002) Imaging and time reversal in random media. Inverse Prob. , 18, 1247. Google Scholar CrossRef Search ADS   7. Boyd S., Parikh N., Chu E., Peleato B. & Eckstein J. ( 2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning , 3, 1– 122. Google Scholar CrossRef Search ADS   8. Boyd N., Schiebinger G. & Recht B. ( 2017) The alternating descent conditional gradient method for sparse inverse problems. SIAM J. Optimiz. , 27, 616– 639. Google Scholar CrossRef Search ADS   9. Boyd S. P. & Vandenberghe L. ( 2004) Convex Optimization . Cambridge University Press. Google Scholar CrossRef Search ADS   10. Boyer C., De Castro Y. & Salmon J. ( 2016) Adapting to unknown noise level in sparse deconvolution. arXiv preprint arXiv:1606.04760 . 11. Bredies K. & Pikkarainen H. K. ( 2013) Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations , 19, 190– 218. Google Scholar CrossRef Search ADS   12. Candès E. J. & Fernandez-Granda C. ( 2014) Towards a mathematical theory of super-resolution. Commun. Pure Appl. Math. , 67, 906– 956. Google Scholar CrossRef Search ADS   13. Candès E. J. & Fernandez-Granda C. ( 2013) Super-resolution from noisy data. J. Fourier Anal. Appl. , 19, 1229– 1254. Google Scholar CrossRef Search ADS   14. Candès E. J., Li X., Ma Y. & Wright J. ( 2011) Robust principal component analysis? J. ACM , 58, 11. Google Scholar CrossRef Search ADS   15. Candes E. J. & Plan Y. ( 2011) A probabilistic and ripless theory of compressed sensing. IEEE Trans. Inf. Theory , 57, 7235– 7254. Google Scholar CrossRef Search ADS   16. Candès E. J. & Romberg J. ( 2007) Sparsity and incoherence in compressive sampling. Inverse Probl. , 23, 969– 985. Google Scholar CrossRef Search ADS   17. Candès E. J., Romberg J. & Tao T. ( 2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory , 52, 489– 509. Google Scholar CrossRef Search ADS   18. Candès E. J. & Tao T. ( 2005) Decoding by linear programming. IEEE Trans. Inf. Theory , 51, 4203– 4215. Google Scholar CrossRef Search ADS   19. Candes E. J. & Tao T. ( 2006) Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inf. Theory , 52, 5406– 5425. Google Scholar CrossRef Search ADS   20. Candès E. J. & Tao T. ( 2010) The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory , 56, 2053– 2080. Google Scholar CrossRef Search ADS   21. Carriere R. & Moses R. L. ( 1992) High resolution radar target modeling using a modified Prony estimator. IEEE Trans. Antennas Propag. , 40, 13– 18. Google Scholar CrossRef Search ADS   22. Chandrasekaran V., Recht B., Parrilo P. A. & Willsky A. S. ( 2012) The convex geometry of linear inverse problems. Found. Comput. Math. , 12, 805– 849. Google Scholar CrossRef Search ADS   23. Chandrasekaran V., Sanghavi S., Parrilo P. A. & Willsky A. S. ( 2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. , 21, 572– 596. Google Scholar CrossRef Search ADS   24. Chen Y. & Chi Y. ( 2014) Robust spectral compressed sensing via structured matrix completion. IEEE Trans. Inf. Theory , 60, 6576– 6601. Google Scholar CrossRef Search ADS   25. Chen S. S., Donoho D. L. & Saunders M. A. ( 2001) Atomic decomposition by basis pursuit. SIAM Rev. , 43, 129– 159. Google Scholar CrossRef Search ADS   26. De Castro Y. & Gamboa F. ( 2012) Exact reconstruction using Beurling minimal extrapolation. J. Math. Anal. Appl. , 395, 336– 354. Google Scholar CrossRef Search ADS   27. De Prony B. G. R. ( 1795) Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures. J. de l’école Polytechnique , 1, 24– 76. 28. Donoho D. L. ( 2006) Compressed sensing. IEEE Trans. Inf. Theory , 52, 1289– 1306. Google Scholar CrossRef Search ADS   29. Donoho D. L. & Huo X. ( 2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory , 47, 2845– 2862. Google Scholar CrossRef Search ADS   30. Donoho D. L. & Stark P. B. ( 1989) Uncertainty principles and signal recovery. SIAM J. Appl. Math. , 49, 906– 931. Google Scholar CrossRef Search ADS   31. Dragotti P. L. & Lu Y. M. ( 2014) On sparse representation in Fourier and local bases. IEEE Trans. Inf. Theory , 60, 7888– 7899. Google Scholar CrossRef Search ADS   32. Dumitrescu B. ( 2007) Positive Trigonometric Polynomials and Signal Processing Applications . Springer. 33. Duval V. & Peyré G. ( 2014) Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. , 15, 1– 41. 34. Eftekhari A. & Wakin M. B. ( 2015) Greed is super: a fast algorithm for super-resolution. arXiv preprint arXiv:1511.03385 . 35. Fannjiang A. & Liao W. ( 2012) Coherence pattern-guided compressive sensing with unresolved grids. SIAM J. Imag. Sci. , 5, 179– 202. Google Scholar CrossRef Search ADS   36. Faxin Y., Yiying S. & Yongtan L. ( 2001) An effective method of anti-impulsive-disturbance for ship-target detection in hf radar. Radar, 2001 CIE International Conference on, Proceedings . IEEE. pp. 372– 375. 37. Fernandez-Granda C. ( 2013) Support detection in super-resolution. Proceedings of the 10th International Conference on Sampling Theory and Applications . pp. 145– 148. 38. Fernandez-Granda C. ( 2016) Super-resolution of point sources via convex programming. Information Inference . https://doi.org/10.1093/imaiai/iaw005. 39. Grant M., Boyd S. & Ye Y. ( 2008) CVX: Matlab software for disciplined convex programming. 40. Gross D. ( 2009) Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory , 57, 1548– 1566. Google Scholar CrossRef Search ADS   41. Harris F. ( 1978) On the use of windows for harmonic analysis with the discrete Fourier transform. IEEE Proc. , 66, 51– 83. Google Scholar CrossRef Search ADS   42. Lagarias J. C., Reeds J. A., Wright M. H. & Wright P. E. ( 1998) Convergence properties of the nelder–mead simplex method in low dimensions. SIAM J. Optim. , 9, 112– 147. Google Scholar CrossRef Search ADS   43. Leonowicz Z., Lobos T. & Rezmer J. ( 2003) Advanced spectrum estimation methods for signal analysis in power electronics. IEEE Trans. Ind. Electron. , 50, 514– 519. Google Scholar CrossRef Search ADS   44. Li X. ( 2013) Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. , 37, 73– 99. Google Scholar CrossRef Search ADS   45. Lu X., Wang J., Ponsford A. M. & Kirlin R. L. ( 2010) Impulsive noise excision and performance analysis. 2010 IEEE Radar Conference . Washington, DC: IEEE. pp. 1295– 1300. 46. Mairal J., Bach F. & Ponce J., et al.   ( 2014) Sparse modeling for image and vision processing. Foundations and Trends® in Computer Graphics and Vision , 8, 85– 283. Google Scholar CrossRef Search ADS   47. Mallat S. G. & Zhang Z. ( 1993) Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Proc. , 41, 3397– 3415. Google Scholar CrossRef Search ADS   48. McCoy M. B. & Tropp J. A. ( 2014) Sharp recovery bounds for convex demixing, with applications. Found. Comput. Math. , 14, 503– 567. Google Scholar CrossRef Search ADS   49. Moitra A. ( 2015) Super-resolution, extremal functions and the condition number of Vandermonde matrices. Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC) . ACM, pp. 821– 830. 50. Olshausen B. A. & Field D. ( 1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature , 381, 607– 609. Google Scholar CrossRef Search ADS   51. Pati Y. C., Rezaiifar R. & Krishnaprasad P. ( 1993) Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on. IEEE. pp. 40– 44. 52. Rao N., Shah P. & Wright S. ( 2014) Forward-backward greedy algorithms for signal demixing. Signals, Systems and Computers, 2014 48th Asilomar Conference on IEEE. pp. 437– 441. 53. Rao N., Shah P. & Wright S. ( 2015) Forward–backward greedy algorithms for atomic norm regularization. IEEE Trans. Sig. Proc. , 63, 5798– 5811. Google Scholar CrossRef Search ADS   54. Rockafellar R. ( 1974) Conjugate Duality and Optimization . Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics. 55. Rudin L. I., Osher S. & Fatemi E. ( 1992) Nonlinear total variation based noise removal algorithms. Physica D , 60, 259– 268. Google Scholar CrossRef Search ADS   56. Schaeffer A. C. ( 1941) Inequalities of A. Markoff and S. Bernstein for polynomials and related functions. Bull. Amer. Math. Soc. , 47, 565– 579. Google Scholar CrossRef Search ADS   57. Schmidt R. ( 1986) Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. , 34, 276– 280. Google Scholar CrossRef Search ADS   58. Slepian D. ( 1978) Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V – The discrete case. Bell Syst. Tech. J. , 57, 1371– 1430. Google Scholar CrossRef Search ADS   59. Smith J. O. ( 2008) Introduction to Digital Filters: with Audio Applications , vol. 2. Julius Smith. 60. Stoica P., Babu P. & Li J. ( 2011) New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data. IEEE Trans. Sig. Proc. , 59, 35– 47. Google Scholar CrossRef Search ADS   61. Stoica P., Moses R., Friedlander B. & Soderstrom T. ( 1989) Maximum likelihood estimation of the parameters of multiple sinusoids from noisy measurements. IEEE Trans. Acoust. Speech Sig. Proc. , 37, 378– 392. Google Scholar CrossRef Search ADS   62. Stoica P. & Moses R. L. ( 2005) Spectral Analysis of Signals , 1 edn. Upper Saddle River, NJ: Prentice Hall. 63. Su D. ( 2016) Compressed sensing with corrupted Fourier measurements. arXiv preprint arXiv:1607.04926 . 64. Tang G. ( 2015) Resolution limits for atomic decompositions via Markov-Bernstein type inequalities. Proceedings of the 10th International Conference on Sampling Theory and Applications . pp. 548– 552. 65. Tang G., Bhaskar B. & Recht B. ( 2015) Near minimax line spectral estimation. IEEE Trans. Inf. Theory,  61, 499– 512. Google Scholar CrossRef Search ADS   66. Tang G., Bhaskar B., Shah P. & Recht B. ( 2013) Compressed sensing off the grid. IEEE Trans. Inf. Theory , 59, 7465– 7490. Google Scholar CrossRef Search ADS   67. Tang G., Bhaskar B. N. & Recht B. ( 2013) Sparse recovery over continuous dictionaries-just discretize. 2013 Asilomar Conference on Signals, Systems and Computers . pp. 1043– 1047. 68. Tang G., Shah P., Bhaskar B. N. & Recht B. ( 2014) Robust line spectral estimation. Signals, Systems and Computers, 2014 48th Asilomar Conference on . IEEE. pp. 301– 305. 69. Tibshirani R. ( 1996) Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B , 58, 267– 288. 70. Tropp J. A. ( 2008) On the linear independence of spikes and sines. J. Fourier Anal. Appl. , 14, 838– 858. Google Scholar CrossRef Search ADS   71. Tropp J. A. ( 2011) User-friendly tail bounds for sums of random matrices. Found. Comput. Math. , 12, 389– 434. Google Scholar CrossRef Search ADS   72. Viti V., Petrucci C. & Barone P. ( 1997) Prony methods in NMR spectroscopy. Int. J. Imaging Syst. Technol. , 8, 565– 571. Google Scholar CrossRef Search ADS   73. Yang Z. & Xie L. On gridless sparse methods for line spectral estimation from complete and incomplete data. IEEE Trans. Sig. Proc. , 63, 3139– 3153. CrossRef Search ADS   74. Zeng W.-J., So H. & Huang L. ( 2013) $$\ell_p$$-music: Robust direction-of-arrival estimator for impulsive noise environments. IEEE Trans. Sig. Proc. , 61, 4296– 4308. Google Scholar CrossRef Search ADS   75. Zheng L. & Wang X. ( 2017) Improved NN-JPDAF for joint multiple target tracking and feature extraction. arXiv preprint arXiv:1703.08254 . © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com

### Journal

Information and Inference: A Journal of the IMAOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Unlimited reading Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere. ### Stay up to date Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates. ### Organize your research It’s easy to organize your research with our built-in tools. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. ### Monthly Plan • Read unlimited articles • Personalized recommendations • No expiration • Print 20 pages per month • 20% off on PDF purchases • Organize your research • Get updates on your journals and topic searches$49/month

14-day Free Trial

Best Deal — 39% off

### Annual Plan

• All the features of the Professional Plan, but for 39% off!
• Billed annually
• No expiration
• For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588$360/year

billed annually

14-day Free Trial