# Superresolution without separation

Superresolution without separation Abstract This article provides a theoretical analysis of diffraction-limited superresolution, demonstrating that arbitrarily close point sources can be resolved in ideal situations. Precisely, we assume that the incoming signal is a linear combination of $$M$$ shifted copies of a known waveform with unknown shifts and amplitudes, and one only observes a finite collection of evaluations of this signal. We characterize properties of the base waveform such that the exact translations and amplitudes can be recovered from $$2M+1$$ observations. This recovery can be achieved by solving a weighted version of basis pursuit over a continuous dictionary. Our analysis shows that $$\ell_1$$-based methods enjoy the same separation-free recovery guarantees as polynomial root finding techniques, such as de Prony’s method or Vetterli’s method for signals of finite rate of innovation. Our proof techniques combine classical polynomial interpolation techniques with contemporary tools from compressed sensing. 1. Introduction Imaging below the diffraction limit remains one of the most practically important yet theoretically challenging problems in signal processing. Recent advances in superresolution imaging techniques have made substantial progress toward overcoming these limits in practice [32,46], but theoretical analysis of these powerful methods remains elusive. Building on polynomial interpolation techniques and tools from compressed sensing, this article provides a theoretical analysis of diffraction-limited superresolution, demonstrating that arbitrarily close point sources can be resolved in ideal situations. We assume that the measured signal takes the form   x(s)=∑i=1Mciψ(s,ti). (1.1) Here, $$\psi(s,t)$$ is a differentiable function that describes the image at spatial location $$s$$ of a point source of light localized at $$t$$. The function $$\psi$$ is called the point spread function, and we assume its particular form is known beforehand. In (1.1), $$t_1, \ldots, t_M$$ are the locations of the point sources and $$c_1,...,c_M > 0$$ are their intensities. Throughout we assume that these quantities together with the number of point sources $$M$$ are fixed but unknown. The primary goal of superresolution is to recover the locations and intensities from a set of noiseless observations   {x(s)|s∈S}. Here, $${\mathcal S}$$ is the set of points at which we observe $$x$$; we denote the elements of $${\mathcal S}$$ by $$s_1,\ldots,s_n$$. A mock-up of such a signal $$x$$ is displayed in Fig. 1. Fig. 1 View largeDownload slide An illustrative example of (1.1) with the Gaussian point spread function $$\psi(s,t)= \,{\rm e}^{-(s-t)^2}$$. The $$t_i$$ are denoted by red dots, and the true intensities $$c_i$$ are illustrated by vertical, dashed black lines. The super position resulting in the signal $$x$$ is plotted in blue. The samples $${\mathcal S}$$ would be observed at the tick marks on the horizontal axis. Fig. 1 View largeDownload slide An illustrative example of (1.1) with the Gaussian point spread function $$\psi(s,t)= \,{\rm e}^{-(s-t)^2}$$. The $$t_i$$ are denoted by red dots, and the true intensities $$c_i$$ are illustrated by vertical, dashed black lines. The super position resulting in the signal $$x$$ is plotted in blue. The samples $${\mathcal S}$$ would be observed at the tick marks on the horizontal axis. In this article, building on the work of Candès & Fernadez-Granda [14,15,34] and Tang et al. [9,57,58], we aim to show that we can recover the tuple $$(t_i, c_i, M)$$ by solving a convex optimization problem. We formulate the superresolution imaging problem as an infinite-dimensional optimization over measures. Precisely, note that the observed signal can be rewritten as   x(s)=∑i=1Mciψ(s,ti)=∫ψ(s,t)dμ⋆(t). (1.2) Here, $$\mu_\star$$ is the positive discrete measure $$\sum_{i=1}^M c_i\delta_{t_i}$$, where $$\delta_t$$ denotes the Dirac measure centered at $$t$$. We aim to show that we can recover $$\mu_\star$$ by solving the following:   minimizeμ ∫w(t)dμ(t)subject tox(s)=∫ψ(s,t)dμ(t)s∈Ssuppμ⊂Bμ≥0. (1.3) Here, $$B$$ is a fixed compact set and $$w(t)$$ is a weighting function that weights the measure at different locations. The optimization problem (1.3) is over the set of all positive finite measures $$\mu$$ supported on $$B$$. The optimization problem (1.2) is an analog of $$\ell_1$$ minimization over the continuous domain $$B$$. Indeed, if we know a priori that the $$t_i$$ are elements of a finite discrete set $${\it {\Omega}}$$, then optimizing over all measures subject to $$\text{supp} \mu \subset {\it {\Omega}}$$ is precisely equivalent to weighted $$\ell_1$$ minimization. This infinite-dimensional analog with uniform weights has proven useful for compressed sensing over continuous domains [58], resolving diffraction-limited images from low-pass signals [14,34,57], system identification [52] and many other applications [21]. We will see below that the weighting function essentially ensures that all of the candidate locations are given equal influence in the optimization problem. Our main result, Theorem 1.4, establishes that for one-dimensional signals, under rather mild conditions, we can recover $$\mu_\star$$ from the optimal solution of (1.3). Our conditions, described in full-detail below, essentially require the observation of at least $$2M+1$$ samples, and that the set of translates of the point spread function forms a linearly independent set. In Theorem 1.1, we verify that these conditions are satisfied by the Gaussian point spread function for any $$M$$ source locations with no minimum separation condition. We present an analysis of an $$\ell_1$$-based method that matches the separation-free performance of polynomial root finding techniques [23,27,61]. Our motivation for such an analysis is that $$\ell_1$$-based methods generalize to higher dimensions and are empirically stable in the presence of noise. Boyd et al. [12] show that the problem (1.3) can be optimized to precision $$\epsilon$$ in polynomial time using a greedy algorithm, building on earlier work by Bredies and Pikkarainen [13]. In our experiments in Section 3, we use this algorithm to demonstrate that our theory applies, and to show that even in multiple dimensions with noise, we can recover closely spaced point sources. 1.1 Main result We restrict our theoretical attention to the one-dimensional case, leaving the higher-dimensional cases to future work. Let $$\psi:\mathbb{R}^2 \rightarrow \mathbb{R}$$ be our one-dimensional point spread function, with the first argument denoting the position where we are observing the image of a point source located at the second argument. We assume that $$\psi$$ is differentiable in both arguments. For convenience, we will assume that $$B = [-T,T]$$ for some large scalar $$T$$. However, our proof will trivially extend to more restricted subsets of the real line. Moreover, we will state our results for the special case where $${\mathcal S} = \{ s_1,\ldots,s_n \}$$, although our proof is written for possibly infinite measurement sets. We define the weighting function in the objective of our optimization problem via   w(t)=1n∑i=1nψ(si,t). Our main result establishes conditions on $$\psi$$ such that the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Importantly, we show that these conditions are satisfied by the Gaussian point spread function with no separation condition. Theorem 1.1 Suppose $$|{\mathcal S}|> 2M$$ and $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$. Then for any $$t_1 < \ldots < t_M$$, the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Before we proceed to state the main result, we need to introduce a bit of notation and define the notion of a Tchebycheff system. Let $$K(t,\tau) = \frac 1 n \sum_{i=1}^n \psi(s_i,t) \psi(s_i,\tau)$$ and define the vector-valued function $$v : \mathbb{R} \to \mathbb{R}^{2M}$$ via   v(s)=[ψ(s,t1)…ψ(s,tM)ddt1ψ(s,t1)…ddtMψ(s,tM)]T. (1.4) Definition 1.2 A set of functions $$u_1,\ldots, u_n$$ is called a Tchebycheff system (or T-system) if for any points $$\tau_1 < \ldots < \tau_n$$, the matrix   (u1(τ1)…u1(τn)⋮un(τ1)…un(τn)) is invertible. Conditions 1.3 We impose the following three conditions on the point spread function $$\psi$$: Positivity For all $$t\in B$$ we have $$w(t) > 0$$. Independence The matrix $$\frac 1 n \sum_{i=1}^n v(s_i) v(s_i)^{\rm T}$$ is non-singular. T-system $$\{ K(\cdot, t_1), \ldots, K(\cdot,t_M), \frac d {\,{\rm d}t_1} K(\cdot, t_1), \ldots, \frac d {\,{\rm d}t_M} K(\cdot,t_M), w(\cdot) \}$$ form a T-system. Theorem 1.4 If $$\psi$$ satisfies the Conditions 1.3 and $$|{\mathcal S}| > 2M$$, then the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Note that the first two parts of Conditions 1.3 are easy to verify. Positivity eliminates the possibility that a candidate point spread function could equal zero at all locations—obviously we would not be able to recover the source in such a setting! Independence is satisfied if   {ψ(⋅,t1),…,ψ(⋅,tM),ddt1ψ(⋅,t1),…,ddtMψ(⋅,tM)}is a T-system. This condition allows us to recover the amplitudes uniquely, assuming we knew the true $$t_i$$ locations a priori, but it is also useful for constructing a dual certificate as we discuss below. We remark that we actually prove the theorem under a weaker condition than T-system. Define the matrix-valued function $${it{\Lambda}} : \mathbb{R}^{2M+1} \to \mathbb{R}^{{2M+1}\times{2M+1}}$$ by   Λ(p1,…,p2M+1):=[κ(p1)…κ(p2M+1)1…1], (1.5) where $$\kappa: \mathbb{R} \to \mathbb{R}^{2M}$$ is defined as   κ(t)=1n∑i=1nψ(si,t)w(t)v(si). (1.6) Our proof of Theorem 1.4 replaces condition T-system with the following: Determinantal There exists $$\rho > 0$$ such that for any $$t_i^-,t_i^+ \in (t_i-\rho,t_i+\rho)$$, and $$t\in [-T,T]$$, the matrix $${it{\Lambda}}\big(t_1^-, t_1^+,\ldots,t_M^-,t_M^+,t\big)$$ is non-singular whenever $$t,t_i^-,t_i^+$$ are distinct. This condition looks more complicated than T-system and is indeed non-trivial to verify. It is essentially a local T-system condition in the sense that the points $$\tau_i$$ Definition 1.2 are restricted to lie in a small neighborhood about the $$t_i$$. It is clear that T-system implies Determinantal. The advantage of the more general condition is that it can hold for finitely supported $$\psi$$, whereas this is not true for T-system. In fact, it is easy to see that if T-system holds for any point spread function $$\psi$$, then Determinantal holds for the truncated version $$\psi(s,t)\mathbf{1}\{|s-t|\le 3T\}$$, where $$\mathbf{1}\{x \le y\}$$ is the indicator variable equal to $$1$$ when $$x \le y$$ and zero otherwise. We suspect that Determinantal may hold for significantly tighter truncations. As we will see below, T-system and Independence are related to the existence of a canonical dual certificate that is used ubiquitously in sparse approximation [16,36]. In compressed sensing, this construction is due to Fuchs [36], but its origins lie in the theory of polynomial interpolation developed by Markov and Tchebycheff, and extended by Gantmacher, Krein, Karlin and others (see the survey in Section 1.2). In the continuous setting of superresolution, the dual certificate becomes a dual polynomial: a function of the form $$Q(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t) q(s_j)$$ satisfying   Q(t) ≤w(t)|Q(ti)| =w(t)i=1,…,M. (1.7) To see how T-system might be useful for constructing a dual polynomial, note that as $$t_1^+ \downarrow t_1$$ and $$t_1^-\uparrow t_1$$, the first two columns of $${it{\Lambda}}(t_1^+,t_1^-,\ldots,t)$$ converge to the same column, namely $$\kappa(t_1)$$. However, if we divide by the difference $$t_1^+ - t_1^-$$ and take a limit, then we obtain the derivative of the second column. In particular, some calculation shows T-system implies   det[Aκ(t)ωw(t)]≠0∀t≠ti, where $$A = \frac 1 n \sum_{j=1}^n v(s_i) v(s_i)^T$$ is the matrix from Independence, and   ω=[w(t1),…,w(tM),w′(t1),…,w′(tM)]. Taking the Schur complement in $$w(t)$$, we find   det[Aκ(t)ωw(t)]=detA[ωTA−1κ(t)−w(t)]. Hence, it seems like the function $$\omega^{\rm T} A^{-1} \kappa(t)$$ might serve well as our dual polynomial. However, it remains unclear from this short calculation that this function is bounded above by $$w(t)$$. The proof of Theorem 1.4 makes this construction rigorous using the theory of T-systems. Before turning to the proofs of these theorems (c.f. Sections 2.1 and 2.4), we survey the mathematical theory of superresolution imaging. 1.2 Foundations: Tchebycheff systems Our proofs rely on the machinery of Tchebycheff1 systems. This line of work originated in the 1884 doctoral thesis of A. A. Markov on approximating the value of an integral $$\int_a^b f(x) \,{\rm d}x$$ from the moments $$\int_a^b xf(x) \,{\rm d}x$$, $$\ldots$$, $$\int_a^b x^n f(x)\,{\rm d}x$$. His work formed the basis of the proof by Tchebycheff [59] (who was Markov’s doctoral advisor) of the central limit theorem in 1887. Recall that we defined a T-system in Definition 1.2. An equivalent definition of a T-system is the functions $$u_1,...,u_n$$ form a T-system if and only if every linear combination $$U(t) = a_1u_1(t) + \cdots + a_n u_n(t)$$ has at most $$n-1$$ zeros. One natural example of a T-system is given by the functions $$1, t, \ldots, t^{n-1}$$. Indeed, a polynomial of order $$n-1$$ can have at most $$n-1$$ zeros. Equivalently, the Vandermonde determinant does not vanish,   |11…1t1t2…tnt12t22…tn2⋮t1n−1t2n−1…tnn−1|≠0, for any $$t_1<\ldots<t_n$$. Just as Vandermonde systems are used to solve polynomial interpolation problems, T-systems allow the generalization of the tools from polynomial fitting to a broader class of nonlinear function-fitting problems. Indeed, given a T-system $$u_1,...,u_n$$, a generalized polynomial is a linear combination $$U(t) = a_1u_1(t) + \cdots + a_n u_n(t)$$. The machinery of T-systems provides a basis for understanding the properties of these generalized polynomials. For a survey of T-systems and their applications in statistics and approximation theory, see [37,41,42]. In particular, many of our proofs are adapted from [42], and we call out the parallel theorems whenever this is the case. 1.3 Prior art and related work Broadly, superresolution techniques enhance the resolution of a sensing system, optical or otherwise; resolution is the distance at which distinct sources appear indistinguishable. The mathematical problem of localizing point sources from a blurred signal has applications in a wide array of empirical sciences: astronomers deconvolve images of stars to angular resolution beyond the Rayleigh limit [48], and biologists capture nanometer resolution images of fluorescent proteins [11,40,51,63]. Detecting neural action potentials from extracellular electrode measurements is fundamental to experimental neuroscience [31], and resolving the poles of a transfer function is fundamental to system identification [52]. To understand a radar signal, one must decompose it into reflections from different sources [38]; and to understand an NMR spectrum, one must decompose it into signatures from different chemicals [57]. The mathematical analysis of point source recovery has a long history going back to the work of de Prony [23], who pioneered techniques for estimating sinusoidal frequencies. de Prony’s method is based on algebraically solving for the roots of polynomials and can recover arbitrarily closely spaced frequencies. The annihilation filter technique introduced by Vetterli et al. [61] can perfectly recover any signal of finite rate of innovation with minimal samples. In particular, the theory of signals with finite rate of innovation shows that, given a superposition of pulses of the form $$\sum a_k\psi(t-t_k)$$, one can reconstruct the shifts $$t_k$$ and coefficients $$a_k$$ from a minimal number of samples [27,61]. This holds without any separation condition on the $$t_k$$, and as long as the base function $$\psi$$ can reproduce polynomials of a certain degree (see [27, Section A.1] for more details). The algorithm used for this reconstruction is, however, based on polynomial rooting techniques that do not easily extend to higher dimensions. Moreover, this algebraic technique is not robust to noise (see the discussion in [56, Section IV.A], for example). In contrast, we study sparse recovery techniques. This line of thought goes back at least to Carathéodory [18,19]. Our contribution is an analysis of $$\ell_1$$ based methods that matches the performance of the algebraic techniques of Vetterli in the one dimensional and noiseless setting. Our primary motivation is that $$\ell_1$$ based methods may be more stable to noise and trivially generalize to higher dimensions (although our analysis currently does not). It is tempting to apply the theory of compressed sensing [3,16,17,25] to problem (1.3). If one assumes the point sources are located on a finite grid and are well separated, then some of the standard models for recovery are valid (e.g. incoherency, restricted isometry property or restricted eigenvalue property). With this motivation, many authors solve the gridded form of the superresolution problem in practice [2,4,20,28,29,33,39,49,54,55,63]. However, this approach has some significant drawbacks. The theoretical requirements imposed by the classical models of compressed sensing become more stringent as the grid becomes finer. Furthermore, making the grid finer can also lead to numerical instabilities and computational bottlenecks in practice. Despite recent successes in many empirical disciplines, the theory of superresolution imaging remains limited. Candès and Fernandes-Granada [15] recently made an important contribution to the mathematical analysis of superresolution, demonstrating that semi-infinite optimization could be used to solve the classical de Prony problem. Their proof technique has formed the basis of several other analyses, including that of Bendory et al. [7] and that of Tang et al. [57]. To better compare with our approach, we briefly describe the approach of [7,15,57] here. They construct the vector $$q$$ of a dual polynomial $$Q(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t) q_j$$ as a linear combination of $$\psi(s,t_i)$$ and $$\frac{\,{\rm d}}{\,{\rm d}t_i} \psi(s,t_i)$$. In particular, they define the coefficients of this linear combination as the least squares solution to the system of equations   Q(ti) =sign(ci)i=1,…,MddtQ(t)|t=ti =0i=1,…,M. (1.8) They prove that, under a minimum separation condition on the $$t_i$$, the system has a unique solution because the matrix for the system is a perturbation of the identity, hence invertible. Much of the mathematical analysis on superresolution has relied heavily on the assumption that the point sources are separated by more than some minimum amount [5,7,15,24,26,30,45]. We note that in practical situations with noisy observations, some form of minimum separation may be necessary. One can expect, however, that the required minimum separation should go to zero as the noise level decreases: a property that is not manifest in previous results. Our approach, by contrast, does away with the minimum separation condition by observing that the matrix for the system (1.8) need not be close to the identity to be invertible. Instead, we impose Conditions 1.3 to guarantee invertibility directly. Not surprisingly, we use techniques from T-systems to construct an analog of the polynomial $$Q$$ in (1.8) for our specific problem. Another key difference is that we consider the weighted objective $$\int w(t)\,{\rm d}\mu(t)$$, whereas prior work [7,15,57] has analysed the unweighted objective $$\int\,{\rm d}\mu(t)$$. We, too, could not remove the separation condition without reweighing by $$w(t)$$. In Section 3, we provide evidence that this mathematically motivated reweighing step actually improves performance in practice. Weighting has proven to be a powerful tool in compressed sensing, and many works have shown that weighting an $$\ell_1$$-like cost function can yield improved performance over standard $$\ell_1$$ minimization [35,43,60,62]. To our knowledge, the closest analogy to our use of weights comes from Rauhut and Ward [50], who use weights to balance the influence of dynamic range of bases in polynomial interpolation problems. In the setting of this article, weights will serve to lessen the influence of sources that have low overlap with the observed samples. We are not the first to bring the theory of Tchebycheff systems to bear on the problem of recovering finitely supported measures. Building on the foundational work of Beurling [8], de Castro & Gamboa [22] prove that a finitely supported positive measure $$\mu$$ can be recovered exactly from measurements of the form   {∫u0dμ,…,∫undμ} whenever $$\{u_0,\ldots,u_n\}$$ form a T-system containing the constant function $$u_0 = 1$$. These measurements are almost identical to ours; if we set $$u_k(t) = \psi(s_k,t)$$ for $$k = 1,\ldots,n$$, where $$\{s_1,\ldots,s_n\}={\mathcal S}$$ is our measurement set, then our measurements are of the form   {x(s)|s∈S}={∫u1dμ,…,∫undμ}. However, in practice, it is often impossible to directly measure the mass $$\int u_0\,{\rm d}\mu = \int\,{\rm d}\mu$$ as required by (1.3). Moreover, the requirement that $$\{1,\psi(s_1,t),\ldots,\psi(s_n,t)\}$$ form a T-system does not hold for the Gaussian point spread function $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$ (see Remark 2.6). Therefore the theory of [22] is not readily applicable to superresolution imaging. We conclude our review of the literature by discussing some prior literature on $$\ell_1$$-based super-resolution without a minimum separation condition. We would like to mention the work of Fuchs [36] in the case that the point spread function is band limited and the samples are on a regularly spaced grid. This result also does not require a minimum separation condition. However, our results hold for considerably more general point spread functions and sampling patterns. Finally, in a recent article, Bendory [6] presents an analysis of $$\ell_1$$ minimization in a discrete setup by imposing a Rayleigh regularity condition which, in the absence of noise, requires no minimum separation. Our results are of a different flavor, as our setup is continuous. Furthermore, we require linear sample complexity, whereas the theory of Bendory [6] requires infinitely many samples. 2. Proofs In this section, we prove Theorem 1.4 and Theorem 1.1. We start by giving a short list of notation to be used throughout the proofs. We write our proofs for an arbitrary measurement $${\mathcal S}$$, which need not be finite for the sake of the proof. Let $$P$$ denote a fixed positive finite measure on $${\mathcal S}$$ for which $$\psi(\cdot,t) \in L^2_P$$, and set   w(t)=∫ψ(s,t)dP(s). For concreteness, the reader might think of $$P$$ as the uniform measure over $${\mathcal S}$$, where if $${\mathcal S}$$ is finite, then $$w(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t).$$ Just note that the particular choice of $$P$$ does not affect the proof. Notation glossary We denote the inner product of functions $$f,g\in L^2_{P}$$ by $${\left \langle {f,g} \right \rangle}_{{P}} := \int f(s) g(s) \,{\rm d}P(s).$$ For any differentiable function $$f: \mathbb{R}^2 \to \mathbb{R}$$, we denote the derivative in its first argument by $$\partial_1 f$$ and in its second argument by $$\partial_2 f$$. For $$t \in \mathbb{R}$$, let $$\psi_t(\cdot) = \psi(\cdot,t)$$. 2.1 Proof of Theorem 1.4 We prove Theorem 1.4 in two steps. We first reduce the proof to constructing a function $$q$$ such that $${\left \langle {q,\psi_t} \right \rangle}_{{P}}$$ possesses some specific properties. Proposition 2.1 If the first three items of Conditions 1.3 hold, and if there exists a function $$q$$ such that $$Q(t) := {\left \langle {q,\psi_{t}} \right \rangle}_{{P}}$$ satisfies   Q(tj) =w(tj)j=1,…,MQ(t) <w(tj)for t∈[−T,T] and t≠tj, (2.1) then the true measure $$\mu_\star := \sum_{j=1}^M c_j \delta_{t_j}$$ is the unique optimal solution of the program 1.3. This proof technique is somewhat standard [16,36]: the function $$Q(t)$$ is called a dual certificate of optimality. However, introducing the function $$w(t)$$ is a novel aspect of our proof. The majority of arguments have $$w(t) = 1$$. Note that when $$\int \psi(s,t) \,{\rm d}P(s)$$ is independent of $$t$$, then $$w(t)$$ is a constant and we recover the usual method of proof. In the second step, we construct $$q(s)$$ as a linear combination of the $$t_i$$-centered point spread functions $$\psi(s, t_i)$$ and their derivatives $$\partial_2\psi(s, t_i)$$. Theorem 2.2 Under the Conditions 1.3, there exist $$\alpha_1,\ldots,\alpha_M,\beta_1,\ldots,\beta_M,c\in \mathbb{R}$$ such that $$Q(t) = \langle q, \psi_t\rangle_P$$ satisfies (2.1), where   q(s)=∑i=1M(αiψ(s,ti)+βiddtiψ(s,ti))+c. To complete the proof of Theorem 1.4, it remains to prove Proposition 2.1 and Theorem 2.2. Their proofs can be found in Sections 2.2 and 2.3, respectively. 2.2 Proof of Proposition 2.1 We show that $$\mu_\star$$ is the optimal solution of problem (1.3) through strong duality. The optimization problem (1.3) may have infinitely many conditions, and by [53] its Lagrangian is   L(q,μ)=∫w(t)dμ(t)+∫Sq(s)(x(s)−∫ψ(s,t)dμ(t))dP(s), where $$q \in L^2_P$$ is the dual variable. Note that the Lagrangian can be rewritten as   L(q,μ) =∫w(t)dμ(t)+∫q(s)x(s)dP(s)−∫∫Sq(s)ψ(s,t)dP(s)dμ(t) =∫(w(t)−⟨q,ψt⟩P)dμ(t)+⟨q,x⟩P, (2.2) where we have applied Fubini’s theorem to obtain the first equality and rearranged terms to obtain the second. The dual of problem (1.3) is   maximizeqminimizeμ≥0suppμ⊂[−T,T]L(q,μ). From equation (2.2), we infer that the inner minimization above yields $$\langle q,x\rangle_P$$ if $$w(t) \ge {\left \langle {q,\psi_t} \right \rangle}_{{P}}$$ for all $$t\in [-T,T]$$ and $$-\infty$$ otherwise. Hence, the dual of problem (1.3) is   maximizeq⟨q,x⟩Psubject to⟨q,ψt⟩P≤w(t)for t∈[−T,T]. (2.3) Since the primal (1.3) is equality constrained, Slater’s condition naturally holds, implying strong duality. See [53] for background on Slater’s condition in the presence of infinitely many constraints. As a consequence, we have   ⟨q,x⟩P=∫w(t)dμ(t)⟺ q is dual optimal and μ is primal optimal. Suppose $$q$$ satisfies (2.1). Hence, $$q$$ is dual feasible and we have   ⟨q,x⟩P= ∑j=1Mcj⟨q,ψtj⟩P=∑j=1McjQ(tj)= ∫w(t)dμ⋆(t). Therefore, $$q$$ is dual optimal and $$\mu_\star$$ is primal optimal. Next we show uniqueness. Suppose the primal (1.3) has another optimal solution   μ^=∑j=1M^c^jδt^j such that $$\{\hat{t}_1,\ldots,\hat{t}_{\hat M}\} \ne \{t_1,\ldots,t_M\} := \mathcal{T}$$. Then we have   ⟨q,x⟩P= ∑jc^j⟨q,ψt^j⟩P= ∑t^j∈Tc^jQ(t^j)+∑t^j∉Tc^jQ(t^j)< ∑t^j∈Tc^jw(t^j)+∑t^j∉Tc^jw(t^j)=∫w(t)dμ^(t). Therefore, all optimal solutions must be supported on $$\{t_1, \ldots, t_M \}$$. We now show that the coefficients of any optimal $$\hat \mu$$ are uniquely determined. By condition Independence, the matrix $$\int v(s) v(s)^{\rm T} \,{\rm d}P(s)$$ is invertible. As it is also positive semidefinite, then it is positive definite, so, in particular its upper $$M\times M$$ block is also positive definite:   det∫[ψ(s,t1)⋮ψ(s,tM)][ψ(s,t1)…ψ(s,tM)]dP(s)≠0. Hence, there must be $$s_1,\ldots, s_{M} \in {\mathcal S}$$ such that the matrix with entries $$\psi(s_i,t_j)$$ is non-singular. Now consider some optimal $$\hat \mu = \sum_{i=1}^M \hat c_i t_i$$. Since $$\hat \mu$$ is feasible, we have   x(sj)=∑i=1Mc^iψ(sj,ti)=∑i=1Mciψ(sj,ti)forj=1,…,M. Since $$\psi(s_i, t_j)$$ is invertible, we conclude that the coefficients $$c_1,\ldots,c_M$$ are unique. Hence, $$\mu_\star$$ is the unique optimal solution of (1.3). 2.3 Proof of theorem 2.2 We construct $$Q(t)$$ via a limiting interpolation argument due to Krein [44]. We have adapted some of our proofs (with non-trivial modifications) from the aforementioned text by Karlin and Studden [42]. We give reference to the specific places where we borrow from classical arguments. In the sequel, we make frequent use of the following elementary manipulation of determinants: Lemma 2.3 If $$v_0,\ldots,v_{n}$$ are vectors in $$\mathbb{R}^{n}$$, and $$n$$ is even, then   |v1−v0…vn−v0|=|v1…vnv01…11|. Proof. By elementary manipulations, both determinants in the statement of the lemma are equal to   |v1−v0…vn−v0v00…01|. □ In what follows, we consider $$\epsilon>0$$ such that   t1−ϵ<t1+ϵ<t2−ϵ<t2+ϵ<⋯<tM−ϵ<tM+ϵ. Definition 2.4 A point $$t$$ is a nodal zero of a continuous function $$f:\mathbb R \to \mathbb R$$ if $$f(t) = 0$$ and $$f$$ changes sign at $$t$$. A point $$t$$ is a non-nodal zero if $$f(t) = 0$$, but $$f$$ does not change sign at $$t$$. This distinction is illustrated in Fig. 2. Fig. 2 View largeDownload slide The point $$a$$ is a nodal zero of $$f$$, and the point $$b$$ is a non-nodal zero of $$f$$. Fig. 2 View largeDownload slide The point $$a$$ is a nodal zero of $$f$$, and the point $$b$$ is a non-nodal zero of $$f$$. Our proof of Theorem 2.2 proceeds as follows. With $$\epsilon$$ fixed, we construct a function   Q~ϵ(t)=∑i=1Mαϵ[i]KP(t,ti)+βϵ[i]∂2KP(t,ti) such that $${\tilde{Q}}_{\epsilon}(t) = w(t)$$ only at the points $$t = t_j \pm\epsilon$$ for all $$j=1, 2, \ldots, M$$, and the points $$t_j\pm\epsilon$$ are nodal zeros of $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ for all $$j=1,2,\dots, M$$. We then consider the limiting function $${\tilde{Q}}(t) = \lim\limits_{\epsilon \downarrow 0} {\tilde{Q}}_\epsilon(t)$$, and prove that either $${\tilde{Q}}(t)$$ satisfies (2.1) or $$2w(t) - {\tilde{Q}}(t)$$ satisfies (2.1). An illustration of this construction is pictured in Fig. 3. Fig. 3 View largeDownload slide The relationship between the functions $$w(t)$$, $${\tilde{Q}}_\epsilon(t)$$ and $${\tilde{Q}}(t)$$. The function $${\tilde{Q}}_\epsilon(t)$$ touches $$w(t)$$ only at $$t_i\pm \epsilon$$, and these are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. The function $${\tilde{Q}}(t)$$ touches $$w(t)$$ only at $$t_i$$ and these are non-nodal zeros of $${\tilde{Q}}(t) - w(t)$$. Fig. 3 View largeDownload slide The relationship between the functions $$w(t)$$, $${\tilde{Q}}_\epsilon(t)$$ and $${\tilde{Q}}(t)$$. The function $${\tilde{Q}}_\epsilon(t)$$ touches $$w(t)$$ only at $$t_i\pm \epsilon$$, and these are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. The function $${\tilde{Q}}(t)$$ touches $$w(t)$$ only at $$t_i$$ and these are non-nodal zeros of $${\tilde{Q}}(t) - w(t)$$. We begin with the construction of $${\tilde{Q}}_\epsilon$$. We aim to find the coefficients $$\alpha_{\epsilon}, \beta_{\epsilon}$$ to satisfy   Q~ϵ(ti−ϵ)=w(ti−ϵ)andQ~ϵ(ti+ϵ)=w(ti+ϵ)fori=1,…,M. This system of equations is equivalent to the system   Q~ϵ(ti−ϵ) =w(ti−ϵ)fori=1,…,MQ~ϵ(ti+ϵ)−Q~ϵ(ti−ϵ)2ϵ =w(ti+ϵ)−w(ti−ϵ)2ϵfori=1,…,M. (2.4) Note that this is a linear system of equations in $$\alpha_\epsilon,\beta_\epsilon$$ with coefficient matrix given by   Kϵ:=[KP(tj−ϵ,ti)∂2KP(tj−ϵ,ti)12ϵ(KP(tj+ϵ,ti)−KP(tj−ϵ,ti))12ϵ(∂2KP(tj+ϵ,ti)−∂2KP(tj−ϵ,ti))]. That is, the equation (2.4) can be written as   Kϵ[|αϵ||βϵ|]=[w(t1−ϵ)⋮w(tM−ϵ)12ϵ(w(t1+ϵ)−w(t1−ϵ))⋮12ϵ(w(tM+ϵ)−w(tM−ϵ))]. We first show that the matrix $$\mathbf{K}_\epsilon$$ is invertible for all $$\epsilon$$ sufficiently small. Note that as $$\epsilon\to 0$$ the matrix $$\mathbf{K}_{\epsilon}$$ converges to   K:=[KP(tj,ti)∂2KP(tj,ti)∂1KP(tj,ti)∂1∂2KP(tj,ti)]=∫v(s)v(s)TdP(s), which is positive definite by Independence. Since the entries of $$\mathbf{K}_\epsilon$$ converge to the entries of $$\mathbf{K}$$, there is a $${it{\Delta}} > 0$$ such that $$\mathbf{K}_\epsilon$$ is invertible for all $$\epsilon \in (0, {it{\Delta}})$$. Moreover, $$\mathbf K_{\epsilon}^{-1}$$ converges to $$\mathbf K^{-1}$$ as $$\epsilon\to 0$$ and for all $$\epsilon < {it{\Delta}}$$, the coefficients are uniquely defined as   [|αϵ||βϵ|]=Kϵ−1[w(t1−ϵ)⋮w(tM−ϵ)12ϵ(w(t1+ϵ)−w(t1−ϵ))⋮12ϵ(w(tM+ϵ)−w(tM−ϵ))]. (2.5) We denote the corresponding function by   Q~ϵ(t):=∑i=1Mαϵ[i]KP(t,ti)+βϵ[i]∂2KP(t,ti). Before we construct $${\tilde{Q}}(t)$$, we take a moment to establish the following remarkable consequences of the Determinantal condition. For all $$\epsilon>0$$ sufficiently small, the following hold: (a) $${\tilde{Q}}_{\epsilon}(t) = w(t)$$ only at the points $$t_1 - \epsilon,t_1+\epsilon, \ldots, t_{M}-\epsilon,t_M+\epsilon$$. (b) These points $$t_1 - \epsilon,t_1+\epsilon, \ldots, t_{M}-\epsilon,t_M+\epsilon$$ are nodal zeros of $${\tilde{Q}}_{\epsilon}(t)-w(t)$$. We adapted the proofs of (a) and (b) (with non-trivial modification) from the proofs of Theorem 1.6.1 and Theorem 1.6.2 of [42]. Proof of $$(a).$$ Suppose for the sake of contradiction that there is a $$\tau \in [-T,T]$$ such that $${\tilde{Q}}_{\epsilon}(\tau) = w(\tau)$$ and $$\tau \notin \{t_1 - \epsilon,t_1 +\epsilon, \ldots,t_M-\epsilon, t_M + \epsilon\}$$. Then we have the system of $$2M$$ linear equations   Q~ϵ(tj−ϵ)w(tj−ϵ)−Q~ϵ(τ)w(τ) =0j=1,…,MQ~ϵ(tj+ϵ)w(tj+ϵ)−Q~ϵ(τ)w(τ) =0j=1,…,M. Rewriting this in matrix form, the coefficient vector $$[ \alpha_\epsilon \quad \beta_\epsilon ] = [\alpha_\epsilon^{[1]} \quad \cdots \quad \alpha_\epsilon^{[M]}\quad \beta_\epsilon^{[1]}\quad \cdots \quad\beta_\epsilon^{[M]}]$$ of $${\tilde{Q}}_{\epsilon}$$ satisfies   [αϵβϵ](κ(t1−ϵ)−κ(τ)κ(t1+ϵ)−κ(τ)…κ(tM+ϵ)−κ(τ))=[0…0]. (2.6) By Lemma 2.3 applied to the $$2M+1$$ vectors $$v_1 = \kappa(t_1 - \epsilon), \ldots, v_{2M} = \kappa(t_M+\epsilon)$$ and $$v_0 = \kappa(\tau)$$, the matrix for the system of equations (2.6) is non-singular if and only if the following matrix is non-singular:   [κ(t1−ϵ)…κ(tM+ϵ)κ(τ)1…11]=Λ(t1−ϵ,…,tM+ϵ,τ). However, this is non-singular by the Determinantal condition. This gives us the contradiction that completes the proof of part $$(a)$$. Proof of $$(b)$$. Suppose for the sake of contradiction that $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ has $$N_1 < 2M$$ nodal zeros and $$N_0 = 2M - N_1$$ non-nodal zeros. Denote the nodal zeros by $$\{\tau_1, ..., \tau_{N_1}\}$$ and denote the non-nodal zeros by $$z_1, \ldots, z_{N_0}$$. In what follows, we obtain a contradiction by doubling the non-nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. We do this by constructing a certain generalized polynomial $$u(t)$$ and adding a small multiple of it to $${\tilde{Q}}_\epsilon(t) - w(t)$$. See Fig. 4 for an illustration. Fig. 4 View largeDownload slide The points $$\{\tau_1,\tau_2,\tau_3,\tau_4\}$$ are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$, and the points $$\{\zeta_1,\zeta_2,\zeta_3\}$$ are non-nodal zeros. The function $$u(t)$$ has the appropriate sign so that $${\tilde{Q}}_\epsilon(t) - w(t) + \delta u(t)$$ retains nodal zeros at $$\tau_i$$ and obtains two zeros in the vicinity of each $$\zeta_i$$. Fig. 4 View largeDownload slide The points $$\{\tau_1,\tau_2,\tau_3,\tau_4\}$$ are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$, and the points $$\{\zeta_1,\zeta_2,\zeta_3\}$$ are non-nodal zeros. The function $$u(t)$$ has the appropriate sign so that $${\tilde{Q}}_\epsilon(t) - w(t) + \delta u(t)$$ retains nodal zeros at $$\tau_i$$ and obtains two zeros in the vicinity of each $$\zeta_i$$. We divide the non-nodal zeros into groups according to whether $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ is positive or negative in a small neighborhood around the zero; define   I−:={i|Q~ϵ≤w near zi}andI+:={i|Q~ϵ≥w near zi}. We first show that there are coefficients $$a_0, \ldots, a_M$$, and $$b_1,\ldots, b_M$$ such that the polynomial   u(t)=∑i=1MaiKP(t,ti)+∑i=1Mbi∂2KP(t,ti)+a0w(t) satisfies the system of equations   u(zj) =+1j∈I−u(zj) =−1j∈I+u(τi) =0i=1,…,N1u(τ) =0, (2.7) where $$\tau$$ is some arbitrary additional point. The matrix for this system is   W(κ(z1)T1⋮κ(zN0)T1κ(τ1)T1⋮κ(τN1)T1κ(τ)1), where $$\mathbf{W} = \text{diag}\big(w(z_1),\ldots,w(z_{N_0}),w(\tau_1),\ldots,w(\tau_{N_1}),w(\tau)\big)$$. This matrix is invertible by Determin-antal since the nodal and non-nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$ are given by $$t_1-\epsilon,\ldots,t_M+\epsilon$$. Hence, there is a solution to the system (2.7). Now consider the function   Uδ(t)=Q~ϵ(t)+δu(t)=∑i=1M[αϵ[i]+δai]KP(t,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(t,ti)+δa0w(t), where $${\delta} > 0$$. By construction, $$u(\tau_i) = 0$$, so $$U^\delta(t) - w(t)$$ has nodal zeros at $$\tau_1,\ldots,\tau_{N_1}$$. We can choose $$\delta$$ small enough so that $$U^\delta(t) - w(t)$$ vanishes twice in the vicinity of each $$z_i$$. This means that $$U^{\delta} (t) - w(t)$$ has $$2M + N_0$$ zeros. Assuming $$N_0>0$$, select a subset of these zeros $$p_1 < \ldots < p_{2M+1}$$ such that there are two in each interval $$[t_i - \rho,t_i +\rho]$$. This is possible if $$\epsilon < \rho$$ and $$\delta$$ is sufficiently small. We have the system of $$2M+1$$ equations   ∑i=1M[αϵ[i]+δai]KP(p1,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(p1,ti) =(1−δa0)w(τ)⋮∑i=1M[αϵ[i]+δai]KP(p2M+1,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(p2M+1,ti) =(1−δa0)w(τ). Subtracting the last equation from each of the first $$2M$$ equations, we find that   (αϵ[1]+δa1,…,βϵ[M]+δbM)(κ(p1)−κ(p2M+1)…κ(p2M)−κ(p2M+1))=(0,…,0). This matrix is non-singular by Lemma 2.3 combined with the Determinantal condition. This contradiction implies that $$N_0 = 0$$. This completes the proof of (b). We now complete the proof by constructing $${\tilde{Q}}(t)$$ from $${\tilde{Q}}_{\epsilon}(t)$$ by sending $$\epsilon \to 0$$. Note that the coefficients $$\alpha_{\epsilon}, \beta_{\epsilon}$$ converge as $$\epsilon\to 0$$ since the right-hand side of equation (2.5) converges to   K−1[w(t1)⋮w(tM)w′(t1)⋮w′(tM)]=[|α||β|]. We denote the limiting function by   Q~(t)=∑i=1MαiKP(t,ti)+∑i=1Mβi∂2KP(t,ti). (2.8) We conclude that $$w(t) - {\tilde{Q}}(t)$$ does not change sign at the $$t_i$$ since $$w(t) - {\tilde{Q}}_\epsilon(t)$$ changes sign only at $$t_i \pm \epsilon$$. We now show that the limiting process does not introduce any additional zeros of $$w(t) - {\tilde{Q}}(t)$$. Suppose $${\tilde{Q}}(t)$$ does touch $$w(t)$$ at some $$\tau_1 \in [-T,T]$$ with $$\tau_1 \ne t_i$$ for any $$i=1, ..., M$$. Since $$w(t) - {\tilde{Q}}(t)$$ does not change sign, the points $$t_1,\ldots,t_M,\tau_1$$ are non-nodal zeros of $$w(t) - {\tilde{Q}}(t)$$. We find a contradiction by constructing a polynomial with two nodal zeros in the vicinity of each of these $$M+1$$ points (but possibly only one nodal zero in the vicinity of $$\tau_1$$ if $$\tau_1=T$$ or $$\tau_1 = -T$$). For sufficiently small $$\gamma>0$$, the polynomial   Wγ(t)=Q~(t)+γw(t) attains the value $$w(t)$$ twice in the vicinity of each $$t_i$$ and twice in the vicinity of $$\tau_1$$. In other words, there exist $$p_1 < \ldots < p_{2M+2}$$ such that $$W_{\gamma}(p_i) = w(p_i).$$ Therefore,   Q~(pi)=(1−γ)w(pi)fori=1,…,2M+2, and so $$\frac{{\tilde{Q}}(p_i)}{w(p_i)} - \frac{{\tilde{Q}}(p_{2M+1})}{w(p_{2M+1})} = 0$$ for $$i=1,2,...,2M$$. Thus, the coefficient vector for the polynomial $${\tilde{Q}}(t)$$ lies in the left nullspace of the matrix   (κ(p1)−κ(p2M+1)…κ(p2M)−κ(p2M+1)). However, this matrix is non-singular by Lemma 2.3 and the Determinantal condition. Collecting our results, we have proven that $${\tilde{Q}}(t) - w(t) = 0$$ if and only if $$t = t_i$$ and that $${\tilde{Q}}(t) - w(t)$$ does not change sign when $$t$$ passes through $$t_i$$. Therefore, one of the following is true   w(t)≥Q~(t)orQ~(t)≥w(t) with equality iff $$t = t_i$$. In the first case, $$Q(t) = {\tilde{Q}}(t)$$ fulfills the prescriptions (2.1) with   q(t)=∑i=1Mαiψ(s,ti)+βiddtiψ(s,ti). In the second case, $$Q(t) = 2w(t) - {\tilde{Q}}(t)$$ satisfies (2.1) with   q(t)=2−∑i=1Mαiψ(s,ti)+βiddtiψ(s,ti). 2.4 Proof of theorem 1.1 Integrability and Positivity naturally hold for the Gaussian point spread function $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$. Independence holds because $$\psi(s, t_1), \ldots, \psi(s, t_M)$$, together with their derivatives $$\partial_2\psi(s, t_1), \ldots, \partial_2\psi(s, t_M)$$, form a T-system (see, for example, [42]). This means that for any $$s_1 < \ldots < s_{2M} \in \mathbb{R}$$,   |v(s1)…v(s2M)|≠0, and the determinant always takes the same sign. Therefore, by an integral version of the Cauchy–Binet formula for the determinant (cf. [41]),   |∫v(s)v(s)TdP(s)|=(2M)!∫s1<…<s2M|v(s1)…v(s2M)||v(s1)T⋮v(s2M)T|dP(s1)…dP(s2M)≠0. To establish the Determinantal condition, we prove the slightly stronger statement:   |Λ(p1,…,p2M+1)|=|∫[v(s)1][ψ(s,p1)w(p1)…ψ(s,p2M+1)w(p2M+1)]dP(s)|≠0 (2.9) for any distinct $$p_1,\ldots,p_{2M+1}$$. When $$p_1,\ldots,p_{2M+1}$$ are restricted so that two points $$p_i,p_j$$ lie in each ball $$(t_k-\rho,t_k +\rho)$$, we recover the statement of Determinantal. We prove (2.9) with the following key lemma. Lemma 2.5 For any $$s_1< \ldots < s_{2M+1}$$ and $$t_1 < \ldots < t_{M}$$,   |e−(s1−t1)2⋯e−(s2M+1−t1)2−(s1−t1)e−(s1−t1)2⋯−(s2M+1−t1)e−(s2M+1−t1)2⋮⋮e−(s1−tM)2⋯e−(s2M+1−tM)2−(s1−tM)e−(s1−tM)2⋯−(s2M+1−tM)e−(s2M+1−tM)21⋯1|≠0. Before proving this lemma, we show how it can be used to prove (2.9). By Lemma 2.5, we know in particular that for any $$s_1<\cdots < s_{2M+1}$$,   det[v(s1)⋯v(s2M+1)1⋯1]≠0 and is always the same sign. Moreover, for any $$s_1 < \cdots < s_{2M+1}$$, and any $$p_1<\ldots<p_{2M+1}$$,   det[ψ(s1,p1)…ψ(s1,p2M+1)⋮ψ(s2M+1,p1)…ψ(s2M+1,p2M+1)]>0. Any function with this property is called totally positive, and it is well known that the Gaussian kernel is totally positive [42]. Now, to show that Determinantal holds for the finite sampling measure $$P$$, we use an integral version of the Cauchy–Binet formula for the determinant:   |∫[v(s)1][ψ(s,p1)w(p1)…ψ(s,p2M+1)w(p2M+1)]dP(s)|  =(2M+1)!∫s1<⋯<s2M+1|v(s1)⋯v(s2M+1)1⋯1||ψ(s1,p1)w(p1)…ψ(s1,p2M+1)w(p2M+1)⋮ψ(s2M+1,p1)w(p1)…ψ(s2M+1,p2M+1)w(p2M+1)|dP(s1)…dP(s2M+1). The integral is non-zero since all integrands are non-zero and have the same sign. This proves (2.9). Proof of Lemma 2.5. Multiplying the $$2i-1$$ and $$2i$$th row by $$\,{\rm e}^{t_i^2}$$ and the $$i$$th column by $$\,{\rm e}^{s_i^2}$$, and subtracting $$t_i$$ times the $$2i-1$$th row from the $$2i$$th row, we obtain that we equivalently have to show that   |es1t1es2t1…es2M+1t1s1es1t1s2es2t1…s2M+1eskt1es1t2es2t2…es2M+1t2⋮es1tMes2tM…es2M+1tMs1es1tMs2es2tM…s2M+1es2M+1tMes12es22…es2M+12|≠0. The above matrix has a vanishing determinant if and only if there exists a non-zero vector   (a1,b1,...,aM,bM,aM+1) in its left null space. This vector has to have non-zero last coordinate since by Example 1.1.5 in [42], the Gaussian kernel is extended totally positive, and therefore the upper $$2M\times 2M$$ submatrix has a non-zero determinant. Therefore, we assume that $$a_{M+1} = 1$$. Thus, the matrix above has a vanishing determinant if and only if the function   ∑i=1M(ai+bis)etis+es2 (2.10) has at least the $$2M+1$$ zeros $$s_1 < s_2 < ... < s_{2M+1}$$. Lemma 2.7, applied to $$r = M$$ and $$d_1 = \cdots = d_M = 1$$, establishes that this is impossible. To complete the proof of Lemma 2.5, it remains to state and prove Lemma 2.7. □ Remark 2.6 The inclusion of the derivatives is essential for the shifted Gaussians to form a T-system together with the constant function $$1$$. In particular, following the same logic as in the proof of Lemma 2.5, we find that $$\{1,\,{\rm e}^{(s-t_1)^2},\ldots,\,{\rm e}^{(s-t_M)^2}\}$$ form a T-system if and only if the function   ∑i=1Maietis+es2 has at most $$M$$ zeros. However, for $$M = 3$$, the function has four zeros if we select $$a_1 = -3$$, $$t_1 = 1$$, $$a_2 = 7$$, $$t_2 = 0$$, $$a_3 = -5$$ and $$t_3 = -1$$. Lemma 2.7 Let $$d_1,...,d_r\in\mathbb N$$. The function   ϕd1,...,dr(s)=∑i=1r(ai0+ai1s+⋯+ai(2di−1)s2di−1)etis+es2 has at most $$2(d_1 + \cdots + d_r)$$ zeros. Proof. We are going to show that $$\phi_{d_1,...,d_r}(s)$$ has at most $$2(d_1 + \cdots + d_r)$$ zeros as follows. Let   g0(s)=ϕd1,...,dr(s). For $$k=1, ..., d_1 + \cdots + d_r$$, let   gk(s)={d2ds2[gk−1(s)e(−tj+t1+⋯+tj−1)s], if k=d1+⋯+dj−1+1 for some j,d2ds2[gk−1(s)], otherwise.  (2.11) If we show that $$g_{d_1 + \cdots + d_r}(s)$$ has no zeros, then $$g_{d_1 + \cdots + d_r-1}(s)$$ has at most two zeros, counting with multiplicity. By induction, it will follow that $$g_0(s)$$ has at most $$2(d_1 + \cdots + d_r)$$ zeros, counting with multiplicity. Note that if $$d_1 + \cdots + d_{j-1} \leq k < d_1 + \cdots + d_{j-1} + d_j$$, then   gk(s)= (a~j,2(k−d1+⋯+dj−1)+⋯+a~j,(2dj−1)s2dj−1−2(k−d1+⋯+dj−1)) +∑i=j+1r(a~i0+⋯+a~i(2di−1)s2di−1)e(ti−(t1+⋯+tj−1))r+cfi(r)er2, where $$c>0$$ is a constant and $$r := s - c_i$$. We are going to show that $$f_i(r)$$ is a sum of squares polynomial such that one of the squares is a positive constant. This would mean that $$g_k(s) = f_k(s)\,{\rm e}^{s^2}$$ has no zeros. Denote   p0(s) =1p1(s) =2s ⋮pi(s) =2spi−1(s−ci)+pi−1′(s−ci), where $$c_1, ..., c_k$$ are constants. It follows by induction that the degree of $$p_i(s)$$ is $$\deg(p_i) = i$$ and the leading coefficient of $$p_i(s)$$ is $$2^i$$. We will show by induction that   fi(s) =pi(s)2+12pi′(s)2+⋯+12ii!pi(i)(s)2 =∑j=0i12jj!pi(j)(s)2. When $$i=0$$, we have that $$f_0(s) = 1$$ and $$\sum_{j=0}^0 \frac1{2^jj!}p_0^{(j)}(s)^2 = 1$$. We are going to prove the general statement by induction. Suppose the statement is true for $$i-1$$. By the relationship (2.11), we have   fi(s)es2=d2ds2[es2fi−1(s−ci)]=d2ds2[es2∑j=0i−112jj!pi−1(j)(s−ci)2]= ∑j=0i−1es22jj!{2pi−1(j+2)(s−ci)pi−1(j)(s−ci)+2pi−1(j+1)(s−ci)2  +(4s2+2)pi−1(j)(s−ci)2+8spi−1(j)(s−ci)pi−1(j+1)(s−ci)}. (2.12) We need to show that this expression is equal to $$\,{\rm e}^{s^2}\left(\sum_{j=0}^{i} \frac{p_{i}^{(j)}(s)^2}{2^jj!} \right)$$. Since   pi(s)=2spi−1(s−ci)+pi−1′(s−ci), it follows by induction that $$p_i^{(j)}(s) = 2jp_{i-1}^{(j-1)}(s-c_i) + 2sp_{i-1}^{(j)}(s-c_i) + p_{i-1}^{(j+1)}(s-c_i).$$ Therefore we obtain   es2(∑j=0ipi(j)(s)22jj!) =es2∑j=0i12jj![2jpi−1(j−1)(s−ci)+2spi−1(j)(s−ci)+pi−1(j+1)(s−ci)]2 =es2∑j=0i12jj![4j2pi−1(j−1)(s−ci)2+4s2pi−1(j)(s−cI)2+pi−1(j+1)(s−ci)2  +8jspi−1(j−1)(s−ci)pi−1(j)(s−ci) + 4spi−1(j)(s−ci)pi−1(j+1)+4jpi−1(j−1)(s−ci)pi−1(j+1)(s−ci)]. (2.13) There are four types of terms in the sums (2.12) and (2.13):   pi−1(j)(s−ci)2,s2pi−1(j)(s−ci)2,pi−1(j−1)(s−ci)pi−1(j)(s−ci) and spi−1(j−1)(s−ci)pi−1(j)(s−ci). For a fixed $$j\in\{0, 1, ..., i+1\}$$, it is easy to check that the coefficients in front of each of these terms in (2.12) and (2.13) are equal. Therefore,   fi(s) =pi(s)2+12pi′(s)2+⋯+12ii!pi(i)(s)2 =∑j=0i12jj!pi(j)(s)2. Note that since $$\deg(p_i) = i$$, then, $$p_i^{(i)}(s)$$ equals the leading coefficient of $$p_i(s)$$, which, as we discussed above, equals $$2^i$$. Therefore, the term $$\frac1{2^i i!} p_i^{(i)}(s)^2 = 2^i i!$$. Thus, one of the squares in $$f_i(s)$$ is a positive number, so $$f_i(s) > 0$$ for all $$s$$. □ 3. Numerical experiments In this section, we present the results of several numerical experiments to complement our theoretical results. To allow for potentially noisy observations, we solve the constrained least squares problem   minimizeμ≥0 ∑i=1n(∫ψ(si,t)dμ(t)−x(si))2subject to ∫w(t)μ(dt)≤τ (3.1) using the conditional gradient method proposed in [12]. 3.1 Reweighing matters for source localization Our first numerical experiment provides evidence that weighting by $$w(t)$$ helps recover point sources near the border of the image. This matches our intuition: near the border, the mass of an observed point source is smaller than if it were measured in the center of the image. Hence, if we did not weight the candidate locations, sources that are close to the edge of the image would be beneficial to add to the representation. We simulate two populations of images, one with point sources located away from the image boundary and one with point sources located near the image boundary. For each population of images, we solve (3.1) with $$w(t)=\int \psi(s,t) \,{\rm d}P(s)$$ (weighted) and with $$w(t) = 1$$ (unweighted). We find that the solutions to (3.1) recover the true point sources more accurately with $$w(t) = \int \psi(s,t) \,{\rm d}P(s)$$. We use the same procedure for computing accuracy as in [10]. Namely, we match true point sources to estimated point courses and compute the F-score of the match. To describe this procedure in detail, we compute the F-score by solving a bipartite graph matching problem. In particular, we form the bipartite graph with an edge between $$t_i$$ and $$\hat t_j$$ for all $$i,j$$ such that $$\Vert t_i - \hat t_j \Vert < r$$, where $$r>0$$ is a tolerance parameter, and $$\hat t_1, \ldots, \hat t_N$$ are the estimated point sources. Then we greedily select edges from this graph under the constraint that no two selected edges can share the same vertex; that is, no $$t_i$$ can be paired with two $$\hat t_{j}, \hat t_k$$ or vice versa. Finally, the $$\hat t_i$$ successfully paired with some $$t_j$$ are categorized as true positives, and we denote their number by $$T_{\rm P}$$. The number of false negatives is $$F_{\rm N} = M - T_{\rm P}$$, and the number of false positives is $$N - T_{\rm P}$$. The precision and recall are then $$P = \frac{T_{\rm P}}{T_{\rm P} + F_{\rm N}},$$ and $$R = \frac{T_{\rm P}}{T_{\rm P} + F_{\rm P}}$$, respectively, and the F-score is the harmonic mean:   F=2PRP+R. We find a match by greedily pairing points of $$\{\tau_1,\ldots,\tau_N\}$$ to elements of $$\{t_1,\ldots,t_M\}$$, and a tolerance radius $$r>0$$ upper bounds the allow distance between any potential pairs. To emphasize the dependence on $$r$$, we sometimes write $$F(r)$$ for the F-score. Both populations contain $$100$$ images simulated using the Gaussian point spread function   ψ(s,t)=e−(s−t)2σ2 with $$\sigma = 0.1$$, and in both cases, the measurement set $${\mathcal S}$$ is a dense uniform grid of $$n = 100$$ points covering $$[0,1]$$. The populations differ in how the point sources for each image are chosen. Each image in the first population has five points drawn uniformly in the interval $$(0.1,0.9)$$, while each simulated image in the second population has a total of four point sources with two point sources in each of the two boundary regions $$(0,0.1)$$ and $$(0.9,1)$$. In both cases, we assign intensity of $$1$$ to all point sources and solve (3.1) using an optimal value of $$\tau$$ (chosen with a preliminary simulation). The results are displayed in Fig. 5. The left subplot shows that the F-scores are essentially the same for the weighted and unweighted problems when the point sources are away from the boundary. This is not surprising because when $$t$$ is away from the border of the image, then $$\int \psi(s,t)\,{\rm d}P(s)$$ is essentially a constant, independent of $$t$$. But when the point sources are near the boundary, the weighting matters and the F-scores are dramatically better as shown in the right subplot. Fig. 5 View largeDownload slide Reweighing matters for source localization. The two plots above compare the quality of solutions to the weighted problem (with $$w(t) = \int \psi(s,t)\,{\rm d}P(s)$$) and the unweighted problem (with $$w(t) = 1$$). When point sources are away from the boundary (left plot), the performance is nearly identical. But when the point sources are near the boundary (right plot), the weighted method performs significantly better. Fig. 5 View largeDownload slide Reweighing matters for source localization. The two plots above compare the quality of solutions to the weighted problem (with $$w(t) = \int \psi(s,t)\,{\rm d}P(s)$$) and the unweighted problem (with $$w(t) = 1$$). When point sources are away from the boundary (left plot), the performance is nearly identical. But when the point sources are near the boundary (right plot), the weighted method performs significantly better. 3.2 Sensitivity to point-source separation Our theoretical results assert that in the absence of noise, the optimal solution of (1.3) recovers point sources with no minimum bound on the separation. In the following experiment, we explore the ability of (3.1) to recover pairs of points as a function of their separation. The setup is similar to the first numerical experiment. We use the Gaussian point spread function with $$\sigma = 0.1$$ as before, but here we observe only $$n=50$$ samples. For each separation $$d\in\{0.1\sigma, 0.2\sigma,\ldots,1.9\sigma, 2\sigma\}$$, we simulate a population of $$20$$ images containing two point sources separated by $$d$$. The point sources are chosen by picking a random point $$x$$ away from the border of the image and placing two point sources at $$x\pm \frac d 2$$. Again, each point source is assigned an intensity of $$1$$, and we attempt to recover the locations of the point sources by solving (3.1). In the left subplot of Fig. 6, we plot F-score versus separation for the value of $$\tau$$ that produces the best F-scores. Note that we achieve near perfect recovery for separations greater than $$\frac \sigma 4$$. The right subplot of Fig. 6 shows the observations, true point sources and estimated point sources for a separation of $$\frac d \sigma = \frac 1 2$$. Note the near perfect recovery in spite of the small separation. Fig. 6 View largeDownload slide Sensitivity to point-source separation. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace shows an image for $$\frac d \sigma = \frac 1 2$$. The green stars show the locations (x coordinate) and weights (y coordinate) of the true point sources. The red dots show the recovered locations and weights. Fig. 6 View largeDownload slide Sensitivity to point-source separation. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace shows an image for $$\frac d \sigma = \frac 1 2$$. The green stars show the locations (x coordinate) and weights (y coordinate) of the true point sources. The red dots show the recovered locations and weights. Because of numerical issues, we cannot localize point sources with arbitrarily small $$d>0$$. Indeed, the F-score for $$\frac d \sigma < \frac 1 4$$ is quite poor. This does not contradict our theory because numerical ill-conditioning is in effect adding noise to the recovery problem, and we expect that a separation condition will be necessary in the presence of noise. 3.3 Sensitivity to noise Next, we investigate the performance of (3.1) in the presence of additive noise. The setup is identical to the previous numerical experiment, except that we add Gaussian noise to the observations. In particular, our noisy observations are   {x(si)+ηi|si∈S}, where $$\eta_i\sim {\mathcal N}(0,0.1)$$. We measure the performance of (3.1) in Fig. 7. Note that we achieve near-perfect recovery when $$d> \sigma$$. However, if $$d< \sigma$$, the F-scores are clearly worse than the noiseless case. Unsurprisingly, we observe that sources must be separated to recover their locations to reasonable precision. We defer an investigation of the dependence of the signal separation as a function of the signal-to-noise ratio to future work. Fig. 7 View largeDownload slide Sensitivity to noise. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace is the $$50$$ pixel image we observe. The green stars show the locations (x-coordinate) and weights (y-coordinate) of the true point sources. The red dots show the recovered locations and weights. Fig. 7 View largeDownload slide Sensitivity to noise. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace is the $$50$$ pixel image we observe. The green stars show the locations (x-coordinate) and weights (y-coordinate) of the true point sources. The red dots show the recovered locations and weights. 3.4 Extension to two dimensions Though our proof does not extend as is, we do expect generalizations of our recovery result to higher dimensional settings. The optimization problem (3.1) extends immediately to arbitrary dimensions, and we have observed that it performs quite well in practice. We demonstrate in Fig. 8 the power of applying (3.1) to a high-density fluorescence image in simulation. Figure 8 shows an image simulated with parameters specified by the single molecule localization microscopy challenge [?]. In this challenge, point sources are blurred by a Gaussian point-spread function and then corrupted by noise. The green stars show the true locations of a simulated collection of point sources, and the red dots show the support of the measure output by (3.1) applied to the grayscale image forming the background of Fig. 8. The overlap between the true locations and estimated locations is near perfect with an F-score of $$0.98$$ for a tolerance radius corresponding to one third of a pixel. Fig. 8 View largeDownload slide High density single molecule imaging. The green stars show the locations of a simulated collection point sources and the grayscale background shows the noisy, pixelated point spread image. The red dots show the support of the measure-valued solution of (3.1). Fig. 8 View largeDownload slide High density single molecule imaging. The green stars show the locations of a simulated collection point sources and the grayscale background shows the noisy, pixelated point spread image. The red dots show the support of the measure-valued solution of (3.1). 4. Conclusions and future work In this article, we have demonstrated that one can recover the centers of a non-negative sum of Gaussians from a few samples by solving a convex optimization problem. This recovery is theoretically possible no matter how close the true centers are to one another. We remark that similar results are true for recovering measures from their moments. Indeed, the atoms of a positive atomic measure can be recovered no matter how close together the atoms are, provided one observes twice the number of moments as there are atoms. Our work can be seen as a generalization of this result, applying generalized polynomials and the theory of Tchebycheff systems in place of properties of Vandermonde systems. As we discussed in our numerical experiments, this work opens up several theoretical problems that would benefit from future investigation. We close with a very brief discussion of some of the possible extensions. Noise Motivated by the fact that there is no separation condition in the absence of noise, it would be interesting to study how the required separation decays to zero as the noise level decreases. One of the key advantages of using convex optimization for signal processing is that dual certificates generically give stability results, in the same way that Lagrange multipliers measure sensitivity in linear programming. Previous work on estimating line spectra has shown that dual polynomials constructed for noiseless recovery extend to certify properties of estimation and localization in the presence of noise [14,34,57]. We believe that these methods should be directly applicable to our problem setup. Higher dimensions One logical extension is proving that the same results hold in higher dimensions. Most scientific and engineering applications of interest have point sources arising one to four dimensions, and we expect that some version of our results should hold in higher dimensions. Indeed, we believe a guarantee for recovery with no separation condition can be proven in higher dimensions with noiseless observations. However, it is not straightforward to extend our results to higher dimensions because the theory of Tchebycheff systems is only developed in one dimension. In particular, our approach using limits of polynomials does not directly generalize to higher dimensions. Other point spread functions We have shown that our Conditions 1.3 hold for the Gaussian point spread function, which is commonly used in microscopy as an approximation to an Airy function. It will be very useful to show that they also hold for other point spread functions such as the Airy function and other common physical models. Our proof relied heavily on algebraic properties of the Gaussian, but there is a long, rich history of determinantal systems that may apply to generalize our result. In particular, works on properties of totally positive systems may be fruitful for such generalizations [1,47]. Model mismatch in the point spread function Our analysis relies on perfect knowledge of the point spread function. In practice one never has an exact analytic expression for the point spread function. Aberrations in manufacturing and scattering media can lead to distortions in the image not properly captured by a forward model. It would be interesting to derive guarantees on recovery that assume only partial knowledge of the point spread function. Note that the optimization problem of searching both for the locations of the sources and for the associated wave function is a blind deconvolution problem, and techniques from this well-studied problem could likely be extended to the super resolution setting. If successful, such methods could have immediate practical impact when applied to denoising images in molecular, cellular and astronomical imaging. Acknowledgements We would like to thank Pablo Parrilo for introducing us to the theory of T-systems. We would also like to thank Nicholas Boyd, Mahdi Soltanolkotabi, Bernd Sturmfels and Gongguo Tang for many useful conversations about this work. Funding ONR awards (N00014-11-1-0723 and N00014-13-1-0129 to B.R.); NSF awards (CCF-1148243 and CCF-1217058 to B.R.); AFOSR award (FA9550-13-1-0138 to B.R.); a Sloan Research Fellowship (to B.R.); NSF award (CCF-1148243 to G.S.).; NSF CISE Expeditions Award (CCF-1139158, in part); LBNL Award (7076018); DARPA XData Award (FA8750-12-2-0331); gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Adatao, Adobe, Apple, Inc., Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware. Footnotes Tchebycheff is one among many transliterations from the name originally written in cyrillic. Others include Chebyshev, Chebychev and Cebysev. References 1. Ando, T. ( 1987) Totally positive matrices. Linear Algebra Appl. , 90, 165– 219. Google Scholar CrossRef Search ADS   2. Bajwa, W. U., Haupt, J., Sayeed, A. M. & Nowak, R. ( 2010) Compressed channel sensing: a new approach to estimating sparse multipath channels. Proc. IEEE , 98, 1058– 1076. Google Scholar CrossRef Search ADS   3. Baraniuk, R. ( 2007) Compressive sensing [lecture notes]. IEEE Sig. Process Mag ., 24, 118– 121. Google Scholar CrossRef Search ADS   4. Baraniuk, R. & Steeghs, P. ( 2007) Compressive radar imaging. Radar Conference, 2007 IEEE . Waltham, MA: IEEE, pp. 128– 133. 5. Batenkov, D. & Yomdin, Y. ( 2012) Algebraic fourier reconstruction of piecewise smooth functions. Math. Comput. , 81, 277– 318. Google Scholar CrossRef Search ADS   6. Bendory, T. ( 2017) Robust recovery of positive stream of pulses. IEEE Trans. Signal Process , 65, 2114– 2122. Google Scholar CrossRef Search ADS   7. Bendory, T., Dekel, S. & Feuer, A. ( 2014) Robust recovery of stream of pulses using convex optimization. J Math Anal Appl. , 442, 511– 536. Google Scholar CrossRef Search ADS   8. Beurling, A. ( 1938) Sur les intégrales de fourier absolument convergentes et leur application à une transformation fonctionnelle. Ninth Scandinavian Mathematical Congress . pp. 345– 366. 9. Bhaskar, B. N., Tang, G. & Recht, B. ( 2013) Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Sig. Process. , 61, 5987– 5999. Google Scholar CrossRef Search ADS   10. Sage, D., Kirshner, H., Pengo, T., Stuurman, N., Min, J., Manley, S. & Unser, M. ( 2015) Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods , 12, 717– 724. Google Scholar CrossRef Search ADS PubMed  11. Bonifacino, J. S., Olenych, S., Davidson, M. W., Lippincott-Schwartz, J. & Hess, H. F. ( 2006) Imaging intracellular fluorescent proteins at nanometer resolution. Science , 313, 1642– 1645. Google Scholar CrossRef Search ADS PubMed  12. Boyd, N., Schiebinger, G. & Recht, B. ( 2015) The alternating descent conditional gradient method for sparse inverse problems. SIAM J Optimiz. , 27, 616– 639. Google Scholar CrossRef Search ADS   13. Bredies, K. & Pikkarainen, H. K. ( 2013) Inverse problems in spaces of measures. ESAIM Control Optimisation Calculus Var. , 19, 190– 218. Google Scholar CrossRef Search ADS   14. Candès, E. J. & Fernandez-Granda, C. ( 2013) Super-resolution from noisy data. J. Fourier Anal. Appl. , 19, 1229– 1254. Google Scholar CrossRef Search ADS   15. Candès, E. J. & Fernandez-Granda, C. ( 2013) Towards a mathematical theory of super resolution. Comm. Pure Appl. Math , 67, 906– 956. Google Scholar CrossRef Search ADS   16. Candès, E. J., Romberg, J. & Tao, T. ( 2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Thy. , 52, 489– 509. Google Scholar CrossRef Search ADS   17. Candès, E. J. & Wakin, M. ( 2008) An introduction to compressive sampling. IEEE Sig. Process. Mag. , 25, 21– 30. Google Scholar CrossRef Search ADS   18. Carathéodory, C. ( 1907) Ueber den Variabilitaetsbereich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehman. Math. Ann. , 64, 95– 115. Google Scholar CrossRef Search ADS   19. Carathéodory, C. ( 1911) Ueber den Variabilitaetsbereich der Fourier’schen Konstanten von positiven harmonischen Funktionen. Rend. Circ. Mat. , 32, 193– 217. Google Scholar CrossRef Search ADS   20. Cetin, M., Malioutov, D. & Willsky, A. S. ( 2005) A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Sig. Process ., 53, 3010– 3022. Google Scholar CrossRef Search ADS   21. Chandrasekaran, V., Recht, B., Parrilo, P. A. & Willsky, A. ( 2012) The convex geometry of linear inverse problems. Found. Comput. Math. , 12, 805– 849. Google Scholar CrossRef Search ADS   22. De Castro, Y. & Gamboa, F. ( 2012) Exact reconstruction using beurling minimal extrapolation. J Math Anal Appl. , 395, 336– 354. Google Scholar CrossRef Search ADS   23. de Prony, B. G. R. ( 1795) Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures. Journal de l’école Polytechnique , 1, 24– 76. 24. Donoho, D. ( 1992) Superresolution via sparsity constraints. SIAM J. Math. Anal. , 23, 1309– 1331. Google Scholar CrossRef Search ADS   25. Donoho, D. ( 2006) Compressed sensing. IEEE Trans. Inf. Thy. , 52, 1289– 1306. Google Scholar CrossRef Search ADS   26. Donoho, D. & Stark, P. ( 1989) Uncertainty principles and signal recovery. SIAM J. Appl. Math , 49, 906– 931. Google Scholar CrossRef Search ADS   27. Dragotti, P., Vetterli, M. & Blu, T. ( 2007) Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets strang-fix. IEEE Trans. Sig. Process. , 55, 1741– 1757. Google Scholar CrossRef Search ADS   28. Duarte, M. & Baraniuk, R. ( 2013) Spectral compressive sensing. Appl. Comput. Harmon. Anal. , 35, 111– 129. Google Scholar CrossRef Search ADS   29. Duval, V. & Peyré, G. ( 2015) Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. , 15, 1315– 1355. Google Scholar CrossRef Search ADS   30. Eckhoff, K. S. ( 1995) Accurate reconstructions of functions of finite regularity from truncated fourier series expansions. Math. Comput ., 64, 671– 690. Google Scholar CrossRef Search ADS   31. Ekanadham, C., Tranchina, D. & Simoncelli, E. P. ( 2011) Neural Spike Identifcation with Continuous Basis Pursuit.  Salt Lake City, Utah: Computational and Systems Neuroscience (CoSyNe). 32. Evanko, D. ( 2009) Primer: fluorescence imaging under the diffraction limit. Nat. Methods , 6, 19– 20. Google Scholar CrossRef Search ADS   33. Fannjiang, A. C., Strohmer, T. & Yan, P. ( 2010) Compressed remote sensing of sparse objects. SIAM J. Imag. Sci. , 3, 595– 618. Google Scholar CrossRef Search ADS   34. Fernandez-Granda, C. ( 2013) Support detection in super-resolution. Proceedings of the 10th International Conference on Sampling Theory and Applications (SampTA 2013), 145– 148. 35. Friedlander, M. P., Mansour, H., Saab, R. & Yilmaz, O. ( 2012) Recovering compressively sampled signals using partial support information. IEEE Trans. Inf. Theory , 58, 1122– 1134. Google Scholar CrossRef Search ADS   36. Fuchs, J.-J. ( 2005) Sparsity and uniqueness for some specific under-determined linear systems. Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05). IEEE International Conference on. 37. Gantmacher, F. P. & Krein, M. G. ( 2002) Oscillation Matrices and Kernels and Small Vibrations of Mechanical Systems , revised english edn. Providence, RI: AMS Chelsea Pub. Google Scholar CrossRef Search ADS   38. Heckel, R., Morgenshtern, V. & Soltanolkotabi, M. ( 2016) Super-resolution radar. Inf inference , 5, 22– 75. Google Scholar CrossRef Search ADS   39. Herman, M. A. & Strohmer, T. ( 2009) High-resolution radar via compressed sensing. IEEE Trans. Sig. Process. , 57, 2275– 2284. Google Scholar CrossRef Search ADS   40. Hess, S. T., Giriajan, T. P. & Mason, M. D. ( 2006) Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. , 91, 4258– 4272. Google Scholar CrossRef Search ADS PubMed  41. Karlin, S. ( 1968) Total Positivity , vol. 1. Stanford, CA: Stanford University Press. 42. Karlin, S. & Studden, W. ( 1967) Tchebycheff Systems: with Applications in Analysis and Statistics . New York, London, Sydney: Wiley Interscience. 43. Khajehnejad, M. A., Xu, W., Avestimehr, A. S. & Hassibi, B. ( 2011) Analyzing weighted minimization for sparse recovery with nonuniform sparse models. IEEE Trans. Sig. Process. , 59, 1985– 2001. Google Scholar CrossRef Search ADS   44. Krein, M. G. ( 1959) The ideas of P.L. Tchebycheff and A.A. Markov in the theory of limiting values of integrals and their futher development. Am. Math. Soc. Transl. Ser. 2. , 12, 1– 122. 45. Morgenshtern, V. I. & Candès, E. J. ( 2015) Super-resolution of positive sources: the discrete setup. SIAM J. Imaging Sci. , 9, 412– 444. Google Scholar CrossRef Search ADS   46. Nobelprize.org ( 2014) The Nobel Prize in Chemistry . 47. Pinkus, A. ( 2010) Totally Positive Matrices , vol. 181. Cambridge University Press. 48. Puschmann, K. G. & Kneer, F. ( 2005) On super-resolution in astronomical imaging. Astron. Astrophys. , 436, 373– 378. Google Scholar CrossRef Search ADS   49. Rauhut, H. ( 2007) Random sampling of sparse trigonometric polynomials. Appl. Comput. Hamon. Anal. , 22, 16– 42. Google Scholar CrossRef Search ADS   50. Rauhut, H. & Ward, R. ( 2015) Interpolation via weighted $$l_1$$ minimization. Appl. Comput. Harmon. Anal. , To Appear. Preprint available at arxiv:1308.0759. 51. Rust, M. J., Bates, M. & Zhuang, X. ( 2006) Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm). Nat. Methods , 3, 793– 796. Google Scholar CrossRef Search ADS PubMed  52. Shah, P., Bhaskar, B., Tang, G. & Recht, B. ( 2012) Linear system identification via atomic norm regularization. In  Decision and Control (CDC), 2012 IEEE 51st Annual Conference on. 10-13 Dec. 2012. Maui, HI: IEEE. 53. Shapiro, A. ( 2009) Semi-infinite programming, duality, discretization and optimality conditions. Optim. , 58, 133– 161. Google Scholar CrossRef Search ADS   54. Stoica, P. & Babu, P. ( 2012) Spice and likes: Two hyperparameter-free methods for sparse-parameter estimation. Sig. Process , 92, 1580– 1590. Google Scholar CrossRef Search ADS   55. Stoica, P., Babu, P. & Li, J. ( 2011) New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data. IEEE Trans. Sig. Process. , 59, 35– 47. Google Scholar CrossRef Search ADS   56. Tan, V. & Goyal, V. ( 2008) Estimating signals with finite rate of innovation from noisy samples: A stochastic algorithm. IEEE Trans. Sig. Process. , 56, 5135– 5146. Google Scholar CrossRef Search ADS   57. Tang, G., Bhaskar, B. & Recht, B. ( 2015) Near minimax line spectral estimation. IEEE Trans. Inf. Theor. , 61, 499– 512. Google Scholar CrossRef Search ADS   58. Tang, G., Bhaskar, B. N., Shah, P. & Recht, B. ( 2013) Compressed sensing off the grid. IEEE Trans. Inf. Theor. , 59, 7465– 7490. Google Scholar CrossRef Search ADS   59. Tchebycheff, P. L. ( 1887) On two theorems with respect to probabilities. Zap. Akad. Nauk S.-Petersburg , 55, 156– 168. 60. Vaswani, N. & Lu, W. ( 2010) Modified-cs: Modifying compressive sensing for problems with partially known support. IEEE Trans. Sig. Process.,  58, 4595– 4607. Google Scholar CrossRef Search ADS   61. Vetterli, M., Marziliano, P. & Blu, T. ( 2002) Sampling signals with finite rate of innovation. IEEE Trans. Sig. Process. , 50, 1417– 1428. Google Scholar CrossRef Search ADS   62. Von Borries, R., Miosso, C. J. & Potes, C. ( 2007) Compressed sensing using prior information. In Computational Advances in Multi-Sensor Adaptive Processing, 2007. CAMPSAP 2007. 2nd IEEE International Workshop on . St. Thomas, VI: IEEE, pp. 121– 124. 63. Zhu, L., Zhang, W., Elnatanand, D. & Huang, B. ( 2012) Faster storm using compressed sensing. Nat. Methods , 9, 721– 723. Google Scholar CrossRef Search ADS PubMed  © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information and Inference: A Journal of the IMA Oxford University Press

# Superresolution without separation

, Volume 7 (1) – Mar 1, 2018
30 pages

/lp/ou_press/superresolution-without-separation-0uxFrxUmQt
Publisher
Oxford University Press
ISSN
2049-8764
eISSN
2049-8772
D.O.I.
10.1093/imaiai/iax006
Publisher site
See Article on Publisher Site

### Abstract

Abstract This article provides a theoretical analysis of diffraction-limited superresolution, demonstrating that arbitrarily close point sources can be resolved in ideal situations. Precisely, we assume that the incoming signal is a linear combination of $$M$$ shifted copies of a known waveform with unknown shifts and amplitudes, and one only observes a finite collection of evaluations of this signal. We characterize properties of the base waveform such that the exact translations and amplitudes can be recovered from $$2M+1$$ observations. This recovery can be achieved by solving a weighted version of basis pursuit over a continuous dictionary. Our analysis shows that $$\ell_1$$-based methods enjoy the same separation-free recovery guarantees as polynomial root finding techniques, such as de Prony’s method or Vetterli’s method for signals of finite rate of innovation. Our proof techniques combine classical polynomial interpolation techniques with contemporary tools from compressed sensing. 1. Introduction Imaging below the diffraction limit remains one of the most practically important yet theoretically challenging problems in signal processing. Recent advances in superresolution imaging techniques have made substantial progress toward overcoming these limits in practice [32,46], but theoretical analysis of these powerful methods remains elusive. Building on polynomial interpolation techniques and tools from compressed sensing, this article provides a theoretical analysis of diffraction-limited superresolution, demonstrating that arbitrarily close point sources can be resolved in ideal situations. We assume that the measured signal takes the form   x(s)=∑i=1Mciψ(s,ti). (1.1) Here, $$\psi(s,t)$$ is a differentiable function that describes the image at spatial location $$s$$ of a point source of light localized at $$t$$. The function $$\psi$$ is called the point spread function, and we assume its particular form is known beforehand. In (1.1), $$t_1, \ldots, t_M$$ are the locations of the point sources and $$c_1,...,c_M > 0$$ are their intensities. Throughout we assume that these quantities together with the number of point sources $$M$$ are fixed but unknown. The primary goal of superresolution is to recover the locations and intensities from a set of noiseless observations   {x(s)|s∈S}. Here, $${\mathcal S}$$ is the set of points at which we observe $$x$$; we denote the elements of $${\mathcal S}$$ by $$s_1,\ldots,s_n$$. A mock-up of such a signal $$x$$ is displayed in Fig. 1. Fig. 1 View largeDownload slide An illustrative example of (1.1) with the Gaussian point spread function $$\psi(s,t)= \,{\rm e}^{-(s-t)^2}$$. The $$t_i$$ are denoted by red dots, and the true intensities $$c_i$$ are illustrated by vertical, dashed black lines. The super position resulting in the signal $$x$$ is plotted in blue. The samples $${\mathcal S}$$ would be observed at the tick marks on the horizontal axis. Fig. 1 View largeDownload slide An illustrative example of (1.1) with the Gaussian point spread function $$\psi(s,t)= \,{\rm e}^{-(s-t)^2}$$. The $$t_i$$ are denoted by red dots, and the true intensities $$c_i$$ are illustrated by vertical, dashed black lines. The super position resulting in the signal $$x$$ is plotted in blue. The samples $${\mathcal S}$$ would be observed at the tick marks on the horizontal axis. In this article, building on the work of Candès & Fernadez-Granda [14,15,34] and Tang et al. [9,57,58], we aim to show that we can recover the tuple $$(t_i, c_i, M)$$ by solving a convex optimization problem. We formulate the superresolution imaging problem as an infinite-dimensional optimization over measures. Precisely, note that the observed signal can be rewritten as   x(s)=∑i=1Mciψ(s,ti)=∫ψ(s,t)dμ⋆(t). (1.2) Here, $$\mu_\star$$ is the positive discrete measure $$\sum_{i=1}^M c_i\delta_{t_i}$$, where $$\delta_t$$ denotes the Dirac measure centered at $$t$$. We aim to show that we can recover $$\mu_\star$$ by solving the following:   minimizeμ ∫w(t)dμ(t)subject tox(s)=∫ψ(s,t)dμ(t)s∈Ssuppμ⊂Bμ≥0. (1.3) Here, $$B$$ is a fixed compact set and $$w(t)$$ is a weighting function that weights the measure at different locations. The optimization problem (1.3) is over the set of all positive finite measures $$\mu$$ supported on $$B$$. The optimization problem (1.2) is an analog of $$\ell_1$$ minimization over the continuous domain $$B$$. Indeed, if we know a priori that the $$t_i$$ are elements of a finite discrete set $${\it {\Omega}}$$, then optimizing over all measures subject to $$\text{supp} \mu \subset {\it {\Omega}}$$ is precisely equivalent to weighted $$\ell_1$$ minimization. This infinite-dimensional analog with uniform weights has proven useful for compressed sensing over continuous domains [58], resolving diffraction-limited images from low-pass signals [14,34,57], system identification [52] and many other applications [21]. We will see below that the weighting function essentially ensures that all of the candidate locations are given equal influence in the optimization problem. Our main result, Theorem 1.4, establishes that for one-dimensional signals, under rather mild conditions, we can recover $$\mu_\star$$ from the optimal solution of (1.3). Our conditions, described in full-detail below, essentially require the observation of at least $$2M+1$$ samples, and that the set of translates of the point spread function forms a linearly independent set. In Theorem 1.1, we verify that these conditions are satisfied by the Gaussian point spread function for any $$M$$ source locations with no minimum separation condition. We present an analysis of an $$\ell_1$$-based method that matches the separation-free performance of polynomial root finding techniques [23,27,61]. Our motivation for such an analysis is that $$\ell_1$$-based methods generalize to higher dimensions and are empirically stable in the presence of noise. Boyd et al. [12] show that the problem (1.3) can be optimized to precision $$\epsilon$$ in polynomial time using a greedy algorithm, building on earlier work by Bredies and Pikkarainen [13]. In our experiments in Section 3, we use this algorithm to demonstrate that our theory applies, and to show that even in multiple dimensions with noise, we can recover closely spaced point sources. 1.1 Main result We restrict our theoretical attention to the one-dimensional case, leaving the higher-dimensional cases to future work. Let $$\psi:\mathbb{R}^2 \rightarrow \mathbb{R}$$ be our one-dimensional point spread function, with the first argument denoting the position where we are observing the image of a point source located at the second argument. We assume that $$\psi$$ is differentiable in both arguments. For convenience, we will assume that $$B = [-T,T]$$ for some large scalar $$T$$. However, our proof will trivially extend to more restricted subsets of the real line. Moreover, we will state our results for the special case where $${\mathcal S} = \{ s_1,\ldots,s_n \}$$, although our proof is written for possibly infinite measurement sets. We define the weighting function in the objective of our optimization problem via   w(t)=1n∑i=1nψ(si,t). Our main result establishes conditions on $$\psi$$ such that the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Importantly, we show that these conditions are satisfied by the Gaussian point spread function with no separation condition. Theorem 1.1 Suppose $$|{\mathcal S}|> 2M$$ and $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$. Then for any $$t_1 < \ldots < t_M$$, the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Before we proceed to state the main result, we need to introduce a bit of notation and define the notion of a Tchebycheff system. Let $$K(t,\tau) = \frac 1 n \sum_{i=1}^n \psi(s_i,t) \psi(s_i,\tau)$$ and define the vector-valued function $$v : \mathbb{R} \to \mathbb{R}^{2M}$$ via   v(s)=[ψ(s,t1)…ψ(s,tM)ddt1ψ(s,t1)…ddtMψ(s,tM)]T. (1.4) Definition 1.2 A set of functions $$u_1,\ldots, u_n$$ is called a Tchebycheff system (or T-system) if for any points $$\tau_1 < \ldots < \tau_n$$, the matrix   (u1(τ1)…u1(τn)⋮un(τ1)…un(τn)) is invertible. Conditions 1.3 We impose the following three conditions on the point spread function $$\psi$$: Positivity For all $$t\in B$$ we have $$w(t) > 0$$. Independence The matrix $$\frac 1 n \sum_{i=1}^n v(s_i) v(s_i)^{\rm T}$$ is non-singular. T-system $$\{ K(\cdot, t_1), \ldots, K(\cdot,t_M), \frac d {\,{\rm d}t_1} K(\cdot, t_1), \ldots, \frac d {\,{\rm d}t_M} K(\cdot,t_M), w(\cdot) \}$$ form a T-system. Theorem 1.4 If $$\psi$$ satisfies the Conditions 1.3 and $$|{\mathcal S}| > 2M$$, then the true measure $$\mu_\star$$ is the unique optimal solution of (1.3). Note that the first two parts of Conditions 1.3 are easy to verify. Positivity eliminates the possibility that a candidate point spread function could equal zero at all locations—obviously we would not be able to recover the source in such a setting! Independence is satisfied if   {ψ(⋅,t1),…,ψ(⋅,tM),ddt1ψ(⋅,t1),…,ddtMψ(⋅,tM)}is a T-system. This condition allows us to recover the amplitudes uniquely, assuming we knew the true $$t_i$$ locations a priori, but it is also useful for constructing a dual certificate as we discuss below. We remark that we actually prove the theorem under a weaker condition than T-system. Define the matrix-valued function $${it{\Lambda}} : \mathbb{R}^{2M+1} \to \mathbb{R}^{{2M+1}\times{2M+1}}$$ by   Λ(p1,…,p2M+1):=[κ(p1)…κ(p2M+1)1…1], (1.5) where $$\kappa: \mathbb{R} \to \mathbb{R}^{2M}$$ is defined as   κ(t)=1n∑i=1nψ(si,t)w(t)v(si). (1.6) Our proof of Theorem 1.4 replaces condition T-system with the following: Determinantal There exists $$\rho > 0$$ such that for any $$t_i^-,t_i^+ \in (t_i-\rho,t_i+\rho)$$, and $$t\in [-T,T]$$, the matrix $${it{\Lambda}}\big(t_1^-, t_1^+,\ldots,t_M^-,t_M^+,t\big)$$ is non-singular whenever $$t,t_i^-,t_i^+$$ are distinct. This condition looks more complicated than T-system and is indeed non-trivial to verify. It is essentially a local T-system condition in the sense that the points $$\tau_i$$ Definition 1.2 are restricted to lie in a small neighborhood about the $$t_i$$. It is clear that T-system implies Determinantal. The advantage of the more general condition is that it can hold for finitely supported $$\psi$$, whereas this is not true for T-system. In fact, it is easy to see that if T-system holds for any point spread function $$\psi$$, then Determinantal holds for the truncated version $$\psi(s,t)\mathbf{1}\{|s-t|\le 3T\}$$, where $$\mathbf{1}\{x \le y\}$$ is the indicator variable equal to $$1$$ when $$x \le y$$ and zero otherwise. We suspect that Determinantal may hold for significantly tighter truncations. As we will see below, T-system and Independence are related to the existence of a canonical dual certificate that is used ubiquitously in sparse approximation [16,36]. In compressed sensing, this construction is due to Fuchs [36], but its origins lie in the theory of polynomial interpolation developed by Markov and Tchebycheff, and extended by Gantmacher, Krein, Karlin and others (see the survey in Section 1.2). In the continuous setting of superresolution, the dual certificate becomes a dual polynomial: a function of the form $$Q(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t) q(s_j)$$ satisfying   Q(t) ≤w(t)|Q(ti)| =w(t)i=1,…,M. (1.7) To see how T-system might be useful for constructing a dual polynomial, note that as $$t_1^+ \downarrow t_1$$ and $$t_1^-\uparrow t_1$$, the first two columns of $${it{\Lambda}}(t_1^+,t_1^-,\ldots,t)$$ converge to the same column, namely $$\kappa(t_1)$$. However, if we divide by the difference $$t_1^+ - t_1^-$$ and take a limit, then we obtain the derivative of the second column. In particular, some calculation shows T-system implies   det[Aκ(t)ωw(t)]≠0∀t≠ti, where $$A = \frac 1 n \sum_{j=1}^n v(s_i) v(s_i)^T$$ is the matrix from Independence, and   ω=[w(t1),…,w(tM),w′(t1),…,w′(tM)]. Taking the Schur complement in $$w(t)$$, we find   det[Aκ(t)ωw(t)]=detA[ωTA−1κ(t)−w(t)]. Hence, it seems like the function $$\omega^{\rm T} A^{-1} \kappa(t)$$ might serve well as our dual polynomial. However, it remains unclear from this short calculation that this function is bounded above by $$w(t)$$. The proof of Theorem 1.4 makes this construction rigorous using the theory of T-systems. Before turning to the proofs of these theorems (c.f. Sections 2.1 and 2.4), we survey the mathematical theory of superresolution imaging. 1.2 Foundations: Tchebycheff systems Our proofs rely on the machinery of Tchebycheff1 systems. This line of work originated in the 1884 doctoral thesis of A. A. Markov on approximating the value of an integral $$\int_a^b f(x) \,{\rm d}x$$ from the moments $$\int_a^b xf(x) \,{\rm d}x$$, $$\ldots$$, $$\int_a^b x^n f(x)\,{\rm d}x$$. His work formed the basis of the proof by Tchebycheff [59] (who was Markov’s doctoral advisor) of the central limit theorem in 1887. Recall that we defined a T-system in Definition 1.2. An equivalent definition of a T-system is the functions $$u_1,...,u_n$$ form a T-system if and only if every linear combination $$U(t) = a_1u_1(t) + \cdots + a_n u_n(t)$$ has at most $$n-1$$ zeros. One natural example of a T-system is given by the functions $$1, t, \ldots, t^{n-1}$$. Indeed, a polynomial of order $$n-1$$ can have at most $$n-1$$ zeros. Equivalently, the Vandermonde determinant does not vanish,   |11…1t1t2…tnt12t22…tn2⋮t1n−1t2n−1…tnn−1|≠0, for any $$t_1<\ldots<t_n$$. Just as Vandermonde systems are used to solve polynomial interpolation problems, T-systems allow the generalization of the tools from polynomial fitting to a broader class of nonlinear function-fitting problems. Indeed, given a T-system $$u_1,...,u_n$$, a generalized polynomial is a linear combination $$U(t) = a_1u_1(t) + \cdots + a_n u_n(t)$$. The machinery of T-systems provides a basis for understanding the properties of these generalized polynomials. For a survey of T-systems and their applications in statistics and approximation theory, see [37,41,42]. In particular, many of our proofs are adapted from [42], and we call out the parallel theorems whenever this is the case. 1.3 Prior art and related work Broadly, superresolution techniques enhance the resolution of a sensing system, optical or otherwise; resolution is the distance at which distinct sources appear indistinguishable. The mathematical problem of localizing point sources from a blurred signal has applications in a wide array of empirical sciences: astronomers deconvolve images of stars to angular resolution beyond the Rayleigh limit [48], and biologists capture nanometer resolution images of fluorescent proteins [11,40,51,63]. Detecting neural action potentials from extracellular electrode measurements is fundamental to experimental neuroscience [31], and resolving the poles of a transfer function is fundamental to system identification [52]. To understand a radar signal, one must decompose it into reflections from different sources [38]; and to understand an NMR spectrum, one must decompose it into signatures from different chemicals [57]. The mathematical analysis of point source recovery has a long history going back to the work of de Prony [23], who pioneered techniques for estimating sinusoidal frequencies. de Prony’s method is based on algebraically solving for the roots of polynomials and can recover arbitrarily closely spaced frequencies. The annihilation filter technique introduced by Vetterli et al. [61] can perfectly recover any signal of finite rate of innovation with minimal samples. In particular, the theory of signals with finite rate of innovation shows that, given a superposition of pulses of the form $$\sum a_k\psi(t-t_k)$$, one can reconstruct the shifts $$t_k$$ and coefficients $$a_k$$ from a minimal number of samples [27,61]. This holds without any separation condition on the $$t_k$$, and as long as the base function $$\psi$$ can reproduce polynomials of a certain degree (see [27, Section A.1] for more details). The algorithm used for this reconstruction is, however, based on polynomial rooting techniques that do not easily extend to higher dimensions. Moreover, this algebraic technique is not robust to noise (see the discussion in [56, Section IV.A], for example). In contrast, we study sparse recovery techniques. This line of thought goes back at least to Carathéodory [18,19]. Our contribution is an analysis of $$\ell_1$$ based methods that matches the performance of the algebraic techniques of Vetterli in the one dimensional and noiseless setting. Our primary motivation is that $$\ell_1$$ based methods may be more stable to noise and trivially generalize to higher dimensions (although our analysis currently does not). It is tempting to apply the theory of compressed sensing [3,16,17,25] to problem (1.3). If one assumes the point sources are located on a finite grid and are well separated, then some of the standard models for recovery are valid (e.g. incoherency, restricted isometry property or restricted eigenvalue property). With this motivation, many authors solve the gridded form of the superresolution problem in practice [2,4,20,28,29,33,39,49,54,55,63]. However, this approach has some significant drawbacks. The theoretical requirements imposed by the classical models of compressed sensing become more stringent as the grid becomes finer. Furthermore, making the grid finer can also lead to numerical instabilities and computational bottlenecks in practice. Despite recent successes in many empirical disciplines, the theory of superresolution imaging remains limited. Candès and Fernandes-Granada [15] recently made an important contribution to the mathematical analysis of superresolution, demonstrating that semi-infinite optimization could be used to solve the classical de Prony problem. Their proof technique has formed the basis of several other analyses, including that of Bendory et al. [7] and that of Tang et al. [57]. To better compare with our approach, we briefly describe the approach of [7,15,57] here. They construct the vector $$q$$ of a dual polynomial $$Q(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t) q_j$$ as a linear combination of $$\psi(s,t_i)$$ and $$\frac{\,{\rm d}}{\,{\rm d}t_i} \psi(s,t_i)$$. In particular, they define the coefficients of this linear combination as the least squares solution to the system of equations   Q(ti) =sign(ci)i=1,…,MddtQ(t)|t=ti =0i=1,…,M. (1.8) They prove that, under a minimum separation condition on the $$t_i$$, the system has a unique solution because the matrix for the system is a perturbation of the identity, hence invertible. Much of the mathematical analysis on superresolution has relied heavily on the assumption that the point sources are separated by more than some minimum amount [5,7,15,24,26,30,45]. We note that in practical situations with noisy observations, some form of minimum separation may be necessary. One can expect, however, that the required minimum separation should go to zero as the noise level decreases: a property that is not manifest in previous results. Our approach, by contrast, does away with the minimum separation condition by observing that the matrix for the system (1.8) need not be close to the identity to be invertible. Instead, we impose Conditions 1.3 to guarantee invertibility directly. Not surprisingly, we use techniques from T-systems to construct an analog of the polynomial $$Q$$ in (1.8) for our specific problem. Another key difference is that we consider the weighted objective $$\int w(t)\,{\rm d}\mu(t)$$, whereas prior work [7,15,57] has analysed the unweighted objective $$\int\,{\rm d}\mu(t)$$. We, too, could not remove the separation condition without reweighing by $$w(t)$$. In Section 3, we provide evidence that this mathematically motivated reweighing step actually improves performance in practice. Weighting has proven to be a powerful tool in compressed sensing, and many works have shown that weighting an $$\ell_1$$-like cost function can yield improved performance over standard $$\ell_1$$ minimization [35,43,60,62]. To our knowledge, the closest analogy to our use of weights comes from Rauhut and Ward [50], who use weights to balance the influence of dynamic range of bases in polynomial interpolation problems. In the setting of this article, weights will serve to lessen the influence of sources that have low overlap with the observed samples. We are not the first to bring the theory of Tchebycheff systems to bear on the problem of recovering finitely supported measures. Building on the foundational work of Beurling [8], de Castro & Gamboa [22] prove that a finitely supported positive measure $$\mu$$ can be recovered exactly from measurements of the form   {∫u0dμ,…,∫undμ} whenever $$\{u_0,\ldots,u_n\}$$ form a T-system containing the constant function $$u_0 = 1$$. These measurements are almost identical to ours; if we set $$u_k(t) = \psi(s_k,t)$$ for $$k = 1,\ldots,n$$, where $$\{s_1,\ldots,s_n\}={\mathcal S}$$ is our measurement set, then our measurements are of the form   {x(s)|s∈S}={∫u1dμ,…,∫undμ}. However, in practice, it is often impossible to directly measure the mass $$\int u_0\,{\rm d}\mu = \int\,{\rm d}\mu$$ as required by (1.3). Moreover, the requirement that $$\{1,\psi(s_1,t),\ldots,\psi(s_n,t)\}$$ form a T-system does not hold for the Gaussian point spread function $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$ (see Remark 2.6). Therefore the theory of [22] is not readily applicable to superresolution imaging. We conclude our review of the literature by discussing some prior literature on $$\ell_1$$-based super-resolution without a minimum separation condition. We would like to mention the work of Fuchs [36] in the case that the point spread function is band limited and the samples are on a regularly spaced grid. This result also does not require a minimum separation condition. However, our results hold for considerably more general point spread functions and sampling patterns. Finally, in a recent article, Bendory [6] presents an analysis of $$\ell_1$$ minimization in a discrete setup by imposing a Rayleigh regularity condition which, in the absence of noise, requires no minimum separation. Our results are of a different flavor, as our setup is continuous. Furthermore, we require linear sample complexity, whereas the theory of Bendory [6] requires infinitely many samples. 2. Proofs In this section, we prove Theorem 1.4 and Theorem 1.1. We start by giving a short list of notation to be used throughout the proofs. We write our proofs for an arbitrary measurement $${\mathcal S}$$, which need not be finite for the sake of the proof. Let $$P$$ denote a fixed positive finite measure on $${\mathcal S}$$ for which $$\psi(\cdot,t) \in L^2_P$$, and set   w(t)=∫ψ(s,t)dP(s). For concreteness, the reader might think of $$P$$ as the uniform measure over $${\mathcal S}$$, where if $${\mathcal S}$$ is finite, then $$w(t) = \frac 1 n \sum_{j=1}^n \psi(s_j,t).$$ Just note that the particular choice of $$P$$ does not affect the proof. Notation glossary We denote the inner product of functions $$f,g\in L^2_{P}$$ by $${\left \langle {f,g} \right \rangle}_{{P}} := \int f(s) g(s) \,{\rm d}P(s).$$ For any differentiable function $$f: \mathbb{R}^2 \to \mathbb{R}$$, we denote the derivative in its first argument by $$\partial_1 f$$ and in its second argument by $$\partial_2 f$$. For $$t \in \mathbb{R}$$, let $$\psi_t(\cdot) = \psi(\cdot,t)$$. 2.1 Proof of Theorem 1.4 We prove Theorem 1.4 in two steps. We first reduce the proof to constructing a function $$q$$ such that $${\left \langle {q,\psi_t} \right \rangle}_{{P}}$$ possesses some specific properties. Proposition 2.1 If the first three items of Conditions 1.3 hold, and if there exists a function $$q$$ such that $$Q(t) := {\left \langle {q,\psi_{t}} \right \rangle}_{{P}}$$ satisfies   Q(tj) =w(tj)j=1,…,MQ(t) <w(tj)for t∈[−T,T] and t≠tj, (2.1) then the true measure $$\mu_\star := \sum_{j=1}^M c_j \delta_{t_j}$$ is the unique optimal solution of the program 1.3. This proof technique is somewhat standard [16,36]: the function $$Q(t)$$ is called a dual certificate of optimality. However, introducing the function $$w(t)$$ is a novel aspect of our proof. The majority of arguments have $$w(t) = 1$$. Note that when $$\int \psi(s,t) \,{\rm d}P(s)$$ is independent of $$t$$, then $$w(t)$$ is a constant and we recover the usual method of proof. In the second step, we construct $$q(s)$$ as a linear combination of the $$t_i$$-centered point spread functions $$\psi(s, t_i)$$ and their derivatives $$\partial_2\psi(s, t_i)$$. Theorem 2.2 Under the Conditions 1.3, there exist $$\alpha_1,\ldots,\alpha_M,\beta_1,\ldots,\beta_M,c\in \mathbb{R}$$ such that $$Q(t) = \langle q, \psi_t\rangle_P$$ satisfies (2.1), where   q(s)=∑i=1M(αiψ(s,ti)+βiddtiψ(s,ti))+c. To complete the proof of Theorem 1.4, it remains to prove Proposition 2.1 and Theorem 2.2. Their proofs can be found in Sections 2.2 and 2.3, respectively. 2.2 Proof of Proposition 2.1 We show that $$\mu_\star$$ is the optimal solution of problem (1.3) through strong duality. The optimization problem (1.3) may have infinitely many conditions, and by [53] its Lagrangian is   L(q,μ)=∫w(t)dμ(t)+∫Sq(s)(x(s)−∫ψ(s,t)dμ(t))dP(s), where $$q \in L^2_P$$ is the dual variable. Note that the Lagrangian can be rewritten as   L(q,μ) =∫w(t)dμ(t)+∫q(s)x(s)dP(s)−∫∫Sq(s)ψ(s,t)dP(s)dμ(t) =∫(w(t)−⟨q,ψt⟩P)dμ(t)+⟨q,x⟩P, (2.2) where we have applied Fubini’s theorem to obtain the first equality and rearranged terms to obtain the second. The dual of problem (1.3) is   maximizeqminimizeμ≥0suppμ⊂[−T,T]L(q,μ). From equation (2.2), we infer that the inner minimization above yields $$\langle q,x\rangle_P$$ if $$w(t) \ge {\left \langle {q,\psi_t} \right \rangle}_{{P}}$$ for all $$t\in [-T,T]$$ and $$-\infty$$ otherwise. Hence, the dual of problem (1.3) is   maximizeq⟨q,x⟩Psubject to⟨q,ψt⟩P≤w(t)for t∈[−T,T]. (2.3) Since the primal (1.3) is equality constrained, Slater’s condition naturally holds, implying strong duality. See [53] for background on Slater’s condition in the presence of infinitely many constraints. As a consequence, we have   ⟨q,x⟩P=∫w(t)dμ(t)⟺ q is dual optimal and μ is primal optimal. Suppose $$q$$ satisfies (2.1). Hence, $$q$$ is dual feasible and we have   ⟨q,x⟩P= ∑j=1Mcj⟨q,ψtj⟩P=∑j=1McjQ(tj)= ∫w(t)dμ⋆(t). Therefore, $$q$$ is dual optimal and $$\mu_\star$$ is primal optimal. Next we show uniqueness. Suppose the primal (1.3) has another optimal solution   μ^=∑j=1M^c^jδt^j such that $$\{\hat{t}_1,\ldots,\hat{t}_{\hat M}\} \ne \{t_1,\ldots,t_M\} := \mathcal{T}$$. Then we have   ⟨q,x⟩P= ∑jc^j⟨q,ψt^j⟩P= ∑t^j∈Tc^jQ(t^j)+∑t^j∉Tc^jQ(t^j)< ∑t^j∈Tc^jw(t^j)+∑t^j∉Tc^jw(t^j)=∫w(t)dμ^(t). Therefore, all optimal solutions must be supported on $$\{t_1, \ldots, t_M \}$$. We now show that the coefficients of any optimal $$\hat \mu$$ are uniquely determined. By condition Independence, the matrix $$\int v(s) v(s)^{\rm T} \,{\rm d}P(s)$$ is invertible. As it is also positive semidefinite, then it is positive definite, so, in particular its upper $$M\times M$$ block is also positive definite:   det∫[ψ(s,t1)⋮ψ(s,tM)][ψ(s,t1)…ψ(s,tM)]dP(s)≠0. Hence, there must be $$s_1,\ldots, s_{M} \in {\mathcal S}$$ such that the matrix with entries $$\psi(s_i,t_j)$$ is non-singular. Now consider some optimal $$\hat \mu = \sum_{i=1}^M \hat c_i t_i$$. Since $$\hat \mu$$ is feasible, we have   x(sj)=∑i=1Mc^iψ(sj,ti)=∑i=1Mciψ(sj,ti)forj=1,…,M. Since $$\psi(s_i, t_j)$$ is invertible, we conclude that the coefficients $$c_1,\ldots,c_M$$ are unique. Hence, $$\mu_\star$$ is the unique optimal solution of (1.3). 2.3 Proof of theorem 2.2 We construct $$Q(t)$$ via a limiting interpolation argument due to Krein [44]. We have adapted some of our proofs (with non-trivial modifications) from the aforementioned text by Karlin and Studden [42]. We give reference to the specific places where we borrow from classical arguments. In the sequel, we make frequent use of the following elementary manipulation of determinants: Lemma 2.3 If $$v_0,\ldots,v_{n}$$ are vectors in $$\mathbb{R}^{n}$$, and $$n$$ is even, then   |v1−v0…vn−v0|=|v1…vnv01…11|. Proof. By elementary manipulations, both determinants in the statement of the lemma are equal to   |v1−v0…vn−v0v00…01|. □ In what follows, we consider $$\epsilon>0$$ such that   t1−ϵ<t1+ϵ<t2−ϵ<t2+ϵ<⋯<tM−ϵ<tM+ϵ. Definition 2.4 A point $$t$$ is a nodal zero of a continuous function $$f:\mathbb R \to \mathbb R$$ if $$f(t) = 0$$ and $$f$$ changes sign at $$t$$. A point $$t$$ is a non-nodal zero if $$f(t) = 0$$, but $$f$$ does not change sign at $$t$$. This distinction is illustrated in Fig. 2. Fig. 2 View largeDownload slide The point $$a$$ is a nodal zero of $$f$$, and the point $$b$$ is a non-nodal zero of $$f$$. Fig. 2 View largeDownload slide The point $$a$$ is a nodal zero of $$f$$, and the point $$b$$ is a non-nodal zero of $$f$$. Our proof of Theorem 2.2 proceeds as follows. With $$\epsilon$$ fixed, we construct a function   Q~ϵ(t)=∑i=1Mαϵ[i]KP(t,ti)+βϵ[i]∂2KP(t,ti) such that $${\tilde{Q}}_{\epsilon}(t) = w(t)$$ only at the points $$t = t_j \pm\epsilon$$ for all $$j=1, 2, \ldots, M$$, and the points $$t_j\pm\epsilon$$ are nodal zeros of $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ for all $$j=1,2,\dots, M$$. We then consider the limiting function $${\tilde{Q}}(t) = \lim\limits_{\epsilon \downarrow 0} {\tilde{Q}}_\epsilon(t)$$, and prove that either $${\tilde{Q}}(t)$$ satisfies (2.1) or $$2w(t) - {\tilde{Q}}(t)$$ satisfies (2.1). An illustration of this construction is pictured in Fig. 3. Fig. 3 View largeDownload slide The relationship between the functions $$w(t)$$, $${\tilde{Q}}_\epsilon(t)$$ and $${\tilde{Q}}(t)$$. The function $${\tilde{Q}}_\epsilon(t)$$ touches $$w(t)$$ only at $$t_i\pm \epsilon$$, and these are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. The function $${\tilde{Q}}(t)$$ touches $$w(t)$$ only at $$t_i$$ and these are non-nodal zeros of $${\tilde{Q}}(t) - w(t)$$. Fig. 3 View largeDownload slide The relationship between the functions $$w(t)$$, $${\tilde{Q}}_\epsilon(t)$$ and $${\tilde{Q}}(t)$$. The function $${\tilde{Q}}_\epsilon(t)$$ touches $$w(t)$$ only at $$t_i\pm \epsilon$$, and these are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. The function $${\tilde{Q}}(t)$$ touches $$w(t)$$ only at $$t_i$$ and these are non-nodal zeros of $${\tilde{Q}}(t) - w(t)$$. We begin with the construction of $${\tilde{Q}}_\epsilon$$. We aim to find the coefficients $$\alpha_{\epsilon}, \beta_{\epsilon}$$ to satisfy   Q~ϵ(ti−ϵ)=w(ti−ϵ)andQ~ϵ(ti+ϵ)=w(ti+ϵ)fori=1,…,M. This system of equations is equivalent to the system   Q~ϵ(ti−ϵ) =w(ti−ϵ)fori=1,…,MQ~ϵ(ti+ϵ)−Q~ϵ(ti−ϵ)2ϵ =w(ti+ϵ)−w(ti−ϵ)2ϵfori=1,…,M. (2.4) Note that this is a linear system of equations in $$\alpha_\epsilon,\beta_\epsilon$$ with coefficient matrix given by   Kϵ:=[KP(tj−ϵ,ti)∂2KP(tj−ϵ,ti)12ϵ(KP(tj+ϵ,ti)−KP(tj−ϵ,ti))12ϵ(∂2KP(tj+ϵ,ti)−∂2KP(tj−ϵ,ti))]. That is, the equation (2.4) can be written as   Kϵ[|αϵ||βϵ|]=[w(t1−ϵ)⋮w(tM−ϵ)12ϵ(w(t1+ϵ)−w(t1−ϵ))⋮12ϵ(w(tM+ϵ)−w(tM−ϵ))]. We first show that the matrix $$\mathbf{K}_\epsilon$$ is invertible for all $$\epsilon$$ sufficiently small. Note that as $$\epsilon\to 0$$ the matrix $$\mathbf{K}_{\epsilon}$$ converges to   K:=[KP(tj,ti)∂2KP(tj,ti)∂1KP(tj,ti)∂1∂2KP(tj,ti)]=∫v(s)v(s)TdP(s), which is positive definite by Independence. Since the entries of $$\mathbf{K}_\epsilon$$ converge to the entries of $$\mathbf{K}$$, there is a $${it{\Delta}} > 0$$ such that $$\mathbf{K}_\epsilon$$ is invertible for all $$\epsilon \in (0, {it{\Delta}})$$. Moreover, $$\mathbf K_{\epsilon}^{-1}$$ converges to $$\mathbf K^{-1}$$ as $$\epsilon\to 0$$ and for all $$\epsilon < {it{\Delta}}$$, the coefficients are uniquely defined as   [|αϵ||βϵ|]=Kϵ−1[w(t1−ϵ)⋮w(tM−ϵ)12ϵ(w(t1+ϵ)−w(t1−ϵ))⋮12ϵ(w(tM+ϵ)−w(tM−ϵ))]. (2.5) We denote the corresponding function by   Q~ϵ(t):=∑i=1Mαϵ[i]KP(t,ti)+βϵ[i]∂2KP(t,ti). Before we construct $${\tilde{Q}}(t)$$, we take a moment to establish the following remarkable consequences of the Determinantal condition. For all $$\epsilon>0$$ sufficiently small, the following hold: (a) $${\tilde{Q}}_{\epsilon}(t) = w(t)$$ only at the points $$t_1 - \epsilon,t_1+\epsilon, \ldots, t_{M}-\epsilon,t_M+\epsilon$$. (b) These points $$t_1 - \epsilon,t_1+\epsilon, \ldots, t_{M}-\epsilon,t_M+\epsilon$$ are nodal zeros of $${\tilde{Q}}_{\epsilon}(t)-w(t)$$. We adapted the proofs of (a) and (b) (with non-trivial modification) from the proofs of Theorem 1.6.1 and Theorem 1.6.2 of [42]. Proof of $$(a).$$ Suppose for the sake of contradiction that there is a $$\tau \in [-T,T]$$ such that $${\tilde{Q}}_{\epsilon}(\tau) = w(\tau)$$ and $$\tau \notin \{t_1 - \epsilon,t_1 +\epsilon, \ldots,t_M-\epsilon, t_M + \epsilon\}$$. Then we have the system of $$2M$$ linear equations   Q~ϵ(tj−ϵ)w(tj−ϵ)−Q~ϵ(τ)w(τ) =0j=1,…,MQ~ϵ(tj+ϵ)w(tj+ϵ)−Q~ϵ(τ)w(τ) =0j=1,…,M. Rewriting this in matrix form, the coefficient vector $$[ \alpha_\epsilon \quad \beta_\epsilon ] = [\alpha_\epsilon^{[1]} \quad \cdots \quad \alpha_\epsilon^{[M]}\quad \beta_\epsilon^{[1]}\quad \cdots \quad\beta_\epsilon^{[M]}]$$ of $${\tilde{Q}}_{\epsilon}$$ satisfies   [αϵβϵ](κ(t1−ϵ)−κ(τ)κ(t1+ϵ)−κ(τ)…κ(tM+ϵ)−κ(τ))=[0…0]. (2.6) By Lemma 2.3 applied to the $$2M+1$$ vectors $$v_1 = \kappa(t_1 - \epsilon), \ldots, v_{2M} = \kappa(t_M+\epsilon)$$ and $$v_0 = \kappa(\tau)$$, the matrix for the system of equations (2.6) is non-singular if and only if the following matrix is non-singular:   [κ(t1−ϵ)…κ(tM+ϵ)κ(τ)1…11]=Λ(t1−ϵ,…,tM+ϵ,τ). However, this is non-singular by the Determinantal condition. This gives us the contradiction that completes the proof of part $$(a)$$. Proof of $$(b)$$. Suppose for the sake of contradiction that $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ has $$N_1 < 2M$$ nodal zeros and $$N_0 = 2M - N_1$$ non-nodal zeros. Denote the nodal zeros by $$\{\tau_1, ..., \tau_{N_1}\}$$ and denote the non-nodal zeros by $$z_1, \ldots, z_{N_0}$$. In what follows, we obtain a contradiction by doubling the non-nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$. We do this by constructing a certain generalized polynomial $$u(t)$$ and adding a small multiple of it to $${\tilde{Q}}_\epsilon(t) - w(t)$$. See Fig. 4 for an illustration. Fig. 4 View largeDownload slide The points $$\{\tau_1,\tau_2,\tau_3,\tau_4\}$$ are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$, and the points $$\{\zeta_1,\zeta_2,\zeta_3\}$$ are non-nodal zeros. The function $$u(t)$$ has the appropriate sign so that $${\tilde{Q}}_\epsilon(t) - w(t) + \delta u(t)$$ retains nodal zeros at $$\tau_i$$ and obtains two zeros in the vicinity of each $$\zeta_i$$. Fig. 4 View largeDownload slide The points $$\{\tau_1,\tau_2,\tau_3,\tau_4\}$$ are nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$, and the points $$\{\zeta_1,\zeta_2,\zeta_3\}$$ are non-nodal zeros. The function $$u(t)$$ has the appropriate sign so that $${\tilde{Q}}_\epsilon(t) - w(t) + \delta u(t)$$ retains nodal zeros at $$\tau_i$$ and obtains two zeros in the vicinity of each $$\zeta_i$$. We divide the non-nodal zeros into groups according to whether $${\tilde{Q}}_{\epsilon}(t) - w(t)$$ is positive or negative in a small neighborhood around the zero; define   I−:={i|Q~ϵ≤w near zi}andI+:={i|Q~ϵ≥w near zi}. We first show that there are coefficients $$a_0, \ldots, a_M$$, and $$b_1,\ldots, b_M$$ such that the polynomial   u(t)=∑i=1MaiKP(t,ti)+∑i=1Mbi∂2KP(t,ti)+a0w(t) satisfies the system of equations   u(zj) =+1j∈I−u(zj) =−1j∈I+u(τi) =0i=1,…,N1u(τ) =0, (2.7) where $$\tau$$ is some arbitrary additional point. The matrix for this system is   W(κ(z1)T1⋮κ(zN0)T1κ(τ1)T1⋮κ(τN1)T1κ(τ)1), where $$\mathbf{W} = \text{diag}\big(w(z_1),\ldots,w(z_{N_0}),w(\tau_1),\ldots,w(\tau_{N_1}),w(\tau)\big)$$. This matrix is invertible by Determin-antal since the nodal and non-nodal zeros of $${\tilde{Q}}_\epsilon(t) - w(t)$$ are given by $$t_1-\epsilon,\ldots,t_M+\epsilon$$. Hence, there is a solution to the system (2.7). Now consider the function   Uδ(t)=Q~ϵ(t)+δu(t)=∑i=1M[αϵ[i]+δai]KP(t,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(t,ti)+δa0w(t), where $${\delta} > 0$$. By construction, $$u(\tau_i) = 0$$, so $$U^\delta(t) - w(t)$$ has nodal zeros at $$\tau_1,\ldots,\tau_{N_1}$$. We can choose $$\delta$$ small enough so that $$U^\delta(t) - w(t)$$ vanishes twice in the vicinity of each $$z_i$$. This means that $$U^{\delta} (t) - w(t)$$ has $$2M + N_0$$ zeros. Assuming $$N_0>0$$, select a subset of these zeros $$p_1 < \ldots < p_{2M+1}$$ such that there are two in each interval $$[t_i - \rho,t_i +\rho]$$. This is possible if $$\epsilon < \rho$$ and $$\delta$$ is sufficiently small. We have the system of $$2M+1$$ equations   ∑i=1M[αϵ[i]+δai]KP(p1,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(p1,ti) =(1−δa0)w(τ)⋮∑i=1M[αϵ[i]+δai]KP(p2M+1,ti)+∑i=1M[βϵ[i]+δbi]∂2KP(p2M+1,ti) =(1−δa0)w(τ). Subtracting the last equation from each of the first $$2M$$ equations, we find that   (αϵ[1]+δa1,…,βϵ[M]+δbM)(κ(p1)−κ(p2M+1)…κ(p2M)−κ(p2M+1))=(0,…,0). This matrix is non-singular by Lemma 2.3 combined with the Determinantal condition. This contradiction implies that $$N_0 = 0$$. This completes the proof of (b). We now complete the proof by constructing $${\tilde{Q}}(t)$$ from $${\tilde{Q}}_{\epsilon}(t)$$ by sending $$\epsilon \to 0$$. Note that the coefficients $$\alpha_{\epsilon}, \beta_{\epsilon}$$ converge as $$\epsilon\to 0$$ since the right-hand side of equation (2.5) converges to   K−1[w(t1)⋮w(tM)w′(t1)⋮w′(tM)]=[|α||β|]. We denote the limiting function by   Q~(t)=∑i=1MαiKP(t,ti)+∑i=1Mβi∂2KP(t,ti). (2.8) We conclude that $$w(t) - {\tilde{Q}}(t)$$ does not change sign at the $$t_i$$ since $$w(t) - {\tilde{Q}}_\epsilon(t)$$ changes sign only at $$t_i \pm \epsilon$$. We now show that the limiting process does not introduce any additional zeros of $$w(t) - {\tilde{Q}}(t)$$. Suppose $${\tilde{Q}}(t)$$ does touch $$w(t)$$ at some $$\tau_1 \in [-T,T]$$ with $$\tau_1 \ne t_i$$ for any $$i=1, ..., M$$. Since $$w(t) - {\tilde{Q}}(t)$$ does not change sign, the points $$t_1,\ldots,t_M,\tau_1$$ are non-nodal zeros of $$w(t) - {\tilde{Q}}(t)$$. We find a contradiction by constructing a polynomial with two nodal zeros in the vicinity of each of these $$M+1$$ points (but possibly only one nodal zero in the vicinity of $$\tau_1$$ if $$\tau_1=T$$ or $$\tau_1 = -T$$). For sufficiently small $$\gamma>0$$, the polynomial   Wγ(t)=Q~(t)+γw(t) attains the value $$w(t)$$ twice in the vicinity of each $$t_i$$ and twice in the vicinity of $$\tau_1$$. In other words, there exist $$p_1 < \ldots < p_{2M+2}$$ such that $$W_{\gamma}(p_i) = w(p_i).$$ Therefore,   Q~(pi)=(1−γ)w(pi)fori=1,…,2M+2, and so $$\frac{{\tilde{Q}}(p_i)}{w(p_i)} - \frac{{\tilde{Q}}(p_{2M+1})}{w(p_{2M+1})} = 0$$ for $$i=1,2,...,2M$$. Thus, the coefficient vector for the polynomial $${\tilde{Q}}(t)$$ lies in the left nullspace of the matrix   (κ(p1)−κ(p2M+1)…κ(p2M)−κ(p2M+1)). However, this matrix is non-singular by Lemma 2.3 and the Determinantal condition. Collecting our results, we have proven that $${\tilde{Q}}(t) - w(t) = 0$$ if and only if $$t = t_i$$ and that $${\tilde{Q}}(t) - w(t)$$ does not change sign when $$t$$ passes through $$t_i$$. Therefore, one of the following is true   w(t)≥Q~(t)orQ~(t)≥w(t) with equality iff $$t = t_i$$. In the first case, $$Q(t) = {\tilde{Q}}(t)$$ fulfills the prescriptions (2.1) with   q(t)=∑i=1Mαiψ(s,ti)+βiddtiψ(s,ti). In the second case, $$Q(t) = 2w(t) - {\tilde{Q}}(t)$$ satisfies (2.1) with   q(t)=2−∑i=1Mαiψ(s,ti)+βiddtiψ(s,ti). 2.4 Proof of theorem 1.1 Integrability and Positivity naturally hold for the Gaussian point spread function $$\psi(s,t) = \,{\rm e}^{-(s-t)^2}$$. Independence holds because $$\psi(s, t_1), \ldots, \psi(s, t_M)$$, together with their derivatives $$\partial_2\psi(s, t_1), \ldots, \partial_2\psi(s, t_M)$$, form a T-system (see, for example, [42]). This means that for any $$s_1 < \ldots < s_{2M} \in \mathbb{R}$$,   |v(s1)…v(s2M)|≠0, and the determinant always takes the same sign. Therefore, by an integral version of the Cauchy–Binet formula for the determinant (cf. [41]),   |∫v(s)v(s)TdP(s)|=(2M)!∫s1<…<s2M|v(s1)…v(s2M)||v(s1)T⋮v(s2M)T|dP(s1)…dP(s2M)≠0. To establish the Determinantal condition, we prove the slightly stronger statement:   |Λ(p1,…,p2M+1)|=|∫[v(s)1][ψ(s,p1)w(p1)…ψ(s,p2M+1)w(p2M+1)]dP(s)|≠0 (2.9) for any distinct $$p_1,\ldots,p_{2M+1}$$. When $$p_1,\ldots,p_{2M+1}$$ are restricted so that two points $$p_i,p_j$$ lie in each ball $$(t_k-\rho,t_k +\rho)$$, we recover the statement of Determinantal. We prove (2.9) with the following key lemma. Lemma 2.5 For any $$s_1< \ldots < s_{2M+1}$$ and $$t_1 < \ldots < t_{M}$$,   |e−(s1−t1)2⋯e−(s2M+1−t1)2−(s1−t1)e−(s1−t1)2⋯−(s2M+1−t1)e−(s2M+1−t1)2⋮⋮e−(s1−tM)2⋯e−(s2M+1−tM)2−(s1−tM)e−(s1−tM)2⋯−(s2M+1−tM)e−(s2M+1−tM)21⋯1|≠0. Before proving this lemma, we show how it can be used to prove (2.9). By Lemma 2.5, we know in particular that for any $$s_1<\cdots < s_{2M+1}$$,   det[v(s1)⋯v(s2M+1)1⋯1]≠0 and is always the same sign. Moreover, for any $$s_1 < \cdots < s_{2M+1}$$, and any $$p_1<\ldots<p_{2M+1}$$,   det[ψ(s1,p1)…ψ(s1,p2M+1)⋮ψ(s2M+1,p1)…ψ(s2M+1,p2M+1)]>0. Any function with this property is called totally positive, and it is well known that the Gaussian kernel is totally positive [42]. Now, to show that Determinantal holds for the finite sampling measure $$P$$, we use an integral version of the Cauchy–Binet formula for the determinant:   |∫[v(s)1][ψ(s,p1)w(p1)…ψ(s,p2M+1)w(p2M+1)]dP(s)|  =(2M+1)!∫s1<⋯<s2M+1|v(s1)⋯v(s2M+1)1⋯1||ψ(s1,p1)w(p1)…ψ(s1,p2M+1)w(p2M+1)⋮ψ(s2M+1,p1)w(p1)…ψ(s2M+1,p2M+1)w(p2M+1)|dP(s1)…dP(s2M+1). The integral is non-zero since all integrands are non-zero and have the same sign. This proves (2.9). Proof of Lemma 2.5. Multiplying the $$2i-1$$ and $$2i$$th row by $$\,{\rm e}^{t_i^2}$$ and the $$i$$th column by $$\,{\rm e}^{s_i^2}$$, and subtracting $$t_i$$ times the $$2i-1$$th row from the $$2i$$th row, we obtain that we equivalently have to show that   |es1t1es2t1…es2M+1t1s1es1t1s2es2t1…s2M+1eskt1es1t2es2t2…es2M+1t2⋮es1tMes2tM…es2M+1tMs1es1tMs2es2tM…s2M+1es2M+1tMes12es22…es2M+12|≠0. The above matrix has a vanishing determinant if and only if there exists a non-zero vector   (a1,b1,...,aM,bM,aM+1) in its left null space. This vector has to have non-zero last coordinate since by Example 1.1.5 in [42], the Gaussian kernel is extended totally positive, and therefore the upper $$2M\times 2M$$ submatrix has a non-zero determinant. Therefore, we assume that $$a_{M+1} = 1$$. Thus, the matrix above has a vanishing determinant if and only if the function   ∑i=1M(ai+bis)etis+es2 (2.10) has at least the $$2M+1$$ zeros $$s_1 < s_2 < ... < s_{2M+1}$$. Lemma 2.7, applied to $$r = M$$ and $$d_1 = \cdots = d_M = 1$$, establishes that this is impossible. To complete the proof of Lemma 2.5, it remains to state and prove Lemma 2.7. □ Remark 2.6 The inclusion of the derivatives is essential for the shifted Gaussians to form a T-system together with the constant function $$1$$. In particular, following the same logic as in the proof of Lemma 2.5, we find that $$\{1,\,{\rm e}^{(s-t_1)^2},\ldots,\,{\rm e}^{(s-t_M)^2}\}$$ form a T-system if and only if the function   ∑i=1Maietis+es2 has at most $$M$$ zeros. However, for $$M = 3$$, the function has four zeros if we select $$a_1 = -3$$, $$t_1 = 1$$, $$a_2 = 7$$, $$t_2 = 0$$, $$a_3 = -5$$ and $$t_3 = -1$$. Lemma 2.7 Let $$d_1,...,d_r\in\mathbb N$$. The function   ϕd1,...,dr(s)=∑i=1r(ai0+ai1s+⋯+ai(2di−1)s2di−1)etis+es2 has at most $$2(d_1 + \cdots + d_r)$$ zeros. Proof. We are going to show that $$\phi_{d_1,...,d_r}(s)$$ has at most $$2(d_1 + \cdots + d_r)$$ zeros as follows. Let   g0(s)=ϕd1,...,dr(s). For $$k=1, ..., d_1 + \cdots + d_r$$, let   gk(s)={d2ds2[gk−1(s)e(−tj+t1+⋯+tj−1)s], if k=d1+⋯+dj−1+1 for some j,d2ds2[gk−1(s)], otherwise.  (2.11) If we show that $$g_{d_1 + \cdots + d_r}(s)$$ has no zeros, then $$g_{d_1 + \cdots + d_r-1}(s)$$ has at most two zeros, counting with multiplicity. By induction, it will follow that $$g_0(s)$$ has at most $$2(d_1 + \cdots + d_r)$$ zeros, counting with multiplicity. Note that if $$d_1 + \cdots + d_{j-1} \leq k < d_1 + \cdots + d_{j-1} + d_j$$, then   gk(s)= (a~j,2(k−d1+⋯+dj−1)+⋯+a~j,(2dj−1)s2dj−1−2(k−d1+⋯+dj−1)) +∑i=j+1r(a~i0+⋯+a~i(2di−1)s2di−1)e(ti−(t1+⋯+tj−1))r+cfi(r)er2, where $$c>0$$ is a constant and $$r := s - c_i$$. We are going to show that $$f_i(r)$$ is a sum of squares polynomial such that one of the squares is a positive constant. This would mean that $$g_k(s) = f_k(s)\,{\rm e}^{s^2}$$ has no zeros. Denote   p0(s) =1p1(s) =2s ⋮pi(s) =2spi−1(s−ci)+pi−1′(s−ci), where $$c_1, ..., c_k$$ are constants. It follows by induction that the degree of $$p_i(s)$$ is $$\deg(p_i) = i$$ and the leading coefficient of $$p_i(s)$$ is $$2^i$$. We will show by induction that   fi(s) =pi(s)2+12pi′(s)2+⋯+12ii!pi(i)(s)2 =∑j=0i12jj!pi(j)(s)2. When $$i=0$$, we have that $$f_0(s) = 1$$ and $$\sum_{j=0}^0 \frac1{2^jj!}p_0^{(j)}(s)^2 = 1$$. We are going to prove the general statement by induction. Suppose the statement is true for $$i-1$$. By the relationship (2.11), we have   fi(s)es2=d2ds2[es2fi−1(s−ci)]=d2ds2[es2∑j=0i−112jj!pi−1(j)(s−ci)2]= ∑j=0i−1es22jj!{2pi−1(j+2)(s−ci)pi−1(j)(s−ci)+2pi−1(j+1)(s−ci)2  +(4s2+2)pi−1(j)(s−ci)2+8spi−1(j)(s−ci)pi−1(j+1)(s−ci)}. (2.12) We need to show that this expression is equal to $$\,{\rm e}^{s^2}\left(\sum_{j=0}^{i} \frac{p_{i}^{(j)}(s)^2}{2^jj!} \right)$$. Since   pi(s)=2spi−1(s−ci)+pi−1′(s−ci), it follows by induction that $$p_i^{(j)}(s) = 2jp_{i-1}^{(j-1)}(s-c_i) + 2sp_{i-1}^{(j)}(s-c_i) + p_{i-1}^{(j+1)}(s-c_i).$$ Therefore we obtain   es2(∑j=0ipi(j)(s)22jj!) =es2∑j=0i12jj![2jpi−1(j−1)(s−ci)+2spi−1(j)(s−ci)+pi−1(j+1)(s−ci)]2 =es2∑j=0i12jj![4j2pi−1(j−1)(s−ci)2+4s2pi−1(j)(s−cI)2+pi−1(j+1)(s−ci)2  +8jspi−1(j−1)(s−ci)pi−1(j)(s−ci) + 4spi−1(j)(s−ci)pi−1(j+1)+4jpi−1(j−1)(s−ci)pi−1(j+1)(s−ci)]. (2.13) There are four types of terms in the sums (2.12) and (2.13):   pi−1(j)(s−ci)2,s2pi−1(j)(s−ci)2,pi−1(j−1)(s−ci)pi−1(j)(s−ci) and spi−1(j−1)(s−ci)pi−1(j)(s−ci). For a fixed $$j\in\{0, 1, ..., i+1\}$$, it is easy to check that the coefficients in front of each of these terms in (2.12) and (2.13) are equal. Therefore,   fi(s) =pi(s)2+12pi′(s)2+⋯+12ii!pi(i)(s)2 =∑j=0i12jj!pi(j)(s)2. Note that since $$\deg(p_i) = i$$, then, $$p_i^{(i)}(s)$$ equals the leading coefficient of $$p_i(s)$$, which, as we discussed above, equals $$2^i$$. Therefore, the term $$\frac1{2^i i!} p_i^{(i)}(s)^2 = 2^i i!$$. Thus, one of the squares in $$f_i(s)$$ is a positive number, so $$f_i(s) > 0$$ for all $$s$$. □ 3. Numerical experiments In this section, we present the results of several numerical experiments to complement our theoretical results. To allow for potentially noisy observations, we solve the constrained least squares problem   minimizeμ≥0 ∑i=1n(∫ψ(si,t)dμ(t)−x(si))2subject to ∫w(t)μ(dt)≤τ (3.1) using the conditional gradient method proposed in [12]. 3.1 Reweighing matters for source localization Our first numerical experiment provides evidence that weighting by $$w(t)$$ helps recover point sources near the border of the image. This matches our intuition: near the border, the mass of an observed point source is smaller than if it were measured in the center of the image. Hence, if we did not weight the candidate locations, sources that are close to the edge of the image would be beneficial to add to the representation. We simulate two populations of images, one with point sources located away from the image boundary and one with point sources located near the image boundary. For each population of images, we solve (3.1) with $$w(t)=\int \psi(s,t) \,{\rm d}P(s)$$ (weighted) and with $$w(t) = 1$$ (unweighted). We find that the solutions to (3.1) recover the true point sources more accurately with $$w(t) = \int \psi(s,t) \,{\rm d}P(s)$$. We use the same procedure for computing accuracy as in [10]. Namely, we match true point sources to estimated point courses and compute the F-score of the match. To describe this procedure in detail, we compute the F-score by solving a bipartite graph matching problem. In particular, we form the bipartite graph with an edge between $$t_i$$ and $$\hat t_j$$ for all $$i,j$$ such that $$\Vert t_i - \hat t_j \Vert < r$$, where $$r>0$$ is a tolerance parameter, and $$\hat t_1, \ldots, \hat t_N$$ are the estimated point sources. Then we greedily select edges from this graph under the constraint that no two selected edges can share the same vertex; that is, no $$t_i$$ can be paired with two $$\hat t_{j}, \hat t_k$$ or vice versa. Finally, the $$\hat t_i$$ successfully paired with some $$t_j$$ are categorized as true positives, and we denote their number by $$T_{\rm P}$$. The number of false negatives is $$F_{\rm N} = M - T_{\rm P}$$, and the number of false positives is $$N - T_{\rm P}$$. The precision and recall are then $$P = \frac{T_{\rm P}}{T_{\rm P} + F_{\rm N}},$$ and $$R = \frac{T_{\rm P}}{T_{\rm P} + F_{\rm P}}$$, respectively, and the F-score is the harmonic mean:   F=2PRP+R. We find a match by greedily pairing points of $$\{\tau_1,\ldots,\tau_N\}$$ to elements of $$\{t_1,\ldots,t_M\}$$, and a tolerance radius $$r>0$$ upper bounds the allow distance between any potential pairs. To emphasize the dependence on $$r$$, we sometimes write $$F(r)$$ for the F-score. Both populations contain $$100$$ images simulated using the Gaussian point spread function   ψ(s,t)=e−(s−t)2σ2 with $$\sigma = 0.1$$, and in both cases, the measurement set $${\mathcal S}$$ is a dense uniform grid of $$n = 100$$ points covering $$[0,1]$$. The populations differ in how the point sources for each image are chosen. Each image in the first population has five points drawn uniformly in the interval $$(0.1,0.9)$$, while each simulated image in the second population has a total of four point sources with two point sources in each of the two boundary regions $$(0,0.1)$$ and $$(0.9,1)$$. In both cases, we assign intensity of $$1$$ to all point sources and solve (3.1) using an optimal value of $$\tau$$ (chosen with a preliminary simulation). The results are displayed in Fig. 5. The left subplot shows that the F-scores are essentially the same for the weighted and unweighted problems when the point sources are away from the boundary. This is not surprising because when $$t$$ is away from the border of the image, then $$\int \psi(s,t)\,{\rm d}P(s)$$ is essentially a constant, independent of $$t$$. But when the point sources are near the boundary, the weighting matters and the F-scores are dramatically better as shown in the right subplot. Fig. 5 View largeDownload slide Reweighing matters for source localization. The two plots above compare the quality of solutions to the weighted problem (with $$w(t) = \int \psi(s,t)\,{\rm d}P(s)$$) and the unweighted problem (with $$w(t) = 1$$). When point sources are away from the boundary (left plot), the performance is nearly identical. But when the point sources are near the boundary (right plot), the weighted method performs significantly better. Fig. 5 View largeDownload slide Reweighing matters for source localization. The two plots above compare the quality of solutions to the weighted problem (with $$w(t) = \int \psi(s,t)\,{\rm d}P(s)$$) and the unweighted problem (with $$w(t) = 1$$). When point sources are away from the boundary (left plot), the performance is nearly identical. But when the point sources are near the boundary (right plot), the weighted method performs significantly better. 3.2 Sensitivity to point-source separation Our theoretical results assert that in the absence of noise, the optimal solution of (1.3) recovers point sources with no minimum bound on the separation. In the following experiment, we explore the ability of (3.1) to recover pairs of points as a function of their separation. The setup is similar to the first numerical experiment. We use the Gaussian point spread function with $$\sigma = 0.1$$ as before, but here we observe only $$n=50$$ samples. For each separation $$d\in\{0.1\sigma, 0.2\sigma,\ldots,1.9\sigma, 2\sigma\}$$, we simulate a population of $$20$$ images containing two point sources separated by $$d$$. The point sources are chosen by picking a random point $$x$$ away from the border of the image and placing two point sources at $$x\pm \frac d 2$$. Again, each point source is assigned an intensity of $$1$$, and we attempt to recover the locations of the point sources by solving (3.1). In the left subplot of Fig. 6, we plot F-score versus separation for the value of $$\tau$$ that produces the best F-scores. Note that we achieve near perfect recovery for separations greater than $$\frac \sigma 4$$. The right subplot of Fig. 6 shows the observations, true point sources and estimated point sources for a separation of $$\frac d \sigma = \frac 1 2$$. Note the near perfect recovery in spite of the small separation. Fig. 6 View largeDownload slide Sensitivity to point-source separation. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace shows an image for $$\frac d \sigma = \frac 1 2$$. The green stars show the locations (x coordinate) and weights (y coordinate) of the true point sources. The red dots show the recovered locations and weights. Fig. 6 View largeDownload slide Sensitivity to point-source separation. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace shows an image for $$\frac d \sigma = \frac 1 2$$. The green stars show the locations (x coordinate) and weights (y coordinate) of the true point sources. The red dots show the recovered locations and weights. Because of numerical issues, we cannot localize point sources with arbitrarily small $$d>0$$. Indeed, the F-score for $$\frac d \sigma < \frac 1 4$$ is quite poor. This does not contradict our theory because numerical ill-conditioning is in effect adding noise to the recovery problem, and we expect that a separation condition will be necessary in the presence of noise. 3.3 Sensitivity to noise Next, we investigate the performance of (3.1) in the presence of additive noise. The setup is identical to the previous numerical experiment, except that we add Gaussian noise to the observations. In particular, our noisy observations are   {x(si)+ηi|si∈S}, where $$\eta_i\sim {\mathcal N}(0,0.1)$$. We measure the performance of (3.1) in Fig. 7. Note that we achieve near-perfect recovery when $$d> \sigma$$. However, if $$d< \sigma$$, the F-scores are clearly worse than the noiseless case. Unsurprisingly, we observe that sources must be separated to recover their locations to reasonable precision. We defer an investigation of the dependence of the signal separation as a function of the signal-to-noise ratio to future work. Fig. 7 View largeDownload slide Sensitivity to noise. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace is the $$50$$ pixel image we observe. The green stars show the locations (x-coordinate) and weights (y-coordinate) of the true point sources. The red dots show the recovered locations and weights. Fig. 7 View largeDownload slide Sensitivity to noise. (a) The F-score at tolerance radius $$r=0.1$$ as a function of normalized separation $$\frac d \sigma$$. (b) The black trace is the $$50$$ pixel image we observe. The green stars show the locations (x-coordinate) and weights (y-coordinate) of the true point sources. The red dots show the recovered locations and weights. 3.4 Extension to two dimensions Though our proof does not extend as is, we do expect generalizations of our recovery result to higher dimensional settings. The optimization problem (3.1) extends immediately to arbitrary dimensions, and we have observed that it performs quite well in practice. We demonstrate in Fig. 8 the power of applying (3.1) to a high-density fluorescence image in simulation. Figure 8 shows an image simulated with parameters specified by the single molecule localization microscopy challenge [?]. In this challenge, point sources are blurred by a Gaussian point-spread function and then corrupted by noise. The green stars show the true locations of a simulated collection of point sources, and the red dots show the support of the measure output by (3.1) applied to the grayscale image forming the background of Fig. 8. The overlap between the true locations and estimated locations is near perfect with an F-score of $$0.98$$ for a tolerance radius corresponding to one third of a pixel. Fig. 8 View largeDownload slide High density single molecule imaging. The green stars show the locations of a simulated collection point sources and the grayscale background shows the noisy, pixelated point spread image. The red dots show the support of the measure-valued solution of (3.1). Fig. 8 View largeDownload slide High density single molecule imaging. The green stars show the locations of a simulated collection point sources and the grayscale background shows the noisy, pixelated point spread image. The red dots show the support of the measure-valued solution of (3.1). 4. Conclusions and future work In this article, we have demonstrated that one can recover the centers of a non-negative sum of Gaussians from a few samples by solving a convex optimization problem. This recovery is theoretically possible no matter how close the true centers are to one another. We remark that similar results are true for recovering measures from their moments. Indeed, the atoms of a positive atomic measure can be recovered no matter how close together the atoms are, provided one observes twice the number of moments as there are atoms. Our work can be seen as a generalization of this result, applying generalized polynomials and the theory of Tchebycheff systems in place of properties of Vandermonde systems. As we discussed in our numerical experiments, this work opens up several theoretical problems that would benefit from future investigation. We close with a very brief discussion of some of the possible extensions. Noise Motivated by the fact that there is no separation condition in the absence of noise, it would be interesting to study how the required separation decays to zero as the noise level decreases. One of the key advantages of using convex optimization for signal processing is that dual certificates generically give stability results, in the same way that Lagrange multipliers measure sensitivity in linear programming. Previous work on estimating line spectra has shown that dual polynomials constructed for noiseless recovery extend to certify properties of estimation and localization in the presence of noise [14,34,57]. We believe that these methods should be directly applicable to our problem setup. Higher dimensions One logical extension is proving that the same results hold in higher dimensions. Most scientific and engineering applications of interest have point sources arising one to four dimensions, and we expect that some version of our results should hold in higher dimensions. Indeed, we believe a guarantee for recovery with no separation condition can be proven in higher dimensions with noiseless observations. However, it is not straightforward to extend our results to higher dimensions because the theory of Tchebycheff systems is only developed in one dimension. In particular, our approach using limits of polynomials does not directly generalize to higher dimensions. Other point spread functions We have shown that our Conditions 1.3 hold for the Gaussian point spread function, which is commonly used in microscopy as an approximation to an Airy function. It will be very useful to show that they also hold for other point spread functions such as the Airy function and other common physical models. Our proof relied heavily on algebraic properties of the Gaussian, but there is a long, rich history of determinantal systems that may apply to generalize our result. In particular, works on properties of totally positive systems may be fruitful for such generalizations [1,47]. Model mismatch in the point spread function Our analysis relies on perfect knowledge of the point spread function. In practice one never has an exact analytic expression for the point spread function. Aberrations in manufacturing and scattering media can lead to distortions in the image not properly captured by a forward model. It would be interesting to derive guarantees on recovery that assume only partial knowledge of the point spread function. Note that the optimization problem of searching both for the locations of the sources and for the associated wave function is a blind deconvolution problem, and techniques from this well-studied problem could likely be extended to the super resolution setting. If successful, such methods could have immediate practical impact when applied to denoising images in molecular, cellular and astronomical imaging. Acknowledgements We would like to thank Pablo Parrilo for introducing us to the theory of T-systems. We would also like to thank Nicholas Boyd, Mahdi Soltanolkotabi, Bernd Sturmfels and Gongguo Tang for many useful conversations about this work. Funding ONR awards (N00014-11-1-0723 and N00014-13-1-0129 to B.R.); NSF awards (CCF-1148243 and CCF-1217058 to B.R.); AFOSR award (FA9550-13-1-0138 to B.R.); a Sloan Research Fellowship (to B.R.); NSF award (CCF-1148243 to G.S.).; NSF CISE Expeditions Award (CCF-1139158, in part); LBNL Award (7076018); DARPA XData Award (FA8750-12-2-0331); gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Adatao, Adobe, Apple, Inc., Blue Goji, Bosch, C3Energy, Cisco, Cray, Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica, Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware. Footnotes Tchebycheff is one among many transliterations from the name originally written in cyrillic. Others include Chebyshev, Chebychev and Cebysev. References 1. Ando, T. ( 1987) Totally positive matrices. Linear Algebra Appl. , 90, 165– 219. Google Scholar CrossRef Search ADS   2. Bajwa, W. U., Haupt, J., Sayeed, A. M. & Nowak, R. ( 2010) Compressed channel sensing: a new approach to estimating sparse multipath channels. Proc. IEEE , 98, 1058– 1076. Google Scholar CrossRef Search ADS   3. Baraniuk, R. ( 2007) Compressive sensing [lecture notes]. IEEE Sig. Process Mag ., 24, 118– 121. Google Scholar CrossRef Search ADS   4. Baraniuk, R. & Steeghs, P. ( 2007) Compressive radar imaging. Radar Conference, 2007 IEEE . Waltham, MA: IEEE, pp. 128– 133. 5. Batenkov, D. & Yomdin, Y. ( 2012) Algebraic fourier reconstruction of piecewise smooth functions. Math. Comput. , 81, 277– 318. Google Scholar CrossRef Search ADS   6. Bendory, T. ( 2017) Robust recovery of positive stream of pulses. IEEE Trans. Signal Process , 65, 2114– 2122. Google Scholar CrossRef Search ADS   7. Bendory, T., Dekel, S. & Feuer, A. ( 2014) Robust recovery of stream of pulses using convex optimization. J Math Anal Appl. , 442, 511– 536. Google Scholar CrossRef Search ADS   8. Beurling, A. ( 1938) Sur les intégrales de fourier absolument convergentes et leur application à une transformation fonctionnelle. Ninth Scandinavian Mathematical Congress . pp. 345– 366. 9. Bhaskar, B. N., Tang, G. & Recht, B. ( 2013) Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Sig. Process. , 61, 5987– 5999. Google Scholar CrossRef Search ADS   10. Sage, D., Kirshner, H., Pengo, T., Stuurman, N., Min, J., Manley, S. & Unser, M. ( 2015) Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods , 12, 717– 724. Google Scholar CrossRef Search ADS PubMed  11. Bonifacino, J. S., Olenych, S., Davidson, M. W., Lippincott-Schwartz, J. & Hess, H. F. ( 2006) Imaging intracellular fluorescent proteins at nanometer resolution. Science , 313, 1642– 1645. Google Scholar CrossRef Search ADS PubMed  12. Boyd, N., Schiebinger, G. & Recht, B. ( 2015) The alternating descent conditional gradient method for sparse inverse problems. SIAM J Optimiz. , 27, 616– 639. Google Scholar CrossRef Search ADS   13. Bredies, K. & Pikkarainen, H. K. ( 2013) Inverse problems in spaces of measures. ESAIM Control Optimisation Calculus Var. , 19, 190– 218. Google Scholar CrossRef Search ADS   14. Candès, E. J. & Fernandez-Granda, C. ( 2013) Super-resolution from noisy data. J. Fourier Anal. Appl. , 19, 1229– 1254. Google Scholar CrossRef Search ADS   15. Candès, E. J. & Fernandez-Granda, C. ( 2013) Towards a mathematical theory of super resolution. Comm. Pure Appl. Math , 67, 906– 956. Google Scholar CrossRef Search ADS   16. Candès, E. J., Romberg, J. & Tao, T. ( 2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Thy. , 52, 489– 509. Google Scholar CrossRef Search ADS   17. Candès, E. J. & Wakin, M. ( 2008) An introduction to compressive sampling. IEEE Sig. Process. Mag. , 25, 21– 30. Google Scholar CrossRef Search ADS   18. Carathéodory, C. ( 1907) Ueber den Variabilitaetsbereich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehman. Math. Ann. , 64, 95– 115. Google Scholar CrossRef Search ADS   19. Carathéodory, C. ( 1911) Ueber den Variabilitaetsbereich der Fourier’schen Konstanten von positiven harmonischen Funktionen. Rend. Circ. Mat. , 32, 193– 217. Google Scholar CrossRef Search ADS   20. Cetin, M., Malioutov, D. & Willsky, A. S. ( 2005) A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Sig. Process ., 53, 3010– 3022. Google Scholar CrossRef Search ADS   21. Chandrasekaran, V., Recht, B., Parrilo, P. A. & Willsky, A. ( 2012) The convex geometry of linear inverse problems. Found. Comput. Math. , 12, 805– 849. Google Scholar CrossRef Search ADS   22. De Castro, Y. & Gamboa, F. ( 2012) Exact reconstruction using beurling minimal extrapolation. J Math Anal Appl. , 395, 336– 354. Google Scholar CrossRef Search ADS   23. de Prony, B. G. R. ( 1795) Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool, à différentes températures. Journal de l’école Polytechnique , 1, 24– 76. 24. Donoho, D. ( 1992) Superresolution via sparsity constraints. SIAM J. Math. Anal. , 23, 1309– 1331. Google Scholar CrossRef Search ADS   25. Donoho, D. ( 2006) Compressed sensing. IEEE Trans. Inf. Thy. , 52, 1289– 1306. Google Scholar CrossRef Search ADS   26. Donoho, D. & Stark, P. ( 1989) Uncertainty principles and signal recovery. SIAM J. Appl. Math , 49, 906– 931. Google Scholar CrossRef Search ADS   27. Dragotti, P., Vetterli, M. & Blu, T. ( 2007) Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets strang-fix. IEEE Trans. Sig. Process. , 55, 1741– 1757. Google Scholar CrossRef Search ADS   28. Duarte, M. & Baraniuk, R. ( 2013) Spectral compressive sensing. Appl. Comput. Harmon. Anal. , 35, 111– 129. Google Scholar CrossRef Search ADS   29. Duval, V. & Peyré, G. ( 2015) Exact support recovery for sparse spikes deconvolution. Found. Comput. Math. , 15, 1315– 1355. Google Scholar CrossRef Search ADS   30. Eckhoff, K. S. ( 1995) Accurate reconstructions of functions of finite regularity from truncated fourier series expansions. Math. Comput ., 64, 671– 690. Google Scholar CrossRef Search ADS   31. Ekanadham, C., Tranchina, D. & Simoncelli, E. P. ( 2011) Neural Spike Identifcation with Continuous Basis Pursuit.  Salt Lake City, Utah: Computational and Systems Neuroscience (CoSyNe). 32. Evanko, D. ( 2009) Primer: fluorescence imaging under the diffraction limit. Nat. Methods , 6, 19– 20. Google Scholar CrossRef Search ADS   33. Fannjiang, A. C., Strohmer, T. & Yan, P. ( 2010) Compressed remote sensing of sparse objects. SIAM J. Imag. Sci. , 3, 595– 618. Google Scholar CrossRef Search ADS   34. Fernandez-Granda, C. ( 2013) Support detection in super-resolution. Proceedings of the 10th International Conference on Sampling Theory and Applications (SampTA 2013), 145– 148. 35. Friedlander, M. P., Mansour, H., Saab, R. & Yilmaz, O. ( 2012) Recovering compressively sampled signals using partial support information. IEEE Trans. Inf. Theory , 58, 1122– 1134. Google Scholar CrossRef Search ADS   36. Fuchs, J.-J. ( 2005) Sparsity and uniqueness for some specific under-determined linear systems. Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05). IEEE International Conference on. 37. Gantmacher, F. P. & Krein, M. G. ( 2002) Oscillation Matrices and Kernels and Small Vibrations of Mechanical Systems , revised english edn. Providence, RI: AMS Chelsea Pub. Google Scholar CrossRef Search ADS   38. Heckel, R., Morgenshtern, V. & Soltanolkotabi, M. ( 2016) Super-resolution radar. Inf inference , 5, 22– 75. Google Scholar CrossRef Search ADS   39. Herman, M. A. & Strohmer, T. ( 2009) High-resolution radar via compressed sensing. IEEE Trans. Sig. Process. , 57, 2275– 2284. Google Scholar CrossRef Search ADS   40. Hess, S. T., Giriajan, T. P. & Mason, M. D. ( 2006) Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. , 91, 4258– 4272. Google Scholar CrossRef Search ADS PubMed  41. Karlin, S. ( 1968) Total Positivity , vol. 1. Stanford, CA: Stanford University Press. 42. Karlin, S. & Studden, W. ( 1967) Tchebycheff Systems: with Applications in Analysis and Statistics . New York, London, Sydney: Wiley Interscience. 43. Khajehnejad, M. A., Xu, W., Avestimehr, A. S. & Hassibi, B. ( 2011) Analyzing weighted minimization for sparse recovery with nonuniform sparse models. IEEE Trans. Sig. Process. , 59, 1985– 2001. Google Scholar CrossRef Search ADS   44. Krein, M. G. ( 1959) The ideas of P.L. Tchebycheff and A.A. Markov in the theory of limiting values of integrals and their futher development. Am. Math. Soc. Transl. Ser. 2. , 12, 1– 122. 45. Morgenshtern, V. I. & Candès, E. J. ( 2015) Super-resolution of positive sources: the discrete setup. SIAM J. Imaging Sci. , 9, 412– 444. Google Scholar CrossRef Search ADS   46. Nobelprize.org ( 2014) The Nobel Prize in Chemistry . 47. Pinkus, A. ( 2010) Totally Positive Matrices , vol. 181. Cambridge University Press. 48. Puschmann, K. G. & Kneer, F. ( 2005) On super-resolution in astronomical imaging. Astron. Astrophys. , 436, 373– 378. Google Scholar CrossRef Search ADS   49. Rauhut, H. ( 2007) Random sampling of sparse trigonometric polynomials. Appl. Comput. Hamon. Anal. , 22, 16– 42. Google Scholar CrossRef Search ADS   50. Rauhut, H. & Ward, R. ( 2015) Interpolation via weighted $$l_1$$ minimization. Appl. Comput. Harmon. Anal. , To Appear. Preprint available at arxiv:1308.0759. 51. Rust, M. J., Bates, M. & Zhuang, X. ( 2006) Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm). Nat. Methods , 3, 793– 796. Google Scholar CrossRef Search ADS PubMed  52. Shah, P., Bhaskar, B., Tang, G. & Recht, B. ( 2012) Linear system identification via atomic norm regularization. In  Decision and Control (CDC), 2012 IEEE 51st Annual Conference on. 10-13 Dec. 2012. Maui, HI: IEEE. 53. Shapiro, A. ( 2009) Semi-infinite programming, duality, discretization and optimality conditions. Optim. , 58, 133– 161. Google Scholar CrossRef Search ADS   54. Stoica, P. & Babu, P. ( 2012) Spice and likes: Two hyperparameter-free methods for sparse-parameter estimation. Sig. Process , 92, 1580– 1590. Google Scholar CrossRef Search ADS   55. Stoica, P., Babu, P. & Li, J. ( 2011) New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data. IEEE Trans. Sig. Process. , 59, 35– 47. Google Scholar CrossRef Search ADS   56. Tan, V. & Goyal, V. ( 2008) Estimating signals with finite rate of innovation from noisy samples: A stochastic algorithm. IEEE Trans. Sig. Process. , 56, 5135– 5146. Google Scholar CrossRef Search ADS   57. Tang, G., Bhaskar, B. & Recht, B. ( 2015) Near minimax line spectral estimation. IEEE Trans. Inf. Theor. , 61, 499– 512. Google Scholar CrossRef Search ADS   58. Tang, G., Bhaskar, B. N., Shah, P. & Recht, B. ( 2013) Compressed sensing off the grid. IEEE Trans. Inf. Theor. , 59, 7465– 7490. Google Scholar CrossRef Search ADS   59. Tchebycheff, P. L. ( 1887) On two theorems with respect to probabilities. Zap. Akad. Nauk S.-Petersburg , 55, 156– 168. 60. Vaswani, N. & Lu, W. ( 2010) Modified-cs: Modifying compressive sensing for problems with partially known support. IEEE Trans. Sig. Process.,  58, 4595– 4607. Google Scholar CrossRef Search ADS   61. Vetterli, M., Marziliano, P. & Blu, T. ( 2002) Sampling signals with finite rate of innovation. IEEE Trans. Sig. Process. , 50, 1417– 1428. Google Scholar CrossRef Search ADS   62. Von Borries, R., Miosso, C. J. & Potes, C. ( 2007) Compressed sensing using prior information. In Computational Advances in Multi-Sensor Adaptive Processing, 2007. CAMPSAP 2007. 2nd IEEE International Workshop on . St. Thomas, VI: IEEE, pp. 121– 124. 63. Zhu, L., Zhang, W., Elnatanand, D. & Huang, B. ( 2012) Faster storm using compressed sensing. Nat. Methods , 9, 721– 723. Google Scholar CrossRef Search ADS PubMed  © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com

### Journal

Information and Inference: A Journal of the IMAOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Unlimited reading Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere. ### Stay up to date Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates. ### Organize your research It’s easy to organize your research with our built-in tools. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. ### Monthly Plan • Read unlimited articles • Personalized recommendations • No expiration • Print 20 pages per month • 20% off on PDF purchases • Organize your research • Get updates on your journals and topic searches$49/month

14-day Free Trial

Best Deal — 39% off

### Annual Plan

• All the features of the Professional Plan, but for 39% off!
• Billed annually
• No expiration
• For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588$360/year

billed annually

14-day Free Trial