# One-bit compressive sensing of dictionary-sparse signals

One-bit compressive sensing of dictionary-sparse signals Abstract One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples—only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sparse in some fixed orthonormal basis. However, in most practical applications, signals are sparse with respect to an overcomplete dictionary, rather than a basis. There has already been a surge of activity to obtain recovery guarantees under such a generalized sparsity model in the classical compressive sensing setting. Here, we extend the one-bit framework to this important model, providing a unified theory of one-bit compressive sensing under dictionary sparsity. Specifically, we analyze several different algorithms—based on convex programming and on hard thresholding—and show that, under natural assumptions on the sensing matrix (satisfied by Gaussian matrices), these algorithms can efficiently recover analysis–dictionary-sparse signals in the one-bit model. 1. Introduction The basic insight of compressive sensing is that a small number of linear measurements can be used to reconstruct sparse signals. In traditional compressive sensing, we wish to reconstruct an $$s$$-sparse1 signal $${\bf x} \in \mathbb{R}^N$$ from linear measurements of the form   $$\label{meas} {\bf y} = {\bf A}{\bf x} \in \mathbb{R}^m \qquad\text{(or its corrupted version {\bf y} = {\bf A}{\bf x} + {\bf e})},$$ (1.1) where $${\bf A}$$ is an $$m\times N$$ measurement matrix. A significant body of work over the past decade has demonstrated that the $$s$$-sparse (or nearly $$s$$-sparse) signal $${\bf x}$$ can be accurately and efficiently recovered from its measurement vector $${\bf y} = {\bf A}{\bf x}$$ when $${\bf A}$$ has independent Gaussian entries, say, and when $$m \asymp s\log(N/s)$$ [1,12,15]. This basic model has been extended in several directions. Two important ones—which we focus on in this work—are (a) extending the set of signals to include the larger and important class of dictionary- sparse signals, and (b) considering highly quantized measurements as in one-bit compressive sensing. Both of these settings have important practical applications and have received much attention in the past few years. However, to the best of our knowledge, they have not been considered together before. In this work, we extend the theory of one-bit compressive sensing to dictionary-sparse signals. Below, we briefly review the background on these notions, set up notation and outline our contributions. 1.1 One-bit measurements In practice, each entry $$y_i = \langle {\bf a}_i, {\bf x}\rangle$$ (where $${\bf a}_i$$ denotes the $$i$$th row of $${\bf A}$$) of the measurement vector in (1.1) needs to be quantized. That is, rather than observing $${\bf y}={\bf A}{\bf x}$$, one observes $${\bf y} = Q({\bf A}{\bf x})$$ instead, where $$Q: \mathbb{R}^m \rightarrow \mathscr{A}$$ denotes the quantizer that maps each entry of its input to a corresponding quantized value in an alphabet $$\mathscr{A}$$. The so-called one-bit compressive sensing [5] problem refers to the case when $$|\mathscr{A}| = 2$$, and one wishes to recover $${\bf x}$$ from its heavily quantized (one bit) measurements $${\bf y} = Q({\bf A}{\bf x})$$. The simplest quantizer in the one-bit case uses the alphabet $$\mathscr{A} = \{-1, 1\}$$ and acts by taking the sign of each component as   $$\label{eq:quantized} y_i = Q(\langle {\bf a}_i, {\bf x}\rangle) = \mathrm{sgn}(\langle {\bf a}_i, {\bf x}\rangle),$$ (1.2) which we denote in shorthand by $${\bf y} = \mathrm{sgn}({\bf A}{\bf x})$$. Since the publication of [5] in 2008, several efficient methods, both iterative and optimization based, have been developed to recover the signal $${\bf x}$$ (up to normalization) from its one-bit measurements (see e.g. [17–19,24,25,30]). In particular, it is shown [19] that the direction of any $$s$$-sparse signal $${\bf x}$$ can be estimated by some $$\hat{{\bf x}}$$ produced from $${\bf y}$$ with accuracy   $$\left\| \frac{{\bf x}}{\|{\bf x}\|_2} - \frac{\hat{{\bf x}}}{\|\hat{{\bf x}}\|_2}\right\|_2 \leq \varepsilon$$ when the number of measurements is at least   $$m = {\it {\Omega}}\left(\frac{s \ln(N/s)}{\varepsilon} \right)\!.$$ Notice that with measurements of this form, we can only hope to recover the direction of the signal, not the magnitude. However, we can recover the entire signal if we allow for thresholded measurements of the form   $$\label{eq:quantizeddither} y_i = \mathrm{sgn}(\langle {{{\bf a}_i}}, {{{\bf x}}} \rangle - \tau_i).$$ (1.3) In practice, it is often feasible to obtain quantized measurements of this form, and they have been studied before. Existing works using measurements of the form (1.3) have also allowed for adaptive thresholds; that is, the $$\tau_i$$ can be chosen adaptively based on $$y_j$$ for $$j < i$$. The goal of those works was to improve the convergence rate, i.e. the dependence on $$\varepsilon$$ in the number of measurements $$m$$. It is known that a dependence of $${\it {\Omega}}(1/\varepsilon)$$ is necessary with non-adaptive measurements, but recent work on Sigma-Delta quantization [28] and other schemes [2,20] have shown how to break this barrier using measurements of the form (1.3) with adaptive thresholds. In this article, we neither focus on the decay rate (the dependence on $$\varepsilon$$) nor do we consider adaptive measurements. However, we do consider non-adaptive measurements both of the form (1.2) and (1.3). This allows us to provide results on reconstruction of the magnitude of signals, and the direction. 1.2 Dictionary sparsity Although the classical setting assumes that the signal $${\bf x}$$ itself is sparse, most signals of interest are not immediately sparse. In the straightforward case, a signal may be instead sparse after some transform; for example, images are known to be sparse in the wavelet domain, sinusoidal signals in the Fourier domain, and so on [9]. Fortunately, the classical framework extends directly to this model, since the product of a Gaussian matrix and an orthonormal basis is still Gaussian. However, in many practical applications, the situation is not so straightforward, and the signals of interest are sparse, not in an orthonormal basis, but rather in a redundant (highly overcomplete) dictionary; this is known as dictionary sparsity. Signals in radar and sonar systems, for example, are sparsely represented in Gabor frames, which are highly overcomplete and far from orthonormal [13]. Images may be sparsely represented in curvelet frames [6,7], undecimated wavelet frames [29] and other frames, which by design are highly redundant. Such redundancy allows for sparser representations and a wider class of signal representations. Even in the Fourier domain, utilizing an oversampled DFT allows for much more realistic and practical signals to be represented. For these reasons, recent research has extended the compressive sensing framework to the setting, where the signals of interest are sparsified by overcomplete tight frames (see e.g. [8,14,16,27]). Throughout this article, we consider a dictionary $${\bf D} \in \mathbb{R}^{n \times N}$$, which is assumed to be a tight frame, in the sense that   ${\bf D} {\bf D}^* = {\bf I}_n.$ To distinguish between the signal and its sparse representation, we write $${\bf f}\in\mathbb{R}^n$$ for the signal of interest and $${\bf f}={\bf D}{\bf x}$$, where $${\bf x}\in\mathbb{R}^N$$ is a sparse coefficient vector. We then acquire the samples of the form $${\bf y} = {\bf A}{\bf f} = {\bf A}{\bf D}{\bf x}$$ and attempt to recover the signal $${\bf f}$$. Note that, due to the redundancy of $${\bf D}$$, we do not hope to be able to recover a unique coefficient vector $${\bf x}$$. In other words, even when the measurement matrix $${\bf A}$$ is well suited for sparse recovery, the product $${\bf A}{\bf D}$$ may have highly correlated columns, making recovery of $${\bf x}$$ impossible. With the introduction of a non-invertible sparsifying transform $${\bf D}$$, it becomes important to distinguish between two related but distinct notions of sparsity. Precisely, we say that $${\bf f}$$ is $$s$$-synthesis sparse if $${\bf f} = {\bf D} {\bf x}$$ for some $$s$$-sparse $${\bf x} \in \mathbb{R}^N$$; $${\bf f}$$ is $$s$$-analysis sparse if $${\bf D}^* {\bf f} \in \mathbb{R}^N$$ is $$s$$-sparse. We note that analysis sparsity is a stronger assumption, because, assuming analysis sparsity, one can always take $${\bf x} = {\bf D}^* {\bf f}$$ in the synthesis sparsity model. See [11] for an introduction to the analysis-sparse model in compressive sensing (also called the analysis cosparse model). Instead of exact sparsity, it is often more realistic to study effective sparsity. We call a coefficient vector $${\bf x} \in \mathbb{R}^N$$ effectively $$s$$-sparse if   $$\|{\bf x}\|_1 \le \sqrt{s} \|{\bf x}\|_2,$$ and we say that $${\bf f}$$ is effectively $$s$$-synthesis sparse if $${\bf f} = {\bf D} {\bf x}$$ for some effectively $$s$$-sparse $${\bf x} \in \mathbb{R}^N$$; $${\bf f}$$ is effectively $$s$$-analysis sparse if $${\bf D}^* {\bf f} \in \mathbb{R}^N$$ is effectively $$s$$-sparse. We use the notation   \begin{align*} {\it {\Sigma}}^N_s & \mbox{for the set of $s$-sparse coefficient vectors in $\mathbb{R}^N$, and} \\ {\it {\Sigma}}_s^{N,{\rm eff}} & \mbox{for the set of effectively $s$-sparse coefficient vectors in $\mathbb{R}^N$.} \end{align*} We also use the notation $$B_2^n$$ for the set of signals with $$\ell_2$$-norm at most $$1$$ (i.e. the unit ball in $$\ell_2^n$$) and $$S^{n-1}$$ for the set of signals with $$\ell_2$$-norm equal to $$1$$ (i.e. the unit sphere in $$\ell_2^n$$). It is now well known that, if $${\bf D}$$ is a tight frame and $${\bf A}$$ satisfies analogous conditions to those in the classical setting (e.g. has independent Gaussian entries), then a signal $${\bf f}$$ which is (effectively) analysis- or synthesis sparse can be accurately recovered from traditional compressive sensing measurements $${\bf y} = {\bf A} {\bf f} = {\bf A}{\bf D}{\bf x}$$ (see e.g. [4,8,10,14,16,22,23,27]). 1.3 One-bit measurements with dictionaries: our setup In this article, we study one-bit compressive sensing for dictionary-sparse signals. Precisely, our aim is to recover signals $${\bf f} \in \mathbb{R}^n$$ from the binary measurements   $$y_i = \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle \qquad i=1,\ldots,m,$$ or   $$y_i = \mathrm{sgn} \left(\langle {\bf a}_i, {\bf f} \rangle - \tau_i \right) \qquad i = 1,\ldots,m,$$ when these signals are sparse with respect to a dictionary $${\bf D}$$. As in Section 1.2, there are several ways to model signals that are sparse with respect to $${\bf D}$$. In this work, two different signal classes are considered. For the first one, which is more general, our results are based on convex programming. For the second one, which is more restrictive, we can obtain results using a computationally simpler algorithm based on hard thresholding. The first class consists of signals $${\bf f} \in ({\bf D}^*)^{-1} {\it {\Sigma}}_s^{N,\rm{eff}}$$ that are effectively $$s$$-analysis sparse, i.e. they satisfy   $$\label{Assumption} \|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} \|{\bf D}^* {\bf f}\|_2.$$ (1.4) This occurs, of course, when $${\bf D}^* {\bf f}$$ is genuinely sparse (analysis sparsity) and this is realistic if we are working, e.g. with piecewise-constant images, since they are sparse after application of the total variation operator. We consider effectively sparse signals since genuine analysis sparsity is unrealistic when $${\bf D}$$ has columns in general position, as it would imply that $${\bf f}$$ is orthogonal to too many columns of $${\bf D}$$. The second class consists of signals $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N) \cap ({\bf D}^*)^{-1} {\it {\Sigma}}_{\kappa s}^{N, \rm{eff}}$$ that are both $$s$$-synthesis sparse and $$\kappa s$$-analysis sparse for some $$\kappa \ge 1$$. This will occur as soon as the signals are $$s$$-synthesis sparse, provided we utilize suitable dictionaries $${\bf D} \in \mathbb{R}^{n \times N}$$. One could take, for instance, the matrix of an equiangular tight frame when $$N = n + k$$, $$k = {\rm constant}$$. Other examples of suitable dictionaries found in [21] include harmonic frames again with $$N = n + k$$, $$k = {\rm constant}$$, as well as Fourier and Haar frames with constant redundancy factor $$N/n$$. Figure 1 summarizes the relationship between the various domains we deal with. Fig. 1 View largeDownload slide The coefficient, signal and measurement domains. Fig. 1 View largeDownload slide The coefficient, signal and measurement domains. 1.4 Contributions Our main results demonstrate that one-bit compressive sensing is viable even when the sparsifying transform is an overcomplete dictionary. As outlined in Section 1.1, we consider both the challenge of recovering the direction $${\bf f}/\|{\bf f}\|_2$$ of a signal $${\bf f}$$, and the challenge of recovering the entire signal (direction and magnitude). Using measurements of the form $$y_i = \mathrm{sgn}\langle {\bf a}_i, {\bf f} \rangle$$, we can recover the direction but not the magnitude; using measurements of the form $$y_i = \mathrm{sgn}\left(\langle {\bf a}_i, {\bf f} \rangle - \tau_i \right)$$, we may recover both. In (one-bit) compressive sensing, two standard families of algorithms are (a) algorithms based on convex programming, and (b) algorithms based on thresholding. In this article, we analyze algorithms from both classes. One reason to study multiple algorithms is to give a more complete landscape of this problem. Another reason is that the different algorithms come with different trade-offs (between computational complexity and the strength of assumptions required), and it is valuable to explore this space of trade-offs. 1.4.1 Recovering the direction First, we show that the direction of a dictionary-sparse signal can be estimated from one-bit measurements of the type $$\mathrm{sgn}({\bf A} {\bf f})$$. We consider two algorithms: our first approach is based on linear programming, and our second is based on hard thresholding. The linear programming approach is more computationally demanding, but applies to a broader class of signals. In Section 3, we prove that both of these approaches are effective, provided the sensing matrix $${\bf A}$$ satisfies certain properties. In Section 2, we state that these properties are in fact satisfied by a matrix $${\bf A}$$ populated with independent Gaussian entries. We combine all of these results to prove the statement below. As noted above, the different algorithms require different definitions of ‘dictionary sparsity’. In what follows, $$\gamma, C, c$$ refer to absolute numerical constants. Theorem 1 (Informal statement of direction recovery) Let $$\varepsilon \,{>}\, 0$$, let $$m \,{\ge}\, C \varepsilon^{-7} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, any dictionary-sparse2 signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f})$$ can be approximated by the output $$\widehat{{\bf f}}$$ of an efficient algorithm with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{\widehat{{\bf f}}}{\|\widehat{{\bf f}}\|_2} \right\|_2 \le \varepsilon.$$ 1.4.2 Recovering the whole signal By using one-bit measurements of the form $$\mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$, where $$\tau_1,\ldots,\tau_m$$ are properly normalized Gaussian random thresholds, we are able to recover not just the direction, but also the magnitude of a dictionary-sparse signal $${\bf f}$$. We consider three algorithms: our first approach is based on linear programming, our second approach on second-order cone programming and our third approach on hard thresholding. Again, there are different trade-offs to the different algorithms. As above, the approach based on hard thresholding is more efficient, whereas the approaches based on convex programming apply to a broader signal class. There is also a trade-off between linear programming and second-order cone programming: the second-order cone program requires knowledge of $$\|{\bf f}\|_2,$$ whereas the linear program does not (although it does require a loose bound), but the second-order cone programming approach applies to a slightly larger class of signals. We show in Section 4 that all three of these algorithms are effective when the sensing matrix $${\bf A}$$ is populated with independent Gaussian entries, and when the thresholds $$\tau_i$$ are also independent Gaussian random variables. We combine the results of Section 4 in the following theorem. Theorem 2 (Informal statement of signal estimation) Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C \varepsilon^{-9} s \ln(eN/s)$$, and let $${\bf A} \in \mathbb{R}^{m \times n}$$ and $$\boldsymbol{\tau} \in \mathbb{R}^m$$ be populated by independent mean-zero normal random variables with variance $$1$$ and $$\sigma^2$$, respectively. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, any dictionary-sparse$$^2$$ signal $${\bf f} \in \mathbb{R}^n$$ with $$\|{\bf f}\|_2 \le r$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by the output $$\widehat{{\bf f}}$$ of an efficient algorithm with error   $$\left\| {\bf f} - \widehat{{\bf f}} \right\|_2 \le \varepsilon r.$$ We have not spelled out the dependence of the number of measurements and the failure probability on the parameters $$r$$ and $$\sigma$$: as long as they are roughly the same order of magnitude, the dependence is absorbed in the constants $$C$$ and $$c$$ (see Section 4 for precise statements). As outlined earlier, an estimate of $$r$$ is required to implement the second-order cone program, but the other two algorithms do not require such an estimate. 1.5 Discussion and future directions The purpose of this work is to demonstrate that techniques from one-bit compressive sensing can be effective for the recovery of dictionary-sparse signals, and we propose several algorithms to accomplish this for various notions of dictionary sparsity. Still, some interesting future directions remain. First, we do not believe that the dependence on $$\varepsilon$$ above is optimal. We do believe instead that a logarithmic dependence on $$\varepsilon$$ for the number of measurements (or equivalently an exponential decay in the oversampling factor $$\lambda = m / (s \ln(eN/s))$$ for the recovery error) is possible by choosing the thresholds $$\tau_1,\ldots,\tau_m$$ adaptively. This would be achieved by adjusting the method of [2], but with the strong proviso of exact sparsity. Secondly, it is worth asking to what extent the trade-offs between the different algorithms reflect reality. In particular, is it only an artifact of the proof that the simpler algorithm based on hard thresholding applies to a narrower class of signals? 1.6 Organization The remainder of the article is organized as follows. In Section 2, we outline some technical tools upon which our results rely, namely some properties of Gaussian random matrices. In Section 3, we consider recovery of the direction $${\bf f}/\|{\bf f}\|$$ only and we propose two algorithms to achieve it. In Section 4, we present three algorithms for the recovery of the entire signal $${\bf f}$$. Finally, in Section 5, we provide proofs for the results outlined in Section 2. 2. Technical ingredients In this section, we highlight the theoretical properties upon which our results rely. Their proofs are deferred to Section 5 so that the reader does not lose track of our objectives. The first property we put forward is an adaptation to the dictionary case of the so-called sign product embedding property (the term was coined in [18], but the result originally appeared in [25]). Theorem 3 ($${\bf D}$$-SPEP) Let $$\delta > 0$$, let $$m \ge C \delta^{-7} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \delta^2 m)$$, the renormalized matrix $${\bf A}':= (\sqrt{2/\pi}/m) {\bf A}$$ satisfies the $$s$$th-order sign product embedding property adapted to $${\bf D} \in \mathbb{R}^{n \times N}$$ with constant $$\delta$$ — $${\bf D}$$-SPEP$$(s,\delta)$$ for short—i.e.   $$\label{SPEP} \left| \langle {\bf A}' {\bf f}, \mathrm{sgn}({\bf A}' {\bf g}) \rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \delta$$ (2.1) holds for all $${\bf f}, {\bf g} \in {\bf D}({\it {\Sigma}}^N_s) \cap S^{n-1}$$. Remark 1 The power $$\delta^{-7}$$ is unlikely to be optimal. At least in the non-dictionary case, i.e. when $${\bf D} = {\bf I}_n$$, it can be reduced to $$\delta^{-2}$$, see [3]. As an immediate consequence of $${\bf D}$$-SPEP, setting $${\bf g} = {\bf f}$$ in (2.1) allows one to deduce a variation of the classical restricted isometry property adapted to $${\bf D}$$, where the inner norm becomes the $$\ell_1$$-norm (we mention in passing that this variation could also be deduced by other means). Corollary 1 ($${\bf D}$$-RIP$$_1$$) Let $$\delta \,{>}\, 0$$, let $$m \,{\ge}\, C \delta^{-7} s \,{\ln}\,(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \delta^2 m)$$, the renormalized matrix $${\bf A}':= (\sqrt{2/\pi}/m) {\bf A}$$ satisfies the $$s$$th-order $$\ell_1$$-restricted isometry property adapted to $${\bf D} \in \mathbb{R}^{n \times N}$$ with constant $$\delta$$ — $${\bf D}$$-RIP$$_{1}(s,\delta)$$ for short—i.e.   $$(1-\delta) \| {\bf f}\|_2 \le \| {\bf A}' {\bf f} \|_1 \le (1+\delta) \|{\bf f}\|_2$$ (2.2) holds for all $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N)$$. The next property we put forward is an adaptation of the tessellation of the ‘effectively sparse sphere’ (see [26]) to the dictionary case. In what follows, given a (non-invertible) matrix $${\bf M}$$ and a set $$K$$, we denote by $${\bf M}^{-1} (K)$$ the preimage of $$K$$ with respect to $${\bf M}$$. Theorem 4 (Tessellation) Let $$\varepsilon > 0$$, let $$m \ge C \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, the rows $${\bf a}_1,\ldots,{\bf a}_m \in \mathbb{R}^n$$ of $${\bf A}$$$$\varepsilon$$-tessellate the effectively $$s$$-analysis-sparse sphere—we write that $${\bf A}$$ satisfies $${\bf D}$$-TES$$(s,\varepsilon)$$ for short—i.e.   $$\label{Tes} [{\bf f},{\bf g} \in ({\bf D}^*)^{-1}({\it {\Sigma}}_{s}^{N,{\rm eff}}) \cap S^{n-1} : \; \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle = \mathrm{sgn} \langle {\bf a}_i, {\bf g} \rangle \mbox{for all} i =1,\ldots,m] \Longrightarrow [\|{\bf f} - {\bf g}\|_2 \le \varepsilon].$$ (2.3) 3. Signal estimation: direction only In this whole section, given a measurement matrix $${\bf A} \in \mathbb{R}^{m \times n}$$ with rows $${\bf a}_1,\ldots,{\bf a}_m \in \mathbb{R}^n$$, the signals $${\bf f} \in \mathbb{R}^n$$ are acquired via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f}) \in \{-1,+1\}^m$$, i.e.   $$y_i = \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle \qquad i = 1,\ldots,m.$$ Under this model, all $$c {\bf f}$$ with $$c>0$$ produce the same one-bit measurements, so one can only hope to recover the direction of $${\bf f}$$. We present two methods to do so, one based on linear programming and the other one based on hard thresholding. 3.1 Linear programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$, the optimization scheme we consider here consists in outputting the signal $${\bf f}_{\rm lp}$$ solution of   $$\label{LPforDir} \underset{{{\bf h} \in \mathbb{R}^n}}{\rm minimize}\, \| {\bf D}^* {\bf h}\|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h}) = {\bf y} \quad \|{\bf A} {\bf h}\|_1 = 1.$$ (3.1) This is in fact a linear program (and thus may be solved efficiently), since the condition $$\mathrm{sgn}({\bf A} {\bf h}) = {\bf y}$$ reads   $$y_i ({\bf A} {\bf h})_i \ge 0 \qquad \mbox{for all} i = 1,\ldots, m,$$ and, under this constraint, the condition $$\|{\bf A} {\bf h}\|_1 = 1$$ reads   $$\sum_{i=1}^m y_i ({\bf A} {\bf h})_i = 1.$$ Theorem 5 If $${\bf A} \,{\in}\, \mathbb{R}^{m \times n}$$ satisfies both $${\bf D}$$-TES$$(36s,\varepsilon)$$ and $${\bf D}$$-RIP$$_1(25s,1/5)$$, then any effectively $$s$$-analysis-sparse signal $${\bf f} \in ({\bf D}^*)^{-1}{\it {\Sigma}}_s^{N,{\rm eff}}$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f})$$ is directionally approximated by the output $${\bf f}_{\rm lp}$$ of the linear program (3.1) with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{{\bf f}_{\rm lp}}{\|{\bf f}_{\rm lp}\|_2} \right\|_2 \le \varepsilon.$$ Proof. The main step is to show that $${\bf f}_{\rm lp}$$ is effectively $$36s$$-analysis sparse when $${\bf D}$$-RIP$$_1(t,\delta)$$ holds with $$t= 25s$$ and $$\delta=1/5$$. Then, since both $${\bf f}/\|{\bf f}\|_2$$ and $${\bf f}_{\rm lp} / \|{\bf f}_{\rm lp}\|_2$$ belong to $$({\bf D}^*)^{-1}{\it {\Sigma}}_{36 s}^{N,{\rm eff}} \cap S^{n-1}$$ and have the same sign observations, $${\bf D}$$-TES$$(36s,\varepsilon)$$ implies the desired conclusion. To prove the effective analysis sparsity of $${\bf f}_{\rm lp}$$, we first estimate $$\|{\bf A} {\bf f}\|_1$$ from below. For this purpose, let $$T_0$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, etc. We have   \begin{align*} \|{\bf A} {\bf f} \|_1 & = \|{\bf A} {\bf D} {\bf D}^* {\bf f}\|_1 = \left\| {\bf A} {\bf D} \left(\sum_{k \ge 0} ({\bf D}^*{\bf f})_{T_k} \right) \right\|_1 \ge \|{\bf A} {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right)\!\|_1 - \sum_{k \ge 1} \|{\bf A} {\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_1\\ & \ge (1-\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right)\!\|_2 - \sum_{k \ge 1} (1+\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_2, \end{align*} where the last step used $${\bf D}$$-RIP$$_1(t,\delta)$$. We notice that, for $$k \ge 1$$,   $$\|{\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_2 \le \| ({\bf D}^* {\bf f})_{T_k}\! \|_2 \le \frac{1}{\sqrt{t}} \| ({\bf D}^* {\bf f})_{T_{k-1}}\!\|_1,$$ from where it follows that   $$\label{LowerAf} \|{\bf A} {\bf f}\|_1 \ge (1-\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_0}\right)\!\|_2 - \frac{1+\delta}{\sqrt{t}} \|{\bf D}^* {\bf f} \|_1.$$ (3.2) In addition, we observe that   \begin{align*} \|{\bf D}^* {\bf f} \|_2 & = \|{\bf f}\|_2 = \|{\bf D} {\bf D}^* {\bf f}\|_2 = \left\| {\bf D} \left(\sum_{k \ge 0} ({\bf D}^* {\bf f})_{T_k} \right) \right\|_2 \le \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \sum_{k \ge 1} \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right) \right\|_2\\ & \le \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \frac{1}{\sqrt{t}} \|{\bf D}^* {\bf f} \|_1. \end{align*} In view of the effective sparsity of $${\bf D}^* {\bf f}$$, we obtain   $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} \|{\bf D}^* {\bf f}\|_2 \le \sqrt{s}\left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \sqrt{s/t} \|{\bf D}^* {\bf f} \|_1,$$ hence   $$\label{LowerDD*T0} \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 \ge \frac{1- \sqrt{s/t}}{\sqrt{s}} \|{\bf D}^* {\bf f} \|_1.$$ (3.3) Substituting (3.3) in (3.2) yields   $$\label{LowerAf2} \|{\bf A} {\bf f}\|_1 \ge \left((1-\delta)(1-\sqrt{s/t}) - (1+\delta)(\sqrt{s/t}) \right) \frac{1}{\sqrt{s}} \|{\bf D}^* {\bf f}\|_1 = \frac{2/5}{\sqrt{s}} \|{\bf D}^* {\bf f}\|_1,$$ (3.4) where we have used the values $$t = 25s$$ and $$\delta=1/5$$. This lower estimate for $$\|{\bf A} {\bf f} \|_1$$, combined with the minimality property of $${\bf f}_{\rm lp}$$, allows us to derive that   $$\label{UpperD*fhat} \|{\bf D}^* {\bf f}_{\rm lp} \|_1 \le \|{\bf D}^*({\bf f}/ \|{\bf A} {\bf f}\|_1)\|_1 = \frac{\|{\bf D}^* {\bf f}\|_1}{\|{\bf A} {\bf f} \|_1} \le (5/2) \sqrt{s}.$$ (3.5) Next, with $$\widehat{T}_0$$ denoting an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, $$\widehat{T}_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, $$\widehat{T}_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, etc., we can write   \begin{align*} 1 & = \|{\bf A} {\bf f}_{\rm lp} \|_1 = \|{\bf A} {\bf D} {\bf D}^* {\bf f}_{\rm lp} \|_1 = \left\| {\bf A} {\bf D} \left(\sum_{k \ge 0} ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_1 \le \sum_{k \ge 0} \left\| {\bf A} {\bf D} \left(({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_1\\ & \le \sum_{k \ge 0} (1+\delta) \left\| {\bf D} \left(({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_2 = (1+\delta) \left[\!\left\| ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_0} \right\|_2 + \sum_{k \ge 1} \!\left\| ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right\|_2 \right]\\ & \le (1+\delta) \left[\|{\bf D}^* {\bf f}_{\rm lp} \|_2 + \frac{1}{\sqrt{t}} \|{\bf D}^* {\bf f}_{\rm lp}\|_1 \right] \le (1+\delta) \left[\|{\bf D}^* {\bf f}_{\rm lp} \|_2 + (5/2)\sqrt{s/t} \right]. \end{align*} This chain of inequalities shows that   $$\label{LowerD*fhat} \|{\bf D}^* {\bf f}_{\rm lp} \|_2 \ge \frac{1-(5/2)\sqrt{s/t}}{1+\delta} = \frac{5}{12}.$$ (3.6) Combining (3.5) and (3.6), we obtain   $$\|{\bf D}^* {\bf f}_{\rm lp} \|_1 \le 6 \sqrt{s} \|{\bf D}^* {\bf f}_{\rm lp} \|_2.$$ In other words, $${\bf D}^* {\bf f}_{\rm lp}$$ is effectively $$36s$$-sparse, which is what was needed to conclude the proof. □ Remark 2 We point out that if $${\bf f}$$ was genuinely, instead of effectively, $$s$$-analysis sparse, then a lower bound of the type (3.4) would be immediate from the $${\bf D}$$-RIP$$_1$$. We also point out that our method of proving that the linear program outputs an effectively analysis-sparse signal is new even in the case $${\bf D} = {\bf I}_n$$. In fact, it makes it possible to remove a logarithmic factor from the number of measurements in this ‘non-dictionary’ case, too (compare with [24]). Furthermore, it allows for an analysis of the linear program (3.1) only based on deterministic conditions that the matrix $${\bf A}$$ may satisfy. 3.2 Hard thresholding Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$, the hard thresholding scheme we consider here consists in constructing a signal $${\bf f}_{\rm ht} \in \mathbb{R}^n$$ as   $$\label{HTforDir} {\bf f}_{\rm ht} = {\bf D} {\bf z}, \qquad \mbox{where} {\bf z} := H_t({\bf D}^* {\bf A}^* {\bf y}).$$ (3.7) Our recovery result holds for $$s$$-synthesis-sparse signals that are also effectively $$\kappa s$$-analysis sparse for some $$\kappa \ge 1$$ (we discussed in Section 1 some choices of dictionaries $${\bf D}$$ making this happen). Theorem 6 If $${\bf A} \in \mathbb{R}^{m \times n}$$ satisfies $${\bf D}$$-SPEP$$(s+t,\varepsilon/8)$$, $$t = \lceil 16 \varepsilon^{-2} \kappa s \rceil$$, then any $$s$$-synthesis-sparse signal $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N)$$ with $${\bf D}^* {\bf f} \in {\it {\Sigma}}_{\kappa s}^{N,{\rm eff}}$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$ is directionally approximated by the output $${\bf f}_{\rm ht}$$ of the hard thresholding (3.7) with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le \varepsilon.$$ Proof. We assume without loss of generality that $$\|{\bf f}\|_2 = 1$$. Let $$T=T_0$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, etc. We start by noticing that $${\bf z}$$ is a better $$t$$-sparse approximation to $${\bf D}^* {\bf A}^* {\bf y} = {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})$$ than $$[{\bf D}^* {\bf f}]_T$$, so we can write   $$\| {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) - {\bf z} \|_2^2 \le \|{\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) - [{\bf D}^* {\bf f}]_T \|_2^2,$$ i.e.   $$\| ({\bf D}^* {\bf f} - {\bf z}) - ({\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) \|_2^2 \le \| ({\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) - [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2.$$ Expanding the squares and rearranging gives   \begin{align} \label{Term1} \|{\bf D}^* {\bf f} - {\bf z} \|_2^2 & \le 2 \langle {\bf D}^* {\bf f} - {\bf z}, {\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle \\ \end{align} (3.8)  \begin{align} \label{Term2} & - 2 \langle [{\bf D}^* {\bf f}]_{\overline{T}} , {\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle \\ \end{align} (3.9)  \begin{align} \label{Term3} & + \| [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2. \end{align} (3.10) To bound (3.10), we invoke [15, Theorem 2.5] and the effective analysis sparsity of $${\bf f}$$ to derive   $$\| [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2 \le \frac{1}{4t} \| {\bf D}^* {\bf f} \|_1^2 \le \frac{\kappa s}{4t} \| {\bf D}^* {\bf f} \|_2^2 = \frac{\kappa s}{4t} \|{\bf f} \|_2^2 = \frac{\kappa s}{4t}.$$ To bound (3.8) in absolute value, we notice that it can be written as   \begin{align*} 2 | \langle {\bf D} {\bf D}^* {\bf f} - {\bf D} {\bf z}, &{\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | = 2 | \langle {\bf f} - {\bf f}_{\rm ht}, {\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | \\ & = 2 | \langle {\bf f} - {\bf f}_{\rm ht}, {\bf f} \rangle - \langle {\bf A} ({\bf f} - {\bf f}_{\rm ht}), \mathrm{sgn}({\bf A} {\bf f}) \rangle | \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht} \|_2, \end{align*} where the last step followed from $${\bf D}$$-SPEP$$(s+t,\varepsilon')$$, $$\varepsilon' := \varepsilon /8$$. Finally, (3.9) can be bounded in absolute value by   \begin{align*} 2 & \sum_{k \ge 1} | \langle [{\bf D}^* {\bf f}]_{T_k}, {\bf D}^*({\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) \rangle | = 2 \sum_{k \ge 1} | \langle {\bf D}([{\bf D}^* {\bf f}]_{T_k}), {\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | \\ & = 2 \sum_{k \ge 1} | \langle {\bf D}([{\bf D}^* {\bf f}]_{T_k}), {\bf f} \rangle - \langle {\bf A} ({\bf D}([{\bf D}^* {\bf f}]_{T_k})), \mathrm{sgn}({\bf A} {\bf f}) \rangle | \le 2 \sum_{k \ge 1} \varepsilon' \| {\bf D}([{\bf D}^* {\bf f}]_{T_k}) \|_2\\ & \le 2 \varepsilon' \sum_{k \ge 1} \| [{\bf D}^* {\bf f}]_{T_k} \|_2 \le 2 \varepsilon' \sum_{k \ge 1} \frac{\| [{\bf D}^* {\bf f}]_{T_{k-1}} \|_1}{\sqrt{t}} \le 2 \varepsilon' \frac{\|{\bf D}^* {\bf f}\|_1}{\sqrt{t}} \le 2 \varepsilon' \frac{\sqrt{\kappa s} \|{\bf D}^* {\bf f}\|_2}{\sqrt{t}} = 2 \varepsilon' \sqrt{\frac{\kappa s}{t}}. \end{align*} Putting everything together, we obtain   $$\|{\bf D}^* {\bf f} - {\bf z} \|_2^2 \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht}\|_2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t}.$$ In view of $$\|{\bf f} - {\bf f}_{\rm ht}\|_2 = \|{\bf D} ({\bf D}^* {\bf f} - {\bf z}) \|_2 \le \|{\bf D}^* {\bf f} - {\bf z}\|_2$$, it follows that   $$\|{\bf f} - {\bf f}_{\rm ht}\|_2^2 \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht}\|_2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t}, \quad \mbox{i.e.} \; (\|{\bf f} - {\bf f}_{\rm ht}\|_2 - \varepsilon')^2 \le {\varepsilon'}^2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t} \le \left(\varepsilon' \hspace{-0.5mm}+\hspace{-0.5mm} \sqrt{\frac{\kappa s}{t}} \right)^2 \hspace{-1mm}.$$ This implies that   $$\|{\bf f} - {\bf f}_{\rm ht}\|_2 \le 2 \varepsilon' + \sqrt{\frac{\kappa s}{t}}.$$ Finally, since $${\bf f}_{\rm ht}/\|{\bf f}_{\rm ht}\|_2$$ is the best $$\ell_2$$-normalized approximation to $${\bf f}_{\rm ht}$$, we conclude that   $$\left\| {\bf f} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le \|{\bf f} - {\bf f}_{\rm ht}\|_2 + \left\| {\bf f}_{\rm ht} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le 2 \|{\bf f} - {\bf f}_{\rm ht}\|_2 \le 4 \varepsilon' + 2 \sqrt{\frac{\kappa s}{t}}.$$ The announced result follows from our choices of $$t$$ and $$\varepsilon'$$. □ 4. Signal estimation: direction and magnitude Since information of the type $$y_i = \mathrm{sgn} \langle {\bf a}_i,{\bf f} \rangle$$ can at best allow one to estimate the direction of a signal $${\bf f} \in \mathbb{R}^n$$, we consider in this section information of the type   $$y_i = \mathrm{sgn}(\langle {\bf a}_i, {\bf f} \rangle - \tau_i) \qquad i = 1,\ldots,m ,$$ for some thresholds $$\tau_1,\ldots,\tau_m$$ introduced before quantization. In the rest of this section, we give three methods for recovering $${\bf f}$$ in its entirety. The first one is based on linear programming, the second one on second-order code programming and the last one on hard thresholding. We are going to show that using these algorithms, one can estimate both the direction and the magnitude of dictionary-sparse signal $${\bf f} \in \mathbb{R}^n$$ given a prior magnitude bound such as $$\| {\bf f} \|_2 \le r$$. We simply rely on the previous results by ‘lifting’ the situation from $$\mathbb{R}^n$$ to $$\mathbb{R}^{n+1}$$, in view of the observation that $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ can be interpreted as The following lemma will be equally useful when dealing with linear programming, second-order cone programming or hard thresholding schemes. Lemma 1 For $$\widetilde{{\bf f}}, \widetilde{{\bf g}} \in \mathbb{R}^{n+1}$$ written as   $$\widetilde{{\bf f}} := \begin{bmatrix} {\bf f}_{[n]} \\ \hline f_{n+1} \end{bmatrix} \qquad \mbox{and} \qquad \widetilde{{\bf g}} =: \begin{bmatrix} {\bf g}_{[n]} \\ \hline g_{n+1} \end{bmatrix}$$ with $$\widetilde{{\bf f}}_{[n]}, \widetilde{{\bf g}}_{[n]} \in \mathbb{R}^n$$ and with $$f_{n+1} \not= 0$$, $$g_{n+1} \not= 0$$, one has   $$\left\| \frac{{\bf f}_{[n]}}{f_{n+1}} - \frac{{\bf g}_{[n]}}{g_{n+1}} \right\|_2 \le \frac{\|\widetilde{{\bf f}}\|_2 \|\widetilde{{\bf g}}\|_2}{|f_{n+1}||g_{n+1}|} \left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2.$$ Proof. By using the triangle inequality in $$\mathbb{R}^n$$ and Cauchy–Schwarz inequality in $$\mathbb{R}^2$$, we can write   \begin{align*} \left\| \frac{{\bf f}_{[n]}}{f_{n+1}} - \frac{{\bf g}_{[n]}}{g_{n+1}} \right\|_2 & = \|\widetilde{{\bf f}}\|_2 \left\| \frac{1/f_{n+1}}{\|\widetilde{{\bf f}}\|_2} {\bf f}_{[n]} - \frac{1/g_{n+1}}{\|\widetilde{{\bf f}}\|_2} {\bf g}_{[n]} \right\|_2\\ & \le \|\widetilde{{\bf f}}\|_2 \left(\frac{1}{f_{n+1}} \left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 + \left| \frac{1/g_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{1/f_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \|{\bf g}_{[n]}\|_2 \right)\\ & = \|\widetilde{{\bf f}}\|_2 \left(\frac{1}{f_{n+1}} \left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 + \frac{\|{\bf g}_{[n]}\|_2}{|f_{n+1}| |g_{n+1}|} \left| \frac{f_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \right)\\ & \le \|\widetilde{{\bf f}}\|_2 \left[\frac{1}{|f_{n+1}|^2} + \frac{\|{\bf g}_{[n]}\|_2^2}{|f_{n+1}|^2 |g_{n+1}|^2} \right]^{1/2} \left[\left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2^2 + \left| \frac{f_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right|^2 \right]^{1/2}\\ & = \|\widetilde{{\bf f}}\|_2 \left[\frac{\|\widetilde{{\bf g}}\|_2^2}{|f_{n+1}|^2 |g_{n+1}|^2} \right]^{1/2} \left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2, \end{align*} which is the announced result. □ 4.1 Linear programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the optimization scheme we consider here consists in outputting the signal   $$\label{Defflp} {\bf f}_{\rm LP} = \frac{\sigma}{\widehat{u}} \widehat{{\bf h}} \in \mathbb{R}^n,$$ (4.1) where $$\widehat{{\bf h}} \in \mathbb{R}^{n}$$ and $$\widehat{u} \in \mathbb{R}$$ are solutions of   $$\label{OptProg} \underset{{\bf h} \in \mathbb{R}^n, u \in \mathbb{R}}{\rm minimize \;} \; \|{\bf D}^* {\bf h} \|_1 + |u| \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h} - u \boldsymbol{\tau} / \sigma) = {\bf y}, \quad \|{\bf A} {\bf h} - u \boldsymbol{\tau} / \sigma \|_1 = 1.$$ (4.2) Theorem 7 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C (r/\sigma+\sigma/r)^6 \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from the entries of $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(-c m \varepsilon^2 r^2 \sigma^2/(r^2+\sigma^2)^2)$$, any effectively $$s$$-analysis sparse $${\bf f} \in \mathbb{R}^n$$ satisfying $$\|{\bf f}\|_2 \le r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm LP}$$ given in (4.1) with error   $$\left\| {\bf f}- {\bf f}_{\rm LP} \right\|_2 \le \varepsilon r.$$ Proof. Let us introduce the ‘lifted’ signal $$\widetilde{{\bf f}} \in \mathbb{R}^{n+1}$$, the ‘lifted’ tight frame $$\widetilde{{\bf D}} \in \mathbb{R}^{(n+1)\times (N+1)}$$, and the ‘lifted’ measurement matrix $$\widetilde{{\bf A}} \in \mathbb{R}^{m \times (N+1)}$$ defined as   $$\tilde{\mathbf{f}}:=\left[\!\!\!\frac{\ \ \ \mathbf{f}\ \ }{\sigma}\!\!\!\right],\quad \tilde{\mathbf{D}}:=\left[\!\!\begin{array}{c|c} \mathbf{D} & \mathbf{0}\\\hline \mathbf{0} & \mathbf{1} \end{array}\!\!\right],\quad \tilde{\mathbf{A}}:= \left[\begin{array}{c|c} & -\tau_1/\sigma\\ \mathbf{A} & \vdots \\ & -\tau_m/\sigma\end{array}\right].$$ (4.3) First, we observe that $$\widetilde{{\bf f}}$$ is effectively $$(s+1)$$-analysis sparse (relative to $$\widetilde{{\bf D}}$$), since , hence   $$\frac{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_1}{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_2} = \frac{\|{\bf D}^* {\bf f} \|_1 + \sigma}{\sqrt{\|{\bf D}^* {\bf f}\|_2^2+\sigma^2}} \le \frac{\sqrt{s} \|{\bf D}^* {\bf f}\|_2 + \sigma}{\sqrt{\|{\bf D}^* {\bf f}\|_2^2+\sigma^2}} \le \sqrt{s+1}.$$ Next, we observe that the matrix $$\widetilde{{\bf A}} \in \mathbb{R}^{m \times (n+1)}$$, populated by independent standard normal random variables, satisfies $$\widetilde{{\bf D}}$$-TES$$(36(s+1),\varepsilon')$$, $$\varepsilon' := \dfrac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon$$ and $$\widetilde{{\bf D}}$$-RIP$$_1(25(s+1),1/5)$$ with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2) + \gamma' \exp(-c' m) \le \gamma'' \exp(-c'' m \varepsilon^2 r^2 \sigma^2 / (r^2 + \sigma^2)^2)$$, since $$m \ge C {\varepsilon'}^{-6} (s+1) \ln(eN/(s+1))$$ and $$m \ge C (1/5)^{-7} (s+1) \ln(e N / (s+1))$$ are ensured by our assumption on $$m$$. Finally, we observe that $${\bf y} = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}})$$ and that the optimization program (4.2) reads   $$\underset{\widetilde{{\bf h}} \in \mathbb{R}^{n+1}}{\rm minimize \;} \|\widetilde{{\bf D}}^* \widetilde{{\bf h}} \|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf h}}) = {\bf y}, \quad \|\widetilde{{\bf A}} \widetilde{{\bf h}} \|_1 = 1.$$ Denoting its solution as , Theorem 5 implies that   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon'.$$ In particular, looking at the last coordinate, this inequality yields   $$\left| \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \le \varepsilon', \qquad \mbox{hence} \qquad \frac{|g_{n+1}|}{\|\widetilde{{\bf g}}\|_2} \ge \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \varepsilon' \ge \frac{\sigma}{\sqrt{r^2+\sigma^2}} - \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}} = \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}}.$$ In turn, applying Lemma 1 while taking $${\bf f} = {\bf f}_{[n]}$$ and $${\bf f}_{\rm LP} = (\sigma/g_{n+1}) {\bf g}_{[n]}$$ into consideration gives   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm LP}}{\sigma} \right\|_2 \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{\| \widetilde{{\bf g}} \|_2}{|g_{n+1}|} \varepsilon' \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{2\sqrt{r^2+\sigma^2}}{\sigma} \frac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon = \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon,$$ so that   $$\| {\bf f} - {\bf f}_{\rm LP} \|_2 \le \|\widetilde{{\bf f}}\|_2 \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon \le r \varepsilon.$$ This establishes the announced result. □ Remark 3 The recovery scheme (4.2) does not require an estimation of $$r$$ to be run. The recovery scheme presented next does require such an estimation. Moreover, it is a second-order cone program instead of a simpler linear program. However, it has one noticeable advantage, namely that it applies not only to signals satisfying $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s}\|{\bf D}^*{\bf f}\|_2$$ and $$\|{\bf D}^*{\bf f}\|_2 \le r$$, but also more generally to signals satisfying $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} r$$ and $$\|{\bf D}^*{\bf f}\|_2 \le r$$. For both schemes, one needs $$\sigma$$ to be of the same order as $$r$$ for the results to become meaningful in terms of number of measurement and success probability. However, if $$r$$ is only upper-estimated, then one could choose $$\sigma \ge r$$ and obtain a weaker recovery error $$\|{\bf f} - \widehat{{\bf f}}\|_2 \le \varepsilon \sigma$$ with relevant number of measurement and success probability. 4.2 Second-order cone programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the optimization scheme we consider here consists in outputting the signal   $$\label{Deffcp} {\bf f}_{\rm CP} = \underset{{\bf h} \in \mathbb{R}^n}{\rm {\rm argmin}\, \;} \; \|{\bf D}^* {\bf h}\|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h} - \boldsymbol{\tau}) = {\bf y}, \quad \|{\bf h}\|_2 \le r.$$ (4.4) Theorem 8 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C (r/\sigma + \sigma/r)^6(r^2/\sigma^2+1) \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(- c' m \varepsilon^2 r^2 \sigma^2 / (r^2+\sigma^2)^2)$$, any signal $${\bf f} \in \mathbb{R}^n$$ with $$\|{\bf f}\|_2 \le r$$, $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm CP}$$ given in (4.4) with error   $$\left\| {\bf f}- {\bf f}_{\rm CP} \right\|_2 \le \varepsilon r.$$ Proof. We again use the notation (4.3) introducing the ‘lifted’ objects $$\widetilde{{\bf f}}$$, $$\widetilde{{\bf D}}$$ and $$\widetilde{{\bf A}}$$. Moreover, we set . We claim that $$\widetilde{{\bf f}}$$ and $$\widetilde{{\bf g}}$$ are effectively $$s'$$-analysis sparse, $$s' := (r^2 / \sigma^2 + 1)(s+1)$$. For $$\widetilde{{\bf g}}$$, this indeed follows from $$\| \widetilde{{\bf D}}^* \widetilde{{\bf g}} \|_2 = \|\widetilde{{\bf g}}\|_2 = \sqrt{\|{\bf f}_{\rm CP}\|_2^2 + \sigma^2} \ge \sigma$$ and   $$\|\widetilde{{\bf D}}^* \widetilde{{\bf g}}\|_1 = \left\| \begin{bmatrix} {\bf D}^* {\bf f}_{\rm CP} \\ \hline \sigma \end{bmatrix} \right\|_1 = \| {\bf D}^* {\bf f}_{\rm CP}\|_1 + \sigma \le \| {\bf D}^* {\bf f} \|_1 + \sigma \le \sqrt{s} r + \sigma \le \sqrt{r^2 + \sigma^2} \sqrt{s+1}.$$ We also notice that $$\widetilde{{\bf A}}$$ satisfies $$\widetilde{{\bf D}}$$-TES$$(s', \varepsilon')$$, $$\varepsilon' \,{:=}\, \dfrac{r \sigma}{r^2 + \sigma^2} \varepsilon$$, with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2) \le \gamma \exp(- c' m \varepsilon^2 r^2 \sigma^2 / (r^2+\sigma^2)^2)$$, since $$m \ge C {\varepsilon'}^{-6} s' \ln(eN/s')$$ is ensured by our assumption on $$m$$. Finally, we observe that both $$\widetilde{{\bf f}}/ \|\widetilde{{\bf f}}\|_2$$ and $$\widetilde{{\bf g}}/ \|\widetilde{{\bf g}}\|_2$$ are $$\ell_2$$-normalized effectively $$s'$$-analysis sparse and have the same sign observations $$\mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}}) = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf g}}) = {\bf y}$$. Thus,   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon'.$$ In view of Lemma 1, we derive   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm CP}}{\sigma} \right\|_2 \le \frac{r^2 + \sigma^2}{\sigma^2} \varepsilon', \qquad \mbox{hence} \qquad \|{\bf f} - {\bf f}_{\rm CP}\|_2 \le \frac{r^2 + \sigma^2}{\sigma} \varepsilon' = r \varepsilon.$$ This establishes the announced result. □ 4.3 Hard thresholding Given a signal $${\bf f} \in \mathbb{R}^N$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the hard thresholding scheme we consider here consists in outputting the signal   $$\label{fht} {\bf f}_{\rm HT} = \frac{-\sigma^2}{\langle \boldsymbol{\tau}, {\bf y} \rangle} {\bf D} {\bf z}, \qquad {\bf z} = H_{t-1}({\bf D}^* {\bf A}^* {\bf y}).$$ (4.5) Theorem 9 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C \kappa (r/\sigma+\sigma/r)^9 \varepsilon^{-9} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from the entries of $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(-c m \varepsilon^2 r^2 \sigma^2/(r^2+\sigma^2)^2)$$, any $$s$$-synthesis-sparse and effectively $$\kappa s$$-analysis-sparse signal $${\bf f} \in \mathbb{R}^n$$ satisfying $$\|{\bf f}\|_2 \le r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm HT}$$ given in (4.5) for $$t:=\lceil 16 (\varepsilon'/8)^{-2} \kappa (s+1) \rceil$$ with error   $$\left\| {\bf f}- {\bf f}_{\rm HT} \right\|_2 \le \varepsilon r.$$ Proof. We again use the notation (4.3) for the ‘lifted’ objects $$\widetilde{{\bf f}}$$, $$\widetilde{{\bf D}}$$ and $$\widetilde{{\bf A}}$$. First, we notice that $$\widetilde{{\bf f}}$$ is $$(s+1)$$-synthesis sparse (relative to $$\widetilde{{\bf D}}$$), as well as effectively $$\kappa (s+1)$$-analysis sparse, since satisfies   $$\frac{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_1}{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_2} = \frac{\|{\bf D}^* {\bf f}\|_1 + \sigma}{\sqrt{\|{\bf D}^*{\bf f}\|_2^2 + \sigma^2}} \le \frac{\sqrt{\kappa s}\|{\bf D}^* {\bf f}\|_2 + \sigma}{\sqrt{\|{\bf D}^*{\bf f}\|_2^2 +\sigma^2}} \le \sqrt{\kappa s + 1} \le \sqrt{\kappa (s+1)}.$$ Next, we observe that the matrix $$\widetilde{{\bf A}}$$, populated by independent standard normal random variables, satisfies $$\widetilde{{\bf D}}$$-SPEP$$(s+1+t,\varepsilon'/8)$$, $$\varepsilon ' := \dfrac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon$$, with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2 r^2)$$, since $$m \ge C (\varepsilon'/8)^{-7} (s+1+t) \ln(e(N+1)/(s+1+t))$$ is ensured by our assumption on $$m$$. Finally, since $${\bf y} = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}})$$, Theorem 5 implies that   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon',$$ where $$\widetilde{{\bf g}} \in \mathbb{R}^{n+1}$$ is the output of the ‘lifted’ hard thresholding scheme. i.e.   $$\widetilde{{\bf g}} = \widetilde{{\bf D}} \widetilde{{\bf z}}, \qquad \widetilde{{\bf z}} = H_{t} (\widetilde{{\bf D}}^*\widetilde{{\bf A}}^* {\bf y}).$$ In particular, looking at the last coordinate, this inequality yields   $$\label{LBg} \left| \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \le \varepsilon', \quad \mbox{hence} \quad \frac{|g_{n+1}|}{\|\widetilde{{\bf g}}\|_2} \ge \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \varepsilon' \ge \frac{\sigma}{\sqrt{r^2+\sigma^2}} - \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}} = \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}}.$$ (4.6) Now let us also observe that   $$\widetilde{{\bf z}} = H_{t} \left(\begin{bmatrix} {\bf D}^* {\bf A}^* {\bf y} \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix} \right) = \left\{ \begin{matrix} \begin{bmatrix} H_{t}({\bf D}^* {\bf A}^* {\bf y}) \\ \hline 0 \end{bmatrix}, \\ \mbox{or}\hspace{30mm}\\ \begin{bmatrix} H_{t-1}({\bf D}^* {\bf A}^* {\bf y}) \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix} , \end{matrix} \right. \quad \mbox{hence} \quad \widetilde{{\bf g}} = \widetilde{{\bf D}} \widetilde{{\bf z}} = \left\{ \begin{matrix} \begin{bmatrix} {\bf D}(H_{t}({\bf D}^* {\bf A}^* {\bf y})) \\ \hline 0 \end{bmatrix}, \\ \mbox{or}\hspace{35mm}\\ \begin{bmatrix} {\bf D}(H_{t-1}({\bf D}^* {\bf A}^* {\bf y})) \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix}. \end{matrix} \right.$$ In view of (4.6), the latter option prevails. It is then apparent that $${\bf f}_{\rm HT} = \sigma {\bf g}_{[n]} / g_{n+1}$$. Lemma 1 gives   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm HT}}{\sigma} \right\|_2 \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{\| \widetilde{{\bf g}} \|_2}{|g_{n+1}|} \varepsilon' \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{2\sqrt{r^2+\sigma^2}}{\sigma} \frac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon = \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon,$$ so that   $$\| {\bf f} - {\bf f}_{\rm HT} \|_2 \le \|\widetilde{{\bf f}}\|_2 \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon \le r \varepsilon.$$ This establishes the announced result. □ 5. Postponed proofs and further remarks This final section contains the theoretical justification of the technical properties underlying our results, followed by a few points of discussion around them. 5.1 Proof of $${\mathbf D}$$-$${\rm SPEP}$$ The Gaussian width turns out to be a useful tool in our proofs. For a set $$K \subseteq \mathbb{R}^n$$, it is defined by   $$w(K) = \mathbb{E} \left[\sup_{{\bf f} \in K} \langle {\bf f}, {\bf g} \rangle \right], \qquad {\bf g} \in \mathbb{R}^n \mbox{is a standard normal random vector}.$$ We isolate the following two properties. Lemma 2 Let $$K \subseteq \mathbb{R}^n$$ be a linear space and $$K_1,\ldots,K_L \subseteq \mathbb{R}^n$$ be subsets of the unit sphere $$S^{n-1}$$. (i) $$k / \sqrt{k+1} \le w(K \cap S^{n-1}) \le \sqrt{k} \qquad k:= \dim (K)$$; (ii) $$\displaystyle{w \left(K_1 \cup \ldots \cup K_L \right) \le \max \left\{w(K_1),\ldots,w(K_L) \right\} + 3 \sqrt{\ln(L)}}$$. Proof. (i) By the invariance under orthogonal transformation (see [25, Proposition 2.1]3), we can assume that $$K = \mathbb{R}^k \times \left\{(0,\ldots,0) \right\}$$. We then notice that $$\sup_{{\bf f} \in K \cap S^{n-1}} \langle {\bf f}, {\bf g} \rangle = \|(g_1,\ldots,g_k)\|_2$$ is the $$\ell_2$$-norm of a standard normal random vector of dimension $$k$$. We invoke, e.g. [15, Proposition 8.1] to derive the announced result. (ii) Let us introduce the non-negative random variables   $$\xi_\ell := \sup_{{\bf f} \in K_\ell} \langle {\bf f} , {\bf g} \rangle \quad \ell = 1,\ldots, L ,$$ so that the Gaussian widths of each $$K_\ell$$ and of their union take the form   $$w(K_\ell) = \mathbb{E}(\xi_\ell) \quad \ell = 1,\ldots, L \qquad \mbox{and} \qquad w \left(K_1 \cup \cdots \cup K_L \right) = \mathbb{E} \left(\max_{\ell = 1, \ldots, L} \xi_\ell \right).$$ By the concentration of measure inequality (see e.g. [15, Theorem 8.40]) applied to the function $$F: {\bf x} \in \mathbb{R}^n \mapsto \sup_{{\bf f} \in K_\ell} \langle {\bf f}, {\bf x} \rangle$$, which is a Lipschitz function with constant $$1$$, each $$\xi_\ell$$ satisfies   $$\mathbb{P}(\xi_\ell \ge \mathbb{E}(\xi_\ell) + t) \le \exp \left(-t^2/2 \right)\!.$$ Because each $$\mathbb{E}(\xi_\ell)$$ is no larger than $$\max_\ell \mathbb{E}(\xi_\ell) = \max_\ell w(K_\ell) =: \omega$$, we also have   $$\mathbb{P} (\xi_\ell \ge \omega + t) \le \exp \left(-t^2/2 \right)\!.$$ Setting $$v:= \sqrt{2 \ln(L)}$$, we now calculate   \begin{align*} \mathbb{E} \left(\max_{\ell =1,\ldots, L} \xi_\ell \right) & = \int_0^\infty \mathbb{P} \left(\max_{\ell =1,\ldots, L} \xi_\ell \ge u \right) \,{\rm{d}}u = \left(\int_0^{\omega+v} + \int_{\omega+v}^\infty \right) \mathbb{P} \left(\max_{\ell = 1,\ldots, L} \xi_\ell \ge u \right) \,{\rm{d}}u\\ & \le \int_0^{\omega + v} 1\, {\rm{d}}u + \int_{\omega + v}^\infty \sum_{\ell=1}^L \mathbb{P} \left(\xi_\ell \ge u \right) \,{\rm{d}}u = \omega + v + \sum_{\ell=1}^L \int_v^\infty \mathbb{P} \left(\xi_\ell \ge \omega + t \right) \,{\rm{d}}t\\ & \le \omega + v + L \int_v^\infty \exp \left(-t^2/2 \right)\, {\rm{d}}t \le \omega + v + L \frac{\exp(-v^2/2)}{v} \\ & = \omega + \sqrt{2 \ln(L)} + L \frac{1/L}{\sqrt{2 \ln(L)}} \le \omega + c \sqrt{\ln(L)}, \end{align*} where $$c=\sqrt{2} + (\sqrt{2} \ln(2))^{-1} \le 3$$. We have shown that $$w \left(K_1 \cup \cdots \cup K_L \right) \le \max_{\ell} w(K_\ell) + 3 \sqrt{\ln(L)}$$, as desired. □ We now turn our attention to proving the awaited theorem. Proof of Theorem 3. According to [25, Proposition 4.3], with $${\bf A}' := (\sqrt{2/\pi}/m) {\bf A}$$, we have   $$\left| \langle {\bf A}' {\bf f}, \mathrm{sgn}({\bf A}' {\bf g}) \rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \delta,$$ for all $${\bf f},{\bf g} \in {\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1}$$ provided $$m \ge C \delta^{-7} w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1})^2$$, so it is enough to upper bound $$w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1})$$ appropriately. To do so, with $${\it {\Sigma}}_S^N$$ denoting the space $$\{{\bf x} \in \mathbb{R}^N: {\rm supp}({\bf x}) \subseteq S \}$$ for any $$S \subseteq \{1,\ldots, N \}$$, we use Lemma 2 to write   \begin{align*} w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1}) & = w \bigg(\bigcup_{|S|=s} \left\{{\bf D}({\it {\Sigma}}_S^N) \cap S^{n-1} \right\} \bigg) \underset{(ii)}{\le} \max_{|S|=s} w({\bf D}({\it {\Sigma}}_S^N) \cap S^{n-1}) + 3 \sqrt{\ln \left(\binom{N}{s} \right)}\\ & \underset{(i)}{\le} \sqrt{s} + 3 \sqrt{s \ln \left(eN/s \right)} \le 4 \sqrt{s \ln \left(eN/s \right)}. \end{align*} The result is now immediate. □ 5.2 Proof of TES We propose two approaches for proving Theorem 4. One uses again the notion of Gaussian width, and the other one relies on covering numbers. The necessary results are isolated in the following lemma. Lemma 3 The set of $$\ell_2$$-normalized effectively $$s$$-analysis-sparse signals satisfies (i) $$\displaystyle{w \left(({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1} \right)} \le C \sqrt{s \ln(eN/s)},$$ (ii) $$\displaystyle{\mathscr{N} \left(({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1} , \rho \right) \le \binom{N}{t}\left(1 + \frac{8}{\rho} \right)^t, \qquad t := \lceil 4 \rho^{-2}} s \rceil.$$ Proof. (i) By the definition of the Gaussian width for $$\mathscr{K}_s: = ({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1}$$, with $${\bf g} \in \mathbb{R}^n$$ denoting a standard normal random vector,   $$\label{slep} w(\mathscr{K}_s) = \mathbb{E} \left[\sup_{\substack{{\bf D}^* {\bf f} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf f}\|_2 = 1}} \langle {\bf f} , {\bf g} \rangle \right] = \mathbb{E} \left[\sup_{\substack{{\bf D}^* {\bf f} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf D}^* {\bf f}\|_2 = 1}} \langle {\bf D} {\bf D}^* {\bf f} , {\bf g} \rangle \right] \le \mathbb{E} \left[\sup_{\substack{{\bf x} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf x}\|_2 = 1}} \langle {\bf D} {\bf x} , {\bf g} \rangle \right].$$ (5.1) In view of $$\|{\bf D}\|_{2 \to 2} = 1$$, we have, for any $${\bf x},{\bf x}' \in {\it {\Sigma}}_s^{N,{\rm eff}}$$ with $$\|{\bf x}\|_2 = \|{\bf x}'\|_2 =1$$,   \begin{align*} \mathbb{E} \left(\langle {\bf D} {\bf x}, {\bf g} \rangle - \langle {\bf D} {\bf x}', {\bf g}' \rangle \right)^2 &= \mathbb{E} \left[\langle {\bf D} {\bf x}, {\bf g} \rangle ^2 \right] + \mathbb{E} \left[\langle {\bf D} {\bf x}', {\bf g}' \rangle ^2 \right] = \|{\bf D} {\bf x}\|_2^2 + \|{\bf D} {\bf x}'\|_2^2 \le \|{\bf x}\|_2^2 + \|{\bf x}'\|_2^2\\ &= \mathbb{E} \left(\langle {\bf x}, {\bf g} \rangle - \langle {\bf x}', {\bf g}' \rangle \right)^2. \end{align*} Applying Slepian’s lemma (see e.g. [15, Lemma 8.25]), we obtain   $$w(\mathscr{K}_s) \le \mathbb{E} \left[\sup_{\substack{{\bf x} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf x}\|_2 = 1}} \langle {\bf x} , {\bf g} \rangle \right] =w({\it {\Sigma}}_s^{N,{\rm eff}} \cap S^{n-1}).$$ The latter is known to be bounded by $$C s \ln (eN/s)$$, see [25, Lemma 2.3]. (ii) The covering number $$\mathscr{N}(\mathscr{K}_s,\rho)$$ is bounded above by the maximal number $$\mathscr{P}(\mathscr{K}_s,\rho)$$ of elements in $$\mathscr{K}_s$$ that are separated by a distance $$\rho$$. We claim that $$\mathscr{P} (\mathscr{K}_s, \rho) \le \mathscr{P}({\it {\Sigma}}_t^N \cap B_2^N, \rho/2)$$. To justify this claim, let us consider a maximal $$\rho$$-separated set $$\{{\bf f}^1,\ldots,{\bf f}^L\}$$ of signals in $$\mathscr{K}_s$$. For each $$i$$, let $$T_i \subseteq \{1, \ldots, N \}$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}^i$$. We write   $$\rho < \|{\bf f}^i - {\bf f}^j \|_2 = \|{\bf D}^* {\bf f}^i - {\bf D}^* {\bf f}^j \|_2 \le \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \| ({\bf D}^* {\bf f}^i)_{\overline{T_i}} \|_2 + \| ({\bf D}^* {\bf f}^j)_{\overline{T_j}} \|_2.$$ Invoking [15, Theorem 2.5], we observe that   $$\| ({\bf D}^* {\bf f}^i)_{\overline{T_i}} \|_2 \le \frac{1}{2\sqrt{t}} \|{\bf D}^* {\bf f}^i \|_1 \le \frac{\sqrt{s}}{2 \sqrt{t}} \|{\bf D}^* {\bf f}^i \|_2 = \frac{\sqrt{s}}{2 \sqrt{t}},$$ and similarly for $$j$$ instead of $$i$$. Thus, we obtain   $$\rho < \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \sqrt{\frac{s}{t}} \le \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \frac{\rho}{2}, \quad \mbox{i.e.} \; \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 > \frac{\rho}{2}.$$ Since we have uncovered a set of $$L = \mathscr{P}(\mathscr{K}_s,\rho)$$ points in $${\it {\Sigma}}_t^N \cap B_2^N$$ that are $$(\rho/2)$$ separated, the claimed inequality is proved. We conclude by recalling that $$\mathscr{P}({\it {\Sigma}}_t^N \cap B_2^N, \rho/2)$$ is bounded above by $$\mathscr{N}({\it {\Sigma}}_t^N \cap B_2^N, \rho/4)$$, which is itself bounded above by $$\dbinom{N}{t} \left(1 + \dfrac{2}{\rho/4} \right)^t$$. □ We can now turn our attention to proving the awaited theorem. Proof of Theorem 4. With $$\mathscr{K}_s = ({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1}$$, the conclusion holds when $$m \ge C \varepsilon^{-6} w(\mathscr{K}_s)^2$$ or when $$m \ge C \varepsilon^{-1} \ln (\mathscr{N}(\mathscr{K}_s,c \varepsilon))$$, according to [26, Theorem 1.5] or to [3, Theorem 1.5], respectively. It now suffices to call upon Lemma 3. Note that the latter option yields better powers of $$\varepsilon^{-1}$$, but less pleasant failure probability. □ 5.3 Further remarks We conclude this theoretical section by making two noteworthy comments on the sign product embedding property and the tessellation property in the dictionary case. Remark 4 $${\bf D}$$-SPEP cannot hold for arbitrary dictionary $${\bf D}$$ if synthesis sparsity was replaced by effective synthesis sparsity. This is because the set of effectively $$s$$-synthesis-sparse signals can be the whole space $$\mathbb{R}^n$$. Indeed, let $${\bf f} \in \mathbb{R}^n$$ that can be written as $${\bf f} = {\bf D} {\bf u}$$ for some $${\bf u} \in \mathbb{R}^N$$. Let us also pick an $$(s-1)$$-sparse vector $${\bf v} \in \ker {\bf D}$$—there are tight frames for which this is possible, e.g. the concatenation of two orthogonal matrices. For $$\varepsilon > 0$$ small enough, we have   $$\frac{\|{\bf v} + \varepsilon {\bf u} \|_1}{\|{\bf v} + \varepsilon {\bf u}\|_2} \le \frac{\|{\bf v}\|_1 + \varepsilon \|{\bf u}\|_1}{\|{\bf v}\|_2 - \varepsilon \|{\bf u}\|_2} \le \frac{\sqrt{s-1} \|{\bf v}\|_2 + \varepsilon \|{\bf u}\|_1}{\|{\bf v}\|_2 - \varepsilon \|{\bf u}\|_2} \le \sqrt{s},$$ so that the coefficient vector $${\bf v} + \varepsilon {\bf u}$$ is effectively $$s$$-sparse, hence so is $$(1/\varepsilon){\bf v} + {\bf u}$$. It follows that $${\bf f} = {\bf D}((1/\varepsilon){\bf v} + {\bf u})$$ is effectively $$s$$-synthesis sparse. Remark 5 Theorem 3 easily implies a tessellation result for $${\bf D}({\it {\Sigma}}_s^N) \,\cap\, S^{n-1}$$, the ‘synthesis-sparse sphere’. Precisely, under the assumptions of the theorem (with a change of the constant $$C$$), $${\bf D}$$-SPEP$$(2s,\delta/2)$$ holds. Then, one can derive   $$[{\bf g},{\bf h} \in {\bf D}({\it {\Sigma}}_s) \cap S^{n-1} : \; \mathrm{sgn}({\bf A} {\bf g}) = \mathrm{sgn}({\bf A} {\bf h})] \Longrightarrow [\|{\bf g} - {\bf h}\|_2 \le \delta].$$ To see this, with $$\boldsymbol{\varepsilon} := \mathrm{sgn}({\bf A} {\bf g}) = \mathrm{sgn}({\bf A} {\bf h})$$ and with $${\bf f} := ({\bf g}-{\bf h})/\|{\bf g}-{\bf h}\|_2 \in {\bf D}({\it {\Sigma}}_{2s}) \cap S^{n-1}$$, we have   $$\left| \frac{\sqrt{2/\pi}}{m} \langle {\bf A} {\bf f} , \boldsymbol{\varepsilon}\rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \frac{\delta}{2}, \qquad \left| \frac{\sqrt{2/\pi}}{m} \langle {\bf A} {\bf f} , \boldsymbol{\varepsilon} \rangle - \langle {\bf f}, {\bf h} \rangle \right| \le \frac{\delta}{2},$$ so by the triangle inequality $$|\langle {\bf f}, {\bf g} - {\bf h} \rangle| \le \delta$$, i.e. $$\|{\bf g} -{\bf h}\|_2 \le \delta$$, as announced. Acknowledgements The authors would like to thank the AIM SQuaRE program that funded and hosted our initial collaboration. Funding NSF grant number [CCF-1527501], ARO grant number [W911NF-15-1-0316] and AFOSR grant number [FA9550-14-1-0088] to R.B.; Alfred P. Sloan Fellowship and NSF Career grant number [1348721 to D.N.]; NSERC grant number [22R23068 to Y.P.]; and NSF Postdoctoral Research Fellowship grant number [1400558 to M.W.]. Footnotes 1 A signal $${\bf x} \in \mathbb{R}^N$$ is called $$s$$-sparse if $$\|{\bf x}\|_0 := |\mathrm{supp}({\bf x})| \leq s \ll N$$. 2 Here, ‘dictionary sparsity’ means effective $$s$$-analysis sparsity if $$\widehat{{\bf f}}$$ is produced by convex programming and genuine $$s$$-synthesis sparsity together with effective $$\kappa s$$-analysis sparsity if $$\widehat{{\bf f}}$$ is produced by hard thresholding. 3 In particular, [25, Proposition 2.1] applies to the slightly different notion of mean width defined as $$\mathbb{E} \left[\sup_{{\bf f} \in K - K} \langle {\bf f}, {\bf g} \rangle \right]$$. References 1. ( 2016) Compressive Sensing webpage. http://dsp.rice.edu/cs (accessed 24 June 2016). 2. Baraniuk R., Foucart S., Needell D., Plan Y. & Wootters M. ( 2017) Exponential decay of reconstruction error from binary measurements of sparse signals. IEEE Trans. Inform. Theory,  63, 3368– 3385. Google Scholar CrossRef Search ADS   3. Bilyk D. & Lacey M. T. ( 2015) Random tessellations, restricted isometric embeddings, and one bit sensing. arXiv preprint arXiv:1512.06697. 4. Blumensath T. ( 2011) Sampling and reconstructing signals from a union of linear subspaces. IEEE Trans. Inform. Theory,  57, 4660– 4671. Google Scholar CrossRef Search ADS   5. Boufounos P. T. & Baraniuk R. G. ( 2008) 1-Bit compressive sensing. Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS),  IEEE, pp. 16– 21. 6. Candès E. J., Demanet L., Donoho D. L. & Ying L. ( 2000) Fast discrete curvelet transforms. Multiscale Model. Simul.,  5, 861– 899. Google Scholar CrossRef Search ADS   7. Candès E. J. & Donoho D. L. ( 2004) New tight frames of curvelets and optimal representations of objects with piecewise $$C^2$$ singularities. Comm. Pure Appl. Math.,  57, 219– 266. Google Scholar CrossRef Search ADS   8. Candès E. J., Eldar Y. C., Needell D. & Randall P. ( 2010) Compressed sensing with coherent and redundant dictionaries. Appl. Comput. Harmon. Anal.,  31, 59– 73. Google Scholar CrossRef Search ADS   9. Daubechies I. ( 1992) Ten Lectures on Wavelets . Philadelphia, PA: SIAM. Google Scholar CrossRef Search ADS   10. Davenport M., Needell D. & Wakin M. B. ( 2012) Signal space CoSaMP for sparse recovery with redundant dictionaries. IEEE Trans. Inform. Theory,  59, 6820– 6829. Google Scholar CrossRef Search ADS   11. Elad M., Milanfar P. & Rubinstein R. ( 2007) Analysis versus synthesis in signal priors. Inverse Probl.,  23, 947. Google Scholar CrossRef Search ADS   12. Eldar Y. C. & Kutyniok G. ( 2012) Compressed Sensing: Theory and Applications . Cambridge, UK: Cambridge University Press. Google Scholar CrossRef Search ADS   13. Feichtinger H. & Strohmer T. (eds.) ( 1998) Gabor Analysis and Algorithms . Boston, MA: Birkhäuser. Google Scholar CrossRef Search ADS   14. Foucart S. ( 2016) Dictionary-sparse recovery via thresholding-based algorithms. J. Fourier Anal. Appl.,  22, 6– 19. Google Scholar CrossRef Search ADS   15. Foucart S. & Rauhut H. ( 2013) A Mathematical Introduction to Compressive Sensing . Basel, Switzerland: Birkhäuser. Google Scholar CrossRef Search ADS   16. Giryes R., Nam S., Elad M., Gribonval R. & Davies M. E. ( 2014) Greedy-like algorithms for the cosparse analysis model. Linear Algebra Appl.,  441, 22– 60. Google Scholar CrossRef Search ADS   17. Gopi S., Netrapalli P., Jain P. & Nori A. ( 2013) One-bit compressed sensing: Provable support and vector recovery. Proceedings of the 30th International Conference on Machine Learning (ICML),  Atlanta GA, 2013, pp. 154– 162. 18. Jacques L., Degraux K. & De Vleeschouwer C. ( 2013) Quantized iterative hard thresholding: bridging 1-bit and high-resolution quantized compressed sensing. Proceedings of the 10th International Conference on Sampling Theory and Applications (SampTA),  Bremen, Germany, pp. 105– 108. 19. Jacques L., Laska J. N., Boufounos P. T. & Baraniuk R. G. ( 2013) Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inform. Theory,  59, 2082– 2102. Google Scholar CrossRef Search ADS   20. Knudson K., Saab R. & Ward R. ( 2016) One-bit compressive sensing with norm estimation. IEEE Trans. Inform. Theory,  62, 2748– 2758. Google Scholar CrossRef Search ADS   21. Krahmer F., Needell D. & Ward R. ( 2015) Compressive sensing with redundant dictionaries and structured measurements. SIAM J. Math. Anal.,  47, 4606– 4629. Google Scholar CrossRef Search ADS   22. Nam S., Davies M. E., Elad M. & Gribonval R. ( 2013) The cosparse analysis model and algorithms. Appl. Comput. Harmon. Anal.,  34, 30– 56. Google Scholar CrossRef Search ADS   23. Peleg T. & Elad M. ( 2013) Performance guarantees of the thresholding algorithm for the cosparse analysis model. IEEE Trans. Inform. Theory,  59, 1832– 1845. Google Scholar CrossRef Search ADS   24. Plan Y. & Vershynin R. ( 2013a) One-bit compressed sensing by linear programming. Comm. Pure Appl. Math.,  66, 1275– 1297. Google Scholar CrossRef Search ADS   25. Plan Y. & Vershynin R. ( 2013b) Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach. IEEE Trans. Inform. Theory,  59, 482– 494. Google Scholar CrossRef Search ADS   26. Plan Y. & Vershynin R. ( 2014) Dimension reduction by random hyperplane tessellations. Discrete Comput. Geom.,  51, 438– 461. Google Scholar CrossRef Search ADS   27. Rauhut H., Schnass K. & Vandergheynst P. ( 2008) Compressed sensing and redundant dictionaries. IEEE Trans. Inform. Theory,  54, 2210– 2219. Google Scholar CrossRef Search ADS   28. Saab R., Wang R. & Yilmaz Ö. ( 2016) Quantization of compressive samples with stable and robust recovery. Applied and Computational Harmonic Analysis, to appear. 29. Starck J.-L., Elad M. & Donoho D. ( 2004) Redundant multiscale transforms and their application for morphological component separation. Advances in Imaging and Electron Physics,  132, 287– 348. Google Scholar CrossRef Search ADS   30. Yan M., Yang Y. & Osher S. ( 2012) Robust 1-bit compressive sensing using adaptive outlier pursuit., IEEE Trans. Signal Process.,  60, 3868– 3875. Google Scholar CrossRef Search ADS   © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information and Inference: A Journal of the IMA Oxford University Press

# One-bit compressive sensing of dictionary-sparse signals

, Volume 7 (1) – Mar 1, 2018
22 pages

/lp/ou_press/one-bit-compressive-sensing-of-dictionary-sparse-signals-k1l3bdmQC9
Publisher
Oxford University Press
ISSN
2049-8764
eISSN
2049-8772
D.O.I.
10.1093/imaiai/iax009
Publisher site
See Article on Publisher Site

### Abstract

Abstract One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples—only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sparse in some fixed orthonormal basis. However, in most practical applications, signals are sparse with respect to an overcomplete dictionary, rather than a basis. There has already been a surge of activity to obtain recovery guarantees under such a generalized sparsity model in the classical compressive sensing setting. Here, we extend the one-bit framework to this important model, providing a unified theory of one-bit compressive sensing under dictionary sparsity. Specifically, we analyze several different algorithms—based on convex programming and on hard thresholding—and show that, under natural assumptions on the sensing matrix (satisfied by Gaussian matrices), these algorithms can efficiently recover analysis–dictionary-sparse signals in the one-bit model. 1. Introduction The basic insight of compressive sensing is that a small number of linear measurements can be used to reconstruct sparse signals. In traditional compressive sensing, we wish to reconstruct an $$s$$-sparse1 signal $${\bf x} \in \mathbb{R}^N$$ from linear measurements of the form   $$\label{meas} {\bf y} = {\bf A}{\bf x} \in \mathbb{R}^m \qquad\text{(or its corrupted version {\bf y} = {\bf A}{\bf x} + {\bf e})},$$ (1.1) where $${\bf A}$$ is an $$m\times N$$ measurement matrix. A significant body of work over the past decade has demonstrated that the $$s$$-sparse (or nearly $$s$$-sparse) signal $${\bf x}$$ can be accurately and efficiently recovered from its measurement vector $${\bf y} = {\bf A}{\bf x}$$ when $${\bf A}$$ has independent Gaussian entries, say, and when $$m \asymp s\log(N/s)$$ [1,12,15]. This basic model has been extended in several directions. Two important ones—which we focus on in this work—are (a) extending the set of signals to include the larger and important class of dictionary- sparse signals, and (b) considering highly quantized measurements as in one-bit compressive sensing. Both of these settings have important practical applications and have received much attention in the past few years. However, to the best of our knowledge, they have not been considered together before. In this work, we extend the theory of one-bit compressive sensing to dictionary-sparse signals. Below, we briefly review the background on these notions, set up notation and outline our contributions. 1.1 One-bit measurements In practice, each entry $$y_i = \langle {\bf a}_i, {\bf x}\rangle$$ (where $${\bf a}_i$$ denotes the $$i$$th row of $${\bf A}$$) of the measurement vector in (1.1) needs to be quantized. That is, rather than observing $${\bf y}={\bf A}{\bf x}$$, one observes $${\bf y} = Q({\bf A}{\bf x})$$ instead, where $$Q: \mathbb{R}^m \rightarrow \mathscr{A}$$ denotes the quantizer that maps each entry of its input to a corresponding quantized value in an alphabet $$\mathscr{A}$$. The so-called one-bit compressive sensing [5] problem refers to the case when $$|\mathscr{A}| = 2$$, and one wishes to recover $${\bf x}$$ from its heavily quantized (one bit) measurements $${\bf y} = Q({\bf A}{\bf x})$$. The simplest quantizer in the one-bit case uses the alphabet $$\mathscr{A} = \{-1, 1\}$$ and acts by taking the sign of each component as   $$\label{eq:quantized} y_i = Q(\langle {\bf a}_i, {\bf x}\rangle) = \mathrm{sgn}(\langle {\bf a}_i, {\bf x}\rangle),$$ (1.2) which we denote in shorthand by $${\bf y} = \mathrm{sgn}({\bf A}{\bf x})$$. Since the publication of [5] in 2008, several efficient methods, both iterative and optimization based, have been developed to recover the signal $${\bf x}$$ (up to normalization) from its one-bit measurements (see e.g. [17–19,24,25,30]). In particular, it is shown [19] that the direction of any $$s$$-sparse signal $${\bf x}$$ can be estimated by some $$\hat{{\bf x}}$$ produced from $${\bf y}$$ with accuracy   $$\left\| \frac{{\bf x}}{\|{\bf x}\|_2} - \frac{\hat{{\bf x}}}{\|\hat{{\bf x}}\|_2}\right\|_2 \leq \varepsilon$$ when the number of measurements is at least   $$m = {\it {\Omega}}\left(\frac{s \ln(N/s)}{\varepsilon} \right)\!.$$ Notice that with measurements of this form, we can only hope to recover the direction of the signal, not the magnitude. However, we can recover the entire signal if we allow for thresholded measurements of the form   $$\label{eq:quantizeddither} y_i = \mathrm{sgn}(\langle {{{\bf a}_i}}, {{{\bf x}}} \rangle - \tau_i).$$ (1.3) In practice, it is often feasible to obtain quantized measurements of this form, and they have been studied before. Existing works using measurements of the form (1.3) have also allowed for adaptive thresholds; that is, the $$\tau_i$$ can be chosen adaptively based on $$y_j$$ for $$j < i$$. The goal of those works was to improve the convergence rate, i.e. the dependence on $$\varepsilon$$ in the number of measurements $$m$$. It is known that a dependence of $${\it {\Omega}}(1/\varepsilon)$$ is necessary with non-adaptive measurements, but recent work on Sigma-Delta quantization [28] and other schemes [2,20] have shown how to break this barrier using measurements of the form (1.3) with adaptive thresholds. In this article, we neither focus on the decay rate (the dependence on $$\varepsilon$$) nor do we consider adaptive measurements. However, we do consider non-adaptive measurements both of the form (1.2) and (1.3). This allows us to provide results on reconstruction of the magnitude of signals, and the direction. 1.2 Dictionary sparsity Although the classical setting assumes that the signal $${\bf x}$$ itself is sparse, most signals of interest are not immediately sparse. In the straightforward case, a signal may be instead sparse after some transform; for example, images are known to be sparse in the wavelet domain, sinusoidal signals in the Fourier domain, and so on [9]. Fortunately, the classical framework extends directly to this model, since the product of a Gaussian matrix and an orthonormal basis is still Gaussian. However, in many practical applications, the situation is not so straightforward, and the signals of interest are sparse, not in an orthonormal basis, but rather in a redundant (highly overcomplete) dictionary; this is known as dictionary sparsity. Signals in radar and sonar systems, for example, are sparsely represented in Gabor frames, which are highly overcomplete and far from orthonormal [13]. Images may be sparsely represented in curvelet frames [6,7], undecimated wavelet frames [29] and other frames, which by design are highly redundant. Such redundancy allows for sparser representations and a wider class of signal representations. Even in the Fourier domain, utilizing an oversampled DFT allows for much more realistic and practical signals to be represented. For these reasons, recent research has extended the compressive sensing framework to the setting, where the signals of interest are sparsified by overcomplete tight frames (see e.g. [8,14,16,27]). Throughout this article, we consider a dictionary $${\bf D} \in \mathbb{R}^{n \times N}$$, which is assumed to be a tight frame, in the sense that   ${\bf D} {\bf D}^* = {\bf I}_n.$ To distinguish between the signal and its sparse representation, we write $${\bf f}\in\mathbb{R}^n$$ for the signal of interest and $${\bf f}={\bf D}{\bf x}$$, where $${\bf x}\in\mathbb{R}^N$$ is a sparse coefficient vector. We then acquire the samples of the form $${\bf y} = {\bf A}{\bf f} = {\bf A}{\bf D}{\bf x}$$ and attempt to recover the signal $${\bf f}$$. Note that, due to the redundancy of $${\bf D}$$, we do not hope to be able to recover a unique coefficient vector $${\bf x}$$. In other words, even when the measurement matrix $${\bf A}$$ is well suited for sparse recovery, the product $${\bf A}{\bf D}$$ may have highly correlated columns, making recovery of $${\bf x}$$ impossible. With the introduction of a non-invertible sparsifying transform $${\bf D}$$, it becomes important to distinguish between two related but distinct notions of sparsity. Precisely, we say that $${\bf f}$$ is $$s$$-synthesis sparse if $${\bf f} = {\bf D} {\bf x}$$ for some $$s$$-sparse $${\bf x} \in \mathbb{R}^N$$; $${\bf f}$$ is $$s$$-analysis sparse if $${\bf D}^* {\bf f} \in \mathbb{R}^N$$ is $$s$$-sparse. We note that analysis sparsity is a stronger assumption, because, assuming analysis sparsity, one can always take $${\bf x} = {\bf D}^* {\bf f}$$ in the synthesis sparsity model. See [11] for an introduction to the analysis-sparse model in compressive sensing (also called the analysis cosparse model). Instead of exact sparsity, it is often more realistic to study effective sparsity. We call a coefficient vector $${\bf x} \in \mathbb{R}^N$$ effectively $$s$$-sparse if   $$\|{\bf x}\|_1 \le \sqrt{s} \|{\bf x}\|_2,$$ and we say that $${\bf f}$$ is effectively $$s$$-synthesis sparse if $${\bf f} = {\bf D} {\bf x}$$ for some effectively $$s$$-sparse $${\bf x} \in \mathbb{R}^N$$; $${\bf f}$$ is effectively $$s$$-analysis sparse if $${\bf D}^* {\bf f} \in \mathbb{R}^N$$ is effectively $$s$$-sparse. We use the notation   \begin{align*} {\it {\Sigma}}^N_s & \mbox{for the set of $s$-sparse coefficient vectors in $\mathbb{R}^N$, and} \\ {\it {\Sigma}}_s^{N,{\rm eff}} & \mbox{for the set of effectively $s$-sparse coefficient vectors in $\mathbb{R}^N$.} \end{align*} We also use the notation $$B_2^n$$ for the set of signals with $$\ell_2$$-norm at most $$1$$ (i.e. the unit ball in $$\ell_2^n$$) and $$S^{n-1}$$ for the set of signals with $$\ell_2$$-norm equal to $$1$$ (i.e. the unit sphere in $$\ell_2^n$$). It is now well known that, if $${\bf D}$$ is a tight frame and $${\bf A}$$ satisfies analogous conditions to those in the classical setting (e.g. has independent Gaussian entries), then a signal $${\bf f}$$ which is (effectively) analysis- or synthesis sparse can be accurately recovered from traditional compressive sensing measurements $${\bf y} = {\bf A} {\bf f} = {\bf A}{\bf D}{\bf x}$$ (see e.g. [4,8,10,14,16,22,23,27]). 1.3 One-bit measurements with dictionaries: our setup In this article, we study one-bit compressive sensing for dictionary-sparse signals. Precisely, our aim is to recover signals $${\bf f} \in \mathbb{R}^n$$ from the binary measurements   $$y_i = \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle \qquad i=1,\ldots,m,$$ or   $$y_i = \mathrm{sgn} \left(\langle {\bf a}_i, {\bf f} \rangle - \tau_i \right) \qquad i = 1,\ldots,m,$$ when these signals are sparse with respect to a dictionary $${\bf D}$$. As in Section 1.2, there are several ways to model signals that are sparse with respect to $${\bf D}$$. In this work, two different signal classes are considered. For the first one, which is more general, our results are based on convex programming. For the second one, which is more restrictive, we can obtain results using a computationally simpler algorithm based on hard thresholding. The first class consists of signals $${\bf f} \in ({\bf D}^*)^{-1} {\it {\Sigma}}_s^{N,\rm{eff}}$$ that are effectively $$s$$-analysis sparse, i.e. they satisfy   $$\label{Assumption} \|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} \|{\bf D}^* {\bf f}\|_2.$$ (1.4) This occurs, of course, when $${\bf D}^* {\bf f}$$ is genuinely sparse (analysis sparsity) and this is realistic if we are working, e.g. with piecewise-constant images, since they are sparse after application of the total variation operator. We consider effectively sparse signals since genuine analysis sparsity is unrealistic when $${\bf D}$$ has columns in general position, as it would imply that $${\bf f}$$ is orthogonal to too many columns of $${\bf D}$$. The second class consists of signals $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N) \cap ({\bf D}^*)^{-1} {\it {\Sigma}}_{\kappa s}^{N, \rm{eff}}$$ that are both $$s$$-synthesis sparse and $$\kappa s$$-analysis sparse for some $$\kappa \ge 1$$. This will occur as soon as the signals are $$s$$-synthesis sparse, provided we utilize suitable dictionaries $${\bf D} \in \mathbb{R}^{n \times N}$$. One could take, for instance, the matrix of an equiangular tight frame when $$N = n + k$$, $$k = {\rm constant}$$. Other examples of suitable dictionaries found in [21] include harmonic frames again with $$N = n + k$$, $$k = {\rm constant}$$, as well as Fourier and Haar frames with constant redundancy factor $$N/n$$. Figure 1 summarizes the relationship between the various domains we deal with. Fig. 1 View largeDownload slide The coefficient, signal and measurement domains. Fig. 1 View largeDownload slide The coefficient, signal and measurement domains. 1.4 Contributions Our main results demonstrate that one-bit compressive sensing is viable even when the sparsifying transform is an overcomplete dictionary. As outlined in Section 1.1, we consider both the challenge of recovering the direction $${\bf f}/\|{\bf f}\|_2$$ of a signal $${\bf f}$$, and the challenge of recovering the entire signal (direction and magnitude). Using measurements of the form $$y_i = \mathrm{sgn}\langle {\bf a}_i, {\bf f} \rangle$$, we can recover the direction but not the magnitude; using measurements of the form $$y_i = \mathrm{sgn}\left(\langle {\bf a}_i, {\bf f} \rangle - \tau_i \right)$$, we may recover both. In (one-bit) compressive sensing, two standard families of algorithms are (a) algorithms based on convex programming, and (b) algorithms based on thresholding. In this article, we analyze algorithms from both classes. One reason to study multiple algorithms is to give a more complete landscape of this problem. Another reason is that the different algorithms come with different trade-offs (between computational complexity and the strength of assumptions required), and it is valuable to explore this space of trade-offs. 1.4.1 Recovering the direction First, we show that the direction of a dictionary-sparse signal can be estimated from one-bit measurements of the type $$\mathrm{sgn}({\bf A} {\bf f})$$. We consider two algorithms: our first approach is based on linear programming, and our second is based on hard thresholding. The linear programming approach is more computationally demanding, but applies to a broader class of signals. In Section 3, we prove that both of these approaches are effective, provided the sensing matrix $${\bf A}$$ satisfies certain properties. In Section 2, we state that these properties are in fact satisfied by a matrix $${\bf A}$$ populated with independent Gaussian entries. We combine all of these results to prove the statement below. As noted above, the different algorithms require different definitions of ‘dictionary sparsity’. In what follows, $$\gamma, C, c$$ refer to absolute numerical constants. Theorem 1 (Informal statement of direction recovery) Let $$\varepsilon \,{>}\, 0$$, let $$m \,{\ge}\, C \varepsilon^{-7} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, any dictionary-sparse2 signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f})$$ can be approximated by the output $$\widehat{{\bf f}}$$ of an efficient algorithm with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{\widehat{{\bf f}}}{\|\widehat{{\bf f}}\|_2} \right\|_2 \le \varepsilon.$$ 1.4.2 Recovering the whole signal By using one-bit measurements of the form $$\mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$, where $$\tau_1,\ldots,\tau_m$$ are properly normalized Gaussian random thresholds, we are able to recover not just the direction, but also the magnitude of a dictionary-sparse signal $${\bf f}$$. We consider three algorithms: our first approach is based on linear programming, our second approach on second-order cone programming and our third approach on hard thresholding. Again, there are different trade-offs to the different algorithms. As above, the approach based on hard thresholding is more efficient, whereas the approaches based on convex programming apply to a broader signal class. There is also a trade-off between linear programming and second-order cone programming: the second-order cone program requires knowledge of $$\|{\bf f}\|_2,$$ whereas the linear program does not (although it does require a loose bound), but the second-order cone programming approach applies to a slightly larger class of signals. We show in Section 4 that all three of these algorithms are effective when the sensing matrix $${\bf A}$$ is populated with independent Gaussian entries, and when the thresholds $$\tau_i$$ are also independent Gaussian random variables. We combine the results of Section 4 in the following theorem. Theorem 2 (Informal statement of signal estimation) Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C \varepsilon^{-9} s \ln(eN/s)$$, and let $${\bf A} \in \mathbb{R}^{m \times n}$$ and $$\boldsymbol{\tau} \in \mathbb{R}^m$$ be populated by independent mean-zero normal random variables with variance $$1$$ and $$\sigma^2$$, respectively. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, any dictionary-sparse$$^2$$ signal $${\bf f} \in \mathbb{R}^n$$ with $$\|{\bf f}\|_2 \le r$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by the output $$\widehat{{\bf f}}$$ of an efficient algorithm with error   $$\left\| {\bf f} - \widehat{{\bf f}} \right\|_2 \le \varepsilon r.$$ We have not spelled out the dependence of the number of measurements and the failure probability on the parameters $$r$$ and $$\sigma$$: as long as they are roughly the same order of magnitude, the dependence is absorbed in the constants $$C$$ and $$c$$ (see Section 4 for precise statements). As outlined earlier, an estimate of $$r$$ is required to implement the second-order cone program, but the other two algorithms do not require such an estimate. 1.5 Discussion and future directions The purpose of this work is to demonstrate that techniques from one-bit compressive sensing can be effective for the recovery of dictionary-sparse signals, and we propose several algorithms to accomplish this for various notions of dictionary sparsity. Still, some interesting future directions remain. First, we do not believe that the dependence on $$\varepsilon$$ above is optimal. We do believe instead that a logarithmic dependence on $$\varepsilon$$ for the number of measurements (or equivalently an exponential decay in the oversampling factor $$\lambda = m / (s \ln(eN/s))$$ for the recovery error) is possible by choosing the thresholds $$\tau_1,\ldots,\tau_m$$ adaptively. This would be achieved by adjusting the method of [2], but with the strong proviso of exact sparsity. Secondly, it is worth asking to what extent the trade-offs between the different algorithms reflect reality. In particular, is it only an artifact of the proof that the simpler algorithm based on hard thresholding applies to a narrower class of signals? 1.6 Organization The remainder of the article is organized as follows. In Section 2, we outline some technical tools upon which our results rely, namely some properties of Gaussian random matrices. In Section 3, we consider recovery of the direction $${\bf f}/\|{\bf f}\|$$ only and we propose two algorithms to achieve it. In Section 4, we present three algorithms for the recovery of the entire signal $${\bf f}$$. Finally, in Section 5, we provide proofs for the results outlined in Section 2. 2. Technical ingredients In this section, we highlight the theoretical properties upon which our results rely. Their proofs are deferred to Section 5 so that the reader does not lose track of our objectives. The first property we put forward is an adaptation to the dictionary case of the so-called sign product embedding property (the term was coined in [18], but the result originally appeared in [25]). Theorem 3 ($${\bf D}$$-SPEP) Let $$\delta > 0$$, let $$m \ge C \delta^{-7} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \delta^2 m)$$, the renormalized matrix $${\bf A}':= (\sqrt{2/\pi}/m) {\bf A}$$ satisfies the $$s$$th-order sign product embedding property adapted to $${\bf D} \in \mathbb{R}^{n \times N}$$ with constant $$\delta$$ — $${\bf D}$$-SPEP$$(s,\delta)$$ for short—i.e.   $$\label{SPEP} \left| \langle {\bf A}' {\bf f}, \mathrm{sgn}({\bf A}' {\bf g}) \rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \delta$$ (2.1) holds for all $${\bf f}, {\bf g} \in {\bf D}({\it {\Sigma}}^N_s) \cap S^{n-1}$$. Remark 1 The power $$\delta^{-7}$$ is unlikely to be optimal. At least in the non-dictionary case, i.e. when $${\bf D} = {\bf I}_n$$, it can be reduced to $$\delta^{-2}$$, see [3]. As an immediate consequence of $${\bf D}$$-SPEP, setting $${\bf g} = {\bf f}$$ in (2.1) allows one to deduce a variation of the classical restricted isometry property adapted to $${\bf D}$$, where the inner norm becomes the $$\ell_1$$-norm (we mention in passing that this variation could also be deduced by other means). Corollary 1 ($${\bf D}$$-RIP$$_1$$) Let $$\delta \,{>}\, 0$$, let $$m \,{\ge}\, C \delta^{-7} s \,{\ln}\,(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \delta^2 m)$$, the renormalized matrix $${\bf A}':= (\sqrt{2/\pi}/m) {\bf A}$$ satisfies the $$s$$th-order $$\ell_1$$-restricted isometry property adapted to $${\bf D} \in \mathbb{R}^{n \times N}$$ with constant $$\delta$$ — $${\bf D}$$-RIP$$_{1}(s,\delta)$$ for short—i.e.   $$(1-\delta) \| {\bf f}\|_2 \le \| {\bf A}' {\bf f} \|_1 \le (1+\delta) \|{\bf f}\|_2$$ (2.2) holds for all $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N)$$. The next property we put forward is an adaptation of the tessellation of the ‘effectively sparse sphere’ (see [26]) to the dictionary case. In what follows, given a (non-invertible) matrix $${\bf M}$$ and a set $$K$$, we denote by $${\bf M}^{-1} (K)$$ the preimage of $$K$$ with respect to $${\bf M}$$. Theorem 4 (Tessellation) Let $$\varepsilon > 0$$, let $$m \ge C \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Then, with failure probability at most $$\gamma \exp(-c \varepsilon^2 m)$$, the rows $${\bf a}_1,\ldots,{\bf a}_m \in \mathbb{R}^n$$ of $${\bf A}$$$$\varepsilon$$-tessellate the effectively $$s$$-analysis-sparse sphere—we write that $${\bf A}$$ satisfies $${\bf D}$$-TES$$(s,\varepsilon)$$ for short—i.e.   $$\label{Tes} [{\bf f},{\bf g} \in ({\bf D}^*)^{-1}({\it {\Sigma}}_{s}^{N,{\rm eff}}) \cap S^{n-1} : \; \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle = \mathrm{sgn} \langle {\bf a}_i, {\bf g} \rangle \mbox{for all} i =1,\ldots,m] \Longrightarrow [\|{\bf f} - {\bf g}\|_2 \le \varepsilon].$$ (2.3) 3. Signal estimation: direction only In this whole section, given a measurement matrix $${\bf A} \in \mathbb{R}^{m \times n}$$ with rows $${\bf a}_1,\ldots,{\bf a}_m \in \mathbb{R}^n$$, the signals $${\bf f} \in \mathbb{R}^n$$ are acquired via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f}) \in \{-1,+1\}^m$$, i.e.   $$y_i = \mathrm{sgn} \langle {\bf a}_i, {\bf f} \rangle \qquad i = 1,\ldots,m.$$ Under this model, all $$c {\bf f}$$ with $$c>0$$ produce the same one-bit measurements, so one can only hope to recover the direction of $${\bf f}$$. We present two methods to do so, one based on linear programming and the other one based on hard thresholding. 3.1 Linear programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$, the optimization scheme we consider here consists in outputting the signal $${\bf f}_{\rm lp}$$ solution of   $$\label{LPforDir} \underset{{{\bf h} \in \mathbb{R}^n}}{\rm minimize}\, \| {\bf D}^* {\bf h}\|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h}) = {\bf y} \quad \|{\bf A} {\bf h}\|_1 = 1.$$ (3.1) This is in fact a linear program (and thus may be solved efficiently), since the condition $$\mathrm{sgn}({\bf A} {\bf h}) = {\bf y}$$ reads   $$y_i ({\bf A} {\bf h})_i \ge 0 \qquad \mbox{for all} i = 1,\ldots, m,$$ and, under this constraint, the condition $$\|{\bf A} {\bf h}\|_1 = 1$$ reads   $$\sum_{i=1}^m y_i ({\bf A} {\bf h})_i = 1.$$ Theorem 5 If $${\bf A} \,{\in}\, \mathbb{R}^{m \times n}$$ satisfies both $${\bf D}$$-TES$$(36s,\varepsilon)$$ and $${\bf D}$$-RIP$$_1(25s,1/5)$$, then any effectively $$s$$-analysis-sparse signal $${\bf f} \in ({\bf D}^*)^{-1}{\it {\Sigma}}_s^{N,{\rm eff}}$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f})$$ is directionally approximated by the output $${\bf f}_{\rm lp}$$ of the linear program (3.1) with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{{\bf f}_{\rm lp}}{\|{\bf f}_{\rm lp}\|_2} \right\|_2 \le \varepsilon.$$ Proof. The main step is to show that $${\bf f}_{\rm lp}$$ is effectively $$36s$$-analysis sparse when $${\bf D}$$-RIP$$_1(t,\delta)$$ holds with $$t= 25s$$ and $$\delta=1/5$$. Then, since both $${\bf f}/\|{\bf f}\|_2$$ and $${\bf f}_{\rm lp} / \|{\bf f}_{\rm lp}\|_2$$ belong to $$({\bf D}^*)^{-1}{\it {\Sigma}}_{36 s}^{N,{\rm eff}} \cap S^{n-1}$$ and have the same sign observations, $${\bf D}$$-TES$$(36s,\varepsilon)$$ implies the desired conclusion. To prove the effective analysis sparsity of $${\bf f}_{\rm lp}$$, we first estimate $$\|{\bf A} {\bf f}\|_1$$ from below. For this purpose, let $$T_0$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, etc. We have   \begin{align*} \|{\bf A} {\bf f} \|_1 & = \|{\bf A} {\bf D} {\bf D}^* {\bf f}\|_1 = \left\| {\bf A} {\bf D} \left(\sum_{k \ge 0} ({\bf D}^*{\bf f})_{T_k} \right) \right\|_1 \ge \|{\bf A} {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right)\!\|_1 - \sum_{k \ge 1} \|{\bf A} {\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_1\\ & \ge (1-\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right)\!\|_2 - \sum_{k \ge 1} (1+\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_2, \end{align*} where the last step used $${\bf D}$$-RIP$$_1(t,\delta)$$. We notice that, for $$k \ge 1$$,   $$\|{\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right)\!\|_2 \le \| ({\bf D}^* {\bf f})_{T_k}\! \|_2 \le \frac{1}{\sqrt{t}} \| ({\bf D}^* {\bf f})_{T_{k-1}}\!\|_1,$$ from where it follows that   $$\label{LowerAf} \|{\bf A} {\bf f}\|_1 \ge (1-\delta) \|{\bf D} \left(({\bf D}^* {\bf f})_{T_0}\right)\!\|_2 - \frac{1+\delta}{\sqrt{t}} \|{\bf D}^* {\bf f} \|_1.$$ (3.2) In addition, we observe that   \begin{align*} \|{\bf D}^* {\bf f} \|_2 & = \|{\bf f}\|_2 = \|{\bf D} {\bf D}^* {\bf f}\|_2 = \left\| {\bf D} \left(\sum_{k \ge 0} ({\bf D}^* {\bf f})_{T_k} \right) \right\|_2 \le \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \sum_{k \ge 1} \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_k} \right) \right\|_2\\ & \le \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \frac{1}{\sqrt{t}} \|{\bf D}^* {\bf f} \|_1. \end{align*} In view of the effective sparsity of $${\bf D}^* {\bf f}$$, we obtain   $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} \|{\bf D}^* {\bf f}\|_2 \le \sqrt{s}\left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 + \sqrt{s/t} \|{\bf D}^* {\bf f} \|_1,$$ hence   $$\label{LowerDD*T0} \left\| {\bf D} \left(({\bf D}^* {\bf f})_{T_0} \right) \right\|_2 \ge \frac{1- \sqrt{s/t}}{\sqrt{s}} \|{\bf D}^* {\bf f} \|_1.$$ (3.3) Substituting (3.3) in (3.2) yields   $$\label{LowerAf2} \|{\bf A} {\bf f}\|_1 \ge \left((1-\delta)(1-\sqrt{s/t}) - (1+\delta)(\sqrt{s/t}) \right) \frac{1}{\sqrt{s}} \|{\bf D}^* {\bf f}\|_1 = \frac{2/5}{\sqrt{s}} \|{\bf D}^* {\bf f}\|_1,$$ (3.4) where we have used the values $$t = 25s$$ and $$\delta=1/5$$. This lower estimate for $$\|{\bf A} {\bf f} \|_1$$, combined with the minimality property of $${\bf f}_{\rm lp}$$, allows us to derive that   $$\label{UpperD*fhat} \|{\bf D}^* {\bf f}_{\rm lp} \|_1 \le \|{\bf D}^*({\bf f}/ \|{\bf A} {\bf f}\|_1)\|_1 = \frac{\|{\bf D}^* {\bf f}\|_1}{\|{\bf A} {\bf f} \|_1} \le (5/2) \sqrt{s}.$$ (3.5) Next, with $$\widehat{T}_0$$ denoting an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, $$\widehat{T}_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, $$\widehat{T}_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}_{\rm lp}$$, etc., we can write   \begin{align*} 1 & = \|{\bf A} {\bf f}_{\rm lp} \|_1 = \|{\bf A} {\bf D} {\bf D}^* {\bf f}_{\rm lp} \|_1 = \left\| {\bf A} {\bf D} \left(\sum_{k \ge 0} ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_1 \le \sum_{k \ge 0} \left\| {\bf A} {\bf D} \left(({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_1\\ & \le \sum_{k \ge 0} (1+\delta) \left\| {\bf D} \left(({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right) \right\|_2 = (1+\delta) \left[\!\left\| ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_0} \right\|_2 + \sum_{k \ge 1} \!\left\| ({\bf D}^* {\bf f}_{\rm lp})_{\widehat{T}_k} \right\|_2 \right]\\ & \le (1+\delta) \left[\|{\bf D}^* {\bf f}_{\rm lp} \|_2 + \frac{1}{\sqrt{t}} \|{\bf D}^* {\bf f}_{\rm lp}\|_1 \right] \le (1+\delta) \left[\|{\bf D}^* {\bf f}_{\rm lp} \|_2 + (5/2)\sqrt{s/t} \right]. \end{align*} This chain of inequalities shows that   $$\label{LowerD*fhat} \|{\bf D}^* {\bf f}_{\rm lp} \|_2 \ge \frac{1-(5/2)\sqrt{s/t}}{1+\delta} = \frac{5}{12}.$$ (3.6) Combining (3.5) and (3.6), we obtain   $$\|{\bf D}^* {\bf f}_{\rm lp} \|_1 \le 6 \sqrt{s} \|{\bf D}^* {\bf f}_{\rm lp} \|_2.$$ In other words, $${\bf D}^* {\bf f}_{\rm lp}$$ is effectively $$36s$$-sparse, which is what was needed to conclude the proof. □ Remark 2 We point out that if $${\bf f}$$ was genuinely, instead of effectively, $$s$$-analysis sparse, then a lower bound of the type (3.4) would be immediate from the $${\bf D}$$-RIP$$_1$$. We also point out that our method of proving that the linear program outputs an effectively analysis-sparse signal is new even in the case $${\bf D} = {\bf I}_n$$. In fact, it makes it possible to remove a logarithmic factor from the number of measurements in this ‘non-dictionary’ case, too (compare with [24]). Furthermore, it allows for an analysis of the linear program (3.1) only based on deterministic conditions that the matrix $${\bf A}$$ may satisfy. 3.2 Hard thresholding Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$, the hard thresholding scheme we consider here consists in constructing a signal $${\bf f}_{\rm ht} \in \mathbb{R}^n$$ as   $$\label{HTforDir} {\bf f}_{\rm ht} = {\bf D} {\bf z}, \qquad \mbox{where} {\bf z} := H_t({\bf D}^* {\bf A}^* {\bf y}).$$ (3.7) Our recovery result holds for $$s$$-synthesis-sparse signals that are also effectively $$\kappa s$$-analysis sparse for some $$\kappa \ge 1$$ (we discussed in Section 1 some choices of dictionaries $${\bf D}$$ making this happen). Theorem 6 If $${\bf A} \in \mathbb{R}^{m \times n}$$ satisfies $${\bf D}$$-SPEP$$(s+t,\varepsilon/8)$$, $$t = \lceil 16 \varepsilon^{-2} \kappa s \rceil$$, then any $$s$$-synthesis-sparse signal $${\bf f} \in {\bf D}({\it {\Sigma}}_s^N)$$ with $${\bf D}^* {\bf f} \in {\it {\Sigma}}_{\kappa s}^{N,{\rm eff}}$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f})$$ is directionally approximated by the output $${\bf f}_{\rm ht}$$ of the hard thresholding (3.7) with error   $$\left\| \frac{{\bf f}}{\|{\bf f}\|_2} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le \varepsilon.$$ Proof. We assume without loss of generality that $$\|{\bf f}\|_2 = 1$$. Let $$T=T_0$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_1$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, $$T_2$$ an index set of next $$t$$ largest absolute entries of $${\bf D}^* {\bf f}$$, etc. We start by noticing that $${\bf z}$$ is a better $$t$$-sparse approximation to $${\bf D}^* {\bf A}^* {\bf y} = {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})$$ than $$[{\bf D}^* {\bf f}]_T$$, so we can write   $$\| {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) - {\bf z} \|_2^2 \le \|{\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) - [{\bf D}^* {\bf f}]_T \|_2^2,$$ i.e.   $$\| ({\bf D}^* {\bf f} - {\bf z}) - ({\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) \|_2^2 \le \| ({\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) - [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2.$$ Expanding the squares and rearranging gives   \begin{align} \label{Term1} \|{\bf D}^* {\bf f} - {\bf z} \|_2^2 & \le 2 \langle {\bf D}^* {\bf f} - {\bf z}, {\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle \\ \end{align} (3.8)  \begin{align} \label{Term2} & - 2 \langle [{\bf D}^* {\bf f}]_{\overline{T}} , {\bf D}^* {\bf f} - {\bf D}^* {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle \\ \end{align} (3.9)  \begin{align} \label{Term3} & + \| [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2. \end{align} (3.10) To bound (3.10), we invoke [15, Theorem 2.5] and the effective analysis sparsity of $${\bf f}$$ to derive   $$\| [{\bf D}^* {\bf f}]_{\overline{T}} \|_2^2 \le \frac{1}{4t} \| {\bf D}^* {\bf f} \|_1^2 \le \frac{\kappa s}{4t} \| {\bf D}^* {\bf f} \|_2^2 = \frac{\kappa s}{4t} \|{\bf f} \|_2^2 = \frac{\kappa s}{4t}.$$ To bound (3.8) in absolute value, we notice that it can be written as   \begin{align*} 2 | \langle {\bf D} {\bf D}^* {\bf f} - {\bf D} {\bf z}, &{\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | = 2 | \langle {\bf f} - {\bf f}_{\rm ht}, {\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | \\ & = 2 | \langle {\bf f} - {\bf f}_{\rm ht}, {\bf f} \rangle - \langle {\bf A} ({\bf f} - {\bf f}_{\rm ht}), \mathrm{sgn}({\bf A} {\bf f}) \rangle | \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht} \|_2, \end{align*} where the last step followed from $${\bf D}$$-SPEP$$(s+t,\varepsilon')$$, $$\varepsilon' := \varepsilon /8$$. Finally, (3.9) can be bounded in absolute value by   \begin{align*} 2 & \sum_{k \ge 1} | \langle [{\bf D}^* {\bf f}]_{T_k}, {\bf D}^*({\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f})) \rangle | = 2 \sum_{k \ge 1} | \langle {\bf D}([{\bf D}^* {\bf f}]_{T_k}), {\bf f} - {\bf A}^* \mathrm{sgn}({\bf A} {\bf f}) \rangle | \\ & = 2 \sum_{k \ge 1} | \langle {\bf D}([{\bf D}^* {\bf f}]_{T_k}), {\bf f} \rangle - \langle {\bf A} ({\bf D}([{\bf D}^* {\bf f}]_{T_k})), \mathrm{sgn}({\bf A} {\bf f}) \rangle | \le 2 \sum_{k \ge 1} \varepsilon' \| {\bf D}([{\bf D}^* {\bf f}]_{T_k}) \|_2\\ & \le 2 \varepsilon' \sum_{k \ge 1} \| [{\bf D}^* {\bf f}]_{T_k} \|_2 \le 2 \varepsilon' \sum_{k \ge 1} \frac{\| [{\bf D}^* {\bf f}]_{T_{k-1}} \|_1}{\sqrt{t}} \le 2 \varepsilon' \frac{\|{\bf D}^* {\bf f}\|_1}{\sqrt{t}} \le 2 \varepsilon' \frac{\sqrt{\kappa s} \|{\bf D}^* {\bf f}\|_2}{\sqrt{t}} = 2 \varepsilon' \sqrt{\frac{\kappa s}{t}}. \end{align*} Putting everything together, we obtain   $$\|{\bf D}^* {\bf f} - {\bf z} \|_2^2 \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht}\|_2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t}.$$ In view of $$\|{\bf f} - {\bf f}_{\rm ht}\|_2 = \|{\bf D} ({\bf D}^* {\bf f} - {\bf z}) \|_2 \le \|{\bf D}^* {\bf f} - {\bf z}\|_2$$, it follows that   $$\|{\bf f} - {\bf f}_{\rm ht}\|_2^2 \le 2 \varepsilon' \|{\bf f} - {\bf f}_{\rm ht}\|_2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t}, \quad \mbox{i.e.} \; (\|{\bf f} - {\bf f}_{\rm ht}\|_2 - \varepsilon')^2 \le {\varepsilon'}^2 + 2 \varepsilon' \sqrt{\frac{\kappa s}{t}} + \frac{\kappa s}{4t} \le \left(\varepsilon' \hspace{-0.5mm}+\hspace{-0.5mm} \sqrt{\frac{\kappa s}{t}} \right)^2 \hspace{-1mm}.$$ This implies that   $$\|{\bf f} - {\bf f}_{\rm ht}\|_2 \le 2 \varepsilon' + \sqrt{\frac{\kappa s}{t}}.$$ Finally, since $${\bf f}_{\rm ht}/\|{\bf f}_{\rm ht}\|_2$$ is the best $$\ell_2$$-normalized approximation to $${\bf f}_{\rm ht}$$, we conclude that   $$\left\| {\bf f} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le \|{\bf f} - {\bf f}_{\rm ht}\|_2 + \left\| {\bf f}_{\rm ht} - \frac{{\bf f}_{\rm ht}}{\|{\bf f}_{\rm ht}\|_2} \right\|_2 \le 2 \|{\bf f} - {\bf f}_{\rm ht}\|_2 \le 4 \varepsilon' + 2 \sqrt{\frac{\kappa s}{t}}.$$ The announced result follows from our choices of $$t$$ and $$\varepsilon'$$. □ 4. Signal estimation: direction and magnitude Since information of the type $$y_i = \mathrm{sgn} \langle {\bf a}_i,{\bf f} \rangle$$ can at best allow one to estimate the direction of a signal $${\bf f} \in \mathbb{R}^n$$, we consider in this section information of the type   $$y_i = \mathrm{sgn}(\langle {\bf a}_i, {\bf f} \rangle - \tau_i) \qquad i = 1,\ldots,m ,$$ for some thresholds $$\tau_1,\ldots,\tau_m$$ introduced before quantization. In the rest of this section, we give three methods for recovering $${\bf f}$$ in its entirety. The first one is based on linear programming, the second one on second-order code programming and the last one on hard thresholding. We are going to show that using these algorithms, one can estimate both the direction and the magnitude of dictionary-sparse signal $${\bf f} \in \mathbb{R}^n$$ given a prior magnitude bound such as $$\| {\bf f} \|_2 \le r$$. We simply rely on the previous results by ‘lifting’ the situation from $$\mathbb{R}^n$$ to $$\mathbb{R}^{n+1}$$, in view of the observation that $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ can be interpreted as The following lemma will be equally useful when dealing with linear programming, second-order cone programming or hard thresholding schemes. Lemma 1 For $$\widetilde{{\bf f}}, \widetilde{{\bf g}} \in \mathbb{R}^{n+1}$$ written as   $$\widetilde{{\bf f}} := \begin{bmatrix} {\bf f}_{[n]} \\ \hline f_{n+1} \end{bmatrix} \qquad \mbox{and} \qquad \widetilde{{\bf g}} =: \begin{bmatrix} {\bf g}_{[n]} \\ \hline g_{n+1} \end{bmatrix}$$ with $$\widetilde{{\bf f}}_{[n]}, \widetilde{{\bf g}}_{[n]} \in \mathbb{R}^n$$ and with $$f_{n+1} \not= 0$$, $$g_{n+1} \not= 0$$, one has   $$\left\| \frac{{\bf f}_{[n]}}{f_{n+1}} - \frac{{\bf g}_{[n]}}{g_{n+1}} \right\|_2 \le \frac{\|\widetilde{{\bf f}}\|_2 \|\widetilde{{\bf g}}\|_2}{|f_{n+1}||g_{n+1}|} \left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2.$$ Proof. By using the triangle inequality in $$\mathbb{R}^n$$ and Cauchy–Schwarz inequality in $$\mathbb{R}^2$$, we can write   \begin{align*} \left\| \frac{{\bf f}_{[n]}}{f_{n+1}} - \frac{{\bf g}_{[n]}}{g_{n+1}} \right\|_2 & = \|\widetilde{{\bf f}}\|_2 \left\| \frac{1/f_{n+1}}{\|\widetilde{{\bf f}}\|_2} {\bf f}_{[n]} - \frac{1/g_{n+1}}{\|\widetilde{{\bf f}}\|_2} {\bf g}_{[n]} \right\|_2\\ & \le \|\widetilde{{\bf f}}\|_2 \left(\frac{1}{f_{n+1}} \left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 + \left| \frac{1/g_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{1/f_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \|{\bf g}_{[n]}\|_2 \right)\\ & = \|\widetilde{{\bf f}}\|_2 \left(\frac{1}{f_{n+1}} \left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 + \frac{\|{\bf g}_{[n]}\|_2}{|f_{n+1}| |g_{n+1}|} \left| \frac{f_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \right)\\ & \le \|\widetilde{{\bf f}}\|_2 \left[\frac{1}{|f_{n+1}|^2} + \frac{\|{\bf g}_{[n]}\|_2^2}{|f_{n+1}|^2 |g_{n+1}|^2} \right]^{1/2} \left[\left\| \frac{{\bf f}_{[n]}}{\| \widetilde{{\bf f}} \|_2} - \frac{{\bf g}_{[n]}}{\|\widetilde{{\bf g}}\|_2} \right\|_2^2 + \left| \frac{f_{n+1}}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right|^2 \right]^{1/2}\\ & = \|\widetilde{{\bf f}}\|_2 \left[\frac{\|\widetilde{{\bf g}}\|_2^2}{|f_{n+1}|^2 |g_{n+1}|^2} \right]^{1/2} \left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2, \end{align*} which is the announced result. □ 4.1 Linear programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the optimization scheme we consider here consists in outputting the signal   $$\label{Defflp} {\bf f}_{\rm LP} = \frac{\sigma}{\widehat{u}} \widehat{{\bf h}} \in \mathbb{R}^n,$$ (4.1) where $$\widehat{{\bf h}} \in \mathbb{R}^{n}$$ and $$\widehat{u} \in \mathbb{R}$$ are solutions of   $$\label{OptProg} \underset{{\bf h} \in \mathbb{R}^n, u \in \mathbb{R}}{\rm minimize \;} \; \|{\bf D}^* {\bf h} \|_1 + |u| \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h} - u \boldsymbol{\tau} / \sigma) = {\bf y}, \quad \|{\bf A} {\bf h} - u \boldsymbol{\tau} / \sigma \|_1 = 1.$$ (4.2) Theorem 7 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C (r/\sigma+\sigma/r)^6 \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from the entries of $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(-c m \varepsilon^2 r^2 \sigma^2/(r^2+\sigma^2)^2)$$, any effectively $$s$$-analysis sparse $${\bf f} \in \mathbb{R}^n$$ satisfying $$\|{\bf f}\|_2 \le r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm LP}$$ given in (4.1) with error   $$\left\| {\bf f}- {\bf f}_{\rm LP} \right\|_2 \le \varepsilon r.$$ Proof. Let us introduce the ‘lifted’ signal $$\widetilde{{\bf f}} \in \mathbb{R}^{n+1}$$, the ‘lifted’ tight frame $$\widetilde{{\bf D}} \in \mathbb{R}^{(n+1)\times (N+1)}$$, and the ‘lifted’ measurement matrix $$\widetilde{{\bf A}} \in \mathbb{R}^{m \times (N+1)}$$ defined as   $$\tilde{\mathbf{f}}:=\left[\!\!\!\frac{\ \ \ \mathbf{f}\ \ }{\sigma}\!\!\!\right],\quad \tilde{\mathbf{D}}:=\left[\!\!\begin{array}{c|c} \mathbf{D} & \mathbf{0}\\\hline \mathbf{0} & \mathbf{1} \end{array}\!\!\right],\quad \tilde{\mathbf{A}}:= \left[\begin{array}{c|c} & -\tau_1/\sigma\\ \mathbf{A} & \vdots \\ & -\tau_m/\sigma\end{array}\right].$$ (4.3) First, we observe that $$\widetilde{{\bf f}}$$ is effectively $$(s+1)$$-analysis sparse (relative to $$\widetilde{{\bf D}}$$), since , hence   $$\frac{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_1}{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_2} = \frac{\|{\bf D}^* {\bf f} \|_1 + \sigma}{\sqrt{\|{\bf D}^* {\bf f}\|_2^2+\sigma^2}} \le \frac{\sqrt{s} \|{\bf D}^* {\bf f}\|_2 + \sigma}{\sqrt{\|{\bf D}^* {\bf f}\|_2^2+\sigma^2}} \le \sqrt{s+1}.$$ Next, we observe that the matrix $$\widetilde{{\bf A}} \in \mathbb{R}^{m \times (n+1)}$$, populated by independent standard normal random variables, satisfies $$\widetilde{{\bf D}}$$-TES$$(36(s+1),\varepsilon')$$, $$\varepsilon' := \dfrac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon$$ and $$\widetilde{{\bf D}}$$-RIP$$_1(25(s+1),1/5)$$ with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2) + \gamma' \exp(-c' m) \le \gamma'' \exp(-c'' m \varepsilon^2 r^2 \sigma^2 / (r^2 + \sigma^2)^2)$$, since $$m \ge C {\varepsilon'}^{-6} (s+1) \ln(eN/(s+1))$$ and $$m \ge C (1/5)^{-7} (s+1) \ln(e N / (s+1))$$ are ensured by our assumption on $$m$$. Finally, we observe that $${\bf y} = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}})$$ and that the optimization program (4.2) reads   $$\underset{\widetilde{{\bf h}} \in \mathbb{R}^{n+1}}{\rm minimize \;} \|\widetilde{{\bf D}}^* \widetilde{{\bf h}} \|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf h}}) = {\bf y}, \quad \|\widetilde{{\bf A}} \widetilde{{\bf h}} \|_1 = 1.$$ Denoting its solution as , Theorem 5 implies that   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon'.$$ In particular, looking at the last coordinate, this inequality yields   $$\left| \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \le \varepsilon', \qquad \mbox{hence} \qquad \frac{|g_{n+1}|}{\|\widetilde{{\bf g}}\|_2} \ge \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \varepsilon' \ge \frac{\sigma}{\sqrt{r^2+\sigma^2}} - \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}} = \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}}.$$ In turn, applying Lemma 1 while taking $${\bf f} = {\bf f}_{[n]}$$ and $${\bf f}_{\rm LP} = (\sigma/g_{n+1}) {\bf g}_{[n]}$$ into consideration gives   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm LP}}{\sigma} \right\|_2 \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{\| \widetilde{{\bf g}} \|_2}{|g_{n+1}|} \varepsilon' \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{2\sqrt{r^2+\sigma^2}}{\sigma} \frac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon = \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon,$$ so that   $$\| {\bf f} - {\bf f}_{\rm LP} \|_2 \le \|\widetilde{{\bf f}}\|_2 \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon \le r \varepsilon.$$ This establishes the announced result. □ Remark 3 The recovery scheme (4.2) does not require an estimation of $$r$$ to be run. The recovery scheme presented next does require such an estimation. Moreover, it is a second-order cone program instead of a simpler linear program. However, it has one noticeable advantage, namely that it applies not only to signals satisfying $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s}\|{\bf D}^*{\bf f}\|_2$$ and $$\|{\bf D}^*{\bf f}\|_2 \le r$$, but also more generally to signals satisfying $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} r$$ and $$\|{\bf D}^*{\bf f}\|_2 \le r$$. For both schemes, one needs $$\sigma$$ to be of the same order as $$r$$ for the results to become meaningful in terms of number of measurement and success probability. However, if $$r$$ is only upper-estimated, then one could choose $$\sigma \ge r$$ and obtain a weaker recovery error $$\|{\bf f} - \widehat{{\bf f}}\|_2 \le \varepsilon \sigma$$ with relevant number of measurement and success probability. 4.2 Second-order cone programming Given a signal $${\bf f} \in \mathbb{R}^n$$ observed via $${\bf y} = \mathrm{sgn} ({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the optimization scheme we consider here consists in outputting the signal   $$\label{Deffcp} {\bf f}_{\rm CP} = \underset{{\bf h} \in \mathbb{R}^n}{\rm {\rm argmin}\, \;} \; \|{\bf D}^* {\bf h}\|_1 \qquad \mbox{subject to} \quad \mathrm{sgn}({\bf A} {\bf h} - \boldsymbol{\tau}) = {\bf y}, \quad \|{\bf h}\|_2 \le r.$$ (4.4) Theorem 8 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C (r/\sigma + \sigma/r)^6(r^2/\sigma^2+1) \varepsilon^{-6} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(- c' m \varepsilon^2 r^2 \sigma^2 / (r^2+\sigma^2)^2)$$, any signal $${\bf f} \in \mathbb{R}^n$$ with $$\|{\bf f}\|_2 \le r$$, $$\|{\bf D}^* {\bf f}\|_1 \le \sqrt{s} r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm CP}$$ given in (4.4) with error   $$\left\| {\bf f}- {\bf f}_{\rm CP} \right\|_2 \le \varepsilon r.$$ Proof. We again use the notation (4.3) introducing the ‘lifted’ objects $$\widetilde{{\bf f}}$$, $$\widetilde{{\bf D}}$$ and $$\widetilde{{\bf A}}$$. Moreover, we set . We claim that $$\widetilde{{\bf f}}$$ and $$\widetilde{{\bf g}}$$ are effectively $$s'$$-analysis sparse, $$s' := (r^2 / \sigma^2 + 1)(s+1)$$. For $$\widetilde{{\bf g}}$$, this indeed follows from $$\| \widetilde{{\bf D}}^* \widetilde{{\bf g}} \|_2 = \|\widetilde{{\bf g}}\|_2 = \sqrt{\|{\bf f}_{\rm CP}\|_2^2 + \sigma^2} \ge \sigma$$ and   $$\|\widetilde{{\bf D}}^* \widetilde{{\bf g}}\|_1 = \left\| \begin{bmatrix} {\bf D}^* {\bf f}_{\rm CP} \\ \hline \sigma \end{bmatrix} \right\|_1 = \| {\bf D}^* {\bf f}_{\rm CP}\|_1 + \sigma \le \| {\bf D}^* {\bf f} \|_1 + \sigma \le \sqrt{s} r + \sigma \le \sqrt{r^2 + \sigma^2} \sqrt{s+1}.$$ We also notice that $$\widetilde{{\bf A}}$$ satisfies $$\widetilde{{\bf D}}$$-TES$$(s', \varepsilon')$$, $$\varepsilon' \,{:=}\, \dfrac{r \sigma}{r^2 + \sigma^2} \varepsilon$$, with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2) \le \gamma \exp(- c' m \varepsilon^2 r^2 \sigma^2 / (r^2+\sigma^2)^2)$$, since $$m \ge C {\varepsilon'}^{-6} s' \ln(eN/s')$$ is ensured by our assumption on $$m$$. Finally, we observe that both $$\widetilde{{\bf f}}/ \|\widetilde{{\bf f}}\|_2$$ and $$\widetilde{{\bf g}}/ \|\widetilde{{\bf g}}\|_2$$ are $$\ell_2$$-normalized effectively $$s'$$-analysis sparse and have the same sign observations $$\mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}}) = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf g}}) = {\bf y}$$. Thus,   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon'.$$ In view of Lemma 1, we derive   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm CP}}{\sigma} \right\|_2 \le \frac{r^2 + \sigma^2}{\sigma^2} \varepsilon', \qquad \mbox{hence} \qquad \|{\bf f} - {\bf f}_{\rm CP}\|_2 \le \frac{r^2 + \sigma^2}{\sigma} \varepsilon' = r \varepsilon.$$ This establishes the announced result. □ 4.3 Hard thresholding Given a signal $${\bf f} \in \mathbb{R}^N$$ observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ with $$\tau_1,\ldots,\tau_m \sim \mathscr{N}(0,\sigma^2)$$, the hard thresholding scheme we consider here consists in outputting the signal   $$\label{fht} {\bf f}_{\rm HT} = \frac{-\sigma^2}{\langle \boldsymbol{\tau}, {\bf y} \rangle} {\bf D} {\bf z}, \qquad {\bf z} = H_{t-1}({\bf D}^* {\bf A}^* {\bf y}).$$ (4.5) Theorem 9 Let $$\varepsilon, r, \sigma > 0$$, let $$m \ge C \kappa (r/\sigma+\sigma/r)^9 \varepsilon^{-9} s \ln(eN/s)$$ and let $${\bf A} \in \mathbb{R}^{m \times n}$$ be populated by independent standard normal random variables. Furthermore, let $$\tau_1,\ldots,\tau_m$$ be independent normal random variables with mean zero and variance $$\sigma^2$$ that are also independent from the entries of $${\bf A}$$. Then, with failure probability at most $$\gamma \exp(-c m \varepsilon^2 r^2 \sigma^2/(r^2+\sigma^2)^2)$$, any $$s$$-synthesis-sparse and effectively $$\kappa s$$-analysis-sparse signal $${\bf f} \in \mathbb{R}^n$$ satisfying $$\|{\bf f}\|_2 \le r$$ and observed via $${\bf y} = \mathrm{sgn}({\bf A} {\bf f} - \boldsymbol{\tau})$$ is approximated by $${\bf f}_{\rm HT}$$ given in (4.5) for $$t:=\lceil 16 (\varepsilon'/8)^{-2} \kappa (s+1) \rceil$$ with error   $$\left\| {\bf f}- {\bf f}_{\rm HT} \right\|_2 \le \varepsilon r.$$ Proof. We again use the notation (4.3) for the ‘lifted’ objects $$\widetilde{{\bf f}}$$, $$\widetilde{{\bf D}}$$ and $$\widetilde{{\bf A}}$$. First, we notice that $$\widetilde{{\bf f}}$$ is $$(s+1)$$-synthesis sparse (relative to $$\widetilde{{\bf D}}$$), as well as effectively $$\kappa (s+1)$$-analysis sparse, since satisfies   $$\frac{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_1}{\|\widetilde{{\bf D}}^* \widetilde{{\bf f}}\|_2} = \frac{\|{\bf D}^* {\bf f}\|_1 + \sigma}{\sqrt{\|{\bf D}^*{\bf f}\|_2^2 + \sigma^2}} \le \frac{\sqrt{\kappa s}\|{\bf D}^* {\bf f}\|_2 + \sigma}{\sqrt{\|{\bf D}^*{\bf f}\|_2^2 +\sigma^2}} \le \sqrt{\kappa s + 1} \le \sqrt{\kappa (s+1)}.$$ Next, we observe that the matrix $$\widetilde{{\bf A}}$$, populated by independent standard normal random variables, satisfies $$\widetilde{{\bf D}}$$-SPEP$$(s+1+t,\varepsilon'/8)$$, $$\varepsilon ' := \dfrac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon$$, with failure probability at most $$\gamma \exp(-c m {\varepsilon'}^2 r^2)$$, since $$m \ge C (\varepsilon'/8)^{-7} (s+1+t) \ln(e(N+1)/(s+1+t))$$ is ensured by our assumption on $$m$$. Finally, since $${\bf y} = \mathrm{sgn}(\widetilde{{\bf A}} \widetilde{{\bf f}})$$, Theorem 5 implies that   $$\left\| \frac{\widetilde{{\bf f}}}{\|\widetilde{{\bf f}}\|_2} - \frac{\widetilde{{\bf g}}}{\|\widetilde{{\bf g}}\|_2} \right\|_2 \le \varepsilon',$$ where $$\widetilde{{\bf g}} \in \mathbb{R}^{n+1}$$ is the output of the ‘lifted’ hard thresholding scheme. i.e.   $$\widetilde{{\bf g}} = \widetilde{{\bf D}} \widetilde{{\bf z}}, \qquad \widetilde{{\bf z}} = H_{t} (\widetilde{{\bf D}}^*\widetilde{{\bf A}}^* {\bf y}).$$ In particular, looking at the last coordinate, this inequality yields   $$\label{LBg} \left| \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \frac{g_{n+1}}{\|\widetilde{{\bf g}}\|_2} \right| \le \varepsilon', \quad \mbox{hence} \quad \frac{|g_{n+1}|}{\|\widetilde{{\bf g}}\|_2} \ge \frac{\sigma}{\|\widetilde{{\bf f}}\|_2} - \varepsilon' \ge \frac{\sigma}{\sqrt{r^2+\sigma^2}} - \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}} = \frac{\sigma}{2 \sqrt{r^2 + \sigma^2}}.$$ (4.6) Now let us also observe that   $$\widetilde{{\bf z}} = H_{t} \left(\begin{bmatrix} {\bf D}^* {\bf A}^* {\bf y} \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix} \right) = \left\{ \begin{matrix} \begin{bmatrix} H_{t}({\bf D}^* {\bf A}^* {\bf y}) \\ \hline 0 \end{bmatrix}, \\ \mbox{or}\hspace{30mm}\\ \begin{bmatrix} H_{t-1}({\bf D}^* {\bf A}^* {\bf y}) \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix} , \end{matrix} \right. \quad \mbox{hence} \quad \widetilde{{\bf g}} = \widetilde{{\bf D}} \widetilde{{\bf z}} = \left\{ \begin{matrix} \begin{bmatrix} {\bf D}(H_{t}({\bf D}^* {\bf A}^* {\bf y})) \\ \hline 0 \end{bmatrix}, \\ \mbox{or}\hspace{35mm}\\ \begin{bmatrix} {\bf D}(H_{t-1}({\bf D}^* {\bf A}^* {\bf y})) \\ \hline - \langle \boldsymbol{\tau},{\bf y} \rangle / \sigma \end{bmatrix}. \end{matrix} \right.$$ In view of (4.6), the latter option prevails. It is then apparent that $${\bf f}_{\rm HT} = \sigma {\bf g}_{[n]} / g_{n+1}$$. Lemma 1 gives   $$\left\| \frac{{\bf f}}{\sigma} - \frac{{\bf f}_{\rm HT}}{\sigma} \right\|_2 \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{\| \widetilde{{\bf g}} \|_2}{|g_{n+1}|} \varepsilon' \le \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{2\sqrt{r^2+\sigma^2}}{\sigma} \frac{r \sigma}{2(r^2 + \sigma^2)} \varepsilon = \frac{\| \widetilde{{\bf f}} \|_2}{\sigma} \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon,$$ so that   $$\| {\bf f} - {\bf f}_{\rm HT} \|_2 \le \|\widetilde{{\bf f}}\|_2 \frac{r}{\sqrt{r^2+\sigma^2}} \varepsilon \le r \varepsilon.$$ This establishes the announced result. □ 5. Postponed proofs and further remarks This final section contains the theoretical justification of the technical properties underlying our results, followed by a few points of discussion around them. 5.1 Proof of $${\mathbf D}$$-$${\rm SPEP}$$ The Gaussian width turns out to be a useful tool in our proofs. For a set $$K \subseteq \mathbb{R}^n$$, it is defined by   $$w(K) = \mathbb{E} \left[\sup_{{\bf f} \in K} \langle {\bf f}, {\bf g} \rangle \right], \qquad {\bf g} \in \mathbb{R}^n \mbox{is a standard normal random vector}.$$ We isolate the following two properties. Lemma 2 Let $$K \subseteq \mathbb{R}^n$$ be a linear space and $$K_1,\ldots,K_L \subseteq \mathbb{R}^n$$ be subsets of the unit sphere $$S^{n-1}$$. (i) $$k / \sqrt{k+1} \le w(K \cap S^{n-1}) \le \sqrt{k} \qquad k:= \dim (K)$$; (ii) $$\displaystyle{w \left(K_1 \cup \ldots \cup K_L \right) \le \max \left\{w(K_1),\ldots,w(K_L) \right\} + 3 \sqrt{\ln(L)}}$$. Proof. (i) By the invariance under orthogonal transformation (see [25, Proposition 2.1]3), we can assume that $$K = \mathbb{R}^k \times \left\{(0,\ldots,0) \right\}$$. We then notice that $$\sup_{{\bf f} \in K \cap S^{n-1}} \langle {\bf f}, {\bf g} \rangle = \|(g_1,\ldots,g_k)\|_2$$ is the $$\ell_2$$-norm of a standard normal random vector of dimension $$k$$. We invoke, e.g. [15, Proposition 8.1] to derive the announced result. (ii) Let us introduce the non-negative random variables   $$\xi_\ell := \sup_{{\bf f} \in K_\ell} \langle {\bf f} , {\bf g} \rangle \quad \ell = 1,\ldots, L ,$$ so that the Gaussian widths of each $$K_\ell$$ and of their union take the form   $$w(K_\ell) = \mathbb{E}(\xi_\ell) \quad \ell = 1,\ldots, L \qquad \mbox{and} \qquad w \left(K_1 \cup \cdots \cup K_L \right) = \mathbb{E} \left(\max_{\ell = 1, \ldots, L} \xi_\ell \right).$$ By the concentration of measure inequality (see e.g. [15, Theorem 8.40]) applied to the function $$F: {\bf x} \in \mathbb{R}^n \mapsto \sup_{{\bf f} \in K_\ell} \langle {\bf f}, {\bf x} \rangle$$, which is a Lipschitz function with constant $$1$$, each $$\xi_\ell$$ satisfies   $$\mathbb{P}(\xi_\ell \ge \mathbb{E}(\xi_\ell) + t) \le \exp \left(-t^2/2 \right)\!.$$ Because each $$\mathbb{E}(\xi_\ell)$$ is no larger than $$\max_\ell \mathbb{E}(\xi_\ell) = \max_\ell w(K_\ell) =: \omega$$, we also have   $$\mathbb{P} (\xi_\ell \ge \omega + t) \le \exp \left(-t^2/2 \right)\!.$$ Setting $$v:= \sqrt{2 \ln(L)}$$, we now calculate   \begin{align*} \mathbb{E} \left(\max_{\ell =1,\ldots, L} \xi_\ell \right) & = \int_0^\infty \mathbb{P} \left(\max_{\ell =1,\ldots, L} \xi_\ell \ge u \right) \,{\rm{d}}u = \left(\int_0^{\omega+v} + \int_{\omega+v}^\infty \right) \mathbb{P} \left(\max_{\ell = 1,\ldots, L} \xi_\ell \ge u \right) \,{\rm{d}}u\\ & \le \int_0^{\omega + v} 1\, {\rm{d}}u + \int_{\omega + v}^\infty \sum_{\ell=1}^L \mathbb{P} \left(\xi_\ell \ge u \right) \,{\rm{d}}u = \omega + v + \sum_{\ell=1}^L \int_v^\infty \mathbb{P} \left(\xi_\ell \ge \omega + t \right) \,{\rm{d}}t\\ & \le \omega + v + L \int_v^\infty \exp \left(-t^2/2 \right)\, {\rm{d}}t \le \omega + v + L \frac{\exp(-v^2/2)}{v} \\ & = \omega + \sqrt{2 \ln(L)} + L \frac{1/L}{\sqrt{2 \ln(L)}} \le \omega + c \sqrt{\ln(L)}, \end{align*} where $$c=\sqrt{2} + (\sqrt{2} \ln(2))^{-1} \le 3$$. We have shown that $$w \left(K_1 \cup \cdots \cup K_L \right) \le \max_{\ell} w(K_\ell) + 3 \sqrt{\ln(L)}$$, as desired. □ We now turn our attention to proving the awaited theorem. Proof of Theorem 3. According to [25, Proposition 4.3], with $${\bf A}' := (\sqrt{2/\pi}/m) {\bf A}$$, we have   $$\left| \langle {\bf A}' {\bf f}, \mathrm{sgn}({\bf A}' {\bf g}) \rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \delta,$$ for all $${\bf f},{\bf g} \in {\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1}$$ provided $$m \ge C \delta^{-7} w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1})^2$$, so it is enough to upper bound $$w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1})$$ appropriately. To do so, with $${\it {\Sigma}}_S^N$$ denoting the space $$\{{\bf x} \in \mathbb{R}^N: {\rm supp}({\bf x}) \subseteq S \}$$ for any $$S \subseteq \{1,\ldots, N \}$$, we use Lemma 2 to write   \begin{align*} w({\bf D}({\it {\Sigma}}_s^N) \cap S^{n-1}) & = w \bigg(\bigcup_{|S|=s} \left\{{\bf D}({\it {\Sigma}}_S^N) \cap S^{n-1} \right\} \bigg) \underset{(ii)}{\le} \max_{|S|=s} w({\bf D}({\it {\Sigma}}_S^N) \cap S^{n-1}) + 3 \sqrt{\ln \left(\binom{N}{s} \right)}\\ & \underset{(i)}{\le} \sqrt{s} + 3 \sqrt{s \ln \left(eN/s \right)} \le 4 \sqrt{s \ln \left(eN/s \right)}. \end{align*} The result is now immediate. □ 5.2 Proof of TES We propose two approaches for proving Theorem 4. One uses again the notion of Gaussian width, and the other one relies on covering numbers. The necessary results are isolated in the following lemma. Lemma 3 The set of $$\ell_2$$-normalized effectively $$s$$-analysis-sparse signals satisfies (i) $$\displaystyle{w \left(({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1} \right)} \le C \sqrt{s \ln(eN/s)},$$ (ii) $$\displaystyle{\mathscr{N} \left(({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1} , \rho \right) \le \binom{N}{t}\left(1 + \frac{8}{\rho} \right)^t, \qquad t := \lceil 4 \rho^{-2}} s \rceil.$$ Proof. (i) By the definition of the Gaussian width for $$\mathscr{K}_s: = ({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1}$$, with $${\bf g} \in \mathbb{R}^n$$ denoting a standard normal random vector,   $$\label{slep} w(\mathscr{K}_s) = \mathbb{E} \left[\sup_{\substack{{\bf D}^* {\bf f} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf f}\|_2 = 1}} \langle {\bf f} , {\bf g} \rangle \right] = \mathbb{E} \left[\sup_{\substack{{\bf D}^* {\bf f} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf D}^* {\bf f}\|_2 = 1}} \langle {\bf D} {\bf D}^* {\bf f} , {\bf g} \rangle \right] \le \mathbb{E} \left[\sup_{\substack{{\bf x} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf x}\|_2 = 1}} \langle {\bf D} {\bf x} , {\bf g} \rangle \right].$$ (5.1) In view of $$\|{\bf D}\|_{2 \to 2} = 1$$, we have, for any $${\bf x},{\bf x}' \in {\it {\Sigma}}_s^{N,{\rm eff}}$$ with $$\|{\bf x}\|_2 = \|{\bf x}'\|_2 =1$$,   \begin{align*} \mathbb{E} \left(\langle {\bf D} {\bf x}, {\bf g} \rangle - \langle {\bf D} {\bf x}', {\bf g}' \rangle \right)^2 &= \mathbb{E} \left[\langle {\bf D} {\bf x}, {\bf g} \rangle ^2 \right] + \mathbb{E} \left[\langle {\bf D} {\bf x}', {\bf g}' \rangle ^2 \right] = \|{\bf D} {\bf x}\|_2^2 + \|{\bf D} {\bf x}'\|_2^2 \le \|{\bf x}\|_2^2 + \|{\bf x}'\|_2^2\\ &= \mathbb{E} \left(\langle {\bf x}, {\bf g} \rangle - \langle {\bf x}', {\bf g}' \rangle \right)^2. \end{align*} Applying Slepian’s lemma (see e.g. [15, Lemma 8.25]), we obtain   $$w(\mathscr{K}_s) \le \mathbb{E} \left[\sup_{\substack{{\bf x} \in {\it {\Sigma}}_s^{N,{\rm eff}} \\ \|{\bf x}\|_2 = 1}} \langle {\bf x} , {\bf g} \rangle \right] =w({\it {\Sigma}}_s^{N,{\rm eff}} \cap S^{n-1}).$$ The latter is known to be bounded by $$C s \ln (eN/s)$$, see [25, Lemma 2.3]. (ii) The covering number $$\mathscr{N}(\mathscr{K}_s,\rho)$$ is bounded above by the maximal number $$\mathscr{P}(\mathscr{K}_s,\rho)$$ of elements in $$\mathscr{K}_s$$ that are separated by a distance $$\rho$$. We claim that $$\mathscr{P} (\mathscr{K}_s, \rho) \le \mathscr{P}({\it {\Sigma}}_t^N \cap B_2^N, \rho/2)$$. To justify this claim, let us consider a maximal $$\rho$$-separated set $$\{{\bf f}^1,\ldots,{\bf f}^L\}$$ of signals in $$\mathscr{K}_s$$. For each $$i$$, let $$T_i \subseteq \{1, \ldots, N \}$$ denote an index set of $$t$$ largest absolute entries of $${\bf D}^* {\bf f}^i$$. We write   $$\rho < \|{\bf f}^i - {\bf f}^j \|_2 = \|{\bf D}^* {\bf f}^i - {\bf D}^* {\bf f}^j \|_2 \le \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \| ({\bf D}^* {\bf f}^i)_{\overline{T_i}} \|_2 + \| ({\bf D}^* {\bf f}^j)_{\overline{T_j}} \|_2.$$ Invoking [15, Theorem 2.5], we observe that   $$\| ({\bf D}^* {\bf f}^i)_{\overline{T_i}} \|_2 \le \frac{1}{2\sqrt{t}} \|{\bf D}^* {\bf f}^i \|_1 \le \frac{\sqrt{s}}{2 \sqrt{t}} \|{\bf D}^* {\bf f}^i \|_2 = \frac{\sqrt{s}}{2 \sqrt{t}},$$ and similarly for $$j$$ instead of $$i$$. Thus, we obtain   $$\rho < \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \sqrt{\frac{s}{t}} \le \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 + \frac{\rho}{2}, \quad \mbox{i.e.} \; \|({\bf D}^* {\bf f}^i)_{T_i} - ({\bf D}^* {\bf f}^j)_{T_j} \|_2 > \frac{\rho}{2}.$$ Since we have uncovered a set of $$L = \mathscr{P}(\mathscr{K}_s,\rho)$$ points in $${\it {\Sigma}}_t^N \cap B_2^N$$ that are $$(\rho/2)$$ separated, the claimed inequality is proved. We conclude by recalling that $$\mathscr{P}({\it {\Sigma}}_t^N \cap B_2^N, \rho/2)$$ is bounded above by $$\mathscr{N}({\it {\Sigma}}_t^N \cap B_2^N, \rho/4)$$, which is itself bounded above by $$\dbinom{N}{t} \left(1 + \dfrac{2}{\rho/4} \right)^t$$. □ We can now turn our attention to proving the awaited theorem. Proof of Theorem 4. With $$\mathscr{K}_s = ({\bf D}^*)^{-1}({\it {\Sigma}}_s^{N,{\rm eff}}) \cap S^{n-1}$$, the conclusion holds when $$m \ge C \varepsilon^{-6} w(\mathscr{K}_s)^2$$ or when $$m \ge C \varepsilon^{-1} \ln (\mathscr{N}(\mathscr{K}_s,c \varepsilon))$$, according to [26, Theorem 1.5] or to [3, Theorem 1.5], respectively. It now suffices to call upon Lemma 3. Note that the latter option yields better powers of $$\varepsilon^{-1}$$, but less pleasant failure probability. □ 5.3 Further remarks We conclude this theoretical section by making two noteworthy comments on the sign product embedding property and the tessellation property in the dictionary case. Remark 4 $${\bf D}$$-SPEP cannot hold for arbitrary dictionary $${\bf D}$$ if synthesis sparsity was replaced by effective synthesis sparsity. This is because the set of effectively $$s$$-synthesis-sparse signals can be the whole space $$\mathbb{R}^n$$. Indeed, let $${\bf f} \in \mathbb{R}^n$$ that can be written as $${\bf f} = {\bf D} {\bf u}$$ for some $${\bf u} \in \mathbb{R}^N$$. Let us also pick an $$(s-1)$$-sparse vector $${\bf v} \in \ker {\bf D}$$—there are tight frames for which this is possible, e.g. the concatenation of two orthogonal matrices. For $$\varepsilon > 0$$ small enough, we have   $$\frac{\|{\bf v} + \varepsilon {\bf u} \|_1}{\|{\bf v} + \varepsilon {\bf u}\|_2} \le \frac{\|{\bf v}\|_1 + \varepsilon \|{\bf u}\|_1}{\|{\bf v}\|_2 - \varepsilon \|{\bf u}\|_2} \le \frac{\sqrt{s-1} \|{\bf v}\|_2 + \varepsilon \|{\bf u}\|_1}{\|{\bf v}\|_2 - \varepsilon \|{\bf u}\|_2} \le \sqrt{s},$$ so that the coefficient vector $${\bf v} + \varepsilon {\bf u}$$ is effectively $$s$$-sparse, hence so is $$(1/\varepsilon){\bf v} + {\bf u}$$. It follows that $${\bf f} = {\bf D}((1/\varepsilon){\bf v} + {\bf u})$$ is effectively $$s$$-synthesis sparse. Remark 5 Theorem 3 easily implies a tessellation result for $${\bf D}({\it {\Sigma}}_s^N) \,\cap\, S^{n-1}$$, the ‘synthesis-sparse sphere’. Precisely, under the assumptions of the theorem (with a change of the constant $$C$$), $${\bf D}$$-SPEP$$(2s,\delta/2)$$ holds. Then, one can derive   $$[{\bf g},{\bf h} \in {\bf D}({\it {\Sigma}}_s) \cap S^{n-1} : \; \mathrm{sgn}({\bf A} {\bf g}) = \mathrm{sgn}({\bf A} {\bf h})] \Longrightarrow [\|{\bf g} - {\bf h}\|_2 \le \delta].$$ To see this, with $$\boldsymbol{\varepsilon} := \mathrm{sgn}({\bf A} {\bf g}) = \mathrm{sgn}({\bf A} {\bf h})$$ and with $${\bf f} := ({\bf g}-{\bf h})/\|{\bf g}-{\bf h}\|_2 \in {\bf D}({\it {\Sigma}}_{2s}) \cap S^{n-1}$$, we have   $$\left| \frac{\sqrt{2/\pi}}{m} \langle {\bf A} {\bf f} , \boldsymbol{\varepsilon}\rangle - \langle {\bf f}, {\bf g} \rangle \right| \le \frac{\delta}{2}, \qquad \left| \frac{\sqrt{2/\pi}}{m} \langle {\bf A} {\bf f} , \boldsymbol{\varepsilon} \rangle - \langle {\bf f}, {\bf h} \rangle \right| \le \frac{\delta}{2},$$ so by the triangle inequality $$|\langle {\bf f}, {\bf g} - {\bf h} \rangle| \le \delta$$, i.e. $$\|{\bf g} -{\bf h}\|_2 \le \delta$$, as announced. Acknowledgements The authors would like to thank the AIM SQuaRE program that funded and hosted our initial collaboration. Funding NSF grant number [CCF-1527501], ARO grant number [W911NF-15-1-0316] and AFOSR grant number [FA9550-14-1-0088] to R.B.; Alfred P. Sloan Fellowship and NSF Career grant number [1348721 to D.N.]; NSERC grant number [22R23068 to Y.P.]; and NSF Postdoctoral Research Fellowship grant number [1400558 to M.W.]. Footnotes 1 A signal $${\bf x} \in \mathbb{R}^N$$ is called $$s$$-sparse if $$\|{\bf x}\|_0 := |\mathrm{supp}({\bf x})| \leq s \ll N$$. 2 Here, ‘dictionary sparsity’ means effective $$s$$-analysis sparsity if $$\widehat{{\bf f}}$$ is produced by convex programming and genuine $$s$$-synthesis sparsity together with effective $$\kappa s$$-analysis sparsity if $$\widehat{{\bf f}}$$ is produced by hard thresholding. 3 In particular, [25, Proposition 2.1] applies to the slightly different notion of mean width defined as $$\mathbb{E} \left[\sup_{{\bf f} \in K - K} \langle {\bf f}, {\bf g} \rangle \right]$$. References 1. ( 2016) Compressive Sensing webpage. http://dsp.rice.edu/cs (accessed 24 June 2016). 2. Baraniuk R., Foucart S., Needell D., Plan Y. & Wootters M. ( 2017) Exponential decay of reconstruction error from binary measurements of sparse signals. IEEE Trans. Inform. Theory,  63, 3368– 3385. Google Scholar CrossRef Search ADS   3. Bilyk D. & Lacey M. T. ( 2015) Random tessellations, restricted isometric embeddings, and one bit sensing. arXiv preprint arXiv:1512.06697. 4. Blumensath T. ( 2011) Sampling and reconstructing signals from a union of linear subspaces. IEEE Trans. Inform. Theory,  57, 4660– 4671. Google Scholar CrossRef Search ADS   5. Boufounos P. T. & Baraniuk R. G. ( 2008) 1-Bit compressive sensing. Proceedings of the 42nd Annual Conference on Information Sciences and Systems (CISS),  IEEE, pp. 16– 21. 6. Candès E. J., Demanet L., Donoho D. L. & Ying L. ( 2000) Fast discrete curvelet transforms. Multiscale Model. Simul.,  5, 861– 899. Google Scholar CrossRef Search ADS   7. Candès E. J. & Donoho D. L. ( 2004) New tight frames of curvelets and optimal representations of objects with piecewise $$C^2$$ singularities. Comm. Pure Appl. Math.,  57, 219– 266. Google Scholar CrossRef Search ADS   8. Candès E. J., Eldar Y. C., Needell D. & Randall P. ( 2010) Compressed sensing with coherent and redundant dictionaries. Appl. Comput. Harmon. Anal.,  31, 59– 73. Google Scholar CrossRef Search ADS   9. Daubechies I. ( 1992) Ten Lectures on Wavelets . Philadelphia, PA: SIAM. Google Scholar CrossRef Search ADS   10. Davenport M., Needell D. & Wakin M. B. ( 2012) Signal space CoSaMP for sparse recovery with redundant dictionaries. IEEE Trans. Inform. Theory,  59, 6820– 6829. Google Scholar CrossRef Search ADS   11. Elad M., Milanfar P. & Rubinstein R. ( 2007) Analysis versus synthesis in signal priors. Inverse Probl.,  23, 947. Google Scholar CrossRef Search ADS   12. Eldar Y. C. & Kutyniok G. ( 2012) Compressed Sensing: Theory and Applications . Cambridge, UK: Cambridge University Press. Google Scholar CrossRef Search ADS   13. Feichtinger H. & Strohmer T. (eds.) ( 1998) Gabor Analysis and Algorithms . Boston, MA: Birkhäuser. Google Scholar CrossRef Search ADS   14. Foucart S. ( 2016) Dictionary-sparse recovery via thresholding-based algorithms. J. Fourier Anal. Appl.,  22, 6– 19. Google Scholar CrossRef Search ADS   15. Foucart S. & Rauhut H. ( 2013) A Mathematical Introduction to Compressive Sensing . Basel, Switzerland: Birkhäuser. Google Scholar CrossRef Search ADS   16. Giryes R., Nam S., Elad M., Gribonval R. & Davies M. E. ( 2014) Greedy-like algorithms for the cosparse analysis model. Linear Algebra Appl.,  441, 22– 60. Google Scholar CrossRef Search ADS   17. Gopi S., Netrapalli P., Jain P. & Nori A. ( 2013) One-bit compressed sensing: Provable support and vector recovery. Proceedings of the 30th International Conference on Machine Learning (ICML),  Atlanta GA, 2013, pp. 154– 162. 18. Jacques L., Degraux K. & De Vleeschouwer C. ( 2013) Quantized iterative hard thresholding: bridging 1-bit and high-resolution quantized compressed sensing. Proceedings of the 10th International Conference on Sampling Theory and Applications (SampTA),  Bremen, Germany, pp. 105– 108. 19. Jacques L., Laska J. N., Boufounos P. T. & Baraniuk R. G. ( 2013) Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inform. Theory,  59, 2082– 2102. Google Scholar CrossRef Search ADS   20. Knudson K., Saab R. & Ward R. ( 2016) One-bit compressive sensing with norm estimation. IEEE Trans. Inform. Theory,  62, 2748– 2758. Google Scholar CrossRef Search ADS   21. Krahmer F., Needell D. & Ward R. ( 2015) Compressive sensing with redundant dictionaries and structured measurements. SIAM J. Math. Anal.,  47, 4606– 4629. Google Scholar CrossRef Search ADS   22. Nam S., Davies M. E., Elad M. & Gribonval R. ( 2013) The cosparse analysis model and algorithms. Appl. Comput. Harmon. Anal.,  34, 30– 56. Google Scholar CrossRef Search ADS   23. Peleg T. & Elad M. ( 2013) Performance guarantees of the thresholding algorithm for the cosparse analysis model. IEEE Trans. Inform. Theory,  59, 1832– 1845. Google Scholar CrossRef Search ADS   24. Plan Y. & Vershynin R. ( 2013a) One-bit compressed sensing by linear programming. Comm. Pure Appl. Math.,  66, 1275– 1297. Google Scholar CrossRef Search ADS   25. Plan Y. & Vershynin R. ( 2013b) Robust 1-bit compressed sensing and sparse logistic regression: a convex programming approach. IEEE Trans. Inform. Theory,  59, 482– 494. Google Scholar CrossRef Search ADS   26. Plan Y. & Vershynin R. ( 2014) Dimension reduction by random hyperplane tessellations. Discrete Comput. Geom.,  51, 438– 461. Google Scholar CrossRef Search ADS   27. Rauhut H., Schnass K. & Vandergheynst P. ( 2008) Compressed sensing and redundant dictionaries. IEEE Trans. Inform. Theory,  54, 2210– 2219. Google Scholar CrossRef Search ADS   28. Saab R., Wang R. & Yilmaz Ö. ( 2016) Quantization of compressive samples with stable and robust recovery. Applied and Computational Harmonic Analysis, to appear. 29. Starck J.-L., Elad M. & Donoho D. ( 2004) Redundant multiscale transforms and their application for morphological component separation. Advances in Imaging and Electron Physics,  132, 287– 348. Google Scholar CrossRef Search ADS   30. Yan M., Yang Y. & Osher S. ( 2012) Robust 1-bit compressive sensing using adaptive outlier pursuit., IEEE Trans. Signal Process.,  60, 3868– 3875. Google Scholar CrossRef Search ADS   © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com

### Journal

Information and Inference: A Journal of the IMAOxford University Press

Published: Mar 1, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations