Sparse recovery: from vectors to tensors

Sparse recovery: from vectors to tensors Abstract Recent advances in various fields such as telecommunications, biomedicine and economics, among others, have created enormous amount of data that are often characterized by their huge size and high dimensionality. It has become evident, from research in the past couple of decades, that sparsity is a flexible and powerful notion when dealing with these data, both from empirical and theoretical viewpoints. In this survey, we review some of the most popular techniques to exploit sparsity, for analyzing high-dimensional vectors, matrices and higher-order tensors. high-dimensional data, sparsity, compressive sensing, low-rank matrix recovery, tensors INTRODUCTION The problem of sparse recovery is ubiquitous in modern science and engineering applications. In these applications, we are interested in inferring a high-dimensional object, namely a vector, a matrix or a higher-order tensor, from very few observations. Notable examples include identifying key genes driving a complex disease, and reconstructing high-quality images or videos from compressive measurements, among many others. More specifically, consider linear measurements of an n-dimensional object x of the form: \begin{equation} y_k = \langle a_k, x\rangle , \quad k=1,\cdots , m, \end{equation} (1) where 〈·, ·〉 stands for the usual inner product in $${\mathbb {R}}^n$$, and ak are a set of prespecified n-dimensional vectors. The number of measurements m is typically much smaller than n so that the linear system (1) is underdetermined whenever m < n. Thus it is impossible to recover x from yk in the absence of any additional assumption. The idea behind sparse recovery is to assume that x actually resides in a subspace whose dimensionality is much smaller than the ambient dimension n. A canonical example of sparse recovery is the so-called compressive sensing for vectors, where x is assumed to have only a small number of, albeit unknown, nonzero coordinates. More generally, we call a vector $$x\in \mathbb {R}^n$$k-sparse if it can be represented by up to k elements from a predetermined dictionary. Another common example is the recovery of low-rank matrices where $$x\in \mathbb {R}^{n_1\times n_2}$$ is assumed to have a rank much smaller than min {n1, n2}. In many practical situations, we are also interested in signals of higher-order multilinear structure. For example, it is natural to represent multispectral images by a third-order multilinear array, or tensor, with the third index corresponding to different bandwidths. Clearly, vectors and matrices can be viewed as first-order and second-order tensors as well. Despite the connection, moving from vectors and matrices to higher-order tensors could present significant new challenges. A common way to address these challenges is to unfold tensors to matrices; see e.g. [1–4]. However, as recently pointed out in [5], the multilinear structure is lost in such matricization and, as a result, methods based on these techniques often lead to suboptimal results. A general approach to sparse recovery is through solving the following constrained optimization problem: \begin{eqnarray} \min \limits _{z}\ {\mathcal {S}}(z) \,\, {\rm subject\ to} \, y_k &=& \langle a_k, z\rangle ,\nonumber\\ k \, = \, 1, {\cdots} , m, \end{eqnarray} (2) where $${\mathcal {S}}(\cdot )$$ is an objective function that encourages sparse solutions. The success of this approach hinges upon several crucial aspects including, among others, how to choose the object function $${\mathcal {S}}(\cdot )$$; how to solve the optimization problem (2); how to design the sampling vectors ak to facilitate recovery; what is the minimum sample size requirement m to ensure recovery? There is, by now, an impressive literature addressing these issues when x is a vector or matrix; see e.g. [6–18], among numerous others. In this review, we aim to survey some of the key developments, with a particular focus on applications to image and video analysis. The rest of the paper is organized as follows. In the sections entitled ‘Recovery of sparse vectors’, ‘Recovery of low-rank matrices’ and ‘Recovery of low-rank higher-order tensors’, we discuss the recovery of sparse vector, low-rank matrix and low-rank tensor signals, respectively. Finally, a couple of illustrative examples in image and video processing are given in the section entitled ‘Applications’. RECOVERY OF SPARSE VECTORS We first consider recovering a sparse vector signal, a problem more commonly known as compressive sensing [10–13]. Compressive sensing of sparse signals With slight abuse of notation, let y be an m-dimensional vector whose coordinates are the measurement yk, and A be an m × n matrix whose rows are given by ak. It is then not hard to see that (1) can be more compactly written as y = Ax, where x is an n-dimensional vector. Following the jargon in compressive sensing, hereafter we shall refer to A as the sensing matrix. Obviously, when m < n, there may be infinitely many z that agree with the measurements in that y = Az. Since x is known a priori to be sparse, it is then naturally to seek among all these solutions the one that is sparsest. As mentioned before, an obvious way to measure sparsity of x is its ℓ0 norm: \begin{equation*} \Vert x\Vert _{\ell _0}=|\lbrace i: x_i\ne 0\rbrace |, \end{equation*} where |·| stands for the cardinality of a set, leading to the following recovering x by a solution to \begin{equation} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _{\ell _0} \quad {\rm subject\ to} \quad y=Az. \end{equation} (3) Under mild regularity conditions on the sensing matrix A, it can be shown that the solution to (3) is indeed well defined and unique, and thus correctly recovers x [9]. However, it is also well known [19] that solving (3) is NP-hard in general and thus infeasible to compute for even moderate-size problems. The most popular way to overcome this challenge is the ℓ1 relaxation, which minimizes the ℓ1 norm instead, leading to \begin{equation} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _{\ell _1} \quad {\rm subject\ to} \quad y=Az. \end{equation} (4) Assuming that x is k-sparse and the unique solution to (3), a key question pertaining to the ℓ1 relaxation (4) is: under what condition is it also the unique solution to (4)? The answer can be characterized by various properties of the sensing matrix including the mutual incoherence property (MIP [20]), null space property (NSP [21]) and restricted isometry property (RIP [9]), among others. We shall focus primarily on RIP here. Interested readers are referred to [22,23] and references therein for further discussions on MIP and NSP. Definition 2.1 [9] A sensing matrix $$A\in {\mathbb {R}}^{m\times n}$$ is said to satisfy the RIP of order k if there exists a constant δk ∈ [0, 1) such that, for every k-sparse vector $$z\in \mathbb {R}^n$$, \begin{equation} (1-\delta _{k}) \Vert {z}\Vert _{\ell _2}^2\le \Vert A{z}\Vert _{\ell _2}^2\le (1+\delta _{k})\Vert {z}\Vert _{\ell _2}^2. \end{equation} (5) Similarly, A is said to obey the restricted orthogonality property (ROP) of order (k, k΄) if there exists a constant $$\theta _{k,k^{\prime }}$$ such that, for every k-sparse vector z and k΄-sparse vector z΄ with non-overlapping support sets, \begin{equation} |\langle Az, A{z^{\prime }}\rangle |\le \theta _{k_1,k_2}\Vert z\Vert _{\ell _2}\Vert z^{\prime }\Vert _{\ell _2}. \end{equation} (6) The constants δk and $$\theta _{k, k^{\prime }}$$ are called the k-restricted isometry constant (RIC) and (k, k΄)-restricted orthogonality constant (ROC), respectively. The concept was first introduced by Candès and Tao [9], who showed that, if δk + θk, k + θk, 2k < 1, then x is the unique solution to (4). This condition has since been weakened. For example, Candès and Tao [13] showed that δ2k + 3θk, 2k < 2 suffices, and Cai et al. [24] required that δ1.25k + θk, 1.25k < 1. More recently, Cai and Zhang [25] further weakened the condition to δk + θk, k < 1 and showed that the upper bound 1 is sharp in the sense that, for any ε > 0, the condition δk + θk, k < 1 + ε is not sufficient to guarantee exact recovery. Sufficient conditions for exact recovery by (4) that only involve RIC have also been investigated in the literature. For example, [26] argued that $$\delta _{2k}<\sqrt{2}-1$$ implies that x is the unique solution to (4). This was improved to δ2k < 0.472 in [27]. More recently, [28] showed that, for any given constant t ≥ 4/3, the condition $$\delta _{tk}<\sqrt{(t-1)/t}$$ guarantees the exact recovery of all k-sparse signals by ℓ1 minimization. Moreover, for any ε > 0, $$\delta _{tk}<\sqrt{(t-1)/t}+\epsilon$$ is not sufficient to ensure exact recovery of all k-sparse signals for large k. An immediate question following these results is how to design a sensing matrix that satisfies these conditions so that we can use (4) to recover x. It is now well understood that, for many random ensembles, that is, where each entry of A is independently sampled from a common distribution such as a Gaussian, Rademacher or other sub-Gaussian distribution, δk < ε with overwhelming probability, provided that m ≥ Cε−2k log (n/k), for some constant C > 0. There has also been some recent progress in constructing deterministic sensing matrices that satisfy these RIP conditions; see e.g. [29–31]. Compressive sensing of block-sparse signals In many applications, the signal of interest may have more structured sparsity patterns. The most common example is so-called block-sparsity where sparsity occurs in a blockwise fashion rather than at the individual coordinate level. More specifically, let $${x}\in \mathbb {R}^n$$ be the concatenation of b signal ‘blocks’: \begin{equation} {x}\!=\![\underbrace{x_1\cdots x_{n_1}}_{{x}[1]}\underbrace{x_{n_1+1}\cdots x_{n_1+n_2}}_{x[2]}\cdots \underbrace{x_{n-n_b+1}\cdots x_n}_{{x}[b]}]^\top \!, \end{equation} (7) where each signal ‘block’ x[i] is of length ni. We assume that x is block k-sparse in that there are at most k nonzero blocks among x[i]. As before, we are interested in the most block-sparse signal that satisfies y = Az. To circumvent the potential computational challenge, the following relaxation is often employed: \begin{equation} \min _{z\in \mathbb {R}^n}\Vert {z}\Vert _{\ell _2/\ell _1} \quad {\rm subject \ to} \ {y}=A{z}, \end{equation} (8) where the mixed ℓ2/ℓ1 norm is defined as \begin{equation*} \Vert z\Vert _{\ell _2/\ell _1}=\sum _{i=1}^b\Vert {z}[i]\Vert _{\ell _2}. \end{equation*} It is not hard to see that, when each block has size ni = 1, (8) reduces to the ℓ1 minimization given by (4). More generally, the optimization problem in (8) is convex and can be recast as a second-order cone program, and thus can be solved efficiently. One can also extend the notion of RIP and ROP to the block-sparse setting; see e.g. [32,33]. For brevity, write $$\mathcal {I}=\lbrace n_1,\ldots ,n_b\rbrace$$. Definition 2.2. A sensing matrix A is said to satisfy the block-RIP of order k if there exists a constant $$\delta _{k|\mathcal {I}}\in [0,1)$$ such that, for every block k-sparse vector $$z\in \mathbb {R}^n$$, \begin{equation} (1-\delta _{k|\mathcal {I}})\Vert {z}\Vert _{\ell _2}^2\le \Vert A{z}\Vert _{\ell _2}^2\le (1+\delta _{k|\mathcal {I}})\Vert {z}\Vert _{\ell _2}^2. \end{equation} (9) Similarly, A is said to obey the block-ROP of order (k, k΄) if there exists a constant $$\theta _{k,k^{\prime }|\mathcal {I}}$$ such that, for every block k-sparse vector z and block k΄-sparse vector z΄ with disjoint supports, \begin{equation} |\langle Az, A{z^{\prime }}\rangle |\le \theta _{k_1,k_2|\mathcal {I}}\Vert z\Vert _{\ell _2}\Vert z^{\prime }\Vert _{\ell _2}. \end{equation} (10) The constants $$\delta _{k|\mathcal {I}}$$ and $$\theta _{k, k^{\prime }|\mathcal {I}}$$ are referred to as block k-RIC and block (k, k΄)-ROC, respectively. Clearly, any sufficient RIP conditions of standard ℓ1 minimization can be naturally extended to the setting of block-sparse recovery via mixed ℓ2/ℓ1 minimization so that, for example, $$\theta _{k,k|\mathcal {I}}+\delta _{k|\mathcal {I}}<1$$ is also a sufficient condition for x to be the unique solution to (8). Nonconvex methods In addition to the ℓ1-minimization-based approach, there is also an extensive literature on nonconvex methods for sparse recovery where instead of the ℓ1 norm of z one minimizes a nonconvex objective function in z. The most notable example is the ℓq (0 < q < 1) (quasi-)norm, leading to \begin{equation} \min _{z\in \mathbb {R}^n}\Vert {z}\Vert _{\ell _q} \quad {\rm subject \ to} \ {y}=Az. \end{equation} (11) Some recent studies, e.g. [34–36], have shown that the solution of (11) can recover a sparse signal based on much fewer measurements when compared with the ℓ1 minimization (4). In particular, the case of q = 1/2 has been treated extensively in [37–39]. Other notable examples of nonconvex objective functions include smoothly clipped absolute deviation (SCAD) [40] and the minimax concave plus function [41], among others. Compressive sensing with general dictionaries Thus far, we have focused on sparsity with respect to the canonical basis of $$\mathbb {R}^n$$. In many applications, it might be more appropriate to have sparsity with respect to more general dictionaries; see e.g. [42–46]. More specifically, a signal x is represented by x = Dα with respect to a dictionary $$D\in \mathbb {R}^{n\times n^{\prime }}$$, where $${\alpha }\in \mathbb {R}^{n^{\prime }}$$ is the coordinate in the dictionary and is known a priori to be sparse or nearly sparse. One can obviously treat A΄ = AD as the new sensing matrix and apply any of the aforementioned methods to exploit the sparsity of α. One of the drawbacks, however, is that nice properties of the original sensing matrix A may not be inherited by A΄. In other words, even if one carefully designs a sensing matrix A, exact recovery of x may not be guaranteed despite its sparsity with respect to D. Alternatively, some have advocated reconstructing x by the solution to the following optimization problem: \begin{equation} \min \limits _{{z}\in \mathbb {R}^{n}}\Vert {D^*z}\Vert _{\ell _1} \quad {\rm subject\ to} \quad y=Az; \end{equation} (12) see e.g. [43,44,47,48]. Compressive phase retrieval In the previous subsections, we have mainly discussed the problem of recovering a sparse signal from a small number of linear measurements. However, in some practical scenarios one can only observe some nonlinear measurements of the original signal. The typical example is the so-called compressive phase retrieval problem. In such a scenario, we observe the magnitude of the Fourier coefficients instead of their phase, which can be modeled as the form \begin{equation} y_k = |\langle a_k, x\rangle |, \quad k=1,\cdots , m. \end{equation} (13) For recovering x, several studies [49–51] have considered the following ℓ1 minimization: \begin{eqnarray} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _1 \,\, {\rm subject\ to} \, y_k &=& |\langle a_k, z\rangle |,\nonumber\\ k&=&1,\cdots , m. \end{eqnarray} (14) By introducing a strong notion of RIP, Voroninski and Xu [51] built a parallel result for compressive phase retrieval with the classical compressive sensing. Specifically, they proved that a k-sparse signal x can be recovered from m = O(k log (n/k)) random Gaussian phaseless measurements by solving (14). Unlike the standard convex ℓ1 minimization (4), the problem (14) is a nonconvex problem, but some efficient algorithms have been developed to compute it; see e.g. [50,52]. RECOVERY OF LOW-RANK MATRICES We now consider recovering a low-rank matrix, a problem often referred to as matrix completion. Matrix completion via nuclear norm minimization In many practical situations such as collaborative filtering, system identification and remote sensing, to name a few, the signal that we aim to recover oftentimes is a matrix rather than a vector. To signify this fact and distinguish from the vector case treated in the last section, we shall write the underlying signal as a capital letter, $$X\in \mathbb {R}^{n_1\times n_2}$$, throughout this section. In these applications, we often observe a small fraction of the entries of X. The task of matrix completion is then to ‘complete’ the remaining entries. Formally, let Ω be a subset of [n1] × [n2] where [n] = {1, …, n}. The goal of matrix completion is to recover X based on {Xij: (i, j) ∈ Ω}, particularly with the sample size |Ω| much smaller than the total number n1n2 of entries. To fix ideas, we shall assume that Ω is a uniformly sampled subset of [n1] × [n2], although other sampling schemes have also been investigated in the literature, e.g. [53–55]. Obviously, we cannot complete an arbitrary matrix from a subset of its entries. But it is possible for low-rank matrices as their degrees of freedom are much smaller than n1n2. But low rankness alone is not sufficient. Consider a matrix with a single nonzero entry; it is of rank one but it is impossible to complete it unless the nonzero entry is observed. A formal way to characterize low-rank matrices that can be completed from {Xij: (i, j) ∈ Ω} was first introduced in [14]. Definition 3.1. Let U be a subspace of $$\mathbb {R}^n$$ of dimension r and $$\boldsymbol {P}_U$$ be the orthogonal projection onto U. Then the coherence of U (with respect to the standard basis ($$\boldsymbol {e}_i$$)) is defined to be \begin{equation*} \mu (U)\equiv \frac{n}{r}\max _{1\le i\le n}\Vert \boldsymbol {P}_U\boldsymbol {e}_i\Vert ^2. \end{equation*} It is clear that the smallest possible value for μ(U) is 1 and the largest possible value for μ(U) is n/r. Let M be an n1 × n2 matrix of rank r and with column and row spaces denoted by U and V, respectively. We shall say that M satisfies the incoherence condition with parameter μ0 if max (μ(U), μ(V)) ≤ μ0. Now let X be an incoherent matrix of rank r. In a similar spirit as the vector case, a natural way to reconstruct it from {Xij: (i, j) ∈ Ω} is to seek, among all matrices whose entries indexed by Ω agree with our observations, the one with the smallest rank: \begin{eqnarray} &&\min _{Z\in \mathbb {R}^{n_1\times n_2}}\ {\rm rank}(Z) \quad {\rm subject \ to} \, Z_{ij}= X_{ij}, \nonumber\\ &&\forall (i, j)\in \Omega . \end{eqnarray} (15) Again, to overcome the computational challenges in directly minimizing matrix ranks, the following convex program is commonly suggested: \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _* \,\, {\rm subject \ to} \, Z_{ij}= X_{ij}, \ (i, j)\in \Omega , \end{equation} (16) where the nuclear norm ‖·‖* is the sum of all singular values. As before, we are interested in when the solution to (16) is unique and correctly recovers x. Candès and Recht [14] were the first to show that this is indeed the case, for almost all Ω that are large enough. These results are probabilistic in nature due to the randomness of Ω, that is, one can correctly recover x using (16) with high probability regarding min {n1, n2}. The sample size requirement to ensure exact recovery of X by the solution of (16) was later improved in [16] to |Ω| ≥ Cr(n1 + n2) · polylog(n1 + n2), where C is a constant that depends on the coherence coefficients only. It is clear to see that this requirement is (nearly) optimal in that there are O(r(n1 + n2)) free parameters in specifying a rank-r matrix. Matrix completion from affine measurements More generally, one may consider recovering a low-rank matrix based on affine measurements. More specifically, let $$\mathcal {A}: \mathbb {R}^{n_1\times n_2}\rightarrow \mathbb {R}^m$$ be a linear map such that $$\mathcal {A}(X)=y$$. We aim to recover X based on the information that $$\mathcal {A}(X)=y$$. It is clear that the canonical matrix completion problem discussed in the previous subsection corresponds to the case when $$\mathcal {A}(X)=\lbrace X_{ij}: (i,j)\in \Omega \rbrace$$. Similarly, we can proceed to reconstruct X by the solution to \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _* \quad {\rm subject \ to }\quad y=\mathcal {A}({Z}). \end{equation} (17) It is of interest to know under which kind of sensing operator $$\mathcal {A}$$ can X be recovered in exactly this way. An answer is given in [56], which extends the concept of RIP to general linear operator for matrices. Definition 3.2. A linear operator $$\mathcal {A} : \mathbb {R}^{n_1\times n_2}\rightarrow \mathbb {R}^m$$ is said to satisfy the matrix RIP of order r if there exists a constant $$\delta _{r}^Z$$ such that \begin{equation} \left(1-\delta _r^Z\right)\Vert Z\Vert _{\rm F}\le \Vert \mathcal {A}(Z)\Vert _{\ell _2}\le \left(1+\delta _r^Z\right)\Vert Z\Vert _{\rm F} \end{equation} (18) holds for all matrices $$Z\in \mathbb {R}^{n_1\times n_2}$$ of rank at most r. Recht et al. [56] further proved that, if $$\mathcal {A}$$ satisfies the matrix RIP (18) with $$\delta _{5r}^M<1/10$$, then one can recover a rank-r matrix X from n = O(r(n1 + n2) log (n1n2)) measurements by the solution to (17). Note that the condition $$\delta _{5r}^M<1/10$$ has been dramatically weakened; see e.g. [28,57]. Besides the aforementioned low-rank matrix completion problems, some recent studies, e.g. [58–60], have drawn attention to the so-called high-rank matrix completion, in which the columns of the matrix belong to a union of subspaces and, as a result, the rank can be high or even full. Mixed sparsity In some applications, the matrix that we want to recover is not necessarily of low rank but differs from a low-rank matrix only by a small number of entries; see e.g. [17,18]. In other words, we may write X = S + L, where S is a sparse matrix with only a small number of nonzero entries while L is a matrix of low rank. It is clear that, even if we observe X entirely, the decomposition of X into a sparse component S and a low-rank component L may not be uniquely defined. General conditions under which such a decomposition is indeed unique are provided in, for example, [18]. In the light of the previous discussions, it is natural to consider reconstructing X from observations {Xij: (i, j) ∈ Ω} by the solution to \begin{eqnarray} &&\min _{Z_1,Z_2\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z_1\Vert _*+\lambda \Vert {\rm vec}(Z_2)\Vert _{\ell _1} \nonumber\\ &&{\rm subject \ to} \, (Z_1+Z_2)_{ij}=X_{ij},\,\, \forall (i,j)\in \Omega .\nonumber\\ \end{eqnarray} (19) This strategy has been investigated extensively in the literature; see e.g. [17]. Further developments in this direction can also be found in [61–64], among others. Nonconvex methods Just as ℓ1 norm is a convex relaxation of the ℓ0 norm, the nuclear norm is also a convex relaxation of the rank for a matrix. In addition to these convex approaches, nonconvex methods have also been proposed by numerous authors. The most common example is the Schatten-q (0 < q < 1) (quasi-)norm defined by \begin{equation} \Vert X\Vert _{S_q}=\left(\sum _{i=1}^{\min \lbrace n_1,n_2\rbrace }\sigma _i^q\right)^{1/q}, \end{equation} (20) where σ1, σ2, ⋅⋅⋅ are the singular values of X. It is clear that, when q → 0, ‖X‖sq → rank(X) while the nuclear norm corresponds to the case when q = 1. One may now consider reconstructing X by the solution to \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _{S_q} \quad {\rm subject \ to }\quad y=\mathcal {A}({Z}); \end{equation} (21) see e.g. [65–68], among others. RECOVERY OF LOW-RANK HIGHER-ORDER TENSORS In an increasing number of modern applications, the object to be estimated has a higher-order tensor structure. Typical examples include video inpainting [69], scan completion [70], multichannel EEG (electroencephalogram) compression [71], traffic data analysis [72] and hyperspectral image restoration [73], among many others. Similar to matrices, in many of these applications, we are interested in recovering a low-rank tensor either from observing a subset of its entries or a collection of affine measurements. Despite the apparent similarities between matrices and higher-order tensors, it is delicate to extend the idea behind nuclear norm minimization to the latter because matrix-style singular value decomposition does not exist for higher-order tensors. A common approach is to first unfold a higher-order tensor to a matrix and then apply a matrix-based approach to recover a tensor. Consider, for example, a third-order tensor $$\mathcal {X}\in \mathbb {R}^{d_1\times d_2\times d_3}$$. We can collapse its second and third indices, leading to a d1 × (d2d3) matrix X(1). If $$\mathcal {X}$$ is of low rank, then so is X(1). We can exploit the low rankness of X(1) by minimizing its nuclear norm. Clearly we can also collapse the first and third indices of $$\mathcal {X}$$, leading to a matrix X(2), and the first and second, leading to X(3). If we want to complete a low-rank tensor $$\mathcal {X}$$ based on its entries $$\mathcal {X}(\omega )$$ for ω ∈ Ω⊂[d1] × [d2] × [d3], we can then consider recovering $$\mathcal {X}$$ by the solution to the following convex program: \begin{eqnarray} &&\min _{\mathcal {Z}\in \mathbb {R}^{d_1\times d_2\times d_3}}\sum _{j=1}^3\Vert {{Z}}_{(j)}\Vert _* \nonumber\\ &&{\rm subject \ to}\, \mathcal {Z}(\omega )= \mathcal {X}(\omega ), \ \forall \omega \in \Omega . \end{eqnarray} (22) Efficient algorithms for solving matrix nuclear norm minimization (16) such as the alternating direction method of multipliers (ADMM) and Douglas–Rachford operator splitting methods can then be readily adapted to solve (22); see e.g. [3,4]. As pointed out in [5], however, such an approach fails to exploit fully the multilinear structure of a tensor and is thus suboptimal. Instead, directly minimizing the tensor nuclear norm was suggested: \begin{eqnarray} &&\min _{\mathcal {Z}\in \mathbb {R}^{d_1\times d_2\times d_3}}\Vert \mathcal {Z}\Vert _*\nonumber\\ &&{\rm subject \ to}\, \mathcal {Z}(\omega )= \mathcal {X}(\omega ), \ \forall \omega \in \Omega , \end{eqnarray} (23) where the tensor nuclear norm ‖·‖* is defined as the dual norm of the tensor spectral norm ‖·‖. For more discussions on the tensor nuclear and spectral norms, please see [74]. Unfortunately, unlike the matrix nuclear norm, computing the tensor nuclear norm, and thereby the problem (23), is NP-hard. Hence, various relaxations and approximate algorithms have been introduced in the literature; see e.g. [75–79], and references therein. This is a research area in its infancy and many interesting issues need to be addressed. APPLICATIONS In the previous sections, we have given an overview of some basic ideas and techniques in dealing with sparsity in vectors and low rankness in matrices and tensors. We now give a couple of examples to illustrate how they can be used in action. Background subtraction with compressive imaging Background subtraction in image and video has attracted a lot of attention in the past couple of decades. It aims at simultaneously separating video background and extracting the moving objects from a video stream, and can provide important clues for various applications such as moving object detection [80] and object tracking in surveillance [81], among numerous others. Conventional background subtraction techniques usually consist of four steps: video acquisition, encoding, decoding and separating the moving objects from the background. This scheme needs to fully sample the video frames with large computational and storage requirements, followed by well-designed video coding and background subtraction algorithms. To alleviate the burden of computation and storage, a newly-developed compressive imaging scheme [82–84] has been used for background subtraction by combining the video acquisition, coding and background subtraction into a single framework, as illustrated in Fig. 1. It simultaneously achieves background subtraction and video reconstruction. In this setting, the main objective is then to maximize the reconstruction and separation accuracies using as few compressive measurements as possible. Figure 1. View largeDownload slide The framework of background subtraction with compressive imaging. Figure 1. View largeDownload slide The framework of background subtraction with compressive imaging. Several studies have been carried out for background subtraction from the perspective of compressive imaging. In the seminal work of Cevher et al. [85], the background subtraction problem is formulated as a sparse recovery problem. They showed that the moving objects can be recovered by learning a low-dimensional compressed representation of the background image. More recently, the robust principle component analysis (RPCA) approach has also been used to deal with the problem of background subtraction with compressive imaging, in which they commonly model the video as a matrix with columns of vectorized video frames and then decompose the matrix into a low-rank matrix L and a sparse matrix S; see e.g. [86–89]. Although methods based on RPCA have achieved satisfactory performance, they fail to exploit the finer structures of background and foreground after vectorizing the video frames. It appears more advantageous to model the spatio-temporal information of background and foreground using direct tensor representation of a video. To this end, a novel tensor RPCA approach has been proposed in [90] for background subtraction from compressive measurements by decomposing the video into a static background with spatio-temporal correlation and a moving foreground with spatio-temporal continuity within a tensor representation framework. More specifically, one can use 3D total variation (3D-TV) to characterize the spatio-temporal continuity underlying the video foreground, and low-rank Tucker decomposition to model the spatio-temporal correlation of the video background, which leads to the following tensor RPCA model: \begin{eqnarray} {\min _{{\begin{array} {c} \mathcal {X}, \mathcal {S},\mathcal {E}, \\ \mathcal {G}, \mathbf {U}_{1},\mathbf {U}_{2},\mathbf {U}_{3} \end{array}}} \lambda \Vert \mathcal {S}\Vert _{\text{3D-TV}} + \frac{1}{2}\Vert \mathcal {E}\Vert _{\text{F}}^{2}} \nonumber\\ {\rm subject \ to}\,\mathcal {X} &=& \mathcal {L} + \mathcal {E} + \mathcal {S}, \nonumber\\ \mathcal {L} &=& \mathcal {G} \times _1 \mathbf {U}_1 \times _2 \mathbf {U}_2 \times _3 \mathbf {U}_3, \nonumber\\ y &=& \mathcal {A}(\mathcal {X}), \end{eqnarray} (24) where the factor matrices U1 and U2 are orthogonal in columns for two spatial modes, the factor matrix U3 is orthogonal in columns for the temporal mode, the core tensor $$\mathcal {G}$$ interacts with these factors and the 3D-TV term ‖ · ‖3D-TV is defined as \begin{eqnarray*} \Vert \mathcal {X}\Vert _{\text{3D-TV}}: &=& \Vert \mathcal {X}_{h}(i,j,k)\Vert _1\nonumber\\ &&+\Vert \mathcal {X}_{v}(i,j,k)\Vert _1+\Vert \mathcal {X}_{t}(i,j,k)\Vert _1, \end{eqnarray*} where \begin{eqnarray*} \mathcal {X}_{h}(i,j,k) : &=& \mathcal {X}(i,j+1,k) - \mathcal {X}(i,j,k), \nonumber\\ \mathcal {X}_{v}(i,j,k) : &=& \mathcal {X}(i+1,j,k) - \mathcal {X}(i,j,k), \nonumber\\ \mathcal {X}_{t}(i,j,k) : &=& \mathcal {X}(i,j,k+1) - \mathcal {X}(i,j,k). \end{eqnarray*} Because a 3D patch in a video background is similar to many other 3D patches over the video frames, one can model the video background using several groups of similar video 3D patches, where each patch group corresponds to a fourth-order tensor. Integrating the patch-based modeling idea into (24), one can easily get a patch-group-based tensor RPCA model. It should be noted that solving the nonconvex tensor RPCA model (24) as well as its patch-group-based form are computationally difficult. In practice, we can find a local solution using a multi-block version of ADMM. For more details, please refer to Section V of [90]. Fig. 2 gives an example based on three real videos. It is evident that the proposed tensor models enjoy a superior performance over the other popular matrix models both in terms of the quality of the reconstructed videos and in terms of the separation of the moving objects. This suggests that using a direct tensor modeling technique to deal with practical higher-order tensor data can utilize more useful structures. Figure 2. View largeDownload slide Visual comparison of two tensor models (i.e. H-TenRPCA and PG-TenRPCA) proposed in [90] and three popular matrix models (i.e. SparCS [86], ReProcs [87] and SpLR [88]) under the sampling ratio 1/30. The first column shows the original video frames from different video volumes (a)–(c); the second to sixth columns correspond to the results produced by all the compared methods, respectively. Here, for each method, the reconstruction result of the original video frame (upper panels) and the detection result of moving objects in the foreground (lower panels) is shown. Figure 2. View largeDownload slide Visual comparison of two tensor models (i.e. H-TenRPCA and PG-TenRPCA) proposed in [90] and three popular matrix models (i.e. SparCS [86], ReProcs [87] and SpLR [88]) under the sampling ratio 1/30. The first column shows the original video frames from different video volumes (a)–(c); the second to sixth columns correspond to the results produced by all the compared methods, respectively. Here, for each method, the reconstruction result of the original video frame (upper panels) and the detection result of moving objects in the foreground (lower panels) is shown. Hyperspectral compressive sensing Hyperspectral imaging employs an imaging spectrometer to collect hundreds of spectral bands ranging from ultraviolet to infrared wavelengths for the same area on the surface of the Earth. It has a wide range of applications including environmental monitoring, military surveillance and mineral exploration, among numerous others [91,92]. Figuratively speaking, a hyperspectral image can be treated as a 3D (x, y, λ) data cube, where x and y represent two spatial dimensions of the scene, and λ represents the spectral dimension comprising a range of wavelengths. Typically, such hyperspectral cubes are collected by an airborne sensor or a satellite and sent to a ground station on Earth for subsequent processing. Noting that the dimension of λ is usually in the hundreds, hyperspectral cubes are of a fairly large size even for moderate dimensions of x and y. This makes it necessary to devise effective techniques for hyperspectral data compression, due to the limited bandwidth of the link connection between the satellite/aerospace and the ground station. In the last few years, a popular hyperspectral compressive sensing (HCS) scheme based on the principle of compressive sensing for hyperspectral data compression, as illustrated in Fig. 3, has been extensively investigated. Similar to other compressive sensing problems, the main objectives of HCS are to design easy hardware encoding implementation and develop an efficient sparse reconstruction procedure. In what follows we shall focus primarily on the latter. For hardware implementation, interested readers are referred to [93,94] and references therein. Figure 3. View largeDownload slide The framework of hyperspectral compressive sensing. Figure 3. View largeDownload slide The framework of hyperspectral compressive sensing. Like other natural images, hyperspectral images can be sparsified by using certain transformations. Traditional sparse recovery methods such as ℓ1 minimization and TV minimization are often used for such purposes; see e.g. [95,96]. To further exploit the inherent spectral correlation of hyperspectral images, a series of work based on low-rank modeling has been carried out in recent years. For example, Golbabaee and Vandergheynst proposed in [97] a joint nuclear and ℓ2/ℓ1 minimization method to describe the spectral correlation and the joint-sparse spatial wavelet representations of hyperspectral images. Then, they modeled the spectral correlation together with the spatial piecewise smoothness of hyperspectral images by using a joint nuclear and TV norm minimization method [98]. Similar to the surveillance videos, however, modeling a hyperspectral cube as a matrix cannot utilize the finer spatial-and-spectral information, leading to suboptimal reconstruction results under relatively low sampling ratios (e.g. 1% of the whole image size). To further exploit the compressibility underlying a hyperspectral cube, one may consider such a single cube as a tensor with three modes (width, height and band) and then identify the hidden spatial-and-spectral structures using direct tensor modeling techniques. Precisely, all the bands of a hyperspectral image have very strong correlation in the spectral domain, and each band, if considered as a matrix, has relatively strong correlation in the spatial domain; such spatial-and-spectral correlation can be modeled through low-rank Tucker decomposition. In addition, the intensity at each voxel is likely to be similar to its neighbors, which can be characterized by smoothness using the so-called 3D-TV penalty. In summary, [99] considered the following joint tensor Tucker decomposition and 3D-TV minimization model: \begin{eqnarray} {\min _{{\begin{array}{c}\mathcal {X}, \mathcal {E},\mathcal {G}, \nonumber\\ \mathbf {U}_{1},\mathbf {U}_{2},\mathbf {U}_{3}\\ \end{array}}} \lambda \Vert \mathcal {X}\Vert _{\text{3D-TV}} + \frac{1}{2}\Vert \mathcal {E}\Vert _{\text{F}}^{2}} \nonumber\\ {\rm subject \ to}\,\mathcal {X}&=&\mathcal {G} \times _1 \mathbf {U}_1 \times _2 \mathbf {U}_2 \times _3 \mathbf {U}_3+\mathcal {E}, \nonumber\\ y &=& \mathcal {A}(\mathcal {X}). \end{eqnarray} (25) It is clear that the above minimization problem is highly nonconvex. One often looks for good local solutions using a multi-block ADMM algorithm. Fig. 4 gives an example of the first band of four hyperspectral datasets, respectively reconstructed by different methods, with the sampling ratio at 1%. It is evident that the proposed tensor method could provide nearly perfect reconstruction. In addition, sparse tensor and nonlinear compressive sensing (ST-NCS) performs slightly better than Kronecker compressive sensing (KCS) and joint nuclear/TV norm minimization (JNTV) in terms of reconstruction accuracy, because of the use of a direct tensor sparse representation of a hyperspectral cube. Both these findings demonstrate the power of tensor modeling techniques. It is also worth noting that, compared with ST-NCS, the images reconstructed with the method (25) are clearer and sharper. Figure 4. View largeDownload slide Vision comparison of the tensor method (25) over three other competing methods on the first band of four different hyperspectral datasets (a)–(d). Here the sampling ratio is 1%. The last column shows the original image bands. The columns from the first to the fourth correspond to the produced results from the KCS method [96], the JNTV method [98], the ST-NCS method [100] and the tensor method (25), respectively. Figure 4. View largeDownload slide Vision comparison of the tensor method (25) over three other competing methods on the first band of four different hyperspectral datasets (a)–(d). Here the sampling ratio is 1%. The last column shows the original image bands. The columns from the first to the fourth correspond to the produced results from the KCS method [96], the JNTV method [98], the ST-NCS method [100] and the tensor method (25), respectively. Acknowledgements The authors thank the associate editor and three referees for helpful comments. FUNDING This work was supported in part by the National Natural Science Foundation of China (11501440 and 61273020 to Y.W., 61373114, 61661166011 and 61721002 to D.Y.M.), the National Basic Research Program of China (973 Program) (2013CB329404 to D.Y.M.) and the National Science Foundation (DMS-1265202 to M.Y.). REFERENCES 1. Liu J , Musialski P , Wonka P et al. Tensor completion for estimating missing values in visual data . In: Proceedings of International Conference on Computer Vision , 2009 . 2. Tomioka R , Hayashi K , Kashima H . Estimation of low-rank tensors via convex optimization. arXiv:10100789. 3. Gandy S , Recht B , Yamada I . Tensor completion and low-n-rank tensor recovery via convex optimization . Inverse Probl 2011 ; 27 : 025010 . Google Scholar Crossref Search ADS 4. Liu J , Musialski P , Wonka P . Tensor completion for estimating missing values in visual data . IEEE Trans Pattern Anal Mach Intell 2013 ; 34 : 208 – 20 . Google Scholar Crossref Search ADS 5. Yuan M , Zhang CH . On tensor completion via nuclear norm minimization . Found Comput Math 2016 ; 16 : 1031 – 68 . Google Scholar Crossref Search ADS 6. Chen SS , Donoho DL , Saunders MA . Atomic decomposition by basis pursuit . SIAM J Sci Comput 1998 ; 20 : 33 – 61 . Google Scholar Crossref Search ADS 7. Donoho DL , Huo X . Uncertainty principles and ideal atomic decomposition . IEEE Trans Inform Theor 2001 ; 47 : 2845 – 62 . Google Scholar Crossref Search ADS 8. Donoho DL , Elad M . Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization . Proc Natl Acad Sci 2003 ; 100 : 2197 – 202 . Google Scholar Crossref Search ADS PubMed 9. Candès E , Tao T . Decoding by linear programming . IEEE Trans Inform Theor 2005 ; 51 : 4203 – 15 . Google Scholar Crossref Search ADS 10. Candès E , Romberg J , Tao T . Robust uncertainty principles: exact vector reconstruction from highly incomplete frequency information . IEEE Trans Inform Theor 2006 ; 52 : 489 – 509 . Google Scholar Crossref Search ADS 11. Donoho D . Compressed sensing . IEEE Trans Inform Theor 2006 ; 52 : 1289 – 306 . Google Scholar Crossref Search ADS 12. Candès E , Tao T . Near-optimal signal recovery from random projections: universal encoding strategies . IEEE Trans Inform Theor 2006 ; 52 : 5406 – 25 . Google Scholar Crossref Search ADS 13. Candès E , Romberg J , Tao T . Stable signal recovery from incomplete and inaccurate measurements . Comm Pure Appl Math 2006 ; 59 : 1207 – 23 . Google Scholar Crossref Search ADS 14. Candès E , Recht B . Exact matrix completion via convex optimization . Found Comput Math 2009 ; 9 : 717 – 72 . Google Scholar Crossref Search ADS 15. Candès E , Tao T . The power of convex relaxation: near-optimal matrix completion . IEEE Trans Inform Theor 2010 ; 56 : 2053 – 80 . Google Scholar Crossref Search ADS 16. Gross D . Recovering low-rank matrices from few coefficients in any basis . IEEE Trans Inform Theor 2011 ; 57 : 1548 – 66 . Google Scholar Crossref Search ADS 17. Candès E , Li X , Ma Y et al. Robust principal component analysis? J ACM 2011 ; 58 : 1 – 39 . Google Scholar Crossref Search ADS 18. Chandrasekaran V , Sanghavi S , Parrilo P et al. Rank-sparsity incoherence for matrix decomposition . SIAM J Optim 2011 ; 21 : 572 – 96 . Google Scholar Crossref Search ADS 19. Natarajan B . Sparse approximate solutions to linear systems . SIAM J Comput 1995 ; 24 : 227 – 34 . Google Scholar Crossref Search ADS 20. Donoho D , Elad M , Temlyakov VN . Stable recovery of sparse overcomplete representations in the presence of noise . IEEE Trans Inform Theor 2006 ; 52 : 6 – 18 . Google Scholar Crossref Search ADS 21. Cohen A , Dahmen W , DeVore R . Compressed sensing and best k-term approximation . J Am Math Soc 2009 ; 22 : 211 – 31 . Google Scholar Crossref Search ADS 22. Eldar Y , Kutyniok G . Compressed Sensing: Theory and Applications . Cambridge : Cambridge University Press , 2012 . 23. Foucart S , Rauhut H . A Mathematical Introduction to Compressive Sensing . Berlin : Springer , 2013 . 24. Cai TT , Wang L , Xu G . New bounds for restricted isometry constants . IEEE Trans Inform Theor 2010 ; 56 : 4388 – 94 . Google Scholar Crossref Search ADS 25. Cai TT , Zhang A . Compressed sensing and affine rank minimization under restricted isometry . IEEE Trans Signal Process 2013 ; 61 : 3279 – 90 . Google Scholar Crossref Search ADS 26. Candès E . The restricted isometry property and its implications for compressed sensing . Compt Rendus Math 2008 ; 346 : 589 – 92 . Google Scholar Crossref Search ADS 27. Cai TT , Wang L , Xu G . Shifting inequality and recovery of sparse signals . IEEE Trans Signal Process 2010 ; 58 : 1300 – 8 . Google Scholar Crossref Search ADS 28. Cai TT , Zhang A . Sparse representation of a polytope and recovery of sparse signals and low-rank matrices . IEEE Trans Inform Theor 2014 ; 60 : 122 – 32 . Google Scholar Crossref Search ADS 29. DeVore R . Deterministic constructions of compressed sensing matrices . J Complex 2007 ; 23 : 918 – 25 . Google Scholar Crossref Search ADS 30. Bourgain J , Dilworth SJ , Ford K et al. Explicit constructions of RIP matrices and related problems . Duke Math J 2011 ; 159 : 145 – 85 . Google Scholar Crossref Search ADS 31. Xu Z . Deterministic sampling of sparse trigonometric polynomials . J Complex 2011 ; 27 : 133 – 40 . Google Scholar Crossref Search ADS 32. Eldar Y , Mishali M . Robust recovery of signals from a structured union of subspaces . IEEE Trans Inform Theor 2009 ; 55 : 5302 – 16 . Google Scholar Crossref Search ADS 33. Wang Y , Wang J , Xu Z . On recovery of block-sparse signals via mixed ℓ2/ℓq(0 < q ≤ 1) norm minimization . EURASIP J Adv Signal Process 2013 ; 76 : 1 – 17 . 34. Chartrand R . Exact reconstruction of sparse signals via nonconvex minimization . IEEE Signal Process Lett 2007 ; 14 : 707 – 10 . Google Scholar Crossref Search ADS 35. Sun Q . Recovery of sparsest signals via ℓq-minimization . Appl Comput Harmon Anal 2012 ; 32 : 329 – 41 . Google Scholar Crossref Search ADS 36. Song CB , Xia ST . Sparse signal recovery by ℓq minimization under restricted isometry property . IEEE Signal Process Lett 2014 ; 21 : 1154 – 8 . Google Scholar Crossref Search ADS 37. Xu Z , Zhang H , Wang Y et al. ℓ1/2 regularization . Sci China Inform Sci 2010 ; 53 : 1159 – 69 . Google Scholar Crossref Search ADS 38. Xu Z , Chang X , Xu F et al. ℓ1/2 regularization: a thresholding representation theory and a fast solver . IEEE Trans Neural Network Learn Syst 2012 ; 23 : 1013 – 27 . Google Scholar Crossref Search ADS 39. Zeng J , Lin S , Wang Y et al. ℓ1/2 regularization: convergence of iterative half thresholding algorithm . IEEE Trans Signal Process 2014 ; 62 : 2317 – 29 . Google Scholar Crossref Search ADS 40. Fan J , Li R . Variable selection via nonconcave penalized likelihood and its oracle properties . J Am Stat Assoc 2001 ; 96 : 1348 – 60 . Google Scholar Crossref Search ADS 41. Zhang CH . Nearly unbiased variable selection under minimax concave penalty . Ann Stat 2010 ; 38 : 894 – 942 . Google Scholar Crossref Search ADS 42. Rauhut H , Schnass K , Vandergheynst P . Compressed sensing and redundant dictionaries . IEEE Trans Inform Theor 2013 ; 29 : 1401 – 12 . 43. Candès E , Eldar Y , Needell D et al. Compressed sensing with coherent and redundant dictionaries . Appl Comput Harmon Anal 2010 ; 31 : 59 – 73 . Google Scholar Crossref Search ADS 44. Elad M , Milanfar P , Rubinstein R . Analysis versus synthesis in signal priors . Appl Comput Harmon Anal 2007 ; 23 : 947 – 68 . Google Scholar Crossref Search ADS 45. Lin J , Li S , Shen Y . New bounds for restricted isometry constants with coherent tight frames . IEEE Trans Signal Process 2013 ; 61 : 611 – 21 . Google Scholar Crossref Search ADS 46. Lin J , Li S . Sparse recovery with coherent tight frames via analydsis Dantzig selector and analysis LASSO . Appl Comput Harmon Anal 2014 ; 37 : 126 – 39 . Google Scholar Crossref Search ADS 47. Liu Y , Mi T , Li S . Compressed sensing with general frames via optimal-dual-based ℓ1-analysis . IEEE Trans Inform Theor 2012 ; 58 : 4201 – 14 . Google Scholar Crossref Search ADS 48. Li S , Lin J . Compressed sensing with coherent tight frames via ℓq-minimization for 0 < q ≤ 1 . Inverse Probl Imag 2014 ; 8 : 761 – 77 . Google Scholar Crossref Search ADS 49. Moravec M , Romberg J , Baraniuk R . Compressive phase retrieval . In: Proceedings of SPIE , the International Society for Optics and Photonics , 2007 . 50. Yang Z , Zhang C , Xie L . Robust compressive phase retrieval via ℓ1 minimization with application to image reconstruction. arXiv:13020081 . 51. Voroninski V , Xu Z . A strong restricted isometry property, with an application to phaseless compressed sensing . Appl Comput Harmon Anal 2016 ; 40 : 386 – 95 . Google Scholar Crossref Search ADS 52. Schniter P , Rangan S . Compressive phase retrieval via generalized approximate message passing . IEEE Trans Signal Process 2015 ; 63 : 1043 – 55 . Google Scholar Crossref Search ADS 53. Foygel R , Shamir O , Srebro N et al. Learning with the weighted trace-norm under arbitrary sampling distributions . In: Proceedings of Advances in Neural Information Processing Systems 24 , 2011 . 54. Chen Y , Bhojanapalli S , Sanghavi S et al. Coherent matrix completion . In: Proceedings of the 31st International Conference on Machine Learning , 2014 . 55. Cai TT , Zhou WX . Matrix completion via max-norm constrained optimization . Electron J Stat 2016 ; 10 : 1493 – 525 . Google Scholar Crossref Search ADS 56. Recht B , Fazel M , Parrilo P . Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization . SIAM Rev 2010 ; 52 : 471 – 501 . Google Scholar Crossref Search ADS 57. Candès E , Plan Y . Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements . IEEE Trans Inform Theor 2011 ; 57 : 2342 – 59 . Google Scholar Crossref Search ADS 58. Eriksson B , Balzano L , Nowak R . High-rank matrix completion . In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics , 2012 . 59. Elhamifar E . High-rank matrix completion and clustering under self-expressive models . In: Proceedings of Advances in Neural Information Processing Systems , 2016 . 60. Li CG , Vidal R . A structured sparse plus structured low-rank framework for subspace clustering and completion . IEEE Trans Signal Process 2016 ; 64 : 6557 – 70 . Google Scholar Crossref Search ADS 61. Zhou ZH , Li X , Wright J et al. Stable principal component pursuit . In: Proceedings of the 2010 IEEE International Symposium on Information Theory , 2010 . 62. Ganesh A , Wright J , Li X et al. Dense error correction for low-rank matrices via principal component pursuit . In: Proceedings of the 2010 IEEE International Symposium on Information Theory , 2010 . 63. Zhao Q , Meng D , Xu Z et al. Robust principal component analysis with complex noise . In: Proceedings of the 31st International Conference on Machine Learning , 2014 . 64. Netrapalli P , Niranjan U , Sanghavi S et al. Non-convex robust PCA . In: Proceedings of Advances in Neural Information Processing Systems 27 , 2014 . 65. Zhang M , Huang ZH , Zhang Y . Restricted p-isometry properties of nonconvex matrix recovery . IEEE Trans Inform Theor 2013 ; 59 : 4316 – 23 . Google Scholar Crossref Search ADS 66. Wang J , Wang M , Hu X et al. Visual data denoising with a unified Schatten-p norm and ℓq norm regularized principal component pursuit . Pattern Recogn 2015 ; 48 : 3135 – 44 . Google Scholar Crossref Search ADS 67. Zhao Q , Meng D , Xu Z et al. ℓ1-norm low-rank matrix factorization by variational Bayesian method . IEEE Trans Neural Network Learn Syst 2015 ; 26 : 825 – 39 . Google Scholar Crossref Search ADS 68. Yue MC , So AMC . A perturbation inequality for concave functions of singular values and its applications in low-rank matrix recovery . Appl Comput Harmon Anal 2016 ; 40 : 396 – 416 . Google Scholar Crossref Search ADS 69. Korah T , Rasmussen C . Spatio-temporal inpainting for recovering texture maps of occluded building facades . IEEE Trans Image Process 2007 ; 16 : 2262 – 71 . Google Scholar Crossref Search ADS PubMed 70. Pauly M , Mitra N , Giesen J et al. Example-based 3D scan completion . In: Proceedings of the Symposium on Geometry Processing , 2005 . 71. Acar E , Dunlavy D , Kolda T et al. Scalable tensor factorizations for incomplete data . Chemometr Intell Lab Syst 2001 ; 106 : 41 – 56 . Google Scholar Crossref Search ADS 72. Xie K , Wang L , X W et al. Accurate recovery of internet traffic data: a tensor completion approach . In: Proceedings of the 35th Annual IEEE International Conference on Computer Communications , 2016 . 73. Peng Y , Meng D , Xu Z et al. Decomposable nonlocal tensor dictionary learning for multispectral image denoising . In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , 2014 . 74. Hillar CJ , Lim LH . Most tensor problems are NP-hard . J ACM 2013 ; 60 : 1 – 39 . Google Scholar Crossref Search ADS 75. Nie J , Wang L . Semidefinite relaxations for best rank-1 tensor approximations . SIAM J Matrix Anal Appl 2014 ; 35 : 1155 – 79 . Google Scholar Crossref Search ADS 76. Jiang B , Ma S , Zhang S . Tensor principal component analysis via convex optimization . Math Program 2015 ; 150 : 423 – 57 . Google Scholar Crossref Search ADS 77. Yang Y , Feng Y , Suykens J . A rank-one tensor updating algorithm for tensor completion . IEEE Signal Process Lett 2015 ; 22 : 1633 – 7 . Google Scholar Crossref Search ADS 78. Zhao Q , Meng D , Kong X et al. A novel sparsity measure for tensor recovery . In: Proceedings of International Conference on Computer Vision , 2015 . 79. Xie Q , Zhao Q , Meng D et al. Kronecker-Basis-Representation based tensor sparsity and its Applications to tensor recovery . IEEE Trans Pattern Anal Mach Intell 2017 ; 40 : 1888 – 902 . 80. Wang T , Backhouse A , Gu I . Online subspace learning on Grassmann manifold for moving object tracking in video . In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process , 2008 . 81. Beleznai C , Fruhstuck B , Bischof H . Multiple object tracking using local PCA . In: Proceedings of the 18th International Conference on Pattern Recognition , 2006 . 82. Wakin M , Laska JN , Duarte MF et al. Compressive imaging for video representation and coding . In: Proceedings of Picture Coding Symposium , 2006 . 83. Takhar D , Laska JN , Wakin M et al. A new compressive imaging camera architecture using optical-domain compression . In: Proceedings of Computational Imaging IV at SPIE Electronic Imaging , 2006 . 84. Duarte M , Davenport M , Takhar D et al. Single-pixel imaging via compressive sampling . IEEE Signal Process Mag 2008 ; 25 : 83 – 91 . Google Scholar Crossref Search ADS 85. Cevher V , Sankaranarayanan A , Duarte M et al. Compressive sensing for background subtraction . In: Proceedings of the 10th European Conference on Computer Vision , 2008 . 86. Waters A , Sankaranarayanan A , Baraniuk R et al. A new compressive imaging camera architecture using optical-domain compression . In: Proceedings of Conference on Neural Information Processing Systems 24 , 2011 . 87. Guo H , Qiu CL , Vaswani N . An online algorithm for separating sparse and low-dimensional signal sequences from their sum . IEEE Trans Signal Process 2014 ; 62 : 4284 – 97 . Google Scholar Crossref Search ADS 88. Jiang H , Deng W , Shen Z . Surveillance video processing using compressive sensing . Inverse Probl Imag 2014 ; 6 : 201 – 14 . Google Scholar Crossref Search ADS 89. Jiang H , Zhao S , Shen Z et al. Surveillance video analysis using compressive sensing with low latency . Bell Labs Tech J 2014 ; 18 : 63 – 74 . Google Scholar Crossref Search ADS 90. Cao W , Wang Y , Sun J et al. Total variation regularized tensor RPCA for background subtraction from compressive measurements . IEEE Trans Image Process 2016 ; 25 : 4075 – 90 . Google Scholar Crossref Search ADS PubMed 91. Goetz AFH . Three decades of hyperspectral remote sensing of the Earth: a personal view . Rem Sens Environ 2009 ; 113 : S5 – S6 . Google Scholar Crossref Search ADS 92. Willett R , Duarte M , Davenport M et al. Sparsity and structure in hyperspectral imaging: sensing, reconstruction, and target detection . IEEE Signal Process Mag 2014 ; 31 : 116 – 26 . Google Scholar Crossref Search ADS 93. Arce G , Brady D , Carin L et al. Compressive coded aperture spectral imaging: an introduction . IEEE Signal Process Mag 2014 ; 31 : 105 – 15 . Google Scholar Crossref Search ADS 94. Yuan X , Tsai TH , Zhu R et al. Compressive hyperspectral imaging with side information . IEEE J Sel Top Signal Process 2015 ; 9 : 964 – 76 . Google Scholar Crossref Search ADS 95. Li C , Sun T , Kelly KF et al. A compressive sensing and unmixing scheme for hyperspectral data processing . IEEE Trans Image Process 2012 ; 21 : 1200 – 10 . Google Scholar Crossref Search ADS PubMed 96. Duarte M , Baraniuk R . Kronecker compressive sensing . IEEE Trans Image Process 2012 ; 21 : 494 – 504 . Google Scholar Crossref Search ADS PubMed 97. Golbabaee M , Vandergheynst P . Hyperspectral image compressed sensing via low-rank and joint-sparse matrix recovery . In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , 2012 . 98. Golbabaee M , Vandergheynst P . Joint trace/TV norm minimization: a new efficient approach for spectral compressive imaging . In: Proceedings of 19th IEEE International Conference on Image Processing , 2012 . 99. Wang Y , Lin L , Zhao Q et al. Compressive sensing of hyperspectral images via joint tensor tucker decomposition and weighted total variation regularization . IEEE Geosci Rem Sens Lett 2017 ; 14 : 2457 – 61 . 100. Yang S , Wang M , Li P et al. Compressive hyperspectral imaging via sparse tensor and nonlinear compressed sensing . IEEE Trans Geosci Rem Sens 2015 ; 53 : 5943 – 57 . Google Scholar Crossref Search ADS © The Author(s) 2017. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png National Science Review Oxford University Press

Sparse recovery: from vectors to tensors

Loading next page...
 
/lp/ou_press/sparse-recovery-from-vectors-to-tensors-9et0j5uEb0
Publisher
Oxford University Press
Copyright
© The Author(s) 2017. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd.
ISSN
2095-5138
eISSN
2053-714X
D.O.I.
10.1093/nsr/nwx069
Publisher site
See Article on Publisher Site

Abstract

Abstract Recent advances in various fields such as telecommunications, biomedicine and economics, among others, have created enormous amount of data that are often characterized by their huge size and high dimensionality. It has become evident, from research in the past couple of decades, that sparsity is a flexible and powerful notion when dealing with these data, both from empirical and theoretical viewpoints. In this survey, we review some of the most popular techniques to exploit sparsity, for analyzing high-dimensional vectors, matrices and higher-order tensors. high-dimensional data, sparsity, compressive sensing, low-rank matrix recovery, tensors INTRODUCTION The problem of sparse recovery is ubiquitous in modern science and engineering applications. In these applications, we are interested in inferring a high-dimensional object, namely a vector, a matrix or a higher-order tensor, from very few observations. Notable examples include identifying key genes driving a complex disease, and reconstructing high-quality images or videos from compressive measurements, among many others. More specifically, consider linear measurements of an n-dimensional object x of the form: \begin{equation} y_k = \langle a_k, x\rangle , \quad k=1,\cdots , m, \end{equation} (1) where 〈·, ·〉 stands for the usual inner product in $${\mathbb {R}}^n$$, and ak are a set of prespecified n-dimensional vectors. The number of measurements m is typically much smaller than n so that the linear system (1) is underdetermined whenever m < n. Thus it is impossible to recover x from yk in the absence of any additional assumption. The idea behind sparse recovery is to assume that x actually resides in a subspace whose dimensionality is much smaller than the ambient dimension n. A canonical example of sparse recovery is the so-called compressive sensing for vectors, where x is assumed to have only a small number of, albeit unknown, nonzero coordinates. More generally, we call a vector $$x\in \mathbb {R}^n$$k-sparse if it can be represented by up to k elements from a predetermined dictionary. Another common example is the recovery of low-rank matrices where $$x\in \mathbb {R}^{n_1\times n_2}$$ is assumed to have a rank much smaller than min {n1, n2}. In many practical situations, we are also interested in signals of higher-order multilinear structure. For example, it is natural to represent multispectral images by a third-order multilinear array, or tensor, with the third index corresponding to different bandwidths. Clearly, vectors and matrices can be viewed as first-order and second-order tensors as well. Despite the connection, moving from vectors and matrices to higher-order tensors could present significant new challenges. A common way to address these challenges is to unfold tensors to matrices; see e.g. [1–4]. However, as recently pointed out in [5], the multilinear structure is lost in such matricization and, as a result, methods based on these techniques often lead to suboptimal results. A general approach to sparse recovery is through solving the following constrained optimization problem: \begin{eqnarray} \min \limits _{z}\ {\mathcal {S}}(z) \,\, {\rm subject\ to} \, y_k &=& \langle a_k, z\rangle ,\nonumber\\ k \, = \, 1, {\cdots} , m, \end{eqnarray} (2) where $${\mathcal {S}}(\cdot )$$ is an objective function that encourages sparse solutions. The success of this approach hinges upon several crucial aspects including, among others, how to choose the object function $${\mathcal {S}}(\cdot )$$; how to solve the optimization problem (2); how to design the sampling vectors ak to facilitate recovery; what is the minimum sample size requirement m to ensure recovery? There is, by now, an impressive literature addressing these issues when x is a vector or matrix; see e.g. [6–18], among numerous others. In this review, we aim to survey some of the key developments, with a particular focus on applications to image and video analysis. The rest of the paper is organized as follows. In the sections entitled ‘Recovery of sparse vectors’, ‘Recovery of low-rank matrices’ and ‘Recovery of low-rank higher-order tensors’, we discuss the recovery of sparse vector, low-rank matrix and low-rank tensor signals, respectively. Finally, a couple of illustrative examples in image and video processing are given in the section entitled ‘Applications’. RECOVERY OF SPARSE VECTORS We first consider recovering a sparse vector signal, a problem more commonly known as compressive sensing [10–13]. Compressive sensing of sparse signals With slight abuse of notation, let y be an m-dimensional vector whose coordinates are the measurement yk, and A be an m × n matrix whose rows are given by ak. It is then not hard to see that (1) can be more compactly written as y = Ax, where x is an n-dimensional vector. Following the jargon in compressive sensing, hereafter we shall refer to A as the sensing matrix. Obviously, when m < n, there may be infinitely many z that agree with the measurements in that y = Az. Since x is known a priori to be sparse, it is then naturally to seek among all these solutions the one that is sparsest. As mentioned before, an obvious way to measure sparsity of x is its ℓ0 norm: \begin{equation*} \Vert x\Vert _{\ell _0}=|\lbrace i: x_i\ne 0\rbrace |, \end{equation*} where |·| stands for the cardinality of a set, leading to the following recovering x by a solution to \begin{equation} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _{\ell _0} \quad {\rm subject\ to} \quad y=Az. \end{equation} (3) Under mild regularity conditions on the sensing matrix A, it can be shown that the solution to (3) is indeed well defined and unique, and thus correctly recovers x [9]. However, it is also well known [19] that solving (3) is NP-hard in general and thus infeasible to compute for even moderate-size problems. The most popular way to overcome this challenge is the ℓ1 relaxation, which minimizes the ℓ1 norm instead, leading to \begin{equation} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _{\ell _1} \quad {\rm subject\ to} \quad y=Az. \end{equation} (4) Assuming that x is k-sparse and the unique solution to (3), a key question pertaining to the ℓ1 relaxation (4) is: under what condition is it also the unique solution to (4)? The answer can be characterized by various properties of the sensing matrix including the mutual incoherence property (MIP [20]), null space property (NSP [21]) and restricted isometry property (RIP [9]), among others. We shall focus primarily on RIP here. Interested readers are referred to [22,23] and references therein for further discussions on MIP and NSP. Definition 2.1 [9] A sensing matrix $$A\in {\mathbb {R}}^{m\times n}$$ is said to satisfy the RIP of order k if there exists a constant δk ∈ [0, 1) such that, for every k-sparse vector $$z\in \mathbb {R}^n$$, \begin{equation} (1-\delta _{k}) \Vert {z}\Vert _{\ell _2}^2\le \Vert A{z}\Vert _{\ell _2}^2\le (1+\delta _{k})\Vert {z}\Vert _{\ell _2}^2. \end{equation} (5) Similarly, A is said to obey the restricted orthogonality property (ROP) of order (k, k΄) if there exists a constant $$\theta _{k,k^{\prime }}$$ such that, for every k-sparse vector z and k΄-sparse vector z΄ with non-overlapping support sets, \begin{equation} |\langle Az, A{z^{\prime }}\rangle |\le \theta _{k_1,k_2}\Vert z\Vert _{\ell _2}\Vert z^{\prime }\Vert _{\ell _2}. \end{equation} (6) The constants δk and $$\theta _{k, k^{\prime }}$$ are called the k-restricted isometry constant (RIC) and (k, k΄)-restricted orthogonality constant (ROC), respectively. The concept was first introduced by Candès and Tao [9], who showed that, if δk + θk, k + θk, 2k < 1, then x is the unique solution to (4). This condition has since been weakened. For example, Candès and Tao [13] showed that δ2k + 3θk, 2k < 2 suffices, and Cai et al. [24] required that δ1.25k + θk, 1.25k < 1. More recently, Cai and Zhang [25] further weakened the condition to δk + θk, k < 1 and showed that the upper bound 1 is sharp in the sense that, for any ε > 0, the condition δk + θk, k < 1 + ε is not sufficient to guarantee exact recovery. Sufficient conditions for exact recovery by (4) that only involve RIC have also been investigated in the literature. For example, [26] argued that $$\delta _{2k}<\sqrt{2}-1$$ implies that x is the unique solution to (4). This was improved to δ2k < 0.472 in [27]. More recently, [28] showed that, for any given constant t ≥ 4/3, the condition $$\delta _{tk}<\sqrt{(t-1)/t}$$ guarantees the exact recovery of all k-sparse signals by ℓ1 minimization. Moreover, for any ε > 0, $$\delta _{tk}<\sqrt{(t-1)/t}+\epsilon$$ is not sufficient to ensure exact recovery of all k-sparse signals for large k. An immediate question following these results is how to design a sensing matrix that satisfies these conditions so that we can use (4) to recover x. It is now well understood that, for many random ensembles, that is, where each entry of A is independently sampled from a common distribution such as a Gaussian, Rademacher or other sub-Gaussian distribution, δk < ε with overwhelming probability, provided that m ≥ Cε−2k log (n/k), for some constant C > 0. There has also been some recent progress in constructing deterministic sensing matrices that satisfy these RIP conditions; see e.g. [29–31]. Compressive sensing of block-sparse signals In many applications, the signal of interest may have more structured sparsity patterns. The most common example is so-called block-sparsity where sparsity occurs in a blockwise fashion rather than at the individual coordinate level. More specifically, let $${x}\in \mathbb {R}^n$$ be the concatenation of b signal ‘blocks’: \begin{equation} {x}\!=\![\underbrace{x_1\cdots x_{n_1}}_{{x}[1]}\underbrace{x_{n_1+1}\cdots x_{n_1+n_2}}_{x[2]}\cdots \underbrace{x_{n-n_b+1}\cdots x_n}_{{x}[b]}]^\top \!, \end{equation} (7) where each signal ‘block’ x[i] is of length ni. We assume that x is block k-sparse in that there are at most k nonzero blocks among x[i]. As before, we are interested in the most block-sparse signal that satisfies y = Az. To circumvent the potential computational challenge, the following relaxation is often employed: \begin{equation} \min _{z\in \mathbb {R}^n}\Vert {z}\Vert _{\ell _2/\ell _1} \quad {\rm subject \ to} \ {y}=A{z}, \end{equation} (8) where the mixed ℓ2/ℓ1 norm is defined as \begin{equation*} \Vert z\Vert _{\ell _2/\ell _1}=\sum _{i=1}^b\Vert {z}[i]\Vert _{\ell _2}. \end{equation*} It is not hard to see that, when each block has size ni = 1, (8) reduces to the ℓ1 minimization given by (4). More generally, the optimization problem in (8) is convex and can be recast as a second-order cone program, and thus can be solved efficiently. One can also extend the notion of RIP and ROP to the block-sparse setting; see e.g. [32,33]. For brevity, write $$\mathcal {I}=\lbrace n_1,\ldots ,n_b\rbrace$$. Definition 2.2. A sensing matrix A is said to satisfy the block-RIP of order k if there exists a constant $$\delta _{k|\mathcal {I}}\in [0,1)$$ such that, for every block k-sparse vector $$z\in \mathbb {R}^n$$, \begin{equation} (1-\delta _{k|\mathcal {I}})\Vert {z}\Vert _{\ell _2}^2\le \Vert A{z}\Vert _{\ell _2}^2\le (1+\delta _{k|\mathcal {I}})\Vert {z}\Vert _{\ell _2}^2. \end{equation} (9) Similarly, A is said to obey the block-ROP of order (k, k΄) if there exists a constant $$\theta _{k,k^{\prime }|\mathcal {I}}$$ such that, for every block k-sparse vector z and block k΄-sparse vector z΄ with disjoint supports, \begin{equation} |\langle Az, A{z^{\prime }}\rangle |\le \theta _{k_1,k_2|\mathcal {I}}\Vert z\Vert _{\ell _2}\Vert z^{\prime }\Vert _{\ell _2}. \end{equation} (10) The constants $$\delta _{k|\mathcal {I}}$$ and $$\theta _{k, k^{\prime }|\mathcal {I}}$$ are referred to as block k-RIC and block (k, k΄)-ROC, respectively. Clearly, any sufficient RIP conditions of standard ℓ1 minimization can be naturally extended to the setting of block-sparse recovery via mixed ℓ2/ℓ1 minimization so that, for example, $$\theta _{k,k|\mathcal {I}}+\delta _{k|\mathcal {I}}<1$$ is also a sufficient condition for x to be the unique solution to (8). Nonconvex methods In addition to the ℓ1-minimization-based approach, there is also an extensive literature on nonconvex methods for sparse recovery where instead of the ℓ1 norm of z one minimizes a nonconvex objective function in z. The most notable example is the ℓq (0 < q < 1) (quasi-)norm, leading to \begin{equation} \min _{z\in \mathbb {R}^n}\Vert {z}\Vert _{\ell _q} \quad {\rm subject \ to} \ {y}=Az. \end{equation} (11) Some recent studies, e.g. [34–36], have shown that the solution of (11) can recover a sparse signal based on much fewer measurements when compared with the ℓ1 minimization (4). In particular, the case of q = 1/2 has been treated extensively in [37–39]. Other notable examples of nonconvex objective functions include smoothly clipped absolute deviation (SCAD) [40] and the minimax concave plus function [41], among others. Compressive sensing with general dictionaries Thus far, we have focused on sparsity with respect to the canonical basis of $$\mathbb {R}^n$$. In many applications, it might be more appropriate to have sparsity with respect to more general dictionaries; see e.g. [42–46]. More specifically, a signal x is represented by x = Dα with respect to a dictionary $$D\in \mathbb {R}^{n\times n^{\prime }}$$, where $${\alpha }\in \mathbb {R}^{n^{\prime }}$$ is the coordinate in the dictionary and is known a priori to be sparse or nearly sparse. One can obviously treat A΄ = AD as the new sensing matrix and apply any of the aforementioned methods to exploit the sparsity of α. One of the drawbacks, however, is that nice properties of the original sensing matrix A may not be inherited by A΄. In other words, even if one carefully designs a sensing matrix A, exact recovery of x may not be guaranteed despite its sparsity with respect to D. Alternatively, some have advocated reconstructing x by the solution to the following optimization problem: \begin{equation} \min \limits _{{z}\in \mathbb {R}^{n}}\Vert {D^*z}\Vert _{\ell _1} \quad {\rm subject\ to} \quad y=Az; \end{equation} (12) see e.g. [43,44,47,48]. Compressive phase retrieval In the previous subsections, we have mainly discussed the problem of recovering a sparse signal from a small number of linear measurements. However, in some practical scenarios one can only observe some nonlinear measurements of the original signal. The typical example is the so-called compressive phase retrieval problem. In such a scenario, we observe the magnitude of the Fourier coefficients instead of their phase, which can be modeled as the form \begin{equation} y_k = |\langle a_k, x\rangle |, \quad k=1,\cdots , m. \end{equation} (13) For recovering x, several studies [49–51] have considered the following ℓ1 minimization: \begin{eqnarray} \min \limits _{z\in \mathbb {R}^n}\ \Vert z\Vert _1 \,\, {\rm subject\ to} \, y_k &=& |\langle a_k, z\rangle |,\nonumber\\ k&=&1,\cdots , m. \end{eqnarray} (14) By introducing a strong notion of RIP, Voroninski and Xu [51] built a parallel result for compressive phase retrieval with the classical compressive sensing. Specifically, they proved that a k-sparse signal x can be recovered from m = O(k log (n/k)) random Gaussian phaseless measurements by solving (14). Unlike the standard convex ℓ1 minimization (4), the problem (14) is a nonconvex problem, but some efficient algorithms have been developed to compute it; see e.g. [50,52]. RECOVERY OF LOW-RANK MATRICES We now consider recovering a low-rank matrix, a problem often referred to as matrix completion. Matrix completion via nuclear norm minimization In many practical situations such as collaborative filtering, system identification and remote sensing, to name a few, the signal that we aim to recover oftentimes is a matrix rather than a vector. To signify this fact and distinguish from the vector case treated in the last section, we shall write the underlying signal as a capital letter, $$X\in \mathbb {R}^{n_1\times n_2}$$, throughout this section. In these applications, we often observe a small fraction of the entries of X. The task of matrix completion is then to ‘complete’ the remaining entries. Formally, let Ω be a subset of [n1] × [n2] where [n] = {1, …, n}. The goal of matrix completion is to recover X based on {Xij: (i, j) ∈ Ω}, particularly with the sample size |Ω| much smaller than the total number n1n2 of entries. To fix ideas, we shall assume that Ω is a uniformly sampled subset of [n1] × [n2], although other sampling schemes have also been investigated in the literature, e.g. [53–55]. Obviously, we cannot complete an arbitrary matrix from a subset of its entries. But it is possible for low-rank matrices as their degrees of freedom are much smaller than n1n2. But low rankness alone is not sufficient. Consider a matrix with a single nonzero entry; it is of rank one but it is impossible to complete it unless the nonzero entry is observed. A formal way to characterize low-rank matrices that can be completed from {Xij: (i, j) ∈ Ω} was first introduced in [14]. Definition 3.1. Let U be a subspace of $$\mathbb {R}^n$$ of dimension r and $$\boldsymbol {P}_U$$ be the orthogonal projection onto U. Then the coherence of U (with respect to the standard basis ($$\boldsymbol {e}_i$$)) is defined to be \begin{equation*} \mu (U)\equiv \frac{n}{r}\max _{1\le i\le n}\Vert \boldsymbol {P}_U\boldsymbol {e}_i\Vert ^2. \end{equation*} It is clear that the smallest possible value for μ(U) is 1 and the largest possible value for μ(U) is n/r. Let M be an n1 × n2 matrix of rank r and with column and row spaces denoted by U and V, respectively. We shall say that M satisfies the incoherence condition with parameter μ0 if max (μ(U), μ(V)) ≤ μ0. Now let X be an incoherent matrix of rank r. In a similar spirit as the vector case, a natural way to reconstruct it from {Xij: (i, j) ∈ Ω} is to seek, among all matrices whose entries indexed by Ω agree with our observations, the one with the smallest rank: \begin{eqnarray} &&\min _{Z\in \mathbb {R}^{n_1\times n_2}}\ {\rm rank}(Z) \quad {\rm subject \ to} \, Z_{ij}= X_{ij}, \nonumber\\ &&\forall (i, j)\in \Omega . \end{eqnarray} (15) Again, to overcome the computational challenges in directly minimizing matrix ranks, the following convex program is commonly suggested: \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _* \,\, {\rm subject \ to} \, Z_{ij}= X_{ij}, \ (i, j)\in \Omega , \end{equation} (16) where the nuclear norm ‖·‖* is the sum of all singular values. As before, we are interested in when the solution to (16) is unique and correctly recovers x. Candès and Recht [14] were the first to show that this is indeed the case, for almost all Ω that are large enough. These results are probabilistic in nature due to the randomness of Ω, that is, one can correctly recover x using (16) with high probability regarding min {n1, n2}. The sample size requirement to ensure exact recovery of X by the solution of (16) was later improved in [16] to |Ω| ≥ Cr(n1 + n2) · polylog(n1 + n2), where C is a constant that depends on the coherence coefficients only. It is clear to see that this requirement is (nearly) optimal in that there are O(r(n1 + n2)) free parameters in specifying a rank-r matrix. Matrix completion from affine measurements More generally, one may consider recovering a low-rank matrix based on affine measurements. More specifically, let $$\mathcal {A}: \mathbb {R}^{n_1\times n_2}\rightarrow \mathbb {R}^m$$ be a linear map such that $$\mathcal {A}(X)=y$$. We aim to recover X based on the information that $$\mathcal {A}(X)=y$$. It is clear that the canonical matrix completion problem discussed in the previous subsection corresponds to the case when $$\mathcal {A}(X)=\lbrace X_{ij}: (i,j)\in \Omega \rbrace$$. Similarly, we can proceed to reconstruct X by the solution to \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _* \quad {\rm subject \ to }\quad y=\mathcal {A}({Z}). \end{equation} (17) It is of interest to know under which kind of sensing operator $$\mathcal {A}$$ can X be recovered in exactly this way. An answer is given in [56], which extends the concept of RIP to general linear operator for matrices. Definition 3.2. A linear operator $$\mathcal {A} : \mathbb {R}^{n_1\times n_2}\rightarrow \mathbb {R}^m$$ is said to satisfy the matrix RIP of order r if there exists a constant $$\delta _{r}^Z$$ such that \begin{equation} \left(1-\delta _r^Z\right)\Vert Z\Vert _{\rm F}\le \Vert \mathcal {A}(Z)\Vert _{\ell _2}\le \left(1+\delta _r^Z\right)\Vert Z\Vert _{\rm F} \end{equation} (18) holds for all matrices $$Z\in \mathbb {R}^{n_1\times n_2}$$ of rank at most r. Recht et al. [56] further proved that, if $$\mathcal {A}$$ satisfies the matrix RIP (18) with $$\delta _{5r}^M<1/10$$, then one can recover a rank-r matrix X from n = O(r(n1 + n2) log (n1n2)) measurements by the solution to (17). Note that the condition $$\delta _{5r}^M<1/10$$ has been dramatically weakened; see e.g. [28,57]. Besides the aforementioned low-rank matrix completion problems, some recent studies, e.g. [58–60], have drawn attention to the so-called high-rank matrix completion, in which the columns of the matrix belong to a union of subspaces and, as a result, the rank can be high or even full. Mixed sparsity In some applications, the matrix that we want to recover is not necessarily of low rank but differs from a low-rank matrix only by a small number of entries; see e.g. [17,18]. In other words, we may write X = S + L, where S is a sparse matrix with only a small number of nonzero entries while L is a matrix of low rank. It is clear that, even if we observe X entirely, the decomposition of X into a sparse component S and a low-rank component L may not be uniquely defined. General conditions under which such a decomposition is indeed unique are provided in, for example, [18]. In the light of the previous discussions, it is natural to consider reconstructing X from observations {Xij: (i, j) ∈ Ω} by the solution to \begin{eqnarray} &&\min _{Z_1,Z_2\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z_1\Vert _*+\lambda \Vert {\rm vec}(Z_2)\Vert _{\ell _1} \nonumber\\ &&{\rm subject \ to} \, (Z_1+Z_2)_{ij}=X_{ij},\,\, \forall (i,j)\in \Omega .\nonumber\\ \end{eqnarray} (19) This strategy has been investigated extensively in the literature; see e.g. [17]. Further developments in this direction can also be found in [61–64], among others. Nonconvex methods Just as ℓ1 norm is a convex relaxation of the ℓ0 norm, the nuclear norm is also a convex relaxation of the rank for a matrix. In addition to these convex approaches, nonconvex methods have also been proposed by numerous authors. The most common example is the Schatten-q (0 < q < 1) (quasi-)norm defined by \begin{equation} \Vert X\Vert _{S_q}=\left(\sum _{i=1}^{\min \lbrace n_1,n_2\rbrace }\sigma _i^q\right)^{1/q}, \end{equation} (20) where σ1, σ2, ⋅⋅⋅ are the singular values of X. It is clear that, when q → 0, ‖X‖sq → rank(X) while the nuclear norm corresponds to the case when q = 1. One may now consider reconstructing X by the solution to \begin{equation} \min _{Z\in \mathbb {R}^{n_1\times n_2}}\ \Vert Z\Vert _{S_q} \quad {\rm subject \ to }\quad y=\mathcal {A}({Z}); \end{equation} (21) see e.g. [65–68], among others. RECOVERY OF LOW-RANK HIGHER-ORDER TENSORS In an increasing number of modern applications, the object to be estimated has a higher-order tensor structure. Typical examples include video inpainting [69], scan completion [70], multichannel EEG (electroencephalogram) compression [71], traffic data analysis [72] and hyperspectral image restoration [73], among many others. Similar to matrices, in many of these applications, we are interested in recovering a low-rank tensor either from observing a subset of its entries or a collection of affine measurements. Despite the apparent similarities between matrices and higher-order tensors, it is delicate to extend the idea behind nuclear norm minimization to the latter because matrix-style singular value decomposition does not exist for higher-order tensors. A common approach is to first unfold a higher-order tensor to a matrix and then apply a matrix-based approach to recover a tensor. Consider, for example, a third-order tensor $$\mathcal {X}\in \mathbb {R}^{d_1\times d_2\times d_3}$$. We can collapse its second and third indices, leading to a d1 × (d2d3) matrix X(1). If $$\mathcal {X}$$ is of low rank, then so is X(1). We can exploit the low rankness of X(1) by minimizing its nuclear norm. Clearly we can also collapse the first and third indices of $$\mathcal {X}$$, leading to a matrix X(2), and the first and second, leading to X(3). If we want to complete a low-rank tensor $$\mathcal {X}$$ based on its entries $$\mathcal {X}(\omega )$$ for ω ∈ Ω⊂[d1] × [d2] × [d3], we can then consider recovering $$\mathcal {X}$$ by the solution to the following convex program: \begin{eqnarray} &&\min _{\mathcal {Z}\in \mathbb {R}^{d_1\times d_2\times d_3}}\sum _{j=1}^3\Vert {{Z}}_{(j)}\Vert _* \nonumber\\ &&{\rm subject \ to}\, \mathcal {Z}(\omega )= \mathcal {X}(\omega ), \ \forall \omega \in \Omega . \end{eqnarray} (22) Efficient algorithms for solving matrix nuclear norm minimization (16) such as the alternating direction method of multipliers (ADMM) and Douglas–Rachford operator splitting methods can then be readily adapted to solve (22); see e.g. [3,4]. As pointed out in [5], however, such an approach fails to exploit fully the multilinear structure of a tensor and is thus suboptimal. Instead, directly minimizing the tensor nuclear norm was suggested: \begin{eqnarray} &&\min _{\mathcal {Z}\in \mathbb {R}^{d_1\times d_2\times d_3}}\Vert \mathcal {Z}\Vert _*\nonumber\\ &&{\rm subject \ to}\, \mathcal {Z}(\omega )= \mathcal {X}(\omega ), \ \forall \omega \in \Omega , \end{eqnarray} (23) where the tensor nuclear norm ‖·‖* is defined as the dual norm of the tensor spectral norm ‖·‖. For more discussions on the tensor nuclear and spectral norms, please see [74]. Unfortunately, unlike the matrix nuclear norm, computing the tensor nuclear norm, and thereby the problem (23), is NP-hard. Hence, various relaxations and approximate algorithms have been introduced in the literature; see e.g. [75–79], and references therein. This is a research area in its infancy and many interesting issues need to be addressed. APPLICATIONS In the previous sections, we have given an overview of some basic ideas and techniques in dealing with sparsity in vectors and low rankness in matrices and tensors. We now give a couple of examples to illustrate how they can be used in action. Background subtraction with compressive imaging Background subtraction in image and video has attracted a lot of attention in the past couple of decades. It aims at simultaneously separating video background and extracting the moving objects from a video stream, and can provide important clues for various applications such as moving object detection [80] and object tracking in surveillance [81], among numerous others. Conventional background subtraction techniques usually consist of four steps: video acquisition, encoding, decoding and separating the moving objects from the background. This scheme needs to fully sample the video frames with large computational and storage requirements, followed by well-designed video coding and background subtraction algorithms. To alleviate the burden of computation and storage, a newly-developed compressive imaging scheme [82–84] has been used for background subtraction by combining the video acquisition, coding and background subtraction into a single framework, as illustrated in Fig. 1. It simultaneously achieves background subtraction and video reconstruction. In this setting, the main objective is then to maximize the reconstruction and separation accuracies using as few compressive measurements as possible. Figure 1. View largeDownload slide The framework of background subtraction with compressive imaging. Figure 1. View largeDownload slide The framework of background subtraction with compressive imaging. Several studies have been carried out for background subtraction from the perspective of compressive imaging. In the seminal work of Cevher et al. [85], the background subtraction problem is formulated as a sparse recovery problem. They showed that the moving objects can be recovered by learning a low-dimensional compressed representation of the background image. More recently, the robust principle component analysis (RPCA) approach has also been used to deal with the problem of background subtraction with compressive imaging, in which they commonly model the video as a matrix with columns of vectorized video frames and then decompose the matrix into a low-rank matrix L and a sparse matrix S; see e.g. [86–89]. Although methods based on RPCA have achieved satisfactory performance, they fail to exploit the finer structures of background and foreground after vectorizing the video frames. It appears more advantageous to model the spatio-temporal information of background and foreground using direct tensor representation of a video. To this end, a novel tensor RPCA approach has been proposed in [90] for background subtraction from compressive measurements by decomposing the video into a static background with spatio-temporal correlation and a moving foreground with spatio-temporal continuity within a tensor representation framework. More specifically, one can use 3D total variation (3D-TV) to characterize the spatio-temporal continuity underlying the video foreground, and low-rank Tucker decomposition to model the spatio-temporal correlation of the video background, which leads to the following tensor RPCA model: \begin{eqnarray} {\min _{{\begin{array} {c} \mathcal {X}, \mathcal {S},\mathcal {E}, \\ \mathcal {G}, \mathbf {U}_{1},\mathbf {U}_{2},\mathbf {U}_{3} \end{array}}} \lambda \Vert \mathcal {S}\Vert _{\text{3D-TV}} + \frac{1}{2}\Vert \mathcal {E}\Vert _{\text{F}}^{2}} \nonumber\\ {\rm subject \ to}\,\mathcal {X} &=& \mathcal {L} + \mathcal {E} + \mathcal {S}, \nonumber\\ \mathcal {L} &=& \mathcal {G} \times _1 \mathbf {U}_1 \times _2 \mathbf {U}_2 \times _3 \mathbf {U}_3, \nonumber\\ y &=& \mathcal {A}(\mathcal {X}), \end{eqnarray} (24) where the factor matrices U1 and U2 are orthogonal in columns for two spatial modes, the factor matrix U3 is orthogonal in columns for the temporal mode, the core tensor $$\mathcal {G}$$ interacts with these factors and the 3D-TV term ‖ · ‖3D-TV is defined as \begin{eqnarray*} \Vert \mathcal {X}\Vert _{\text{3D-TV}}: &=& \Vert \mathcal {X}_{h}(i,j,k)\Vert _1\nonumber\\ &&+\Vert \mathcal {X}_{v}(i,j,k)\Vert _1+\Vert \mathcal {X}_{t}(i,j,k)\Vert _1, \end{eqnarray*} where \begin{eqnarray*} \mathcal {X}_{h}(i,j,k) : &=& \mathcal {X}(i,j+1,k) - \mathcal {X}(i,j,k), \nonumber\\ \mathcal {X}_{v}(i,j,k) : &=& \mathcal {X}(i+1,j,k) - \mathcal {X}(i,j,k), \nonumber\\ \mathcal {X}_{t}(i,j,k) : &=& \mathcal {X}(i,j,k+1) - \mathcal {X}(i,j,k). \end{eqnarray*} Because a 3D patch in a video background is similar to many other 3D patches over the video frames, one can model the video background using several groups of similar video 3D patches, where each patch group corresponds to a fourth-order tensor. Integrating the patch-based modeling idea into (24), one can easily get a patch-group-based tensor RPCA model. It should be noted that solving the nonconvex tensor RPCA model (24) as well as its patch-group-based form are computationally difficult. In practice, we can find a local solution using a multi-block version of ADMM. For more details, please refer to Section V of [90]. Fig. 2 gives an example based on three real videos. It is evident that the proposed tensor models enjoy a superior performance over the other popular matrix models both in terms of the quality of the reconstructed videos and in terms of the separation of the moving objects. This suggests that using a direct tensor modeling technique to deal with practical higher-order tensor data can utilize more useful structures. Figure 2. View largeDownload slide Visual comparison of two tensor models (i.e. H-TenRPCA and PG-TenRPCA) proposed in [90] and three popular matrix models (i.e. SparCS [86], ReProcs [87] and SpLR [88]) under the sampling ratio 1/30. The first column shows the original video frames from different video volumes (a)–(c); the second to sixth columns correspond to the results produced by all the compared methods, respectively. Here, for each method, the reconstruction result of the original video frame (upper panels) and the detection result of moving objects in the foreground (lower panels) is shown. Figure 2. View largeDownload slide Visual comparison of two tensor models (i.e. H-TenRPCA and PG-TenRPCA) proposed in [90] and three popular matrix models (i.e. SparCS [86], ReProcs [87] and SpLR [88]) under the sampling ratio 1/30. The first column shows the original video frames from different video volumes (a)–(c); the second to sixth columns correspond to the results produced by all the compared methods, respectively. Here, for each method, the reconstruction result of the original video frame (upper panels) and the detection result of moving objects in the foreground (lower panels) is shown. Hyperspectral compressive sensing Hyperspectral imaging employs an imaging spectrometer to collect hundreds of spectral bands ranging from ultraviolet to infrared wavelengths for the same area on the surface of the Earth. It has a wide range of applications including environmental monitoring, military surveillance and mineral exploration, among numerous others [91,92]. Figuratively speaking, a hyperspectral image can be treated as a 3D (x, y, λ) data cube, where x and y represent two spatial dimensions of the scene, and λ represents the spectral dimension comprising a range of wavelengths. Typically, such hyperspectral cubes are collected by an airborne sensor or a satellite and sent to a ground station on Earth for subsequent processing. Noting that the dimension of λ is usually in the hundreds, hyperspectral cubes are of a fairly large size even for moderate dimensions of x and y. This makes it necessary to devise effective techniques for hyperspectral data compression, due to the limited bandwidth of the link connection between the satellite/aerospace and the ground station. In the last few years, a popular hyperspectral compressive sensing (HCS) scheme based on the principle of compressive sensing for hyperspectral data compression, as illustrated in Fig. 3, has been extensively investigated. Similar to other compressive sensing problems, the main objectives of HCS are to design easy hardware encoding implementation and develop an efficient sparse reconstruction procedure. In what follows we shall focus primarily on the latter. For hardware implementation, interested readers are referred to [93,94] and references therein. Figure 3. View largeDownload slide The framework of hyperspectral compressive sensing. Figure 3. View largeDownload slide The framework of hyperspectral compressive sensing. Like other natural images, hyperspectral images can be sparsified by using certain transformations. Traditional sparse recovery methods such as ℓ1 minimization and TV minimization are often used for such purposes; see e.g. [95,96]. To further exploit the inherent spectral correlation of hyperspectral images, a series of work based on low-rank modeling has been carried out in recent years. For example, Golbabaee and Vandergheynst proposed in [97] a joint nuclear and ℓ2/ℓ1 minimization method to describe the spectral correlation and the joint-sparse spatial wavelet representations of hyperspectral images. Then, they modeled the spectral correlation together with the spatial piecewise smoothness of hyperspectral images by using a joint nuclear and TV norm minimization method [98]. Similar to the surveillance videos, however, modeling a hyperspectral cube as a matrix cannot utilize the finer spatial-and-spectral information, leading to suboptimal reconstruction results under relatively low sampling ratios (e.g. 1% of the whole image size). To further exploit the compressibility underlying a hyperspectral cube, one may consider such a single cube as a tensor with three modes (width, height and band) and then identify the hidden spatial-and-spectral structures using direct tensor modeling techniques. Precisely, all the bands of a hyperspectral image have very strong correlation in the spectral domain, and each band, if considered as a matrix, has relatively strong correlation in the spatial domain; such spatial-and-spectral correlation can be modeled through low-rank Tucker decomposition. In addition, the intensity at each voxel is likely to be similar to its neighbors, which can be characterized by smoothness using the so-called 3D-TV penalty. In summary, [99] considered the following joint tensor Tucker decomposition and 3D-TV minimization model: \begin{eqnarray} {\min _{{\begin{array}{c}\mathcal {X}, \mathcal {E},\mathcal {G}, \nonumber\\ \mathbf {U}_{1},\mathbf {U}_{2},\mathbf {U}_{3}\\ \end{array}}} \lambda \Vert \mathcal {X}\Vert _{\text{3D-TV}} + \frac{1}{2}\Vert \mathcal {E}\Vert _{\text{F}}^{2}} \nonumber\\ {\rm subject \ to}\,\mathcal {X}&=&\mathcal {G} \times _1 \mathbf {U}_1 \times _2 \mathbf {U}_2 \times _3 \mathbf {U}_3+\mathcal {E}, \nonumber\\ y &=& \mathcal {A}(\mathcal {X}). \end{eqnarray} (25) It is clear that the above minimization problem is highly nonconvex. One often looks for good local solutions using a multi-block ADMM algorithm. Fig. 4 gives an example of the first band of four hyperspectral datasets, respectively reconstructed by different methods, with the sampling ratio at 1%. It is evident that the proposed tensor method could provide nearly perfect reconstruction. In addition, sparse tensor and nonlinear compressive sensing (ST-NCS) performs slightly better than Kronecker compressive sensing (KCS) and joint nuclear/TV norm minimization (JNTV) in terms of reconstruction accuracy, because of the use of a direct tensor sparse representation of a hyperspectral cube. Both these findings demonstrate the power of tensor modeling techniques. It is also worth noting that, compared with ST-NCS, the images reconstructed with the method (25) are clearer and sharper. Figure 4. View largeDownload slide Vision comparison of the tensor method (25) over three other competing methods on the first band of four different hyperspectral datasets (a)–(d). Here the sampling ratio is 1%. The last column shows the original image bands. The columns from the first to the fourth correspond to the produced results from the KCS method [96], the JNTV method [98], the ST-NCS method [100] and the tensor method (25), respectively. Figure 4. View largeDownload slide Vision comparison of the tensor method (25) over three other competing methods on the first band of four different hyperspectral datasets (a)–(d). Here the sampling ratio is 1%. The last column shows the original image bands. The columns from the first to the fourth correspond to the produced results from the KCS method [96], the JNTV method [98], the ST-NCS method [100] and the tensor method (25), respectively. Acknowledgements The authors thank the associate editor and three referees for helpful comments. FUNDING This work was supported in part by the National Natural Science Foundation of China (11501440 and 61273020 to Y.W., 61373114, 61661166011 and 61721002 to D.Y.M.), the National Basic Research Program of China (973 Program) (2013CB329404 to D.Y.M.) and the National Science Foundation (DMS-1265202 to M.Y.). REFERENCES 1. Liu J , Musialski P , Wonka P et al. Tensor completion for estimating missing values in visual data . In: Proceedings of International Conference on Computer Vision , 2009 . 2. Tomioka R , Hayashi K , Kashima H . Estimation of low-rank tensors via convex optimization. arXiv:10100789. 3. Gandy S , Recht B , Yamada I . Tensor completion and low-n-rank tensor recovery via convex optimization . Inverse Probl 2011 ; 27 : 025010 . Google Scholar Crossref Search ADS 4. Liu J , Musialski P , Wonka P . Tensor completion for estimating missing values in visual data . IEEE Trans Pattern Anal Mach Intell 2013 ; 34 : 208 – 20 . Google Scholar Crossref Search ADS 5. Yuan M , Zhang CH . On tensor completion via nuclear norm minimization . Found Comput Math 2016 ; 16 : 1031 – 68 . Google Scholar Crossref Search ADS 6. Chen SS , Donoho DL , Saunders MA . Atomic decomposition by basis pursuit . SIAM J Sci Comput 1998 ; 20 : 33 – 61 . Google Scholar Crossref Search ADS 7. Donoho DL , Huo X . Uncertainty principles and ideal atomic decomposition . IEEE Trans Inform Theor 2001 ; 47 : 2845 – 62 . Google Scholar Crossref Search ADS 8. Donoho DL , Elad M . Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization . Proc Natl Acad Sci 2003 ; 100 : 2197 – 202 . Google Scholar Crossref Search ADS PubMed 9. Candès E , Tao T . Decoding by linear programming . IEEE Trans Inform Theor 2005 ; 51 : 4203 – 15 . Google Scholar Crossref Search ADS 10. Candès E , Romberg J , Tao T . Robust uncertainty principles: exact vector reconstruction from highly incomplete frequency information . IEEE Trans Inform Theor 2006 ; 52 : 489 – 509 . Google Scholar Crossref Search ADS 11. Donoho D . Compressed sensing . IEEE Trans Inform Theor 2006 ; 52 : 1289 – 306 . Google Scholar Crossref Search ADS 12. Candès E , Tao T . Near-optimal signal recovery from random projections: universal encoding strategies . IEEE Trans Inform Theor 2006 ; 52 : 5406 – 25 . Google Scholar Crossref Search ADS 13. Candès E , Romberg J , Tao T . Stable signal recovery from incomplete and inaccurate measurements . Comm Pure Appl Math 2006 ; 59 : 1207 – 23 . Google Scholar Crossref Search ADS 14. Candès E , Recht B . Exact matrix completion via convex optimization . Found Comput Math 2009 ; 9 : 717 – 72 . Google Scholar Crossref Search ADS 15. Candès E , Tao T . The power of convex relaxation: near-optimal matrix completion . IEEE Trans Inform Theor 2010 ; 56 : 2053 – 80 . Google Scholar Crossref Search ADS 16. Gross D . Recovering low-rank matrices from few coefficients in any basis . IEEE Trans Inform Theor 2011 ; 57 : 1548 – 66 . Google Scholar Crossref Search ADS 17. Candès E , Li X , Ma Y et al. Robust principal component analysis? J ACM 2011 ; 58 : 1 – 39 . Google Scholar Crossref Search ADS 18. Chandrasekaran V , Sanghavi S , Parrilo P et al. Rank-sparsity incoherence for matrix decomposition . SIAM J Optim 2011 ; 21 : 572 – 96 . Google Scholar Crossref Search ADS 19. Natarajan B . Sparse approximate solutions to linear systems . SIAM J Comput 1995 ; 24 : 227 – 34 . Google Scholar Crossref Search ADS 20. Donoho D , Elad M , Temlyakov VN . Stable recovery of sparse overcomplete representations in the presence of noise . IEEE Trans Inform Theor 2006 ; 52 : 6 – 18 . Google Scholar Crossref Search ADS 21. Cohen A , Dahmen W , DeVore R . Compressed sensing and best k-term approximation . J Am Math Soc 2009 ; 22 : 211 – 31 . Google Scholar Crossref Search ADS 22. Eldar Y , Kutyniok G . Compressed Sensing: Theory and Applications . Cambridge : Cambridge University Press , 2012 . 23. Foucart S , Rauhut H . A Mathematical Introduction to Compressive Sensing . Berlin : Springer , 2013 . 24. Cai TT , Wang L , Xu G . New bounds for restricted isometry constants . IEEE Trans Inform Theor 2010 ; 56 : 4388 – 94 . Google Scholar Crossref Search ADS 25. Cai TT , Zhang A . Compressed sensing and affine rank minimization under restricted isometry . IEEE Trans Signal Process 2013 ; 61 : 3279 – 90 . Google Scholar Crossref Search ADS 26. Candès E . The restricted isometry property and its implications for compressed sensing . Compt Rendus Math 2008 ; 346 : 589 – 92 . Google Scholar Crossref Search ADS 27. Cai TT , Wang L , Xu G . Shifting inequality and recovery of sparse signals . IEEE Trans Signal Process 2010 ; 58 : 1300 – 8 . Google Scholar Crossref Search ADS 28. Cai TT , Zhang A . Sparse representation of a polytope and recovery of sparse signals and low-rank matrices . IEEE Trans Inform Theor 2014 ; 60 : 122 – 32 . Google Scholar Crossref Search ADS 29. DeVore R . Deterministic constructions of compressed sensing matrices . J Complex 2007 ; 23 : 918 – 25 . Google Scholar Crossref Search ADS 30. Bourgain J , Dilworth SJ , Ford K et al. Explicit constructions of RIP matrices and related problems . Duke Math J 2011 ; 159 : 145 – 85 . Google Scholar Crossref Search ADS 31. Xu Z . Deterministic sampling of sparse trigonometric polynomials . J Complex 2011 ; 27 : 133 – 40 . Google Scholar Crossref Search ADS 32. Eldar Y , Mishali M . Robust recovery of signals from a structured union of subspaces . IEEE Trans Inform Theor 2009 ; 55 : 5302 – 16 . Google Scholar Crossref Search ADS 33. Wang Y , Wang J , Xu Z . On recovery of block-sparse signals via mixed ℓ2/ℓq(0 < q ≤ 1) norm minimization . EURASIP J Adv Signal Process 2013 ; 76 : 1 – 17 . 34. Chartrand R . Exact reconstruction of sparse signals via nonconvex minimization . IEEE Signal Process Lett 2007 ; 14 : 707 – 10 . Google Scholar Crossref Search ADS 35. Sun Q . Recovery of sparsest signals via ℓq-minimization . Appl Comput Harmon Anal 2012 ; 32 : 329 – 41 . Google Scholar Crossref Search ADS 36. Song CB , Xia ST . Sparse signal recovery by ℓq minimization under restricted isometry property . IEEE Signal Process Lett 2014 ; 21 : 1154 – 8 . Google Scholar Crossref Search ADS 37. Xu Z , Zhang H , Wang Y et al. ℓ1/2 regularization . Sci China Inform Sci 2010 ; 53 : 1159 – 69 . Google Scholar Crossref Search ADS 38. Xu Z , Chang X , Xu F et al. ℓ1/2 regularization: a thresholding representation theory and a fast solver . IEEE Trans Neural Network Learn Syst 2012 ; 23 : 1013 – 27 . Google Scholar Crossref Search ADS 39. Zeng J , Lin S , Wang Y et al. ℓ1/2 regularization: convergence of iterative half thresholding algorithm . IEEE Trans Signal Process 2014 ; 62 : 2317 – 29 . Google Scholar Crossref Search ADS 40. Fan J , Li R . Variable selection via nonconcave penalized likelihood and its oracle properties . J Am Stat Assoc 2001 ; 96 : 1348 – 60 . Google Scholar Crossref Search ADS 41. Zhang CH . Nearly unbiased variable selection under minimax concave penalty . Ann Stat 2010 ; 38 : 894 – 942 . Google Scholar Crossref Search ADS 42. Rauhut H , Schnass K , Vandergheynst P . Compressed sensing and redundant dictionaries . IEEE Trans Inform Theor 2013 ; 29 : 1401 – 12 . 43. Candès E , Eldar Y , Needell D et al. Compressed sensing with coherent and redundant dictionaries . Appl Comput Harmon Anal 2010 ; 31 : 59 – 73 . Google Scholar Crossref Search ADS 44. Elad M , Milanfar P , Rubinstein R . Analysis versus synthesis in signal priors . Appl Comput Harmon Anal 2007 ; 23 : 947 – 68 . Google Scholar Crossref Search ADS 45. Lin J , Li S , Shen Y . New bounds for restricted isometry constants with coherent tight frames . IEEE Trans Signal Process 2013 ; 61 : 611 – 21 . Google Scholar Crossref Search ADS 46. Lin J , Li S . Sparse recovery with coherent tight frames via analydsis Dantzig selector and analysis LASSO . Appl Comput Harmon Anal 2014 ; 37 : 126 – 39 . Google Scholar Crossref Search ADS 47. Liu Y , Mi T , Li S . Compressed sensing with general frames via optimal-dual-based ℓ1-analysis . IEEE Trans Inform Theor 2012 ; 58 : 4201 – 14 . Google Scholar Crossref Search ADS 48. Li S , Lin J . Compressed sensing with coherent tight frames via ℓq-minimization for 0 < q ≤ 1 . Inverse Probl Imag 2014 ; 8 : 761 – 77 . Google Scholar Crossref Search ADS 49. Moravec M , Romberg J , Baraniuk R . Compressive phase retrieval . In: Proceedings of SPIE , the International Society for Optics and Photonics , 2007 . 50. Yang Z , Zhang C , Xie L . Robust compressive phase retrieval via ℓ1 minimization with application to image reconstruction. arXiv:13020081 . 51. Voroninski V , Xu Z . A strong restricted isometry property, with an application to phaseless compressed sensing . Appl Comput Harmon Anal 2016 ; 40 : 386 – 95 . Google Scholar Crossref Search ADS 52. Schniter P , Rangan S . Compressive phase retrieval via generalized approximate message passing . IEEE Trans Signal Process 2015 ; 63 : 1043 – 55 . Google Scholar Crossref Search ADS 53. Foygel R , Shamir O , Srebro N et al. Learning with the weighted trace-norm under arbitrary sampling distributions . In: Proceedings of Advances in Neural Information Processing Systems 24 , 2011 . 54. Chen Y , Bhojanapalli S , Sanghavi S et al. Coherent matrix completion . In: Proceedings of the 31st International Conference on Machine Learning , 2014 . 55. Cai TT , Zhou WX . Matrix completion via max-norm constrained optimization . Electron J Stat 2016 ; 10 : 1493 – 525 . Google Scholar Crossref Search ADS 56. Recht B , Fazel M , Parrilo P . Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization . SIAM Rev 2010 ; 52 : 471 – 501 . Google Scholar Crossref Search ADS 57. Candès E , Plan Y . Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements . IEEE Trans Inform Theor 2011 ; 57 : 2342 – 59 . Google Scholar Crossref Search ADS 58. Eriksson B , Balzano L , Nowak R . High-rank matrix completion . In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics , 2012 . 59. Elhamifar E . High-rank matrix completion and clustering under self-expressive models . In: Proceedings of Advances in Neural Information Processing Systems , 2016 . 60. Li CG , Vidal R . A structured sparse plus structured low-rank framework for subspace clustering and completion . IEEE Trans Signal Process 2016 ; 64 : 6557 – 70 . Google Scholar Crossref Search ADS 61. Zhou ZH , Li X , Wright J et al. Stable principal component pursuit . In: Proceedings of the 2010 IEEE International Symposium on Information Theory , 2010 . 62. Ganesh A , Wright J , Li X et al. Dense error correction for low-rank matrices via principal component pursuit . In: Proceedings of the 2010 IEEE International Symposium on Information Theory , 2010 . 63. Zhao Q , Meng D , Xu Z et al. Robust principal component analysis with complex noise . In: Proceedings of the 31st International Conference on Machine Learning , 2014 . 64. Netrapalli P , Niranjan U , Sanghavi S et al. Non-convex robust PCA . In: Proceedings of Advances in Neural Information Processing Systems 27 , 2014 . 65. Zhang M , Huang ZH , Zhang Y . Restricted p-isometry properties of nonconvex matrix recovery . IEEE Trans Inform Theor 2013 ; 59 : 4316 – 23 . Google Scholar Crossref Search ADS 66. Wang J , Wang M , Hu X et al. Visual data denoising with a unified Schatten-p norm and ℓq norm regularized principal component pursuit . Pattern Recogn 2015 ; 48 : 3135 – 44 . Google Scholar Crossref Search ADS 67. Zhao Q , Meng D , Xu Z et al. ℓ1-norm low-rank matrix factorization by variational Bayesian method . IEEE Trans Neural Network Learn Syst 2015 ; 26 : 825 – 39 . Google Scholar Crossref Search ADS 68. Yue MC , So AMC . A perturbation inequality for concave functions of singular values and its applications in low-rank matrix recovery . Appl Comput Harmon Anal 2016 ; 40 : 396 – 416 . Google Scholar Crossref Search ADS 69. Korah T , Rasmussen C . Spatio-temporal inpainting for recovering texture maps of occluded building facades . IEEE Trans Image Process 2007 ; 16 : 2262 – 71 . Google Scholar Crossref Search ADS PubMed 70. Pauly M , Mitra N , Giesen J et al. Example-based 3D scan completion . In: Proceedings of the Symposium on Geometry Processing , 2005 . 71. Acar E , Dunlavy D , Kolda T et al. Scalable tensor factorizations for incomplete data . Chemometr Intell Lab Syst 2001 ; 106 : 41 – 56 . Google Scholar Crossref Search ADS 72. Xie K , Wang L , X W et al. Accurate recovery of internet traffic data: a tensor completion approach . In: Proceedings of the 35th Annual IEEE International Conference on Computer Communications , 2016 . 73. Peng Y , Meng D , Xu Z et al. Decomposable nonlocal tensor dictionary learning for multispectral image denoising . In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition , 2014 . 74. Hillar CJ , Lim LH . Most tensor problems are NP-hard . J ACM 2013 ; 60 : 1 – 39 . Google Scholar Crossref Search ADS 75. Nie J , Wang L . Semidefinite relaxations for best rank-1 tensor approximations . SIAM J Matrix Anal Appl 2014 ; 35 : 1155 – 79 . Google Scholar Crossref Search ADS 76. Jiang B , Ma S , Zhang S . Tensor principal component analysis via convex optimization . Math Program 2015 ; 150 : 423 – 57 . Google Scholar Crossref Search ADS 77. Yang Y , Feng Y , Suykens J . A rank-one tensor updating algorithm for tensor completion . IEEE Signal Process Lett 2015 ; 22 : 1633 – 7 . Google Scholar Crossref Search ADS 78. Zhao Q , Meng D , Kong X et al. A novel sparsity measure for tensor recovery . In: Proceedings of International Conference on Computer Vision , 2015 . 79. Xie Q , Zhao Q , Meng D et al. Kronecker-Basis-Representation based tensor sparsity and its Applications to tensor recovery . IEEE Trans Pattern Anal Mach Intell 2017 ; 40 : 1888 – 902 . 80. Wang T , Backhouse A , Gu I . Online subspace learning on Grassmann manifold for moving object tracking in video . In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process , 2008 . 81. Beleznai C , Fruhstuck B , Bischof H . Multiple object tracking using local PCA . In: Proceedings of the 18th International Conference on Pattern Recognition , 2006 . 82. Wakin M , Laska JN , Duarte MF et al. Compressive imaging for video representation and coding . In: Proceedings of Picture Coding Symposium , 2006 . 83. Takhar D , Laska JN , Wakin M et al. A new compressive imaging camera architecture using optical-domain compression . In: Proceedings of Computational Imaging IV at SPIE Electronic Imaging , 2006 . 84. Duarte M , Davenport M , Takhar D et al. Single-pixel imaging via compressive sampling . IEEE Signal Process Mag 2008 ; 25 : 83 – 91 . Google Scholar Crossref Search ADS 85. Cevher V , Sankaranarayanan A , Duarte M et al. Compressive sensing for background subtraction . In: Proceedings of the 10th European Conference on Computer Vision , 2008 . 86. Waters A , Sankaranarayanan A , Baraniuk R et al. A new compressive imaging camera architecture using optical-domain compression . In: Proceedings of Conference on Neural Information Processing Systems 24 , 2011 . 87. Guo H , Qiu CL , Vaswani N . An online algorithm for separating sparse and low-dimensional signal sequences from their sum . IEEE Trans Signal Process 2014 ; 62 : 4284 – 97 . Google Scholar Crossref Search ADS 88. Jiang H , Deng W , Shen Z . Surveillance video processing using compressive sensing . Inverse Probl Imag 2014 ; 6 : 201 – 14 . Google Scholar Crossref Search ADS 89. Jiang H , Zhao S , Shen Z et al. Surveillance video analysis using compressive sensing with low latency . Bell Labs Tech J 2014 ; 18 : 63 – 74 . Google Scholar Crossref Search ADS 90. Cao W , Wang Y , Sun J et al. Total variation regularized tensor RPCA for background subtraction from compressive measurements . IEEE Trans Image Process 2016 ; 25 : 4075 – 90 . Google Scholar Crossref Search ADS PubMed 91. Goetz AFH . Three decades of hyperspectral remote sensing of the Earth: a personal view . Rem Sens Environ 2009 ; 113 : S5 – S6 . Google Scholar Crossref Search ADS 92. Willett R , Duarte M , Davenport M et al. Sparsity and structure in hyperspectral imaging: sensing, reconstruction, and target detection . IEEE Signal Process Mag 2014 ; 31 : 116 – 26 . Google Scholar Crossref Search ADS 93. Arce G , Brady D , Carin L et al. Compressive coded aperture spectral imaging: an introduction . IEEE Signal Process Mag 2014 ; 31 : 105 – 15 . Google Scholar Crossref Search ADS 94. Yuan X , Tsai TH , Zhu R et al. Compressive hyperspectral imaging with side information . IEEE J Sel Top Signal Process 2015 ; 9 : 964 – 76 . Google Scholar Crossref Search ADS 95. Li C , Sun T , Kelly KF et al. A compressive sensing and unmixing scheme for hyperspectral data processing . IEEE Trans Image Process 2012 ; 21 : 1200 – 10 . Google Scholar Crossref Search ADS PubMed 96. Duarte M , Baraniuk R . Kronecker compressive sensing . IEEE Trans Image Process 2012 ; 21 : 494 – 504 . Google Scholar Crossref Search ADS PubMed 97. Golbabaee M , Vandergheynst P . Hyperspectral image compressed sensing via low-rank and joint-sparse matrix recovery . In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , 2012 . 98. Golbabaee M , Vandergheynst P . Joint trace/TV norm minimization: a new efficient approach for spectral compressive imaging . In: Proceedings of 19th IEEE International Conference on Image Processing , 2012 . 99. Wang Y , Lin L , Zhao Q et al. Compressive sensing of hyperspectral images via joint tensor tucker decomposition and weighted total variation regularization . IEEE Geosci Rem Sens Lett 2017 ; 14 : 2457 – 61 . 100. Yang S , Wang M , Li P et al. Compressive hyperspectral imaging via sparse tensor and nonlinear compressed sensing . IEEE Trans Geosci Rem Sens 2015 ; 53 : 5943 – 57 . Google Scholar Crossref Search ADS © The Author(s) 2017. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

National Science ReviewOxford University Press

Published: Sep 1, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off