Structured sampling and fast reconstruction of smooth graph signals

Structured sampling and fast reconstruction of smooth graph signals Abstract This work concerns sampling of smooth signals on arbitrary graphs. We first study a structured sampling strategy for such smooth graph signals that consists of a random selection of few pre-defined groups of nodes. The number of groups to sample to stably embed the set of k-bandlimited signals is driven by a quantity called the group graph cumulative coherence. For some optimized sampling distributions, we show that sampling $$O( {k}\log ( {k}))$$ groups is always sufficient to stably embed the set of k-bandlimited signals, but that this number can be smaller—down to $$O(\log ( {k}))$$—depending on the structure of the groups of nodes. Fast methods to approximate these sampling distributions are detailed. Secondly, we consider k-bandlimited signals that are nearly piecewise constant over pre-defined groups of nodes. We show that it is possible to speed up the reconstruction of such signals by reducing drastically the dimension of the vectors to reconstruct. When combined with the proposed structured sampling procedure, we prove that the method provides stable and accurate reconstruction of the original signal. Finally, we present numerical experiments that illustrate our theoretical results and, as an example, show how to combine these methods for interactive object segmentation in an image using superpixels. 1. Introduction This work was initially inspired by studying developments in edit propagation for interactive image or video manipulations, where the goal is to propagate operations made by a user in some parts of the image to the entire image, e.g., propagate foreground/background scribbles for object segmentation [10,15,17,20,40] or propagate manual color/tone modifications [15,17,24,27,40]. First, a graph $$ { {\mathscr{G}}}$$ modelling the similarities between the pixels of the image is built. Secondly, the user-edits specify values at some nodes (pixels) of $$ { {\mathscr{G}}}$$. Finally, the complete signal is estimated by assuming that it is smooth on $$\mathscr{G}$$. The quality of the propagation depends on the structure of $$\mathscr{G}$$ and on the location of annotated nodes. If a part of the image is weakly connected to the rest, the user-edits do not propagate well to this region unless this region is edited directly. Therefore, highlighting beforehand which regions or groups of nodes are important to edit to ensure a good propagation would be a useful feature to facilitate user interactions. Furthermore, designing a fast reconstruction/propagation method is also important so that the user can visualize immediately the effect of his inputs. These two problems can be addressed using graph signal processing results [35], or, more precisely, using results from the sampling theory of smooth or k-bandlimited graph signals. Indeed, the propagation of the user-edits is exactly a reconstruction of smooth graph signals from few samples. The user could thus be offered to edit few nodes from which the theory guarantees that a stable and accurate reconstruction is possible. Graph signal reconstruction algorithms developed in this field could also, probably, be used to improve and accelerate the reconstruction process. In the context of graph signal processing, smooth or k-bandlimited signal is a widely used and studied model. Several sampling methods have been designed to sample such signals. Pesenson introduced the notion of uniqueness set (of nodes) for k-bandlimited graph signals in [30,31]. Two different k-bandlimited signals are also necessarily different when restricted to a uniqueness set. Therefore, one can sample all k-bandlimited signals on a uniqueness set. Then, Anis et al. [3,5] and Chen et al. [12,13] proved that one can always find a unique set of k nodes to sample all k-bandlimited signals. Finding this set is however computationally expensive. In [4], graph spectral proxies are used to find such a set more efficiently. Yet the combinatorial problem that needs to be solved to find such a set makes the method still difficult to use for large graphs. Other authors used the idea of random sampling to be able to handle large graphs [13,14,32]. Recently, Puy et al. [32] proved that there always exists a random sampling strategy for which sampling $$O( {k}\log ( {k}))$$ nodes are sufficient to stably embed the set of bandlimited signals. 1.1. Illustrating some limits of the current theory As an example, let us take the task of interactive object segmentation with the image presented in Fig. 1. The object to segment is the tiger. Following a procedure similar to the one described in [16], we first build a graph $$\mathscr{G}$$ that models similarities between the pixels of the image and suppose that the segmentation map to estimate—the binary signal indicating the presence or absence of the object at each pixel—is k-bandlimited on $$\mathscr{G}$$ (see Section 5.2 for details). Once $$\mathscr{G}$$ is constructed, the sampling theory of smooth graph signals indicates how to determine a set about k nodes from which one can reconstruct all k-bandlimited signals, thus including the segmentation map of interest. We can thus propose to the user to label these sampled k nodes to segment the object, the labels indicating whether each sampled pixel belongs to the object or not. We present in Fig. 1 the optimal sampling distribution as computed in [32] and a random draw of about 8% of the pixels according to this sampling method (see Section 5.2 for details). Whilst labelling these pixels guarantees an accurate estimation of the segmentation map, we find this labelling task not user-friendly. We notice that a non-negligible number of pixels are located near the edges of the object. It is cumbersome to determine whether these pixels belong to the object or not. More importantly, experiments in Section 5.2 show that too large a number of sampled pixels for manual labelling are necessary to achieve satisfactory results with this method. Fig. 1. View largeDownload slide One image (top left) and its partition into superpixels with SLIC technique [1] (top right). The optimal sampling distribution as computed in [32] and one draw of pixels according to this distribution (the sampled pixels appears in white). Fig. 1. View largeDownload slide One image (top left) and its partition into superpixels with SLIC technique [1] (top right). The optimal sampling distribution as computed in [32] and one draw of pixels according to this distribution (the sampled pixels appears in white). In order to facilitate the labelling task for the user, we partition the image into superpixels, e.g., with the SLIC technique [1]. A superpixel is a small group of spatially connected pixels where the image varies only little. We see in Fig. 1 that the superpixels divide the image into homogeneous regions. The superpixels follow the edges and most of them thus belong to either the tiger or the background, but rarely both. Superpixels are sometimes used among other things to speed up computations, e.g., in [15]. The advantage of using superpixels is twofold in this application. First, it is easier to determine if a superpixel belongs to the object of interest than if a pixel belongs to this object, especially at the boundaries, as recently exploited for segmentation on touchscreens [22]. Labelling superpixels also permits to label many pixels at the same time—by attributing to all pixels within a superpixel the label of the superpixels. Proposing superpixels rather than pixels to edit to the user is thus much more effective. Experiments in Section 5.2 show that one needs to sample fewer superpixels than pixels to reach the same segmentation quality. Secondly, one can notice that the segmentation map, beyond being smooth on $$\mathscr{G}$$, will also be approximately piecewise constant on the superpixels. We exploit this property to reduce the dimension of the problem and design a faster reconstruction method. The idea is to reconstruct the mean value of the segmentation map within each superpixel instead of the value of the segmentation map at each pixel. Whilst the above modifications might seem simple to implement, they raise several theoretical questions. First, we should answer how this structured sampling affects the sampling performance, e.g., how many superpixels should be sampled to ensure an accurate reconstruction of k-bandlimited signals? What is the best sampling distribution? Secondly, we should also quantify the loss of reconstruction quality when estimating only the mean value of the segmentation map instead of the segmentation map itself. Let us remark that the impact of the proposed sampling technique is not limited to interactive object segmentation. For example, the exact same technique can be applied to image colorization, color/tone modifications or matting [16]. This technique might also be useful to speed up the compressive spectral clustering algorithm presented in [38]. In particular, one could accelerate the last step of this algorithm—the reconstruction of the indicator function of each cluster—by pre-grouping few nodes belonging to a same cluster rapidly, e.g. with few steps of simple agglomerative clustering. 1.2. Contributions In this paper, we study a random structured sampling strategy for k-bandlimited signals where we sample few groups of nodes instead of sampling individual nodes. The random sampling strategy that we propose generalizes the method proposed in [32]. Let us already acknowledge that such strategies are also studied in the field of compressed sensing [2,7,9,11] and that some of our solutions are directly inspired by these works. First, we show that the importance of sampling each group can be quantified by a parameter that we call the group graph cumulative coherence, which generalizes the concept of graph cumulative coherence introduced in [32], and characterizes how much the energy of k-bandlimited signals can stay concentrated in each group of nodes. In particular, we show that the number of groups to sample is directly linked to this quantity. Secondly, just as in [32], we optimize the sampling distribution to minimize the number of groups to sample. With this optimized sampling distribution, we prove that, in the worst case, sampling $$O( {k} \log ( {k}))$$ groups of nodes is sufficient to ensure the reconstruction of all k-bandlimited signals. As each group can contain many nodes, we might have to sample a large number of nodes. This is the potential price to pay when sampling the nodes by groups. Fortunately, a smaller number of groups—down to $$O(\log ( {k}))$$—might already be sufficient if the groups are well designed. Thirdly, we also describe a method to estimate the optimal sampling distribution without the need of computing the graph Fourier matrix and which thus scales better to large graphs. Fourthly, estimating the optimal sampling distribution may still be too slow when a large number of groups are involved. We thus also present a sufficient recovery condition that involves a relaxed version of group graph cumulative coherence. The sampling distribution that minimizes this relaxed coherence is fast to estimate for large graphs and large number of groups. With this sampling strategy, we prove that sampling $$O( {k} \log ( {k}))$$ groups is always sufficient to ensure the reconstruction of all k-bandlimited signals. This strategy is mainly interesting at small k. Finally, in order to build a fast reconstruction technique for k-bandlimited signals, we use the intuition that a smooth graph signal is a signal that varies slowly from one node to its connected nodes. If we group few (strongly) connected nodes together, we usually expect a bandlimited signal to be essentially constant on this set of nodes; as long as we do not group together too many weakly connected nodes. We propose here to use this property to drastically reduce the dimension of the reconstruction problem and accelerate this reconstruction. When combined with the proposed sampling technique, we prove that this fast method provides stable and accurate reconstructions of the signals of interest. 1.3. Notations and definitions For any vector $$ \boldsymbol{x} \in {\mathbb{R}}^{ {n}_{1}}$$, $$ {\left \| { {\boldsymbol{x}}}\right \|}_{2}$$ denotes the Euclidean norm of x. Depending on the context, xj may represent the jth entry of the vector x or the jth column-vector of the matrix X. The entry on the ith row and jth column of X is denoted by Xij. The identity matrix is denoted by I (its dimensions are determined by the context). We consider that $$\mathscr{G} = \{\mathscr{V}, \mathscr{E}, {\mathsf{W}} \}$$ is an undirected weighted graph, where $$\mathscr{V}$$ is the set of n nodes, $$\mathscr{E}$$ is the set of edges and $$ {\mathsf{W}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the weighted adjacency matrix with non-negative entries. We denote the graph Laplacian by $$ { {\mathsf{L}}} \in {\mathbb{R}}^{ {n} \times {n}}$$. We assume that L is real, symmetric and positive semi-definite, e.g., the combinatorial graph Laplacian L := D −W or the normalized one L := I −D−1/2WD−1/2. The matrix $$ {\mathsf{D}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the diagonal degree matrix and $$ {\mathsf{I}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the identity matrix [18]. The diagonal degree matrix D has entries $$D_{ii} := \sum _{i \neq j} {\mathsf{W}}_{ij}$$. We denote by $$ { {\mathsf{U}}} \in {\mathbb{R}}^{ {n} \times {n}}$$ the orthonormal eigenvectors of L and by $$0 = {\lambda }_{1} {\leqslant } \ldots {\leqslant } {\lambda }_{n}$$ the ordered real eigenvalues of L. We have $$ { {\mathsf{L}}} = { {\mathsf{U}}} { {\mathsf{\Lambda }}} { {\mathsf{U}}}^{ {{\intercal }}}$$, where $$ { {\mathsf{\Lambda }}} := {{\textrm{diag}}}( {\lambda }_{1}, \ldots , {\lambda }_{n}) \in {\mathbb{R}}^{ {n} \times {n}}$$. The matrix U is the graph Fourier basis [35]. For any signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ defined on the nodes of the graph $$\mathscr{G}$$, $$\hat{ { {\boldsymbol{x}}}} = { {\mathsf{U}}}^{ {{\intercal }}} { {\boldsymbol{x}}}$$ contains the Fourier coefficients of x ordered in increasing frequencies. This work deals with k-bandlimited signals $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ on $$\mathscr{G}$$, i.e., signals whose Fourier coefficients $$\hat{ { {\boldsymbol{x}}}}_{k+1}, \ldots , \hat{ { {\boldsymbol{x}}}}_{ {n}}$$ are null. Let Uk be the restriction of U to its first k vectors:   \begin{align} {{\mathsf{U}}}_{{k}} := \left( {{\boldsymbol{u}}}_{1}, \ldots, {{\boldsymbol{u}}}_{{k}} \right) \in{\mathbb{R}}^{{n} \times{k}}. \end{align} (1.1) Definition 1.1 (k-bandlimited signal on $$\mathscr{G}$$) A signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ defined on the nodes of the graph $$\mathscr{G}$$ is k-bandlimited with $$ {k} \in {\mathbb{N}}\setminus \{0\}$$ if x ∈ span(Uk), i.e., there exists $$ {\boldsymbol{\eta }} \in {\mathbb{R}}^{k}$$ such that   \begin{align} {{\boldsymbol{x}}} = {{\mathsf{U}}}_{{k}} {\boldsymbol{\eta}}. \end{align} (1.2) This definition was also used in [3,13,32]. We assume that $$ {\lambda}_{ {k}} \neq {\lambda }_{ {k}+1}$$ to avoid any ambiguity in the definition of k-bandlimited signals. Finally, for any matrix $$ {\mathsf{X}} \in {\mathbb{R}}^{ {n}_{1} \times {n}_{2}}$$, $$ {\left \| {\mathsf{X}}\right \|}_{2}$$ denotes its spectral norm and $$ {\left \| {\mathsf{X}}\right \|}_{F}$$ its Frobenius norm; when n1 = n2, $$\lambda _{\max }( {\mathsf{X}})$$ denotes its largest eigenvalue and $$\lambda _{\min }( {\mathsf{X}})$$ its smallest eigenvalue. We present in Fig. 2 a representation of the important variables and processes involved in this paper in order to facilitate the understanding of the different results. Fig. 2. View largeDownload slide Summary of the main variables and processes involved in the presented sampling strategies. All variables are defined when needed hereafter. Fig. 2. View largeDownload slide Summary of the main variables and processes involved in the presented sampling strategies. All variables are defined when needed hereafter. 2. Sampling using groups of nodes In this section, we define and study our structured sampling strategy. This sampling generalizes the work in [32], building on, and benefiting from, the principles introduced therein. We highlight during the course of this section which principles are shared between both techniques and what are the important changes. We also summarize these similarities and changes at the end of the section. We start the study with the definition of the groups of nodes. 2.1. Grouping the nodes We consider that the n nodes of $$\mathscr{G}$$ are divided into N different groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}} \subseteq \{1, \ldots , {n}\}$$. We assume that this division into different groups is given by an external algorithm or imposed by the application. The size of the ℓth group is denoted $$ {\left | \mathscr{N}_{\ell } \right |}$$. We suppose that these groups form a partition of {1, …, n}, so that each node belongs to exactly one group. We have   \begin{align} \cup_{\ell=1}^{N} \, \mathscr{N}_{\ell} \; = \; \{1, \ldots, {n} \}\ \textrm{and}\ \mathscr{N}_{\ell} \, \cap \, \mathscr{N}_{\ell^{\prime}} = \emptyset. \end{align} (2.1) For the object segmentation application discussed in the introduction, these groups represent the superpixels. However, we do not impose the groups to be made of neighbouring nodes in the graph; they can be made of nodes ‘far’ from each other. For each group $$\mathscr{N}_{\ell } = \left \{ {n}_{1}^{(\ell )}, \ldots , {n}_{ {\left | \mathscr{N}_{\ell } \right |}}^{(\ell )} \right \}$$, we associate a matrix $$ { {\mathsf{N}}}^{(\ell )}\in {\mathbb{R}}^{ {\left | \mathscr{N}_{\ell } \right |} \times {n}}$$ that restricts a graph signal to the nodes appearing in $$\mathscr{N}_{\ell }$$, i.e.,   \begin{align} {{\mathsf{N}}}^{(\ell)}_{ij} := \begin{cases} 1 & \textrm{for}\ j = {n}_{i}^{(\ell)},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (2.2) Note that   \begin{align} \sum_{\ell=1}^{{N}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)} = {\mathsf{I}}. \end{align} (2.3) The case of overlapping groups can be handled by changing the definition of N(ℓ) to   \begin{align} {{\mathsf{N}}}^{(\ell)}_{ij} := \begin{cases} \beta_{{n}_{i}^{(\ell)}}^{-1/2} & \textrm{for}\ j = {n}_{i}^{(\ell)},\\ 0 & \textrm{otherwise}, \end{cases} \end{align} (2.4) where $$1 {\leqslant } \beta _{i} {\leqslant } {N}$$, i = 1, …, n, is the number of times node i appears in the different groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$. Equation (2.3) also holds in this case. All results presented in Section 2 are valid for overlapping groups with this definition of N(ℓ). 2.2. Sampling the groups The sampling procedure consists of selecting s groups out of the N available ones. In the application of Section 1.1, it corresponds to the selection of the superpixels to label. We select these groups at random using a sampling distribution on {1, …, N} represented by a vector $$ { {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$. The probability of selecting the ℓth group is pℓ. We assume that pℓ > 0 for all ℓ = 1, …, N, so that all groups may be selected with a non-zero probability. We obviously have $$\sum _{\ell =1}^{ {N}} { {\boldsymbol{p}}}_{\ell } = 1$$. The indices Ω := {ω1, …, ωs} of the selected groups are obtained by drawing independently—thus with replacements—s indices from the set {1, …, N} according to p, i.e.,   \begin{align} {\mathbb{P}}(\omega_{j} = \ell) = {{\boldsymbol{p}}}_{\ell},\quad \forall j \in \{1, \ldots, {s} \}\ \textrm{and}\ \forall \ell \in \{1, \ldots, {N} \}. \end{align} (2.5) The selected groups are $$\mathscr{N}_{\omega _{1}}, \ldots , \mathscr{N}_{\omega _{ {s}}}$$ and the total number of selected nodes is   \begin{align} {m} := \sum_{j=1}^{{s}} {\left| \mathscr{N}_{\omega_{j}} \right|}. \end{align} (2.6) Once the groups are selected, we build the sampling matrix $$ { {\mathsf{M}}} \in {\mathbb{R}}^{ {m} \times {n}}$$ that satisfies   \begin{align} {{\mathsf{M}}} := \left( \begin{array}{c} {{\mathsf{N}}}^{(\omega_{1})}\\ \vdots\\{{\mathsf{N}}}^{(\omega_{{s}})} \end{array} \right), \end{align} (2.7) and which restricts any signal to the nodes belonging to the selected groups. For a signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{n}$$ defined on the nodes of $$\mathscr{G}$$, its sampled version $$ { {\boldsymbol{y}}} \in {\mathbb{R}}^{ {m}}$$ satisfies   \begin{align} {{\boldsymbol{y}}} := {{\mathsf{M}}} {{\boldsymbol{x}}}. \end{align} (2.8) Our goal now is to determine what number s is enough to ensure that all k-bandlimited signals can be reconstructed from its sampled version obtained with M. To conduct this study, we need to define few more matrices. First, we associate the matrix   \begin{align} {\mathsf{P}}^{(\ell)} := {{\boldsymbol{p}}}_{\ell}^{-1/2} \; {\mathsf{I}} \in{\mathbb{R}}^{{\left| \mathscr{N}_{\ell} \right|} \times{\left| \mathscr{N}_{\ell} \right|}}, \end{align} (2.9) to each group $$\mathscr{N}_{\ell }$$. Then, once the groups are drawn, we construct the diagonal matrix $$ {\mathsf{P}} \in {\mathbb{R}}^{ {m} \times {m}}$$  \begin{align} {\mathsf{P}} := {{\textrm{diag}}}\left( {\mathsf{P}}^{(\omega_{1})}, \ldots, {\mathsf{P}}^{(\omega_{{s}})} \right). \end{align} (2.10) This matrix takes into account the probability of sampling each group and will be used to rescale y for norm preservation. This matrix ensures that $$ {s}^{-1} \, {\mathbb{E}}_{\varOmega } {\left \| {\mathsf{P}} { {\boldsymbol{y}}}\right \|}_{2}^{2} = {s}^{-1} \, {\mathbb{E}}_{\varOmega } {\left \| {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}}\right \|}_{2}^{2} = {\left \| { {\boldsymbol{x}}}\right \|}_{2}^{2}$$.1 Both matrices P and M depend on Ω and are random. 2.3. Group graph coherence The sampling procedure in [32] is similar to the one proposed here at the difference that the nodes are sampled individually and not by groups. It was proved there that the number of nodes to sample is driven by a quantity called the graph coherence. This quantity measures how the energy of k-bandlimited signals spreads over the nodes. Similarly, we prove here that the number of groups to sample is driven by a quantity that measures the energy of k-bandlimited signals spread over the groups. We now introduce this quantity. The matrix N(ℓ)Uk is the matrix that restricts a k-bandlimited signal to the nodes belonging to $$\mathscr{N}_{\ell }$$. Therefore,   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2} = \sup_{{\boldsymbol{\eta}} \in{\mathbb{R}}^{k} : {\left\| {\boldsymbol{\eta}}\right\|}_{2}=1} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}} {\boldsymbol{\eta}}\right\|}_{2} \end{align} (2.11) measures the energy on the nodes $$\mathscr{N}_{\ell }$$ of the normalized k-bandlimited signal that is most concentrated on $$\mathscr{N}_{\ell }$$. This energy varies between 0 and 1. When this energy is close to 1, there exists a k-bandlimited signal whose energy is essential concentrated on $$\mathscr{N}_{\ell }$$. This signal lives only on the nodes in $$\mathscr{N}_{\ell }$$ and does not spread elsewhere. On the contrary, when this energy is close to 0, there is no k-bandlimited signal living only on $$\mathscr{N}_{\ell }$$. The sampling distribution p is adapted to the graph and the structure of the groups if: whenever ∥N(ℓ)Uk∥2 is high, pℓ is high; whenever ∥N(ℓ)Uk∥2 is small, pℓ is small. In other words, the ratio between ∥N(ℓ)Uk∥2 and pℓ should be as constant as possible. This ensures that the groups where some k-bandlimited signals are concentrated are sampled with higher probability. Similarly to what was done in [32] with individual nodes, we define the group graph weighted coherence as the largest ratio between ∥N(ℓ)Uk∥2 and $$ { {\boldsymbol{p}}}_{\ell }^{-1/2}$$. Definition 2.1 (Group graph cumulative coherence) The group graph cumulative coherence of order k is   \begin{align} {\nu}_{{{\boldsymbol{p}}}} := \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\, {{\boldsymbol{p}}}_{\ell}^{-1/2} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2} \right\}. \end{align} (2.12) The quantity ∥N(ℓ)Uk∥2 is called the local group graph coherence. In the extreme case where the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ reduce all to singletons, we recover the definition of the graph weighted coherence introduced in [32]. It is easy to prove that νp is lower bounded by 1. Indeed, for any $$ {\boldsymbol{\eta }} \in {\mathbb{R}}^{ {k}}$$ with $$ {\left \| {\boldsymbol{\eta }}\right \|}_{2} = 1$$, we have   \begin{align} 1& = {\left\| {{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2} = \sum_{\ell=1}^{N} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2} = \sum_{\ell=1}^{N} {{\boldsymbol{p}}}_{\ell} \cdot \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} {\leqslant} {\left\| {{\boldsymbol{p}}}\right\|}_{1} \cdot \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} \nonumber \\ & = \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} {\leqslant} \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = {\nu}_{{{\boldsymbol{p}}}}^{2}. \end{align} (2.13) We have νp = 1 in, for example, the degenerated case where $$\mathscr{N}_{1} = \{1, \ldots , {n}\}$$. 2.4. Stable embedding We now have all the tools to present our main theorem that shows that sampling $$O\left ( {\nu }_{ { {\boldsymbol{p}}}}^{2} \log ( {k})\right )$$ groups is sufficient to stably embed the whole set of k-bandlimited signals. Hence, it is possible to reconstruct any x ∈ span(Uk) from its measurements y = Mx. Theorem 2.2 (Restricted isometry property—RIP) Let M be a random subsampling matrix constructed as in (2.7) using the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and the sampling distribution p. For any δ, ξ ∈ (0, 1), with probability at least 1 − ξ,   \begin{align} (1 - \delta) {\left\|\, {{\boldsymbol{x}}}\right\|}_{2}^{2} {\leqslant} {\frac{1}{{s}}} {\left\| {\mathsf{P}} {{\mathsf{M}}} \; {{\boldsymbol{x}}}\right\|}_{2}^{2} {\leqslant} (1 + \delta) {\left\|\, {{\boldsymbol{x}}}\right\|}_{2}^{2} \end{align} (2.14) for all x ∈ span(Uk) provided that   \begin{align} {s} {\geqslant} \frac{3}{\delta^{2}} \; {\nu}_{{{\boldsymbol{p}}}}^{2} \; \log\left( \frac{2{k}}{\xi} \right). \end{align} (2.15) Proof. See Appendix A. In the above theorem, we recall that s is the number of selected groups, each of them containing several nodes. We thus control the number of groups to sample and not directly the number of nodes. As the lower bound on νp is 1, sampling $$O(\log ( {k}))$$ groups might already be sufficient if the groups and the sampling distribution are well-designed. The number of groups to sample is driven by νp, which itself depends on the structure of the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and on the sampling distribution p. To reduce the number of samples to measure, we might optimize the structure of the groups and the sampling distribution. For example, if we were able to construct n/L groups (L ≥ 1) such that ∥N(ℓ)Uk∥2 ≈ (L/n)1/2—i.e., no k-bandlimited signals have more than 100 ⋅ (L/n)1/2 percent of its energy concentrated in each group—then setting pℓ = L/n, ℓ = 1, …, n/L, would yield νp ≈ 1. In this case, sampling one group is enough to embed the set of k-bandlimited signals. However, it is not obvious how we can construct such groups in practice and we might not even have the flexibility to modify the structure of the groups. In such a case, the only possibility to reduce the number of measurements is to optimize the sampling distribution p to minimize νp. The sampling distribution minimizing the coherence νp is the distribution $$ { {\boldsymbol{p}}}^{*} \in {\mathbb{R}}^{ {N}}$$ that satisfies   \begin{align} {{\boldsymbol{p}}}^{*}_{\ell} := \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}}{\sum_{\ell^{\prime}=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}},\ \textrm{for all}\ \ell \in \{1, \ldots, {N} \}, \end{align} (2.16) and for which   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}. \end{align} (2.17) Indeed, let $$ {\boldsymbol{p}}^{\prime } \neq { {\boldsymbol{p}}}^{*}$$ be another sampling distribution. As the entries of p′ and p* are non-negative and sum to 1, we necessarily have $$ {\boldsymbol{p}}^{\prime }_{\ell ^{\prime }} < { {\boldsymbol{p}}}_{\ell ^{\prime }}^{*}$$ for some ℓ′ > 0. Then,   \begin{align} {\nu}_{{\boldsymbol{p}}^{\prime}}^{2} {\geqslant}{{\boldsymbol{p}}^{\prime}_{\ell^{\prime}}}^{-1} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}> {{{\boldsymbol{p}}}_{\ell^{\prime}}^{*}}^{-1} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} = {\nu}_{{{\boldsymbol{p}}}^{*}}^{2}, \end{align} (2.18) where the last equality is obtained by replacing $$ { {\boldsymbol{p}}}_{\ell ^{\prime }}^{*}$$ with its value. Therefore, $$\nu_{\boldsymbol{p}^{\prime}} > \nu_{\boldsymbol{p}^{\ast}}$$ for any $$ {\boldsymbol{p}}^{\prime } \neq { {\boldsymbol{p}}}^{*}$$. As similar proof can be found in, e.g., [11] where the authors derive the optimal sampling distribution for a compressive system. We notice that   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} {\leqslant} \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2} = {k}. \end{align} (2.19) Hence, by using this distribution, (2.15) shows that sampling $$O( {k}\log ( {k}))$$ groups is always sufficient to sample all k-bandlimited signals. The exact number is proportional to $$ {\nu }_{ { {\boldsymbol{p}}}^{*}}^{2} \log ( {k})$$. This is not in contradiction with the fact that at least k measurements are required in total as each group contains at least one node. We also have   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} {\leqslant} {N}, \end{align} (2.20) as $$ {\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k}\right \|}_{2}^{2} {\leqslant } 1$$. Therefore, in any case, the bound never suggests to sample much more than N groups, as one would expect. We recall that the results in [32] proves that it is always sufficient to sample $$O( {k}\log ( {k}))$$ nodes to embed the set of k-bandlimited signals. When sampling the nodes by groups, it is the number of groups to sample that should be $$O( {k}\log ( {k}))$$. This can be a large number of individual nodes, but this is the potential price to pay when sampling the nodes by groups, if the groups are not optimized to achieve the best sampling rate. Variable density sampling [2,23,33] and structured sampling [7,9,11] are also important topics in compressed sensing. The method proposed here is closely inspired by these studies, especially by [7,9]. Our results thus share several similarities with these works. We however benefit from a simpler signal model and take advantage of the graph structure to refine the results, propose simpler decoders to reconstruct the signal and design efficient algorithms to estimate p*. 2.5. A more practical result at small k’s Optimizing the sampling distribution reduces to estimating the spectral norm of the matrices N(ℓ)Uk. We will present in Section 3 a method avoiding the computation of Uk. However, the method might still be too slow when a large number of groups is involved as it requires an estimation of a spectral norm for each group separately. It is thus interesting to characterize the performance of the proposed method using other quantities easier to compute. In this section, we present results involving the following quantity   \begin{align} \bar{{\nu}}_{{{\boldsymbol{p}}}} := \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{ {{\boldsymbol{p}}}_{\ell}^{-1/2} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{F} \right\}. \end{align} (2.21) The only difference between νp and $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ is that we substituted the Frobenius norm for the spectral norm. As $$ {\left \| {\mathsf{X}}\right \|}_{2} {\leqslant } {\left \| {\mathsf{X}}\right \|}_{F}$$ for any matrix X, we have $$ {\nu }_{ { {\boldsymbol{p}}}} {\leqslant } \bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$; hence the results involving $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ will be more pessimistic than those involving νp. We have $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}} {\geqslant } {k}$$. Indeed,   \begin{align} k & = {\left\| {{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{\ell=1}^{N} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{\ell=1}^{N} {{\boldsymbol{p}}}_{\ell} \cdot \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{{\boldsymbol{p}}}_{\ell}} {\leqslant} {\left\| {{\boldsymbol{p}}}\right\|}_{1} \cdot \max_{1 {\leqslant} \ell{\leqslant} {s}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = \bar{{\nu}}_{{{\boldsymbol{p}}}}^{2}. \end{align} (2.22) As with νp, the lower bound is also attained in, for example, the degenerated case where $$\mathscr{N}_{1} = \{1, \ldots , {n}\}$$. As $$ {\nu }_{ { {\boldsymbol{p}}}} {\leqslant } \bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$, we have the following corollary to Theorem 2.2. Corollary 2.1 Let M be a random subsampling matrix constructed as in (2.7) using the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and the sampling distribution p. For any δ, ξ ∈ (0, 1), with probability at least 1 − ξ, (2.14) holds for all x ∈ span(Uk) provided that   \begin{align} {s} {\geqslant} \frac{3}{\delta^{2}} \; \bar{{\nu}}_{{{\boldsymbol{p}}}}^{2} \; \log\left( \frac{2{k}}{\xi} \right). \end{align} (2.23) Proof. As $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}} {\geqslant } {\nu }_{ { {\boldsymbol{p}}}}$$, (2.23) implies (2.15). Theorem 2.2 then proves that (2.14) holds with probability at least 1 − ξ for all x ∈ span(Uk). The sufficient condition (2.23) can be much more pessimistic than (2.15). Indeed, as we have $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}^{2} {\geqslant } {k}$$, Condition (2.23) suggests to always sample more than $$O( {k}\log ( {k}))$$ groups, whilst we know that sampling $$O(\log ( {k}))$$ can be enough. As we have done with νp, we can also optimize p to minimize $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$. The sampling distribution that minimizes $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ is the distribution q* satisfying   \begin{align} {\boldsymbol{q}}^{*}_{\ell} := \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{k}},\ \textrm{for all}\ l \in \{1, \ldots, {N} \}, \end{align} (2.24) and for which   \begin{align} \bar{{\nu}}_{{\boldsymbol{q}}^{*}}^{2} = {k}. \end{align} (2.25) With this distribution, (2.23) proves that sampling $$O( {k}\log ( {k}))$$ groups is always enough to sample all k-bandlimited signals. Let us discuss some cases where this non-optimal result is interesting. First, this result can be useful at small k’s because estimating q* is easier and faster than estimating p* (see Section 3). The gain of computation time might compensate the loss of performance in terms of number of measurements. Secondly, at large k’s, (2.23) is a pessimistic bound. In reality, we may have $$ {\nu }_{ {\boldsymbol{q}}^{*}}^{2} {\leqslant } {k}$$ so that, according to (2.15), fewer samples than $$O( {k}\log ( {k}))$$ are actually sufficient when using q*. Finally, q* might actually be close to p*, in which case we would reach almost optimal results with q*. This occurs when $$\| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{F}^{2} \approx \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{2}^{2}$$, which is the case when N(ℓ)Uk is low rank. We discuss in the next paragraph one case where this matrix is low rank. Let us consider a community graph with k communities. In the degenerated (perfect) case where no edge exists between two different communities, it is easy to prove that the indicator vectors of the community are eigenvectors of the combinatorial graph Laplacian with eigenvalue 0. The indicator vector of a community is a binary vector with non-zero entries at the nodes in this community and zero entries everywhere else. Hence, we can form Uk with these indicator vectors after ℓ2-normalization. If the groups are now formed such that no group contains nodes belonging to two different communities, then the matrices N(ℓ)Uk contain $$ {\left | \mathscr{N}_{\ell } \right |}$$ identical rows. These matrices are therefore rank 1 and $$\| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{F}^{2} = \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{2}^{2}$$, which implies q*= p*. It is also easy to prove that νp*2 = k so that Theorem 2.2 indicates that sampling $$O(k\log (k))$$ groups is sufficient to satisfy the RIP. If one actually samples less than k groups then the RIP cannot hold as one community would necessarily not be sampled. Hence, one cannot hope to sample much less than $$O(k\log (k))$$ groups and reach the optimal sampling rate of $$O(\log (k))$$ groups. Note that, in the unstructured sampling setting, the results in [32] show that sampling $$O(k\log (k))$$nodes would have been sufficient to ensure that the RIP holds in this setting. Therefore, in the structured setting, we have to sample much more nodes than actually needed as each group contains several nodes. In a sense, optimizing the sampling distribution has a more limited impact in the structured sampling setting than in the unstructured sampling setting where this optimization allows one to reach optimal rate in terms of number of sampled nodes. Structured sampling imposes fundamental limits on the best sampling rate achievable. Even though we have no formal proof, we expect to observe a similar behaviour for more realistic community graphs (where edges between communities exist). Nevertheless, beyond this theoretical result, we recall that this structured sampling setting allows us to reduce drastically the cost of the labelling procedure for the user and the reconstruction time for interactive object segmentation (see Section 5). 2.6. Summary of the impact of structured sampling As in [32], the quantity driving the optimal sampling strategy is a measure of the energy concentration of k-bandlimited signals on the element to sample: nodes in the case of [32]; groups of nodes in this work. It is more important to sample nodes/groups where the energy of k-bandlimited signals can stay concentrated than nodes/groups where the energy always spreads to neighbouring nodes/groups. Whilst in [32] the sampling performance is characterized in terms of the number of nodes to sample, here, the sampling performance is characterized in terms of the number of groups to sample. In the degenerated case where each group contains one single node, we recover the same results. In [32], it is proved that sampling $$O( {k}\log ( {k}))$$nodes with an optimized sampling distribution is sufficient to ensure the RIP holds. In this work, we prove that sampling between $$O(\log ( {k}))$$ and $$O( {k}\log ( {k}))$$groups with an optimized distribution is sufficient. Note that optimizing the sampling distribution does not overcome completely a ‘poor choice’ of the groups as in certain situation we have to sample at least k groups to ensure that the RIP holds, as explained in subsection above. Reaching the optimal sampling rate $$O(\log ( {k}))$$ is not always feasible even when optimiing the sampling distribution. 3. Optimal sampling estimation In this section, we explain how to estimate the sampling distributions p* and q* without computing the truncated Fourier matrix Uk, as this computation is intractable for large graphs. The principle consists of transforming the problem into a series of filtering operations and using fast filtering techniques on graphs [21]. This principle is used, e.g., in [34,37,38] for fast computation of feature vectors used for clustering or in [32] for the estimation of the best sampling distributions. The method involves matrix-vector multiplications with the sparse Laplacian matrix L and are thus computationally tractable even for large n. 3.1. Estimation of p* The distribution p*, defined in (2.16), that minimizes the coherence νp is entirely defined by the values of ∥N(ℓ)Uk∥2 for ℓ = 1, …, N, which are thus the quantities we need to evaluate. We recall that, in graph signal processing, a filter in the spectral domain is represented by a function $$h: {\mathbb{R}} \rightarrow {\mathbb{R}}$$, and that the signal x filtered by h is   \begin{align} {{\boldsymbol{x}}}_{h} := {{\mathsf{U}}} \, {{\textrm{diag}}}(\hat{{\boldsymbol{h}}}) \, {{\mathsf{U}}}^{{{\intercal}}} {{\boldsymbol{x}}} \in{\mathbb{R}}^{{n}}, \end{align} (3.1) where $$\hat{ {\boldsymbol{h}}} = (h( {\lambda }_{1}), \ldots , h( {\lambda }_{ {n}}))^{ {{\intercal }}} \in {\mathbb{R}}^{ {n}}$$. To filter the signal x without actually computing the graph Fourier transform of x, we can approximate the function h by a polynomial   \begin{align} r(t) := \sum_{i=0}^{d} \alpha_{i} \, t^{i} \approx h(t) \end{align} (3.2) of degree d and compute xr instead of xh. The filtered signal xr is computed rapidly using the formula   \begin{align} {{\boldsymbol{x}}}_{r} = \sum_{i=0}^{d} \alpha_{i} \; {{\mathsf{U}}} \, {{\textrm{diag}}}\left({{\lambda}_{1}^{i}}, \ldots, {\lambda}_{{n}}^{i}\right) \, {{\mathsf{U}}}^{{{\intercal}}} {{\boldsymbol{x}}} = \sum_{i=0}^{d} \alpha_{i} \; {{\mathsf{L}}}^{i} {{\boldsymbol{x}}}, \end{align} (3.3) that involves only matrix-vector multiplications with the sparse Laplacian matrix L. We let the reader refer to [21] for more information on this fast-filtering technique. For any polynomial function r of the form above and any matrix $$ {\mathsf{A}} \in {\mathbb{R}}^{ {n} \times {n}}$$, we write   \begin{align} r({\mathsf{A}}) = \sum_{i=0}^{d} \alpha_{i} \; {\mathsf{A}}^{i} . \end{align} (3.4) Note that $$r( { {\mathsf{L}}}) = { {\mathsf{U}}} \, r( { {\mathsf{\Lambda }}}) \, { {\mathsf{U}}}^{ {{\intercal }}}$$. Let $$i_{ {\lambda }_{k}} : {\mathbb{R}} \rightarrow {\mathbb{R}}$$ be the ideal low-pass filter at cutoff frequency λk, i.e., the filter that satisfies   \begin{align} i_{{\lambda}_{k}}(t) =\begin{cases} 1 & \textrm{if}\ t {\leqslant} {\lambda}_{k},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (3.5) We have $$ { {\mathsf{U}}}_{k} { {\mathsf{U}}}_{k}^{ {{\intercal }}} = i_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$. Then, we notice that   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} = {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {{\mathsf{U}}}_{k}^{{{\intercal}}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2} = {\left\| {{\mathsf{N}}}^{(\ell)} \; i_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2}. \end{align} (3.6) We recall that N(ℓ) is the matrix that restricts the signal to the nodes belonging to $$\mathscr{N}_{\ell }$$. The matrix appearing on the right-hand side of the last equality corresponds to the linear operator that (1) extends a vector on the complete graph by inserting 0 in all groups $$\ell ^{\prime } \neq \ell $$, (2) low-pass filters the extended signal and (3) restricts the result to the group $$\mathscr{N}_{\ell }$$. This process can be approximated by replacing the ideal low-pass filter iλk with a polynomial approximation $$\tilde{i}_{ {\lambda }_{k}}$$ of iλk and   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)} \; i_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2} \approx{\left\| {{\mathsf{N}}}^{(\ell)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2}. \end{align} (3.7) To estimate p*, we estimate the spectral norm of the matrix appearing on the right-hand side for which matrix-vector multiplication is fast. The quality of the approximation depends obviously on the choice of the polynomial $$\tilde{i}_{ {\lambda }_{k}}$$. Estimating $$\left \| { {\mathsf{N}}}^{(\ell )} \; \tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right ) \;{ { {\mathsf{N}}}^{(\ell )}}^{ {{\intercal }}} \right \|_{2}$$ amounts to computing the largest eigenvalue of $$ { {\mathsf{N}}}^{(\ell )} \; \tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right ) \;{ { {\mathsf{N}}}^{(\ell )}}^{ {{\intercal }}}$$ which can be done, e.g., by using the power method. This method requires matrix-vector multiplication only with N(ℓ) and $$\tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$ and is thus fast. Finally, the approximation $$\bar{ {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$ of p* satisfies   \begin{align} \bar{{{\boldsymbol{p}}}}_{\ell} := \frac{\lambda_{\max }\left({{\mathsf{N}}}^{(\ell)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right)}{\sum_{\ell^{\prime}=1}^{{N}} \lambda_{\max }\left({{\mathsf{N}}}^{\left(\ell^{\prime}\right)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{\left(\ell^{\prime}\right)}}^{{{\intercal}}}\right)}. \end{align} (3.8) Note that an estimation of λk is required beforehand to define the filter $$\tilde{i}_{ {\lambda }_{k}}$$. We estimate this value using the dichotomy method presented in [32]. Estimating each value of $$\bar{ { {\boldsymbol{p}}}}_{\ell }$$ requires several filtering with $$\tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$ for each iteration of the power method. These values are estimated in parallel. Even if the power method was converging in one iteration only, this would require at least N filtering. Depending on the number of groups that may already be too much computations. Hence the interest of having a sampling distribution easier to estimate, but which yields acceptable sampling results, namely q*. We explain next how to estimate q*. 3.2. Estimation of q* We start by noticing that we have   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{2}^{2} \end{align} (3.9) for each group ℓ = 1, …, N. The vector $$ {\boldsymbol{\delta }}_{i} \in {\mathbb{R}}^{ {n}}$$ is the unit vector that is null on all nodes except at node i. Hence, we only need an estimation of $$\left \| { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\delta }}_{i}\right \|_{2}^{2}$$, i = 1, …, n, to estimate q*. An algorithm was already proposed in [32] to estimate these values. We let the reader refer to Algorithm 1 in [32] for the details of the method. We just recall that this estimation is obtained by filtering $$O(\log ( {n}))$$ random signals with a polynomial approximation of iλk. In real applications, it is likely that $$O(\log ( {n})) {\leqslant } {N}$$. It is hence faster to estimate q* than p*. Finally, our estimation $$\bar{ {\boldsymbol{q}}} \in {\mathbb{R}}^{ {N}}$$ of q* has entries   \begin{align} \bar{{\boldsymbol{q}}}_{\ell} := \frac{\sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{F}^{2}}{\sum_{\ell=1}^{{N}} \sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{F}^{2}}, \end{align} (3.10) where each $$\left \| { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\delta }}_{i}\right \|_{2}^{2}$$ is estimated by Algorithm 1 in [32]. 4. Fast reconstruction Once the signal has been sampled, we shall also be able to reconstruct it. In [32], the authors propose to estimate the original signal by solving   \begin{align} \min_{{\boldsymbol{z}} \in{\mathbb{R}}^{{n}}} {\left\| {\mathsf{P}}({{\mathsf{M}}} {\boldsymbol{z}} - {{\boldsymbol{y}}})\right\|}_{2}^{2} + {\gamma} \; {\boldsymbol{z}}^{{{\intercal}}} g({{\mathsf{L}}}) {\boldsymbol{z}}, \end{align} (4.1) where γ > 0 and g is a non-negative non-decreasing polynomial. We have   \begin{align} g({{\mathsf{L}}}) := \sum_{i=0}^{d} \alpha_{i} \, {{\mathsf{L}}}^{i} = \sum_{i=0}^{d} \alpha_{i} \, {{\mathsf{U}}} {{\mathsf{\Lambda}}}^{i} {{\mathsf{U}}}^{{{\intercal}}}, \end{align} (4.2) where $$\alpha _{0}, \ldots , \alpha _{d} \in {\mathbb{R}}$$ are the coefficients of the polynomial and $$d \in {\mathbb{N}}$$ its degree. These polynomial’s parameters can be tuned to improve the quality of the reconstruction. The role of g can be viewed as a filter on $$\mathscr{G}$$ and should ideally be a high-pass filter. Indeed, we seek to recover a signal whose energy is concentrated at low frequencies on $$\mathscr{G}$$. We should thus penalize signals with energy at high frequencies in (4.1), hence the use of a high-pass filter. The matrix P is introduced to account for the RIP. The advantage of this method is that it can be solved efficiently even for large graph, for example, by conjugate gradient or gradient descent method. Indeed, each step of this algorithm can be implemented by using only matrix-vector multiplications with PM and L, which are both sparse matrices. The matrix g(L) does not have to be computed explicitly. Note that the computational complexity of each iteration of such algorithms however increases with the polynomial order. For brevity, we do not recall the theorem proving that (4.1) provides accurate and stable recovery of all k-bandlimited signals. However, this theorem applies as soon as the restricted isometry property (2.14) holds, and thus applies here when (2.15) holds. The reconstruction quality is controlled by g(λk) and the ratio g(λk)/g(λk+1). One should seek to design a filter g such that these quantities are as close as possible to zero to improve the reconstruction quality. We propose now a method to obtain a faster estimation of the original signal when it is nearly piecewise constant. 4.1. Piecewise constant graph signals Before continuing, we want to stress that we consider non-overlapping groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ in this rest of Section 4. If a graph signal is nearly piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ then reconstructing the mean values of this signal for each group is enough to obtain a good approximation of the original signal. Instead of estimating n unknowns, we reduce the estimation to N unknowns. When N ≪ n, this is a large reduction of dimension yielding a significant speed up and memory reduction. The fact that a signal x is piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ is characterized as follows. We construct the averaging row-vectors $$ { {\boldsymbol{a}}}^{(\ell )} \in {\mathbb{R}}^{1 \times {n}}$$ that satisfy   \begin{align} {{\boldsymbol{a}}}^{(\ell)} := \frac{{\boldsymbol{1}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)}}{{\left| \mathscr{N}_{\ell} \right|}^{1/2}}, \end{align} (4.3) and the matrix   \begin{align} {{\mathsf{A}}} := \left( \begin{array}{c} {{\boldsymbol{a}}}^{(1)}\\ \vdots\\{{\boldsymbol{a}}}^{({N})} \end{array} \right) \in{\mathbb{R}}^{{N} \times{n}}. \end{align} (4.4) As the groups do not overlap, we have   \begin{align} {{\mathsf{A}}}{{\mathsf{A}}}^{{{\intercal}}} = {\mathsf{I}}, \end{align} (4.5) hence $$ {\left \| { {\mathsf{A}}}\right \|}_{2} = 1$$. Applying A to x provides N values, each one of them corresponding to the sum of the values of x within the group $$\mathscr{N}_{\ell }$$, scaled by $$ {\left | \mathscr{N}_{\ell } \right |}^{-1/2}$$. Then, applying $$ { {\mathsf{A}}}^{ {{\intercal }}}$$ to Ax gives an approximation of x where the values in the vector $$ { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}}$$ are constant within each group; this is a piecewise constant vector over the groups. The value of $$ { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}}$$ appearing within the group $$\mathscr{N}_{\ell }$$ is exactly the average of x within $$\mathscr{N}_{\ell }$$. Saying that x is nearly constant within each group corresponds to assume that   \begin{align} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \varepsilon{\left\| {{\boldsymbol{x}}}\right\|}_{2}, \end{align} (4.6) where $$\varepsilon {\geqslant } 0$$ is a small value. The signal model of interest in this section is thus   \begin{align} {{\mathscr{A}}_\varepsilon} := \left\{ {{\boldsymbol{x}}} \in{{\textrm{span}}}({{\mathsf{U}}}_{{k}}) \;\; \vert \;\; {\left\| ({{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} - {\mathsf{I}}) \; {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \varepsilon{\left\| {{\boldsymbol{x}}}\right\|}_{2} \right\}. \end{align} (4.7) Note that in the degenerated case of a community graph with k communities, where no edge exists between two different communities and where the groups are formed such that no group contains nodes belonging to two different communities, then the indicator vectors of each community are exactly piecewise constant (ε = 0). Hence, all k-bandlimited signals are also piecewise constant in this case. 4.2. Reducing the dimension To build a fast algorithm exploiting the above property, we use a reconstruction method similar to (4.1), but involving vectors of smaller dimension. We define the averaged vector $$\tilde{ { {\boldsymbol{x}}}} := { {\mathsf{A}}} { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {N}}$$ of dimension N. As $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, estimating $$\tilde{ { {\boldsymbol{x}}}}$$ is enough to get a good approximation of x—we just need to multiply it with $$ { {\mathsf{A}}}^{ {{\intercal }}}$$. Furthermore, as x is nearly piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$, by construction of the matrix M, the measurement vector y = Mx is also almost piecewise constant over the sampled groups $$\mathscr{N}_{\omega _{1}}, \ldots , \mathscr{N}_{\omega _{ {s}}}$$. We thus average y over these groups by multiplying it with the matrix $$\widetilde{ { {\mathsf{A}}}} \in {\mathbb{R}}^{ {s} \times {m}}$$ that satisfies   \begin{align} \widetilde{{{\mathsf{A}}}}_{ji} := \begin{cases} {\left| \mathscr{N}_{\omega_{j}} \right|}^{-1/2} & \textrm{for}\ \sum_{j^{\prime}=1}^{j-1} {\left| \mathscr{N}_{\omega_{j^{\prime}}} \right|} {\leqslant} \; i \;{\leqslant} \sum_{j^{\prime}=1}^{\ell} {\left| \mathscr{N}_{\omega_{j^{\prime}}} \right|}, \\ 0 & \textrm{otherwise}. \end{cases} \end{align} (4.8) We obtain   \begin{align} \tilde{{{\boldsymbol{y}}}} := \widetilde{{{\mathsf{A}}}} {{\boldsymbol{y}}} = \widetilde{{{\mathsf{A}}}} {{\mathsf{M}}} {{\boldsymbol{x}}} \in{\mathbb{R}}^{{s}}. \end{align} (4.9) We now have to link $$\tilde{ { {\boldsymbol{y}}}}$$ to $$\tilde{ { {\boldsymbol{x}}}}$$. We create the matrix $$\widetilde{ { {\mathsf{M}}}} \in {\mathbb{R}}^{ {s} \times {N}}$$ that restricts the N mean value of $$\tilde{ { {\boldsymbol{x}}}}$$ to the s mean value of the selected groups, i.e.,   \begin{align} \widetilde{{{\mathsf{M}}}}_{j i} := \begin{cases} 1 & \textrm{if}\ i = \omega_{j},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (4.10) We have therefore   \begin{align} \tilde{{{\boldsymbol{y}}}} = \widetilde{{{\mathsf{M}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{align} (4.11) The goal is now to estimate $$\tilde{ { {\boldsymbol{x}}}}$$ from $$\tilde{ { {\boldsymbol{y}}}}$$. To ensure that the reconstruction method is stable to measurement noise, we do not consider the perfect scenario above, but instead the scenario where   \begin{align} \tilde{{{\boldsymbol{y}}}} = \widetilde{{{\mathsf{M}}}}\tilde{{{\boldsymbol{x}}}} + \tilde{{{\boldsymbol{n}}}}, \end{align} (4.12) and $$\tilde{ { {\boldsymbol{n}}}} \in {\mathbb{R}}^{ {s}}$$ models noise. We now need a regularization term to estimate $$\tilde{ { {\boldsymbol{x}}}}$$. We obtain this term by reducing the dimension of the regularizer involving the Laplacian L in (4.1). We compute   \begin{align} \widetilde{{{\mathsf{L}}}} := {{\mathsf{A}}} \, g({{\mathsf{L}}}) \, {{\mathsf{A}}}^{{{\intercal}}} = ({{\mathsf{A}}} {{\mathsf{U}}}) \, g({{\mathsf{\Lambda}}}) \, ({{\mathsf{A}}} {{\mathsf{U}}})^{{{\intercal}}} \in{\mathbb{R}}^{{N} \times{N}}. \end{align} (4.13) Note that $$\widetilde{ { {\mathsf{L}}}}$$ is a symmetric positive definite matrix. Like g(L), it can thus be used as a regularization. We thus propose to estimate $$\tilde{ { {\boldsymbol{x}}}}$$ by solving   \begin{align} \min_{\tilde{{\boldsymbol{z}}} \in{\mathbb{R}}^{{N}}} {\left\| \widetilde{{\mathsf{P}}} (\widetilde{{{\mathsf{M}}}} \tilde{{\boldsymbol{z}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{\boldsymbol{z}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{\boldsymbol{z}}}, \end{align} (4.14) where γ > 0 and $$\widetilde{ {\mathsf{P}}} \in {\mathbb{R}}^{ {s} \times {s}}$$ is the diagonal matrix with entries satisfying   \begin{align} \widetilde{{\mathsf{P}}}_{jj} := {{\boldsymbol{p}}}_{\omega_{j}}^{-1/2}. \end{align} (4.15) Let $$\tilde{ { {\boldsymbol{x}}}}^{*} \in {\mathbb{R}}^{ {N}}$$ be a solution of (4.14). We finally obtain an estimation of x by computing $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$. In the particular case where g(⋅) is the identity, one can notice that   \begin{align} \widetilde{{{\mathsf{L}}}}_{\ell\ell^{\prime}} = \sum_{(i,j) \in{{\mathscr{N}}}_{\ell} \times \mathscr{N}_{\ell^{\prime}}} \frac{{{\mathsf{L}}}_{ij}}{{\left| \mathscr{N}_{\ell} \right|}^{1/2} {\left| \mathscr{N}_{\ell^{\prime}} \right|}^{1/2}} \end{align} (4.16) is non-zero only if there is at least one edge in $$ {\mathscr{E}}$$ joining the groups $$\mathscr{N}_{\ell }$$ and $$\mathscr{N}_{\ell ^{\prime }}$$. The Laplacian $$\widetilde{ { {\mathsf{L}}}}_{\ell \ell ^{\prime }}$$ preserves the connections present in the original graph represented by L. The dimension of the unknown vector in (4.14) is N, which can be much smaller than n. This leads to a large gain in memory and computation time when either $$\widetilde{ { {\mathsf{L}}}}$$ or matrix--vector multiplications with $$\widetilde{ { {\mathsf{L}}}}$$ can be computed rapidly. If g has a small degree, then $$\widetilde{ { {\mathsf{L}}}}$$ can be computed explicitly in a short amount of time. In such a case, we can solve (4.14) faster than (4.1) as it involves matrices of (much) smaller dimensions. If L encodes a regular graph such as the ring graph, and the groups are formed with neighbouring nodes of $$\mathscr{G}$$, then $$\widetilde{ { {\mathsf{L}}}}$$ is also regular and fast numerical techniques using the fast Fourier transform can be used for fast matrix--vector multiplication with $$\widetilde{ { {\mathsf{L}}}}$$. In general, it is however not always straightforward to find an efficient implementation for matrix--vector multiplications with $$\widetilde{ { {\mathsf{L}}}}$$ without temporarily going back to the signal domain of dimension n, i.e., multiplying the vector $$\tilde{ {\boldsymbol{z}}}$$ with $$ { {\mathsf{A}}}^{ {{\intercal }}}$$, filtering the high-dimensional signal $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}$$ and downsampling the result. Even though solving (4.14) might still be faster than solving (4.1) in this situation, we lose part of the efficiency by working temporarily in the high-dimensional domain. We thus have less flexibility in the choice of g with this reconstruction technique. Let us also mention that (4.14) can be used to initialize the algorithm used to solve (4.1) with a good approximate solution. As in multigrid approaches to solve linear systems of equations, see, e.g., [8, 19]. The following theorem bounds the error between the signal recovered by (4.14) and the original vector. Theorem 4.1 Let Ω = {ω1, …, ωs} be a set of s indices selected independently from {1, …, N} using a sampling distribution $$ { {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$, $$ { {\mathsf{M}}}, {\mathsf{P}}, \widetilde{ { {\mathsf{M}}}}, \widetilde{ {\mathsf{P}}}$$ be the associated matrices constructed respectively in (2.7), (2.10), (4.10) and (4.15), and $$M_{\max }>0$$ be a constant such that $$ {\left \| {\mathsf{P}} { {\mathsf{M}}}\right \|}_{2} {\leqslant } M_{\max }$$. Let ξ, δ ∈ (0, 1) and suppose that s satisfies (2.15). With probability at least 1 − ξ, the following holds for all $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, all $$\tilde{ { {\boldsymbol{n}}}} \in {\mathbb{R}}^{ {s}}$$, all γ > 0 and all non-negative non-decreasing polynomial functions g such that g(λk+1) > 0. Let $$\tilde{ { {\boldsymbol{x}}}}^{*}$$ be the solution of (4.14) with $$\tilde{ { {\boldsymbol{y}}}} = \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} { {\boldsymbol{x}}} + \tilde{ { {\boldsymbol{n}}}}$$. Define $$ {\boldsymbol{\alpha }}^{*} := { {\mathsf{U}}}_{ {k}} { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} \,{ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ and $$ {\boldsymbol{\beta }}^{*} := ( {\mathsf{I}} - { {\mathsf{U}}}_{ {k}} { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}}) \,{ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$. Then,   \begin{align} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} \; {\leqslant} \; {\frac{1}{\sqrt{{s} (1-\delta)}}} \; \cdot &\left[ \left( 2 + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}}\right){\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}}\right\|}_{2} + \left( M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{k}})}\right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}\right. \nonumber \\ &\quad \left. + \; \varepsilon \left( 2 M_{\max } + M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{n}})} \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2} \right], \end{align} (4.17) and   \begin{align} {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} {\leqslant} \; \frac{1}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}} {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2} \; + \; \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} \; + \; \varepsilon \; \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (4.18) Proof. See Appendix B. The vector $$\boldsymbol{\alpha}^{*}$$ is the orthogonal projection of $${ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ onto span(Uk). The vector $$\boldsymbol{\beta}^{*}$$ is the projection of $${ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ onto the orthogonal complement of span(Uk). There are several remarks to make about the above theorem: Theorem 4.1 shows that the result obtained via (4.14) is similar to the one we would have obtained by solving (4.1)—see [32] for the error bounds—with additional errors controlled by ε. We recall that ε characterizes how far x is from a piecewise constant signal. As expected, the smaller ε, the better the reconstruction. The reconstruction quality improves when g(λk) and the ratio g(λk)/g(λk+1) go to 0 and when g(λn)/g(λk+1) tends to 1. We recall that we have $$g( {\lambda }_{ {n}}) {\geqslant } g( {\lambda }_{ {k}+1})> g( {\lambda }_{ {k}})$$ by assumption. The effect of the noise $$\tilde{ { {\boldsymbol{n}}}}$$ decreases when g(λk+1) increases, and, obviously, γ should be adapted to the signal-to-noise ratio (SNR). Let us mention that the idea of ‘coarsening’ a graph and the signals that live on it using a partition into different groups of nodes can also be found in [36], where a multiresolution analysis method of graph signals is proposed. The coarsening method is however different than the one used here. 5. Experiments In this last section, we first test our sampling strategies on two different graphs to illustrate the effect of the different sampling distributions on the minimum number of samples required to ensure that the RIP holds.2 Then, we apply our sampling strategy for user-guided object segmentation. In this application, we also test the different recovery techniques proposed in Section 4. 5.1. Sampling distributions We perform experiments on two different graphs: the Minnesota graph of size n = 2642 and the bunny graph of size n = 2503. Both graphs are presented in Fig. 3 and are available in the Graph signal processing (GSP) toolbox [29]. For each graph, we group the nodes using the spatial coordinates associated to each node. For the Minnesota graph, we divide the space into 100 cells and group the nodes that fall in the same cell. After removing empty cells, we obtain the N = 73 groups represented in Fig. 3. For the bunny graph, we obtain N = 213 groups with a similar procedure (see Fig. 3). Fig. 3. View largeDownload slide [This figure is available in color online.] First column from the left: Minnesota graph (top); bunny graph (bottom). The groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ are indicated by different gray level (colors). Other columns: probability that $$\underline{\delta }_{k}$$ is less than 0.995 as a function of s. The dashed (black), continuous (red), dash-dotted (blue) and dotted (green) curves are obtained using the sampling distributions u, p*, $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, respectively. The top row shows the result for the Minnesota graph. The bottom row shows the result for the bunny graph. The bandlimit k is indicated on top of each curve. Fig. 3. View largeDownload slide [This figure is available in color online.] First column from the left: Minnesota graph (top); bunny graph (bottom). The groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ are indicated by different gray level (colors). Other columns: probability that $$\underline{\delta }_{k}$$ is less than 0.995 as a function of s. The dashed (black), continuous (red), dash-dotted (blue) and dotted (green) curves are obtained using the sampling distributions u, p*, $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, respectively. The top row shows the result for the Minnesota graph. The bottom row shows the result for the bunny graph. The bandlimit k is indicated on top of each curve. For each graph, we compute the combinatorial Laplacian and Uk for different values of k. Then, we compute the lower RIP constant, i.e., the constant $$\underline{\delta }_{k}>0$$ that satisfies   \begin{align*} \underline{\delta}_{k} = 1 - {\frac{1}{{s}}} \;\; \inf_{\substack{{{\boldsymbol{x}}} \in{{\textrm{span}}}({{\mathsf{U}}}_{{k}}) \\{\left\| x\right\|}_{2} = 1}} \;{\left\| {\mathsf{P}} {{\mathsf{M}}} \; {{\boldsymbol{x}}}\right\|}_{2}^{2}. \end{align*} This constant is the smallest value that δ can take such that the left-hand side of the RIP (2.14) holds. Remark that   \begin{align} \underline{\delta}_{k} = 1 - {\frac{1}{{s}}} \; \lambda_{\min } \left( {{\mathsf{U}}}_{k}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}}^{2} {{\mathsf{M}}} {{\mathsf{U}}}_{k}\right). \end{align} (5.1) We estimate $$\underline{\delta }_{k}$$ for 500 independent draws of the set Ω, which defines the matrices PM, and different numbers of selected groups s. All samplings are done in the conditions of Theorem 2.2 using the sampling distributions $$ {\boldsymbol{u}}, {\boldsymbol{p}}^{*}, \bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$. The vector u denotes the uniform distribution over {1, …, N}. When conducting this experiment with the estimated distributions $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$, we re-estimate these distributions at each of the 500 trials. These distributions are estimated using Jackson–Chebychev polynomials of order 50. We choose Chebychev polynomials as they are well suited for the approximation of spectral norm of matrices [26]. Jackson–Chebychev polynomials further minimize the Gibbs phenomenon that appears when one approximates low-pass filters with polynomials [28]. For the Minnesota graph, we consider the bandlimits k = 5, 10, 20. For the bunny graph, we consider the bandlimits k = 10, 25, 50. We present the probability that $$\underline{\delta }_{k}$$ is less than 0.995, estimated over the 500 draws of Ω, as a function of s in Fig. 3. For the Minnesota graph, the performance is better when using the optimal distribution p* than when using the uniform distribution u for all k, which is in line with the theory. The estimated $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$ yield performance equivalent to p*. This confirms that we can achieve similar sampling performance without having to compute the Fourier matrix Uk, which, we recall, is intractable for large graphs. This also shows that $$\bar{ {\boldsymbol{q}}}$$ can lead to nearly optimal results. For the bunny graph, all sampling distributions yield essentially the same results at all bandlimits. We notice a slight improvement at k = 50 when using $$\bar{ { {\boldsymbol{p}}}}$$, $$\bar{ {\boldsymbol{q}}}$$ or p* instead of u. For illustration, we present in Fig. 4 examples of computed sampling distributions p*, $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$. All sampling distributions exhibit similar structures, which explains why they all yield about the same performance in our experiments. Fig. 4. View largeDownload slide Example of sampling distributions. Top panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the Minnesota graph at k = 10. Bottom panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the bunny graph at k = 25. Fig. 4. View largeDownload slide Example of sampling distributions. Top panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the Minnesota graph at k = 10. Bottom panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the bunny graph at k = 25. 5.2. Object segmentation 5.2.1. Protocol We now test our method for interactive object segmentation. We consider the image of size n = 321 × 481 presented in Fig. 1 for which our goal is to segment the tiger. The ground truth segmentation map x ∈ {0, 1}n is presented in Fig. 5. The value 1 (white) indicates the presence of the tiger and the value 0 (black) stands for the foreground. The original image and the ground truth image are part of the dataset available3 in [25]. Our objective is to recover the original map x from few user-inputs. We experiment two sorts of labelling procedure. For the first one, we choose pixels at random and ask the user to label these pixels: 1 if the superpixel belongs to the tiger; 0 otherwise. For the second one, we divide the original image into the N = 600 superpixels showed in Fig. 1 and computed with SLIC [1], choose a small number of superpixels at random and ask the user to label these superpixels. Note that, unlike in the previous subsection, the pixels are not solely grouped with respect to their spatial coordinates, but the local gradients also influence the construction of the superpixels. Fig. 5. View largeDownload slide Top row: ground truth segmentation map (left); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$ evaluated with the spectral norm (middle); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ evaluated with the Frobenius norm (right). Middle row: segmentation map estimated from s = 150 sampled superpixels obtained by solving (4.14) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Bottom row: segmentation map estimated from s = 50 sampled superpixels obtained by solving (4.1) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Fig. 5. View largeDownload slide Top row: ground truth segmentation map (left); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$ evaluated with the spectral norm (middle); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ evaluated with the Frobenius norm (right). Middle row: segmentation map estimated from s = 150 sampled superpixels obtained by solving (4.14) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Bottom row: segmentation map estimated from s = 50 sampled superpixels obtained by solving (4.1) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). The graph $$\mathscr{G}$$ used to propagate the user-labels to the complete image is constructed as follows. We build a feature vector for each pixel by extracting a color RGB patch of size 3 × 3 around the pixel, transform this patch in vector form and augment this vector with the absolute two-dimensional coordinates of the pixels in this extracted patch. This yields n feature vectors $$ {\boldsymbol{g}}_{i} \in {\mathbb{R}}^{45}$$, i = 1, …, n. We then connect each feature vector to its nine nearest neighbours (in the Euclidean sense), which gives a set of 9n edges $$ {\mathscr{E}}$$. The adjacency matrix $$ {\mathsf{W}} \in {\mathbb{R}}^{ {n} \times {n}}$$ satisfies $$ {\mathsf{W}}_{ij} := \exp \left [-{\| {\boldsymbol{g}}_{i} - {\boldsymbol{g}}_{j}\|_{2}^{2}}/{\sigma ^{2}}\right ], $$ where σ > 0 is the 25th percentile of the set $$\{\| {\boldsymbol{g}}_{i} - {\boldsymbol{g}}_{j}\|_{2} : (i, j) \in {\mathscr{E}} \}$$. We finally symmetrize the matrix W and compute the combinatorial Laplacian $$ { {\mathsf{L}}} \in {\mathbb{R}}^{ {n} \times {n}}$$. We study three strategies to choose the superpixels to label. The first strategy consists of choosing the superpixels uniformly at random, i.e., using the sampling distribution u. The second and third strategies consist of choosing the superpixels with respectively the optimized distributions $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, which we evaluate at k0 = 50 using Jackson–Chebychev polynomials of order 150 [28]. For illustration, we present in Fig. 5 the estimated values $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ and $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$, which define the optimized distributions $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ {\boldsymbol{p}}}$$, respectively. Both distributions indicate that one should label more superpixels around the tiger. The distribution $$\bar{ {\boldsymbol{q}}}$$ ‘focuses’ however more on specific regions, like the head of tiger. The distribution $$\bar{ { {\boldsymbol{p}}}}$$ spreads the measurements over the entire tiger more uniformly. We study two strategies to choose the pixels to label. The first strategy consists of choosing the pixels uniformly at random, i.e., using the sampling distribution u. The second strategy consists of choosing the pixels with the optimal sampling distribution proposed in [32], which we evaluate at k1 = 50 × 257 using Jackson–Chebychev polynomials of order 150 [28]. For illustration, we present this distribution in Fig. 1. The distribution indicates that the most important pixels to sample are near the head and tail of the tiger and, more generally, near edges in the image. We chose k1 > k0 as we obtained better results for larger values of k in the pixels labelling scenario. On the contrary, choosing larger values of k in the superpixels labelling scenario yields slightly worse performance, especially with $$\bar{ { {\boldsymbol{p}}}}$$. Note that we have k0/N ≃ k1/n ≃ 8.3%, i.e., the ratio between k and the number of superpixels or pixels is the same. We emulate user-interactions as follows. When pixels need to be labelled, we simply sampled the ground truth segmentation map x at the sampled pixels. This noise-free strategy gives us access to a measurement vector $$ { {\boldsymbol{y}}} \in {\mathbb{R}}^{ {m}}$$. When superpixels need to be labelled, we compute the mean of the ground truth map x within each superpixel. If the mean value is larger than 0.5, we label the superpixel as part of the tiger. Otherwise, we label the superpixel as part of the background. This strategy obviously introduces noise if some superpixels cover part of the background and of the tiger. Once the labelling is done, we have access to the measurement vector $$\tilde{ { {\boldsymbol{y}}}} \in {\mathbb{R}}^{ {s}}$$ from which we want to reconstruct x. For the experiments with superpixels to label, we repeat this procedure for s ∈ {50, 70, …, 250}. For each s, we also repeat the experiments 50 times with independent draws of the superpixels. For the experiments with pixels to label, we repeat this procedure for m ∈ {50 × 257, 70 × 257, …, 250 × 257}. As each superpixel contains 257 pixels on average, we compare results for about the same number of labelled pixels. Note however that the labelling procedure with the superpixels is much less costly for the user: labelling one superpixel is at least as easy as labelling a single pixel—arguably even easier in term of decision—whilst it amounts to automatically label hundreds of pixels at once. For each m, we also repeat the experiments 50 times with independent draws of the pixels. Note that we draw the pixels and the superpixels with replacements. To reconstruct the original map x, we solve (4.1) directly, starting from the null vector as initialization, in the pixel labelling scenarios. In the superpixels labelling scenarios, we first use the fast reconstruction method (4.14) and then refine the solution at the pixel level with (4.1), using the solution of the first minimization problem as initialization to solve the second minimization problem. We choose g(L) = L and solve all problems in the limit where $$ {\gamma } \rightarrow 0$$. In this limit, the problems (4.14) and (4.1) become   \begin{align} \min_{\tilde{{\boldsymbol{z}}} \in{\mathbb{R}}^{{N}}} \; \tilde{{\boldsymbol{z}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{\boldsymbol{z}}} \quad \textrm{subject to} \quad \widetilde{{{\mathsf{M}}}} \tilde{{\boldsymbol{z}}} = \tilde{{{\boldsymbol{y}}}} \end{align} (5.2) and   \begin{align} \min_{{\boldsymbol{z}} \in{\mathbb{R}}^{{n}}} \; {\boldsymbol{z}}^{{{\intercal}}} g({{\mathsf{L}}}) {\boldsymbol{z}} \quad \textrm{subject to} \quad{{\mathsf{M}}} {\boldsymbol{z}} = {{\boldsymbol{y}}}, \end{align} (5.3) respectively. Both problems are solved using Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [6]. The same stopping criteria are used for all experiments. 5.2.2. Results We present in Fig. 6 the reconstruction signal-to-noise ratio obtained with the different methods. The reconstruction SNR is defined as $$-20\log _{10}(- {\left \| { {\boldsymbol{x}}} - { {\boldsymbol{x}}}^{*}\right \|}_{2}/ {\left \| { {\boldsymbol{x}}}\right \|}_{2})$$, where x* is the reconstructed signal. In the superpixels labelling scenarios, the SNR attained with the fast decoder (4.14) is very similar to the SNR attained with (4.1). We also remark that the optimized distributions $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$ yield both better reconstructions than the uniform distribution u and that reconstruction SNRs with $$\bar{ { {\boldsymbol{p}}}}$$ are better than with $$\bar{ {\boldsymbol{q}}}$$. Note however that the latter can depend on the choice of k0 used to evaluate the sampling distributions. In the pixels labelling scenarios, the optimized distribution yields much better reconstructions than the uniform distribution. Finally, note that 50 × 257 labelled pixels are necessary to reach a mean SNR of 9.5 dB whilst only 250 labelled superpixels are necessary to reach a mean SNR of 9.8 dB. Fig. 6. View largeDownload slide [This figure is available in color online] Top: the curves represent the mean reconstruction SNR as a function of the number of sampled superpixels s (left) and pixels m (right). Bottom: mean computation time in seconds as a function of the number of sampled superpixels s (left) and pixels m (right). Left: superpixels labelling scenario. The curves marked with crosses (black) are obtained with the uniform sampling distribution u. The curves marked with circles (blue) are obtained with $$\bar{ {\boldsymbol{q}}}$$. The curves marked with a triangle (red) are obtained with $$\bar{ { {\boldsymbol{p}}}}$$. Right: pixels labelling scenario. The curves marked with crosses (green) are obtained with the uniform sampling distribution. The curves marked with diamonds (magenta) are obtained with the optimal sampling distribution of [32]. In all graphs, the dash-dotted curves are obtained by solving (4.1) and the solid curves are obtained by solving (4.14). Note that the reconstruction time obtained with (4.14) appears almost constant for all s and all sampling methods on this graph. The mean reconstruction times for this method are all in the range [0.30, 0.38] second. Fig. 6. View largeDownload slide [This figure is available in color online] Top: the curves represent the mean reconstruction SNR as a function of the number of sampled superpixels s (left) and pixels m (right). Bottom: mean computation time in seconds as a function of the number of sampled superpixels s (left) and pixels m (right). Left: superpixels labelling scenario. The curves marked with crosses (black) are obtained with the uniform sampling distribution u. The curves marked with circles (blue) are obtained with $$\bar{ {\boldsymbol{q}}}$$. The curves marked with a triangle (red) are obtained with $$\bar{ { {\boldsymbol{p}}}}$$. Right: pixels labelling scenario. The curves marked with crosses (green) are obtained with the uniform sampling distribution. The curves marked with diamonds (magenta) are obtained with the optimal sampling distribution of [32]. In all graphs, the dash-dotted curves are obtained by solving (4.1) and the solid curves are obtained by solving (4.14). Note that the reconstruction time obtained with (4.14) appears almost constant for all s and all sampling methods on this graph. The mean reconstruction times for this method are all in the range [0.30, 0.38] second. We also present the computation time of each method in Fig. 6. In the superpixels labelling scenarios, we notice that solving (4.14) is much faster than solving (4.1), whilst they yield almost the same quality. This highlights the interest of the fast reconstruction technique. It is also faster to solve (4.1) when the measurements are drawn with $$\bar{ { {\boldsymbol{p}}}}$$ or with $$\bar{ {\boldsymbol{q}}}$$ than with u. The reason is probably a better initialization of (4.1). The reconstruction time in the pixels labelling scenario is much longer than in the superpixels labelling scenario. This is another advantage of the method proposed in this paper. Finally, we present in Fig. 5 some examples of reconstructions from s = 150 sampled superpixels for each method. We notice that the optimized sampling distributions improve the reconstruction of x around the head and tail of the tiger, i.e., where the optimized distributions have higher values. With a uniform distribution, the structure of the graph makes it difficult to reconstruct x around the head and tail from the values of other superpixels. The optimized sampling distribution compensates this issue by favouring this area when selecting the measurements. Note that, if the segmentation map to reconstruct was exactly k-bandlimited, the results presented in [32] suggest that $$O(k\log (k))$$-labelled pixels would have been enough to ensure the reconstruction. Similarly, the results presented in this work show that $$O(k\log (k))$$-labelled superpixels would be enough. However, the experiments presented here show that one needs to sample much more pixels than superpixels to reach the same SNR. These numerical results seem to contradict the theory. However, one should not forget that these theoretical results are derived with the sole assumption that the signals are k-bandlimited. The segmentation map here is smooth on $$\mathscr{G}$$ but, also, nearly piecewise-constant on the superpixels. This last property is not exploited to derive the theoretical sampling rates in this work nor in [32]. We used this property only in the reconstruction algorithm. Exploiting the fact that a signal is k-bandlimited and piecewise-constant would probably change the sampling rate at which the RIP is satisfied. Indeed, if one knows that all labels in a superpixels are identical, it is unnecessary to sample twice a pixel in the same superpixels. Therefore, the optimal sampling distribution of [32] for k-bandlimited signals is probably not optimal any more for k-bandlimited and piecewise-constant signals. 6. Discussion and conclusion We presented a structured sampling strategy for k-bandlimited signals where the nodes are selected by groups. We proved that the local group graph cumulative coherence quantifies the importance of sampling each group to ensure a stable embedding of all k-bandlimited signals. Finally, we presented a fast reconstruction technique for k-bandlimited signals which are also nearly piecewise-constant over pre-defined groups of nodes. Among the possible applications of these methods, we believe that they can also be useful to accelerate the compressive spectral clustering method proposed in [38]. After having computed some features vectors for each nodes, this compressive method works by downsampling the set of features of vectors, performing k-means on this reduced set to find k clusters, and interpolating the clustering results on all nodes by solving (4.1). To accelerate the method, one could (1) pre-group similar nodes to form N groups such that $$ {k} {\leqslant } {N} \ll {n}$$, e.g., by running few iterations of the k-means algorithm; (2) subsample this set of N groups; (3) cluster this subset to find k clusters; and (4) solve (4.14) to cluster all nodes. If the overhead of computing the N groups is small, this method has the potential to be faster than the original compressive spectral clustering method. Finally, we would like to discuss two limitations in the proposed methods. First, the optimal sampling distribution depends on the parameter k. In some applications, the final result may change a lot depending on the value of k which was chosen to compute this distribution. Finding a range of values of k which give acceptable and stable results is thus an important step in every application. Secondly, the estimation of the optimal sampling distribution depends on the quality of the polynomial approximation of the ideal low-pass filter λk. It is sometimes necessary to use a polynomial of large degree to get a correct estimation, which limits the computational efficiency of the proposed methods. In such cases, it would be especially useful to find more efficient alternatives to estimate the distributions p* and q*. Footnotes 1  This property is a consequence of (A.7), proved in Appendix A. 2  More information to reproduce the experiments available at https://sites.google.com/site/puygilles/softwares 3  http://www.ntu.edu.sg/home/asjfcai/Benchmark_Website/benchmark_index.html References 1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. & Süsstrunk, S. ( 2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. , 34, 2274-- 2282. Google Scholar CrossRef Search ADS PubMed  2. Adcock, B., Hansen, A. C., Poon, C. & Roman, B. ( 2013) Breaking the coherence barrier: a new theory for compressed sensing. arXiv:1302.0561. 3. Anis, A., Gadde, A. & Ortega, A. ( 2014) Towards a sampling theorem for signals on arbitrary graphs. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 3864-- 3868. 4. Anis, A., Gadde, A. & Ortega, A. ( 2016) Efficient sampling set selection for bandlimited graph signals using graph spectral proxies., IEEE Trans. Signal Process.  PP, 1-- 1. 5. Anis, A., Gamal, A. E., Avestimehr, S. & Ortega, A. ( 2015) Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 5461-- 5465. 6. Beck, A. & Teboulle, M. ( 2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. , 2, 183-- 202. Google Scholar CrossRef Search ADS   7. Bigot, J., Boyer, C. & Weiss, P. ( 2016) An analysis of block sampling strategies in compressed sensing. IEEE Trans. Inf. Theory , 62, 2125-- 2139. Google Scholar CrossRef Search ADS   8. Borzi, A. & Schulz, V. ( 2009) Multigrid methods for PDE optimization. SIAM Rev ., 51, 361-- 395. Google Scholar CrossRef Search ADS   9. Boyer, C., Bigot, J. & Weiss, P. ( 2015) Compressed sensing with structured sparsity and structured acquisition. arXiv:1505.01619. 10. Boykov, Y. & Jolly, M. ( 2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. Proc. Int. Conf. Comput. Vis. , 1, 105-- 112. 11. Chauffert, N., Ciuciu, P. & Weiss, P. ( 2013) Variable density compressed sensing in MRI. Theoretical VS heuristic sampling strategies. Proc. IEEE Int. Symp. on Biomedical Imaging , pp. 298-- 301. 12. Chen, S., Sandryhaila, A. & Kovacevic, J. ( 2015) Sampling theory for graph signals. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 3392-- 3396. 13. Chen, S., Varma, R., Sandryhaila, A. & Kovacevic, J. ( 2015) Discrete signal processing on graphs: sampling theory. Signal Processing, IEEE Transactions on , PP(99), 1–1. 14. Chen, S., Varma, R., Singh, A. & Kovacević, J. ( 2015) Signal recovery on graphs: random versus experimentally designed sampling. Sampling Theory and Applications (SampTA), 2015 International Conference on , pp. 337-- 341. 15. Chen, X., Zou, D., Zhao, Q. & Tan, P. ( 2012) Manifold preserving edit propagation. ACM Transactions on Graphics , 31, 132-- 139. 16. Chen, X., Zou, D., Zhao, Q. & Tan, P. ( 2012) Manifold preserving edit propagation. ACM SIGGRAPH Asia , vol. 31. 17. Chen, X., Zou, D., Zhou, S. Z., Zhao, Q. & Tan, P. ( 2013) Image matting with local and nonlocal smooth priors. Proc. Conf. Comput. Vis. Patt. Recog , pp. 1902-- 1907. 18. Chung, F. ( 1997) Spectral Graph Theory , no. 92. Amer Math. Soc. , Providence, RI USA. 19. Dreyer, T., Maar, B. & Schulz, V. ( 2000) Multigrid optimization in applications. J. Comput. Appl. Math. , 128, 67-- 84. Google Scholar CrossRef Search ADS   20. Grady, L. ( 2006) Random walks for image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. , 28, 1768-- 1783. Google Scholar CrossRef Search ADS   21. Hammond, D. K., Vandergheynst, P. & Gribonval, R. ( 2011) Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal. , 30, 129-- 150. Google Scholar CrossRef Search ADS   22. Korinke, C., Stratmann, T. C., Laue, T. & Boll, S. ( 2016) SuperSelect: an interactive superpixel-based segmentation method for touch displays. Proc. ACM Int. Conf. Multimedia , pp. 717-- 719. 23. Krahmer, F. & Ward, R. ( 2013) Stable and robust sampling strategies for compressive imaging. IEEE Trans. Image Process. , 23, 612-- 622. Google Scholar CrossRef Search ADS PubMed  24. Levin, A., Lischinski, D. & Weiss, Y. ( 2004) Colorization using optimization. ACM Trans. Graph , 23, 689-- 694. Google Scholar CrossRef Search ADS   25. Li, H., Cai, J., Nguyen, T. N. A. & Zheng, J. ( 2013) A benchmark for semantic image segmentation. IEEE Int. Conf. on Multimedia and Expo , pp. 1-- 6. 26. Liesen, L. & Tichy, T. ( 2009) On best approximations of polynomials in matrices in the matrix 2-norm. SIAM J. Matrix Anal. Appl. , 31, 853-- 863. Google Scholar CrossRef Search ADS   27. Lischinski, D., Farbman, Z., Uyttendaele, M. & Szeliski, R. ( 2006) Interactive local adjustment of tonal values. ACM Trans. Graph. , 25, 646-- 653. Google Scholar CrossRef Search ADS   28. Napoli, E. D., Polizzo, E. & Saad, Y. ( 2016) Efficient estimation of eigenvalue counts in an interval. Numer. Linear Alg. Appl. , 23, 674-- 692. Google Scholar CrossRef Search ADS   29. Perraudin, N., Paratte, J., Shuman, D., Kalofolias, V., Vandergheynst, P. & Hammond, D. K. ( 2014) GSPBOX: a toolbox for signal processing on graphs. arXiv:1408.5781. 30. Pesenson, I. ( 2008) Sampling in Paley-Wiener spaces on combinatorial graphs. Trans. Am. Math. Soc. , 360, 5603-- 5627. Google Scholar CrossRef Search ADS   31. Pesenson, I. Z. & Pesenson, M. Z. ( 2010) Sampling, filtering and sparse approximations on combinatorial graphs. J. Fourier Anal. Appl. , 16, 921-- 942. Google Scholar CrossRef Search ADS   32. Puy, G., Tremblay, N., Gribonval, R. & Vandergheynst, P. ( 2016) Random sampling of bandlimited signals on graphs. Appl. Comput. Harmon. Anal.  (in press). 33. Puy, G., Vandergheynst, P. & Wiaux, Y. ( 2011) On variable density compressive sampling. IEEE Signal Process. Lett. , 18, 595-- 598. Google Scholar CrossRef Search ADS   34. Ramasamy, D. & Madhow, U. ( 2015) Compressive spectral embedding: sidestepping the SVD. Advances in Neural Information Processing Systems , 28, pp. 550-- 558. 35. Shuman, D., Narang, S., Frossard, P., Ortega, A. & Vandergheynst, P. ( 2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. Signal Proc. Mag. IEEE , 30, 83-- 98. Google Scholar CrossRef Search ADS   36. Tremblay, N. & Borgnat, P. ( 2016) Subgraph-based filterbanks for graph signals. IEEE Trans. Signal Proc. , PP(99), 1–1. 37. Tremblay, N., Puy, G., Borgnat, P., Gribonval, R. & Vandergheynst, P. ( 2016) Accelerated spectral clustering using graph filtering of random signals. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . 38. Tremblay, N., Puy, G., Gribonval, R. & Vandergheynst, P. ( 2016) Compressive spectral clustering. Machine Learning, Proceedings of the Thirty-third International Conference (ICML 2016),  June 20–22, New York City, USA. 39. Tropp, J. A. ( 2012) User-friendly tail bounds for sums of random matrices. Found. Comput. Math. , 12, 389-- 434. Google Scholar CrossRef Search ADS   40. Xu, L., Yan, Q. & Jia, J. ( 2013) A sparse control model for image and video editing. ACM Trans. Graph. , 32, 197. Appendix A. Proof of the Theorem 2.2 As done in [32], the proof is obtained by applying the following lemma obtained by Tropp in [39]. Lemma A1 (Theorem 1.1, [39]) Consider a finite sequence {Xj} of independent, random, self-adjoint, positive semi-definite matrices of dimension d × d. Assume that each random matrix satisfies $$ \lambda _{\max }( {\mathsf{X}}_{j}) {\leqslant } R $$ almost surely. Define   \begin{align} \mu_{\min } := \lambda_{\min } \left( \sum_{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right) \; \textrm{and} \; \mu_{\max } := \lambda_{\max } \left( \sum_{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right). \end{align} (A.1) Then   \begin{equation} {\mathbb{P}} \left\{ \lambda_{\min } \left( \sum_{j} {\mathsf{X}}_{j} \right) {\leqslant} (1 - \delta) \mu_{\min } \right\} {\leqslant} d \, \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{\frac{\mu_{\min }}{R}} \textrm{for}\ \delta \in [0, 1],\end{equation} (A.2)  \begin{equation}{\hskip-40pt}\textrm{and}\ \; {\mathbb{P}} \left\{ \lambda_{\max } \left( \sum_{j} {\mathsf{X}}_{j} \right) {\geqslant} (1 + \delta) \mu_{\max } \right\} {\leqslant} d \, \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{\frac{\mu_{\max }}{R}} \textrm{for}\ \delta{\geqslant} 0. \end{equation} (A.3) We will also use the facts that, for all δ ∈ [0, 1],   \begin{align} \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{\mu_{\min }/R} \; {\leqslant} \; \exp \left( - \frac{\delta^{2} \mu_{\min }}{3 \, R}\right) \textrm{and} \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{\mu_{\max }/R} \; {\leqslant} \; \exp \left( - \frac{\delta^{2} \mu_{\max }}{3 \, R}\right). \end{align} (A.4) Proof of Theorem 2.2 We start by noticing that   \begin{align} {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \sum_{j = 1}^{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{l})} {{\mathsf{U}}}_{{k}} \right). \end{align} (A.5) We define   \begin{align} {\mathsf{X}}_{j} := \frac{1}{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{j})} {{\mathsf{U}}}_{{k}} \right) \quad \textrm{and} \quad{\mathsf{X}} := \sum_{j=1}^{{s}} {\mathsf{X}}_{j} = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}}^{2} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}}. \end{align} (A.6) The matrix X is thus a sum of s independent, random, self-adjoint, positive semi-definite matrices. We are in the setting of Lemma 1. We continue by computing $$ {\mathbb{E}} \, {\mathsf{X}}_{j}$$ and $$\lambda _{\max }( {\mathsf{X}}_{j})$$. The expected value of each Xj is   \begin{align} {\mathbb{E}} \, {\mathsf{X}}_{j} & = {\mathbb{E}} \left[ \frac{1}{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{j})} {{\mathsf{U}}}_{{k}} \right) \right] = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} \left(\sum_{\ell=1}^{{N}} {{\boldsymbol{p}}}_{\ell} \left({{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}{\mathsf{P}}^{(\ell)} \right) \left({\mathsf{P}}^{(\ell)} {{\mathsf{N}}}^{(\ell)} \right) \right) {{\mathsf{U}}}_{{k}} \nonumber \\ & = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} \left(\sum_{\ell=1}^{{N}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)} \right) {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \; {\mathsf{I}}. \end{align} (A.7) Therefore, $$ \lambda _{\min } \left ( \sum _{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right ) = 1 $$ and $$ \lambda _{\max } \left ( \sum _{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right ) = 1. $$ Furthermore, for all j = 1, …, s, we have   \begin{align} \lambda_{\max } ({\mathsf{X}}_{j}) = {\left\| {\mathsf{X}}_{j}\right\|}_{2} {\leqslant} \max_{1 {\leqslant} \ell{\leqslant} {N}} {\left\| \frac{{\mathsf{P}}^{(\ell)} {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}}{{s}}\right\|}_{2}^{2} = \frac{1}{{s}} \; \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{ \frac{{\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = \frac{{\nu}_{{{\boldsymbol{p}}}}^{2}}{{s}}. \end{align} (A.8) Lemma A1 yields, for any δ ∈ (0, 1),   \begin{equation} {\mathbb{P}} \left\{ \lambda_{\min } \left( {\mathsf{X}} \right) {\leqslant} (1 - \delta) \right\} \; {\leqslant} \; {k} \cdot \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{{s}/{\nu}_{{{\boldsymbol{p}}}}^{2}} \; {\leqslant} \; {k} \; \exp \left( - \frac{\delta^{2} {s}}{3 \, {\nu}_{{{\boldsymbol{p}}}}^{2}} \right)\end{equation} (A.9)  \begin{equation} \textrm{and } \; {\mathbb{P}} \left\{ \lambda_{\max } \left( {\mathsf{X}} \right) {\geqslant} (1 + \delta) \right\} \;\ {\leqslant} \; {k} \cdot \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{{s}/{\nu}_{{{\boldsymbol{p}}}}^{2}} \; {\leqslant} \; {k} \; \exp \left( - \frac{\delta^{2} {s}}{3 \, {\nu}_{{{\boldsymbol{p}}}}^{2}} \right). \end{equation} (A.10) Therefore, for any δ ∈ (0, 1), we have, with probability at least 1 − ξ,   \begin{align} 1 - \delta{\leqslant} \lambda_{\min } \left( {\mathsf{X}} \right) \quad \textrm{and} \quad \lambda_{\max } \left( {\mathsf{X}} \right) {\leqslant} 1+\delta \end{align} (A.11) provided that (2.15) holds. Noticing that (A11) implies that   \begin{align} (1-\delta) {\left\| {\boldsymbol{\alpha}}\right\|}_{2}^{2} {\leqslant} {\frac{1}{{s}}} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}} {\boldsymbol{\alpha}} \right\|}_{2}^{2} {\leqslant} (1+\delta) {\left\| {\boldsymbol{\alpha}}\right\|}_{2}^{2}, \end{align} (A.12) for all $$ {\boldsymbol{\alpha }} \in {\mathbb{R}}^{k}$$, which is equivalent to (2.14) for all x ∈ span(Uk), terminates the proof. Appendix B. Proof of Theorem 4.1 In order to prove Theorem 4.1, we need to establish few properties between the different matrices used in this work. The first useful property is   \begin{align} \widetilde{{\mathsf{P}}} \widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} = \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}}. \end{align} (B.1) Indeed, for any $$ {\boldsymbol{z}} \in {\mathbb{R}}^{ {n}}$$, the jth entry of $$\widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} {\boldsymbol{z}}$$ is   \begin{align} \left( \widetilde{{\mathsf{P}}} \widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} {\boldsymbol{z}} \right)_{j} = \frac{{\boldsymbol{1}}^{{{\intercal}}}{{\mathsf{N}}}^{(\omega_{j})} {\boldsymbol{z}} }{\left( {{\boldsymbol{p}}}_{\omega_{j}} \, {\left| \mathscr{N}_{\omega_{j}} \right|} \right)^{1/2}}. \end{align} (B.2) Then, the jth entry of $$\widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} {\boldsymbol{z}}$$ is the scaled sum of the values in the jth sampled group appearing in PMz, which is $$ { {\boldsymbol{p}}}_{\omega _{j}}^{-1/2} { {\mathsf{N}}}^{(\omega _{j})} {\boldsymbol{z}}$$. From the definition of $$\widetilde{ { {\mathsf{A}}}}$$, the sum is scaled by $$ {\left | \mathscr{N}_{\omega _{j}} \right |}^{-1/2}$$. Therefore, $$( \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} {\boldsymbol{z}} )_{j} = ( \widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} {\boldsymbol{z}} )_{j}$$ for all j ∈ {1, …, s}, which terminates the proof. The second property is   \begin{align} \widetilde{{{\mathsf{A}}}} \widetilde{{{\mathsf{A}}}}^{{{\intercal}}} = {\mathsf{I}}, \end{align} (B.3) which implies $$\| \widetilde{ { {\mathsf{A}}}} \|_{2} = 1$$. The third property is   \begin{align} \left\| \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{\boldsymbol{z}}} \right\|_{2} = {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{\boldsymbol{z}}} \right\|}_{2}, \end{align} (B.4) for all $$\tilde{ {\boldsymbol{z}}} \in {\mathbb{R}}^{ {N}}$$. To prove this property, we remark that $$ {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} = \widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}}. $$ Indeed, the entries in $$ {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}$$ corresponding to the first selected group $$\mathscr{N}_{\omega _{1}}$$ are all equal to $$\left ( { {\boldsymbol{p}}}_{\omega _{1}} \, {\left | \mathscr{N}_{\omega _{1}} \right |}\right )^{-1/2} \tilde{ {\boldsymbol{z}}}_{\omega _{1}}$$. There are $$ {\left | \mathscr{N}_{\omega _{1}} \right |}$$ such entries. One can also notice that the first $$ {\left | \mathscr{N}_{\omega _{1}} \right |}$$ entries of $$\widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}}$$ are also equal to $$\left ( { {\boldsymbol{p}}}_{\omega _{j}} \, {\left | \mathscr{N}_{\omega _{1}} \right |}\right )^{-1/2} \tilde{ {\boldsymbol{z}}}_{\omega _{1}}$$. Repeating this reasoning for all the sampled groups proves the equality. On the one side, we thus have $$ {\left \| {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}\right \|}_{2} = \| \widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2}, $$ where we used (B.3). On the other side, we have $$ \| \widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2}, $$ where we used (B.1) and (4.5). This terminates the proof. Proof of Theorem 4.1 As x* is a minimizer of (4.14), we have   \begin{align} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}^{*} \; {\leqslant} \; {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{align} (B.5) To prove the theorem, we need to lower and upper bound the left-hand and right-hand sides of (B.5), respectively. We start with the bound involving $$\widetilde{ { {\mathsf{L}}}}$$ and then with the ones involving $$\widetilde{ { {\mathsf{M}}}}$$. [Bounding the terms in (B.5) involving $$\widetilde{ { {\mathsf{L}}}}$$]. We define the matrices   \begin{equation} {\hskip-60pt}\bar{{{\mathsf{U}}}}_{{k}} := \left( {{\boldsymbol{u}}}_{{k}+1}, \ldots, {{\boldsymbol{u}}}_{{n}} \right) \in{\mathbb{R}}^{{n} \times ({n} - {k})}, \end{equation} (B.6)  \begin{equation}{\hskip-40pt}{\mathsf{G}}_{{k}} := {{\textrm{diag}}}\left( g({\lambda}_{1}), \ldots, g({\lambda}_{{k}}) \right) \in{\mathbb{R}}^{{k} \times{k}},\end{equation} (B.7)  \begin{equation}\bar{{\mathsf{G}}}_{{k}} := {{\textrm{diag}}}\left( g({\lambda}_{{k}+1}), \ldots, g({\lambda}_{{n}}) \right) \in{\mathbb{R}}^{({n} - {k}) \times ({n} - {k})}. \end{equation} (B.8) By definition of $$\boldsymbol{\alpha}^{*}$$ and $$\boldsymbol{ \beta}^{*}$$, $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} = {\boldsymbol{\alpha }}^{*} + {\boldsymbol{\beta }}^{*}$$ with $$\boldsymbol{\alpha}^{*}$$ ∈ span(Uk) and $$ {\boldsymbol{\beta }}^{*} \in {{\textrm{span}}}(\bar{ { {\mathsf{U}}}}_{ {k}})$$. We recall that $$\widetilde{ { {\mathsf{L}}}} = ( { {\mathsf{A}}} { {\mathsf{U}}}) \; g( { {\mathsf{\Lambda }}}) \; ( { {\mathsf{A}}} { {\mathsf{U}}})^{ {{\intercal }}}$$. Therefore, we obtain   \begin{align} (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \; \widetilde{{{\mathsf{L}}}} \; \tilde{{{\boldsymbol{x}}}}^{*} = \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\alpha}}^{*}\right)^{{{\intercal}}} \; {\mathsf{G}}_{{k}} \; \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\alpha}}^{*}\right) + \left(\bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\beta}}^{*}\right)^{{{\intercal}}} \; \bar{{\mathsf{G}}}_{{k}} \; \left(\bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\beta}}^{*}\right) {\geqslant} g({\lambda}_{{k}+1}) {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}^{2}. \end{align} (B.9) In the first step, we used the facts that $$\bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\alpha }}^{*} = {\boldsymbol{0}}$$ and $$ { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\beta }}^{*} = {\boldsymbol{0}}$$. The second step follows from the fact that $$ {\left \| \bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\beta }}^{*} \right \|}_{2} = {\left \| {\boldsymbol{\beta }}^{*} \right \|}_{2}$$. We also have   \begin{align} \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}} & = ({{\mathsf{U}}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}})^{{{\intercal}}} \; g({{\mathsf{\Lambda}}}) \; ({{\mathsf{U}}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}) \; {\leqslant} \; g({\lambda}_{{k}}) {\left\| {{\mathsf{U}}}_{{k}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}\right\|}_{2}^{2} \nonumber \\ & {\leqslant} g({\lambda}_{{k}}) {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} \; {\!\leqslant\!} \; g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) \left[{\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} +\! {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} ({{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}})\right\|}_{2}^{2} \right] \nonumber \\ & {\leqslant} g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + \varepsilon^{2} g({\lambda}_{{n}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2}. \end{align} (B.10) The second inequality follows from the facts that $$ {\left \| { {\mathsf{U}}}_{k}\right \|}_{2} = 1$$ and $$\tilde{ { {\boldsymbol{x}}}} = { {\mathsf{A}}} { {\boldsymbol{x}}}$$. To obtain the third inequality, we used $$ {\left \| { {\mathsf{A}}}\right \|}_{2} = 1$$ and the triangle inequality. For the last step, notice that $$ {\left \| \bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} { {\boldsymbol{x}}}\right \|}_{2} = 0$$ (as x ∈span(Uk)), $$ {\left \| \bar{ { {\mathsf{U}}}}_{k}\right \|}_{2} = 1$$ and use (4.6). [Bounding the terms in (B5) involving $$\widetilde{ { {\mathsf{M}}}}$$]. By definition of $$\tilde{ { {\boldsymbol{y}}}}$$, it is immediate that   \begin{align} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} = {\left\| \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{n}}}}\right\|}_{2}^{2}. \end{align} (B.11) For the other term involving $$\widetilde{ { {\mathsf{M}}}}$$, the triangle inequality yields   \begin{align*} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} & {\geqslant} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align*} Then, we have  \begin{equation} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} = {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} = {\left\| \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2}\end{equation} (B.12)  \begin{equation}{\hskip128pt}{\geqslant} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2}. \end{equation} (B.13) The first equality follows from $$ { {\mathsf{A}}} { {\mathsf{A}}}^{ {{\intercal }}} = {\mathsf{I}}$$, the second from (B.1) and the triangle inequality was used in the last step. To summarize, we are at   \begin{align} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} {\geqslant} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align} (B.14) We continue by lower bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} \|_{2}$$ and upper bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}} \|_{2}$$ separately. [Lower bound on $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} \|_{2}$$]. Equality (B.4) yields   \begin{align} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} = {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}. \end{align} (B.15) Using the triangle inequality and the fact that $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, we obtain   \begin{align} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} \nonumber \\ & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| {\mathsf{P}} {{\mathsf{M}}}\right\|}_{2} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}}\right\|}_{2} \nonumber \\ & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - \varepsilon \, M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.16) The restricted isometry property (2.14) then yields   \begin{align} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} & = {\left\| {\mathsf{P}} {{\mathsf{M}}} \, ({\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}}) + {\mathsf{P}} {{\mathsf{M}}} {\boldsymbol{\beta}}^{*}\right\|}_{2} {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} ({\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}})\right\|}_{2} - {\left\| {\mathsf{P}}{{\mathsf{M}}} {\boldsymbol{\beta}}^{*}\right\|}_{2} \nonumber \\ & {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}. \end{align} (B.17) We used the equality $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} = {\boldsymbol{\alpha }}^{*} + {\boldsymbol{\beta }}^{*}$$. We thus proved that   \begin{align} \| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} \|_{2} {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} - \varepsilon M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.18) [Upper bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}} \|_{2}$$]. We have   \begin{align} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \| \tilde{{{\mathsf{A}}}} \|_{2} {\left\| {\mathsf{P}} {{\mathsf{M}}}\right\|}_{2} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}} \right\|}_{2} {\leqslant} \varepsilon \, M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.19) We used the facts that $$\| \tilde{ { {\mathsf{A}}}} \|_{2} = 1$$ (see (B.3)) and that $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$ in the last inequality. Using the inequalities (B.18) and (B.19) in (B.14), we arrive at   \begin{align} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} - 2\varepsilon M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align} (B.20) [Finishing the proof]. Remark that (B.5) implies   \begin{equation} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} {\leqslant} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}},\end{equation} (B.21)  \begin{equation}{\hskip-5pt}\textrm{and} \ \ {\gamma} \; (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}^{*} {\leqslant} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{equation} (B.22) Using (B.9), (B.10), (B.11) in (B.22), we obtain   \begin{align} {\gamma} \; g({\lambda}_{{k}+1}) {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}^{2} {\leqslant} \; {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2}^{2} + \; {\gamma} \; g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + \varepsilon^{2} \; {\gamma} \; g({\lambda}_{{n}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2}, \end{align} (B.23) which implies (4.18) in Theorem 4.1. It remains to prove (4.17) to finish the proof. Using (B.10), (B.11) and (B.20) in (B.21), we obtain   \begin{align} \sqrt{{s} (1-\delta)} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} {\leqslant} 2 {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2} + M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} + (\sqrt{{\gamma} g({\lambda}_{{k}})} + \varepsilon \sqrt{{\gamma} g({\lambda}_{{n}})} + 2 \varepsilon M_{\max }) {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.24) Using (4.18) to bound $$ {\left \| {\boldsymbol{\beta }}^{*}\right \|}_{2}$$ on the right-hand side, we have   \begin{align} \sqrt{{s} (1-\delta)} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} \; {\leqslant} &\; 2 {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2} + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}} {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}}\right\|}_{2} + M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} \nonumber \\ & + \varepsilon \; M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} + \left(\sqrt{{\gamma} g({\lambda}_{{k}})} + \varepsilon \sqrt{{\gamma} g({\lambda}_{{n}})} + 2 \varepsilon M_{\max } \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}\end{align} (B.25)  \begin{align}{\hskip75pt}=& \left( 2 + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}}\right){\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}}\right\|}_{2} + \left( M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{k}})}\right) {\left\| {{\boldsymbol{x}}} \right\|}_{2} \nonumber \\ & + \varepsilon \left( 2 M_{\max } + M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{n}})} \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.26) We only rearranged the term in the last step. This proves (4.17) and terminates the proof. © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information and Inference: A Journal of the IMA Oxford University Press

Structured sampling and fast reconstruction of smooth graph signals

Loading next page...
 
/lp/ou_press/structured-sampling-and-fast-reconstruction-of-smooth-graph-signals-elHoGTybuU
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
ISSN
2049-8764
eISSN
2049-8772
D.O.I.
10.1093/imaiai/iax021
Publisher site
See Article on Publisher Site

Abstract

Abstract This work concerns sampling of smooth signals on arbitrary graphs. We first study a structured sampling strategy for such smooth graph signals that consists of a random selection of few pre-defined groups of nodes. The number of groups to sample to stably embed the set of k-bandlimited signals is driven by a quantity called the group graph cumulative coherence. For some optimized sampling distributions, we show that sampling $$O( {k}\log ( {k}))$$ groups is always sufficient to stably embed the set of k-bandlimited signals, but that this number can be smaller—down to $$O(\log ( {k}))$$—depending on the structure of the groups of nodes. Fast methods to approximate these sampling distributions are detailed. Secondly, we consider k-bandlimited signals that are nearly piecewise constant over pre-defined groups of nodes. We show that it is possible to speed up the reconstruction of such signals by reducing drastically the dimension of the vectors to reconstruct. When combined with the proposed structured sampling procedure, we prove that the method provides stable and accurate reconstruction of the original signal. Finally, we present numerical experiments that illustrate our theoretical results and, as an example, show how to combine these methods for interactive object segmentation in an image using superpixels. 1. Introduction This work was initially inspired by studying developments in edit propagation for interactive image or video manipulations, where the goal is to propagate operations made by a user in some parts of the image to the entire image, e.g., propagate foreground/background scribbles for object segmentation [10,15,17,20,40] or propagate manual color/tone modifications [15,17,24,27,40]. First, a graph $$ { {\mathscr{G}}}$$ modelling the similarities between the pixels of the image is built. Secondly, the user-edits specify values at some nodes (pixels) of $$ { {\mathscr{G}}}$$. Finally, the complete signal is estimated by assuming that it is smooth on $$\mathscr{G}$$. The quality of the propagation depends on the structure of $$\mathscr{G}$$ and on the location of annotated nodes. If a part of the image is weakly connected to the rest, the user-edits do not propagate well to this region unless this region is edited directly. Therefore, highlighting beforehand which regions or groups of nodes are important to edit to ensure a good propagation would be a useful feature to facilitate user interactions. Furthermore, designing a fast reconstruction/propagation method is also important so that the user can visualize immediately the effect of his inputs. These two problems can be addressed using graph signal processing results [35], or, more precisely, using results from the sampling theory of smooth or k-bandlimited graph signals. Indeed, the propagation of the user-edits is exactly a reconstruction of smooth graph signals from few samples. The user could thus be offered to edit few nodes from which the theory guarantees that a stable and accurate reconstruction is possible. Graph signal reconstruction algorithms developed in this field could also, probably, be used to improve and accelerate the reconstruction process. In the context of graph signal processing, smooth or k-bandlimited signal is a widely used and studied model. Several sampling methods have been designed to sample such signals. Pesenson introduced the notion of uniqueness set (of nodes) for k-bandlimited graph signals in [30,31]. Two different k-bandlimited signals are also necessarily different when restricted to a uniqueness set. Therefore, one can sample all k-bandlimited signals on a uniqueness set. Then, Anis et al. [3,5] and Chen et al. [12,13] proved that one can always find a unique set of k nodes to sample all k-bandlimited signals. Finding this set is however computationally expensive. In [4], graph spectral proxies are used to find such a set more efficiently. Yet the combinatorial problem that needs to be solved to find such a set makes the method still difficult to use for large graphs. Other authors used the idea of random sampling to be able to handle large graphs [13,14,32]. Recently, Puy et al. [32] proved that there always exists a random sampling strategy for which sampling $$O( {k}\log ( {k}))$$ nodes are sufficient to stably embed the set of bandlimited signals. 1.1. Illustrating some limits of the current theory As an example, let us take the task of interactive object segmentation with the image presented in Fig. 1. The object to segment is the tiger. Following a procedure similar to the one described in [16], we first build a graph $$\mathscr{G}$$ that models similarities between the pixels of the image and suppose that the segmentation map to estimate—the binary signal indicating the presence or absence of the object at each pixel—is k-bandlimited on $$\mathscr{G}$$ (see Section 5.2 for details). Once $$\mathscr{G}$$ is constructed, the sampling theory of smooth graph signals indicates how to determine a set about k nodes from which one can reconstruct all k-bandlimited signals, thus including the segmentation map of interest. We can thus propose to the user to label these sampled k nodes to segment the object, the labels indicating whether each sampled pixel belongs to the object or not. We present in Fig. 1 the optimal sampling distribution as computed in [32] and a random draw of about 8% of the pixels according to this sampling method (see Section 5.2 for details). Whilst labelling these pixels guarantees an accurate estimation of the segmentation map, we find this labelling task not user-friendly. We notice that a non-negligible number of pixels are located near the edges of the object. It is cumbersome to determine whether these pixels belong to the object or not. More importantly, experiments in Section 5.2 show that too large a number of sampled pixels for manual labelling are necessary to achieve satisfactory results with this method. Fig. 1. View largeDownload slide One image (top left) and its partition into superpixels with SLIC technique [1] (top right). The optimal sampling distribution as computed in [32] and one draw of pixels according to this distribution (the sampled pixels appears in white). Fig. 1. View largeDownload slide One image (top left) and its partition into superpixels with SLIC technique [1] (top right). The optimal sampling distribution as computed in [32] and one draw of pixels according to this distribution (the sampled pixels appears in white). In order to facilitate the labelling task for the user, we partition the image into superpixels, e.g., with the SLIC technique [1]. A superpixel is a small group of spatially connected pixels where the image varies only little. We see in Fig. 1 that the superpixels divide the image into homogeneous regions. The superpixels follow the edges and most of them thus belong to either the tiger or the background, but rarely both. Superpixels are sometimes used among other things to speed up computations, e.g., in [15]. The advantage of using superpixels is twofold in this application. First, it is easier to determine if a superpixel belongs to the object of interest than if a pixel belongs to this object, especially at the boundaries, as recently exploited for segmentation on touchscreens [22]. Labelling superpixels also permits to label many pixels at the same time—by attributing to all pixels within a superpixel the label of the superpixels. Proposing superpixels rather than pixels to edit to the user is thus much more effective. Experiments in Section 5.2 show that one needs to sample fewer superpixels than pixels to reach the same segmentation quality. Secondly, one can notice that the segmentation map, beyond being smooth on $$\mathscr{G}$$, will also be approximately piecewise constant on the superpixels. We exploit this property to reduce the dimension of the problem and design a faster reconstruction method. The idea is to reconstruct the mean value of the segmentation map within each superpixel instead of the value of the segmentation map at each pixel. Whilst the above modifications might seem simple to implement, they raise several theoretical questions. First, we should answer how this structured sampling affects the sampling performance, e.g., how many superpixels should be sampled to ensure an accurate reconstruction of k-bandlimited signals? What is the best sampling distribution? Secondly, we should also quantify the loss of reconstruction quality when estimating only the mean value of the segmentation map instead of the segmentation map itself. Let us remark that the impact of the proposed sampling technique is not limited to interactive object segmentation. For example, the exact same technique can be applied to image colorization, color/tone modifications or matting [16]. This technique might also be useful to speed up the compressive spectral clustering algorithm presented in [38]. In particular, one could accelerate the last step of this algorithm—the reconstruction of the indicator function of each cluster—by pre-grouping few nodes belonging to a same cluster rapidly, e.g. with few steps of simple agglomerative clustering. 1.2. Contributions In this paper, we study a random structured sampling strategy for k-bandlimited signals where we sample few groups of nodes instead of sampling individual nodes. The random sampling strategy that we propose generalizes the method proposed in [32]. Let us already acknowledge that such strategies are also studied in the field of compressed sensing [2,7,9,11] and that some of our solutions are directly inspired by these works. First, we show that the importance of sampling each group can be quantified by a parameter that we call the group graph cumulative coherence, which generalizes the concept of graph cumulative coherence introduced in [32], and characterizes how much the energy of k-bandlimited signals can stay concentrated in each group of nodes. In particular, we show that the number of groups to sample is directly linked to this quantity. Secondly, just as in [32], we optimize the sampling distribution to minimize the number of groups to sample. With this optimized sampling distribution, we prove that, in the worst case, sampling $$O( {k} \log ( {k}))$$ groups of nodes is sufficient to ensure the reconstruction of all k-bandlimited signals. As each group can contain many nodes, we might have to sample a large number of nodes. This is the potential price to pay when sampling the nodes by groups. Fortunately, a smaller number of groups—down to $$O(\log ( {k}))$$—might already be sufficient if the groups are well designed. Thirdly, we also describe a method to estimate the optimal sampling distribution without the need of computing the graph Fourier matrix and which thus scales better to large graphs. Fourthly, estimating the optimal sampling distribution may still be too slow when a large number of groups are involved. We thus also present a sufficient recovery condition that involves a relaxed version of group graph cumulative coherence. The sampling distribution that minimizes this relaxed coherence is fast to estimate for large graphs and large number of groups. With this sampling strategy, we prove that sampling $$O( {k} \log ( {k}))$$ groups is always sufficient to ensure the reconstruction of all k-bandlimited signals. This strategy is mainly interesting at small k. Finally, in order to build a fast reconstruction technique for k-bandlimited signals, we use the intuition that a smooth graph signal is a signal that varies slowly from one node to its connected nodes. If we group few (strongly) connected nodes together, we usually expect a bandlimited signal to be essentially constant on this set of nodes; as long as we do not group together too many weakly connected nodes. We propose here to use this property to drastically reduce the dimension of the reconstruction problem and accelerate this reconstruction. When combined with the proposed sampling technique, we prove that this fast method provides stable and accurate reconstructions of the signals of interest. 1.3. Notations and definitions For any vector $$ \boldsymbol{x} \in {\mathbb{R}}^{ {n}_{1}}$$, $$ {\left \| { {\boldsymbol{x}}}\right \|}_{2}$$ denotes the Euclidean norm of x. Depending on the context, xj may represent the jth entry of the vector x or the jth column-vector of the matrix X. The entry on the ith row and jth column of X is denoted by Xij. The identity matrix is denoted by I (its dimensions are determined by the context). We consider that $$\mathscr{G} = \{\mathscr{V}, \mathscr{E}, {\mathsf{W}} \}$$ is an undirected weighted graph, where $$\mathscr{V}$$ is the set of n nodes, $$\mathscr{E}$$ is the set of edges and $$ {\mathsf{W}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the weighted adjacency matrix with non-negative entries. We denote the graph Laplacian by $$ { {\mathsf{L}}} \in {\mathbb{R}}^{ {n} \times {n}}$$. We assume that L is real, symmetric and positive semi-definite, e.g., the combinatorial graph Laplacian L := D −W or the normalized one L := I −D−1/2WD−1/2. The matrix $$ {\mathsf{D}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the diagonal degree matrix and $$ {\mathsf{I}} \in {\mathbb{R}}^{ {n} \times {n}}$$ is the identity matrix [18]. The diagonal degree matrix D has entries $$D_{ii} := \sum _{i \neq j} {\mathsf{W}}_{ij}$$. We denote by $$ { {\mathsf{U}}} \in {\mathbb{R}}^{ {n} \times {n}}$$ the orthonormal eigenvectors of L and by $$0 = {\lambda }_{1} {\leqslant } \ldots {\leqslant } {\lambda }_{n}$$ the ordered real eigenvalues of L. We have $$ { {\mathsf{L}}} = { {\mathsf{U}}} { {\mathsf{\Lambda }}} { {\mathsf{U}}}^{ {{\intercal }}}$$, where $$ { {\mathsf{\Lambda }}} := {{\textrm{diag}}}( {\lambda }_{1}, \ldots , {\lambda }_{n}) \in {\mathbb{R}}^{ {n} \times {n}}$$. The matrix U is the graph Fourier basis [35]. For any signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ defined on the nodes of the graph $$\mathscr{G}$$, $$\hat{ { {\boldsymbol{x}}}} = { {\mathsf{U}}}^{ {{\intercal }}} { {\boldsymbol{x}}}$$ contains the Fourier coefficients of x ordered in increasing frequencies. This work deals with k-bandlimited signals $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ on $$\mathscr{G}$$, i.e., signals whose Fourier coefficients $$\hat{ { {\boldsymbol{x}}}}_{k+1}, \ldots , \hat{ { {\boldsymbol{x}}}}_{ {n}}$$ are null. Let Uk be the restriction of U to its first k vectors:   \begin{align} {{\mathsf{U}}}_{{k}} := \left( {{\boldsymbol{u}}}_{1}, \ldots, {{\boldsymbol{u}}}_{{k}} \right) \in{\mathbb{R}}^{{n} \times{k}}. \end{align} (1.1) Definition 1.1 (k-bandlimited signal on $$\mathscr{G}$$) A signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {n}}$$ defined on the nodes of the graph $$\mathscr{G}$$ is k-bandlimited with $$ {k} \in {\mathbb{N}}\setminus \{0\}$$ if x ∈ span(Uk), i.e., there exists $$ {\boldsymbol{\eta }} \in {\mathbb{R}}^{k}$$ such that   \begin{align} {{\boldsymbol{x}}} = {{\mathsf{U}}}_{{k}} {\boldsymbol{\eta}}. \end{align} (1.2) This definition was also used in [3,13,32]. We assume that $$ {\lambda}_{ {k}} \neq {\lambda }_{ {k}+1}$$ to avoid any ambiguity in the definition of k-bandlimited signals. Finally, for any matrix $$ {\mathsf{X}} \in {\mathbb{R}}^{ {n}_{1} \times {n}_{2}}$$, $$ {\left \| {\mathsf{X}}\right \|}_{2}$$ denotes its spectral norm and $$ {\left \| {\mathsf{X}}\right \|}_{F}$$ its Frobenius norm; when n1 = n2, $$\lambda _{\max }( {\mathsf{X}})$$ denotes its largest eigenvalue and $$\lambda _{\min }( {\mathsf{X}})$$ its smallest eigenvalue. We present in Fig. 2 a representation of the important variables and processes involved in this paper in order to facilitate the understanding of the different results. Fig. 2. View largeDownload slide Summary of the main variables and processes involved in the presented sampling strategies. All variables are defined when needed hereafter. Fig. 2. View largeDownload slide Summary of the main variables and processes involved in the presented sampling strategies. All variables are defined when needed hereafter. 2. Sampling using groups of nodes In this section, we define and study our structured sampling strategy. This sampling generalizes the work in [32], building on, and benefiting from, the principles introduced therein. We highlight during the course of this section which principles are shared between both techniques and what are the important changes. We also summarize these similarities and changes at the end of the section. We start the study with the definition of the groups of nodes. 2.1. Grouping the nodes We consider that the n nodes of $$\mathscr{G}$$ are divided into N different groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}} \subseteq \{1, \ldots , {n}\}$$. We assume that this division into different groups is given by an external algorithm or imposed by the application. The size of the ℓth group is denoted $$ {\left | \mathscr{N}_{\ell } \right |}$$. We suppose that these groups form a partition of {1, …, n}, so that each node belongs to exactly one group. We have   \begin{align} \cup_{\ell=1}^{N} \, \mathscr{N}_{\ell} \; = \; \{1, \ldots, {n} \}\ \textrm{and}\ \mathscr{N}_{\ell} \, \cap \, \mathscr{N}_{\ell^{\prime}} = \emptyset. \end{align} (2.1) For the object segmentation application discussed in the introduction, these groups represent the superpixels. However, we do not impose the groups to be made of neighbouring nodes in the graph; they can be made of nodes ‘far’ from each other. For each group $$\mathscr{N}_{\ell } = \left \{ {n}_{1}^{(\ell )}, \ldots , {n}_{ {\left | \mathscr{N}_{\ell } \right |}}^{(\ell )} \right \}$$, we associate a matrix $$ { {\mathsf{N}}}^{(\ell )}\in {\mathbb{R}}^{ {\left | \mathscr{N}_{\ell } \right |} \times {n}}$$ that restricts a graph signal to the nodes appearing in $$\mathscr{N}_{\ell }$$, i.e.,   \begin{align} {{\mathsf{N}}}^{(\ell)}_{ij} := \begin{cases} 1 & \textrm{for}\ j = {n}_{i}^{(\ell)},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (2.2) Note that   \begin{align} \sum_{\ell=1}^{{N}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)} = {\mathsf{I}}. \end{align} (2.3) The case of overlapping groups can be handled by changing the definition of N(ℓ) to   \begin{align} {{\mathsf{N}}}^{(\ell)}_{ij} := \begin{cases} \beta_{{n}_{i}^{(\ell)}}^{-1/2} & \textrm{for}\ j = {n}_{i}^{(\ell)},\\ 0 & \textrm{otherwise}, \end{cases} \end{align} (2.4) where $$1 {\leqslant } \beta _{i} {\leqslant } {N}$$, i = 1, …, n, is the number of times node i appears in the different groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$. Equation (2.3) also holds in this case. All results presented in Section 2 are valid for overlapping groups with this definition of N(ℓ). 2.2. Sampling the groups The sampling procedure consists of selecting s groups out of the N available ones. In the application of Section 1.1, it corresponds to the selection of the superpixels to label. We select these groups at random using a sampling distribution on {1, …, N} represented by a vector $$ { {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$. The probability of selecting the ℓth group is pℓ. We assume that pℓ > 0 for all ℓ = 1, …, N, so that all groups may be selected with a non-zero probability. We obviously have $$\sum _{\ell =1}^{ {N}} { {\boldsymbol{p}}}_{\ell } = 1$$. The indices Ω := {ω1, …, ωs} of the selected groups are obtained by drawing independently—thus with replacements—s indices from the set {1, …, N} according to p, i.e.,   \begin{align} {\mathbb{P}}(\omega_{j} = \ell) = {{\boldsymbol{p}}}_{\ell},\quad \forall j \in \{1, \ldots, {s} \}\ \textrm{and}\ \forall \ell \in \{1, \ldots, {N} \}. \end{align} (2.5) The selected groups are $$\mathscr{N}_{\omega _{1}}, \ldots , \mathscr{N}_{\omega _{ {s}}}$$ and the total number of selected nodes is   \begin{align} {m} := \sum_{j=1}^{{s}} {\left| \mathscr{N}_{\omega_{j}} \right|}. \end{align} (2.6) Once the groups are selected, we build the sampling matrix $$ { {\mathsf{M}}} \in {\mathbb{R}}^{ {m} \times {n}}$$ that satisfies   \begin{align} {{\mathsf{M}}} := \left( \begin{array}{c} {{\mathsf{N}}}^{(\omega_{1})}\\ \vdots\\{{\mathsf{N}}}^{(\omega_{{s}})} \end{array} \right), \end{align} (2.7) and which restricts any signal to the nodes belonging to the selected groups. For a signal $$ { {\boldsymbol{x}}} \in {\mathbb{R}}^{n}$$ defined on the nodes of $$\mathscr{G}$$, its sampled version $$ { {\boldsymbol{y}}} \in {\mathbb{R}}^{ {m}}$$ satisfies   \begin{align} {{\boldsymbol{y}}} := {{\mathsf{M}}} {{\boldsymbol{x}}}. \end{align} (2.8) Our goal now is to determine what number s is enough to ensure that all k-bandlimited signals can be reconstructed from its sampled version obtained with M. To conduct this study, we need to define few more matrices. First, we associate the matrix   \begin{align} {\mathsf{P}}^{(\ell)} := {{\boldsymbol{p}}}_{\ell}^{-1/2} \; {\mathsf{I}} \in{\mathbb{R}}^{{\left| \mathscr{N}_{\ell} \right|} \times{\left| \mathscr{N}_{\ell} \right|}}, \end{align} (2.9) to each group $$\mathscr{N}_{\ell }$$. Then, once the groups are drawn, we construct the diagonal matrix $$ {\mathsf{P}} \in {\mathbb{R}}^{ {m} \times {m}}$$  \begin{align} {\mathsf{P}} := {{\textrm{diag}}}\left( {\mathsf{P}}^{(\omega_{1})}, \ldots, {\mathsf{P}}^{(\omega_{{s}})} \right). \end{align} (2.10) This matrix takes into account the probability of sampling each group and will be used to rescale y for norm preservation. This matrix ensures that $$ {s}^{-1} \, {\mathbb{E}}_{\varOmega } {\left \| {\mathsf{P}} { {\boldsymbol{y}}}\right \|}_{2}^{2} = {s}^{-1} \, {\mathbb{E}}_{\varOmega } {\left \| {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}}\right \|}_{2}^{2} = {\left \| { {\boldsymbol{x}}}\right \|}_{2}^{2}$$.1 Both matrices P and M depend on Ω and are random. 2.3. Group graph coherence The sampling procedure in [32] is similar to the one proposed here at the difference that the nodes are sampled individually and not by groups. It was proved there that the number of nodes to sample is driven by a quantity called the graph coherence. This quantity measures how the energy of k-bandlimited signals spreads over the nodes. Similarly, we prove here that the number of groups to sample is driven by a quantity that measures the energy of k-bandlimited signals spread over the groups. We now introduce this quantity. The matrix N(ℓ)Uk is the matrix that restricts a k-bandlimited signal to the nodes belonging to $$\mathscr{N}_{\ell }$$. Therefore,   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2} = \sup_{{\boldsymbol{\eta}} \in{\mathbb{R}}^{k} : {\left\| {\boldsymbol{\eta}}\right\|}_{2}=1} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}} {\boldsymbol{\eta}}\right\|}_{2} \end{align} (2.11) measures the energy on the nodes $$\mathscr{N}_{\ell }$$ of the normalized k-bandlimited signal that is most concentrated on $$\mathscr{N}_{\ell }$$. This energy varies between 0 and 1. When this energy is close to 1, there exists a k-bandlimited signal whose energy is essential concentrated on $$\mathscr{N}_{\ell }$$. This signal lives only on the nodes in $$\mathscr{N}_{\ell }$$ and does not spread elsewhere. On the contrary, when this energy is close to 0, there is no k-bandlimited signal living only on $$\mathscr{N}_{\ell }$$. The sampling distribution p is adapted to the graph and the structure of the groups if: whenever ∥N(ℓ)Uk∥2 is high, pℓ is high; whenever ∥N(ℓ)Uk∥2 is small, pℓ is small. In other words, the ratio between ∥N(ℓ)Uk∥2 and pℓ should be as constant as possible. This ensures that the groups where some k-bandlimited signals are concentrated are sampled with higher probability. Similarly to what was done in [32] with individual nodes, we define the group graph weighted coherence as the largest ratio between ∥N(ℓ)Uk∥2 and $$ { {\boldsymbol{p}}}_{\ell }^{-1/2}$$. Definition 2.1 (Group graph cumulative coherence) The group graph cumulative coherence of order k is   \begin{align} {\nu}_{{{\boldsymbol{p}}}} := \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\, {{\boldsymbol{p}}}_{\ell}^{-1/2} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2} \right\}. \end{align} (2.12) The quantity ∥N(ℓ)Uk∥2 is called the local group graph coherence. In the extreme case where the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ reduce all to singletons, we recover the definition of the graph weighted coherence introduced in [32]. It is easy to prove that νp is lower bounded by 1. Indeed, for any $$ {\boldsymbol{\eta }} \in {\mathbb{R}}^{ {k}}$$ with $$ {\left \| {\boldsymbol{\eta }}\right \|}_{2} = 1$$, we have   \begin{align} 1& = {\left\| {{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2} = \sum_{\ell=1}^{N} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2} = \sum_{\ell=1}^{N} {{\boldsymbol{p}}}_{\ell} \cdot \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} {\leqslant} {\left\| {{\boldsymbol{p}}}\right\|}_{1} \cdot \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} \nonumber \\ & = \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {\boldsymbol{\eta}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} {\leqslant} \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = {\nu}_{{{\boldsymbol{p}}}}^{2}. \end{align} (2.13) We have νp = 1 in, for example, the degenerated case where $$\mathscr{N}_{1} = \{1, \ldots , {n}\}$$. 2.4. Stable embedding We now have all the tools to present our main theorem that shows that sampling $$O\left ( {\nu }_{ { {\boldsymbol{p}}}}^{2} \log ( {k})\right )$$ groups is sufficient to stably embed the whole set of k-bandlimited signals. Hence, it is possible to reconstruct any x ∈ span(Uk) from its measurements y = Mx. Theorem 2.2 (Restricted isometry property—RIP) Let M be a random subsampling matrix constructed as in (2.7) using the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and the sampling distribution p. For any δ, ξ ∈ (0, 1), with probability at least 1 − ξ,   \begin{align} (1 - \delta) {\left\|\, {{\boldsymbol{x}}}\right\|}_{2}^{2} {\leqslant} {\frac{1}{{s}}} {\left\| {\mathsf{P}} {{\mathsf{M}}} \; {{\boldsymbol{x}}}\right\|}_{2}^{2} {\leqslant} (1 + \delta) {\left\|\, {{\boldsymbol{x}}}\right\|}_{2}^{2} \end{align} (2.14) for all x ∈ span(Uk) provided that   \begin{align} {s} {\geqslant} \frac{3}{\delta^{2}} \; {\nu}_{{{\boldsymbol{p}}}}^{2} \; \log\left( \frac{2{k}}{\xi} \right). \end{align} (2.15) Proof. See Appendix A. In the above theorem, we recall that s is the number of selected groups, each of them containing several nodes. We thus control the number of groups to sample and not directly the number of nodes. As the lower bound on νp is 1, sampling $$O(\log ( {k}))$$ groups might already be sufficient if the groups and the sampling distribution are well-designed. The number of groups to sample is driven by νp, which itself depends on the structure of the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and on the sampling distribution p. To reduce the number of samples to measure, we might optimize the structure of the groups and the sampling distribution. For example, if we were able to construct n/L groups (L ≥ 1) such that ∥N(ℓ)Uk∥2 ≈ (L/n)1/2—i.e., no k-bandlimited signals have more than 100 ⋅ (L/n)1/2 percent of its energy concentrated in each group—then setting pℓ = L/n, ℓ = 1, …, n/L, would yield νp ≈ 1. In this case, sampling one group is enough to embed the set of k-bandlimited signals. However, it is not obvious how we can construct such groups in practice and we might not even have the flexibility to modify the structure of the groups. In such a case, the only possibility to reduce the number of measurements is to optimize the sampling distribution p to minimize νp. The sampling distribution minimizing the coherence νp is the distribution $$ { {\boldsymbol{p}}}^{*} \in {\mathbb{R}}^{ {N}}$$ that satisfies   \begin{align} {{\boldsymbol{p}}}^{*}_{\ell} := \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}}{\sum_{\ell^{\prime}=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}},\ \textrm{for all}\ \ell \in \{1, \ldots, {N} \}, \end{align} (2.16) and for which   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}. \end{align} (2.17) Indeed, let $$ {\boldsymbol{p}}^{\prime } \neq { {\boldsymbol{p}}}^{*}$$ be another sampling distribution. As the entries of p′ and p* are non-negative and sum to 1, we necessarily have $$ {\boldsymbol{p}}^{\prime }_{\ell ^{\prime }} < { {\boldsymbol{p}}}_{\ell ^{\prime }}^{*}$$ for some ℓ′ > 0. Then,   \begin{align} {\nu}_{{\boldsymbol{p}}^{\prime}}^{2} {\geqslant}{{\boldsymbol{p}}^{\prime}_{\ell^{\prime}}}^{-1} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2}> {{{\boldsymbol{p}}}_{\ell^{\prime}}^{*}}^{-1} {\left\| {{\mathsf{N}}}^{(\ell^{\prime})}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} = {\nu}_{{{\boldsymbol{p}}}^{*}}^{2}, \end{align} (2.18) where the last equality is obtained by replacing $$ { {\boldsymbol{p}}}_{\ell ^{\prime }}^{*}$$ with its value. Therefore, $$\nu_{\boldsymbol{p}^{\prime}} > \nu_{\boldsymbol{p}^{\ast}}$$ for any $$ {\boldsymbol{p}}^{\prime } \neq { {\boldsymbol{p}}}^{*}$$. As similar proof can be found in, e.g., [11] where the authors derive the optimal sampling distribution for a compressive system. We notice that   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} {\leqslant} \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2} = {k}. \end{align} (2.19) Hence, by using this distribution, (2.15) shows that sampling $$O( {k}\log ( {k}))$$ groups is always sufficient to sample all k-bandlimited signals. The exact number is proportional to $$ {\nu }_{ { {\boldsymbol{p}}}^{*}}^{2} \log ( {k})$$. This is not in contradiction with the fact that at least k measurements are required in total as each group contains at least one node. We also have   \begin{align} {\nu}_{{{\boldsymbol{p}}}^{*}}^{2} = \sum_{\ell=1}^{{N}} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} {\leqslant} {N}, \end{align} (2.20) as $$ {\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k}\right \|}_{2}^{2} {\leqslant } 1$$. Therefore, in any case, the bound never suggests to sample much more than N groups, as one would expect. We recall that the results in [32] proves that it is always sufficient to sample $$O( {k}\log ( {k}))$$ nodes to embed the set of k-bandlimited signals. When sampling the nodes by groups, it is the number of groups to sample that should be $$O( {k}\log ( {k}))$$. This can be a large number of individual nodes, but this is the potential price to pay when sampling the nodes by groups, if the groups are not optimized to achieve the best sampling rate. Variable density sampling [2,23,33] and structured sampling [7,9,11] are also important topics in compressed sensing. The method proposed here is closely inspired by these studies, especially by [7,9]. Our results thus share several similarities with these works. We however benefit from a simpler signal model and take advantage of the graph structure to refine the results, propose simpler decoders to reconstruct the signal and design efficient algorithms to estimate p*. 2.5. A more practical result at small k’s Optimizing the sampling distribution reduces to estimating the spectral norm of the matrices N(ℓ)Uk. We will present in Section 3 a method avoiding the computation of Uk. However, the method might still be too slow when a large number of groups is involved as it requires an estimation of a spectral norm for each group separately. It is thus interesting to characterize the performance of the proposed method using other quantities easier to compute. In this section, we present results involving the following quantity   \begin{align} \bar{{\nu}}_{{{\boldsymbol{p}}}} := \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{ {{\boldsymbol{p}}}_{\ell}^{-1/2} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{F} \right\}. \end{align} (2.21) The only difference between νp and $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ is that we substituted the Frobenius norm for the spectral norm. As $$ {\left \| {\mathsf{X}}\right \|}_{2} {\leqslant } {\left \| {\mathsf{X}}\right \|}_{F}$$ for any matrix X, we have $$ {\nu }_{ { {\boldsymbol{p}}}} {\leqslant } \bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$; hence the results involving $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ will be more pessimistic than those involving νp. We have $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}} {\geqslant } {k}$$. Indeed,   \begin{align} k & = {\left\| {{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{\ell=1}^{N} {\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{\ell=1}^{N} {{\boldsymbol{p}}}_{\ell} \cdot \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{{\boldsymbol{p}}}_{\ell}} {\leqslant} {\left\| {{\boldsymbol{p}}}\right\|}_{1} \cdot \max_{1 {\leqslant} \ell{\leqslant} {s}} \left\{\frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = \bar{{\nu}}_{{{\boldsymbol{p}}}}^{2}. \end{align} (2.22) As with νp, the lower bound is also attained in, for example, the degenerated case where $$\mathscr{N}_{1} = \{1, \ldots , {n}\}$$. As $$ {\nu }_{ { {\boldsymbol{p}}}} {\leqslant } \bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$, we have the following corollary to Theorem 2.2. Corollary 2.1 Let M be a random subsampling matrix constructed as in (2.7) using the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ and the sampling distribution p. For any δ, ξ ∈ (0, 1), with probability at least 1 − ξ, (2.14) holds for all x ∈ span(Uk) provided that   \begin{align} {s} {\geqslant} \frac{3}{\delta^{2}} \; \bar{{\nu}}_{{{\boldsymbol{p}}}}^{2} \; \log\left( \frac{2{k}}{\xi} \right). \end{align} (2.23) Proof. As $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}} {\geqslant } {\nu }_{ { {\boldsymbol{p}}}}$$, (2.23) implies (2.15). Theorem 2.2 then proves that (2.14) holds with probability at least 1 − ξ for all x ∈ span(Uk). The sufficient condition (2.23) can be much more pessimistic than (2.15). Indeed, as we have $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}^{2} {\geqslant } {k}$$, Condition (2.23) suggests to always sample more than $$O( {k}\log ( {k}))$$ groups, whilst we know that sampling $$O(\log ( {k}))$$ can be enough. As we have done with νp, we can also optimize p to minimize $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$. The sampling distribution that minimizes $$\bar{ {\nu }}_{ { {\boldsymbol{p}}}}$$ is the distribution q* satisfying   \begin{align} {\boldsymbol{q}}^{*}_{\ell} := \frac{{\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2}}{{k}},\ \textrm{for all}\ l \in \{1, \ldots, {N} \}, \end{align} (2.24) and for which   \begin{align} \bar{{\nu}}_{{\boldsymbol{q}}^{*}}^{2} = {k}. \end{align} (2.25) With this distribution, (2.23) proves that sampling $$O( {k}\log ( {k}))$$ groups is always enough to sample all k-bandlimited signals. Let us discuss some cases where this non-optimal result is interesting. First, this result can be useful at small k’s because estimating q* is easier and faster than estimating p* (see Section 3). The gain of computation time might compensate the loss of performance in terms of number of measurements. Secondly, at large k’s, (2.23) is a pessimistic bound. In reality, we may have $$ {\nu }_{ {\boldsymbol{q}}^{*}}^{2} {\leqslant } {k}$$ so that, according to (2.15), fewer samples than $$O( {k}\log ( {k}))$$ are actually sufficient when using q*. Finally, q* might actually be close to p*, in which case we would reach almost optimal results with q*. This occurs when $$\| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{F}^{2} \approx \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{2}^{2}$$, which is the case when N(ℓ)Uk is low rank. We discuss in the next paragraph one case where this matrix is low rank. Let us consider a community graph with k communities. In the degenerated (perfect) case where no edge exists between two different communities, it is easy to prove that the indicator vectors of the community are eigenvectors of the combinatorial graph Laplacian with eigenvalue 0. The indicator vector of a community is a binary vector with non-zero entries at the nodes in this community and zero entries everywhere else. Hence, we can form Uk with these indicator vectors after ℓ2-normalization. If the groups are now formed such that no group contains nodes belonging to two different communities, then the matrices N(ℓ)Uk contain $$ {\left | \mathscr{N}_{\ell } \right |}$$ identical rows. These matrices are therefore rank 1 and $$\| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{F}^{2} = \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{k} \|_{2}^{2}$$, which implies q*= p*. It is also easy to prove that νp*2 = k so that Theorem 2.2 indicates that sampling $$O(k\log (k))$$ groups is sufficient to satisfy the RIP. If one actually samples less than k groups then the RIP cannot hold as one community would necessarily not be sampled. Hence, one cannot hope to sample much less than $$O(k\log (k))$$ groups and reach the optimal sampling rate of $$O(\log (k))$$ groups. Note that, in the unstructured sampling setting, the results in [32] show that sampling $$O(k\log (k))$$nodes would have been sufficient to ensure that the RIP holds in this setting. Therefore, in the structured setting, we have to sample much more nodes than actually needed as each group contains several nodes. In a sense, optimizing the sampling distribution has a more limited impact in the structured sampling setting than in the unstructured sampling setting where this optimization allows one to reach optimal rate in terms of number of sampled nodes. Structured sampling imposes fundamental limits on the best sampling rate achievable. Even though we have no formal proof, we expect to observe a similar behaviour for more realistic community graphs (where edges between communities exist). Nevertheless, beyond this theoretical result, we recall that this structured sampling setting allows us to reduce drastically the cost of the labelling procedure for the user and the reconstruction time for interactive object segmentation (see Section 5). 2.6. Summary of the impact of structured sampling As in [32], the quantity driving the optimal sampling strategy is a measure of the energy concentration of k-bandlimited signals on the element to sample: nodes in the case of [32]; groups of nodes in this work. It is more important to sample nodes/groups where the energy of k-bandlimited signals can stay concentrated than nodes/groups where the energy always spreads to neighbouring nodes/groups. Whilst in [32] the sampling performance is characterized in terms of the number of nodes to sample, here, the sampling performance is characterized in terms of the number of groups to sample. In the degenerated case where each group contains one single node, we recover the same results. In [32], it is proved that sampling $$O( {k}\log ( {k}))$$nodes with an optimized sampling distribution is sufficient to ensure the RIP holds. In this work, we prove that sampling between $$O(\log ( {k}))$$ and $$O( {k}\log ( {k}))$$groups with an optimized distribution is sufficient. Note that optimizing the sampling distribution does not overcome completely a ‘poor choice’ of the groups as in certain situation we have to sample at least k groups to ensure that the RIP holds, as explained in subsection above. Reaching the optimal sampling rate $$O(\log ( {k}))$$ is not always feasible even when optimiing the sampling distribution. 3. Optimal sampling estimation In this section, we explain how to estimate the sampling distributions p* and q* without computing the truncated Fourier matrix Uk, as this computation is intractable for large graphs. The principle consists of transforming the problem into a series of filtering operations and using fast filtering techniques on graphs [21]. This principle is used, e.g., in [34,37,38] for fast computation of feature vectors used for clustering or in [32] for the estimation of the best sampling distributions. The method involves matrix-vector multiplications with the sparse Laplacian matrix L and are thus computationally tractable even for large n. 3.1. Estimation of p* The distribution p*, defined in (2.16), that minimizes the coherence νp is entirely defined by the values of ∥N(ℓ)Uk∥2 for ℓ = 1, …, N, which are thus the quantities we need to evaluate. We recall that, in graph signal processing, a filter in the spectral domain is represented by a function $$h: {\mathbb{R}} \rightarrow {\mathbb{R}}$$, and that the signal x filtered by h is   \begin{align} {{\boldsymbol{x}}}_{h} := {{\mathsf{U}}} \, {{\textrm{diag}}}(\hat{{\boldsymbol{h}}}) \, {{\mathsf{U}}}^{{{\intercal}}} {{\boldsymbol{x}}} \in{\mathbb{R}}^{{n}}, \end{align} (3.1) where $$\hat{ {\boldsymbol{h}}} = (h( {\lambda }_{1}), \ldots , h( {\lambda }_{ {n}}))^{ {{\intercal }}} \in {\mathbb{R}}^{ {n}}$$. To filter the signal x without actually computing the graph Fourier transform of x, we can approximate the function h by a polynomial   \begin{align} r(t) := \sum_{i=0}^{d} \alpha_{i} \, t^{i} \approx h(t) \end{align} (3.2) of degree d and compute xr instead of xh. The filtered signal xr is computed rapidly using the formula   \begin{align} {{\boldsymbol{x}}}_{r} = \sum_{i=0}^{d} \alpha_{i} \; {{\mathsf{U}}} \, {{\textrm{diag}}}\left({{\lambda}_{1}^{i}}, \ldots, {\lambda}_{{n}}^{i}\right) \, {{\mathsf{U}}}^{{{\intercal}}} {{\boldsymbol{x}}} = \sum_{i=0}^{d} \alpha_{i} \; {{\mathsf{L}}}^{i} {{\boldsymbol{x}}}, \end{align} (3.3) that involves only matrix-vector multiplications with the sparse Laplacian matrix L. We let the reader refer to [21] for more information on this fast-filtering technique. For any polynomial function r of the form above and any matrix $$ {\mathsf{A}} \in {\mathbb{R}}^{ {n} \times {n}}$$, we write   \begin{align} r({\mathsf{A}}) = \sum_{i=0}^{d} \alpha_{i} \; {\mathsf{A}}^{i} . \end{align} (3.4) Note that $$r( { {\mathsf{L}}}) = { {\mathsf{U}}} \, r( { {\mathsf{\Lambda }}}) \, { {\mathsf{U}}}^{ {{\intercal }}}$$. Let $$i_{ {\lambda }_{k}} : {\mathbb{R}} \rightarrow {\mathbb{R}}$$ be the ideal low-pass filter at cutoff frequency λk, i.e., the filter that satisfies   \begin{align} i_{{\lambda}_{k}}(t) =\begin{cases} 1 & \textrm{if}\ t {\leqslant} {\lambda}_{k},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (3.5) We have $$ { {\mathsf{U}}}_{k} { {\mathsf{U}}}_{k}^{ {{\intercal }}} = i_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$. Then, we notice that   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{2}^{2} = {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k} {{\mathsf{U}}}_{k}^{{{\intercal}}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2} = {\left\| {{\mathsf{N}}}^{(\ell)} \; i_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2}. \end{align} (3.6) We recall that N(ℓ) is the matrix that restricts the signal to the nodes belonging to $$\mathscr{N}_{\ell }$$. The matrix appearing on the right-hand side of the last equality corresponds to the linear operator that (1) extends a vector on the complete graph by inserting 0 in all groups $$\ell ^{\prime } \neq \ell $$, (2) low-pass filters the extended signal and (3) restricts the result to the group $$\mathscr{N}_{\ell }$$. This process can be approximated by replacing the ideal low-pass filter iλk with a polynomial approximation $$\tilde{i}_{ {\lambda }_{k}}$$ of iλk and   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)} \; i_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2} \approx{\left\| {{\mathsf{N}}}^{(\ell)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right\|}_{2}. \end{align} (3.7) To estimate p*, we estimate the spectral norm of the matrix appearing on the right-hand side for which matrix-vector multiplication is fast. The quality of the approximation depends obviously on the choice of the polynomial $$\tilde{i}_{ {\lambda }_{k}}$$. Estimating $$\left \| { {\mathsf{N}}}^{(\ell )} \; \tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right ) \;{ { {\mathsf{N}}}^{(\ell )}}^{ {{\intercal }}} \right \|_{2}$$ amounts to computing the largest eigenvalue of $$ { {\mathsf{N}}}^{(\ell )} \; \tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right ) \;{ { {\mathsf{N}}}^{(\ell )}}^{ {{\intercal }}}$$ which can be done, e.g., by using the power method. This method requires matrix-vector multiplication only with N(ℓ) and $$\tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$ and is thus fast. Finally, the approximation $$\bar{ {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$ of p* satisfies   \begin{align} \bar{{{\boldsymbol{p}}}}_{\ell} := \frac{\lambda_{\max }\left({{\mathsf{N}}}^{(\ell)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}\right)}{\sum_{\ell^{\prime}=1}^{{N}} \lambda_{\max }\left({{\mathsf{N}}}^{\left(\ell^{\prime}\right)} \; \tilde{i}_{{\lambda}_{k}}\left({{\mathsf{L}}}\right) \;{{{\mathsf{N}}}^{\left(\ell^{\prime}\right)}}^{{{\intercal}}}\right)}. \end{align} (3.8) Note that an estimation of λk is required beforehand to define the filter $$\tilde{i}_{ {\lambda }_{k}}$$. We estimate this value using the dichotomy method presented in [32]. Estimating each value of $$\bar{ { {\boldsymbol{p}}}}_{\ell }$$ requires several filtering with $$\tilde{i}_{ {\lambda }_{k}}\left ( { {\mathsf{L}}}\right )$$ for each iteration of the power method. These values are estimated in parallel. Even if the power method was converging in one iteration only, this would require at least N filtering. Depending on the number of groups that may already be too much computations. Hence the interest of having a sampling distribution easier to estimate, but which yields acceptable sampling results, namely q*. We explain next how to estimate q*. 3.2. Estimation of q* We start by noticing that we have   \begin{align} {\left\| {{\mathsf{N}}}^{(\ell)}{{\mathsf{U}}}_{k}\right\|}_{F}^{2} = \sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{2}^{2} \end{align} (3.9) for each group ℓ = 1, …, N. The vector $$ {\boldsymbol{\delta }}_{i} \in {\mathbb{R}}^{ {n}}$$ is the unit vector that is null on all nodes except at node i. Hence, we only need an estimation of $$\left \| { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\delta }}_{i}\right \|_{2}^{2}$$, i = 1, …, n, to estimate q*. An algorithm was already proposed in [32] to estimate these values. We let the reader refer to Algorithm 1 in [32] for the details of the method. We just recall that this estimation is obtained by filtering $$O(\log ( {n}))$$ random signals with a polynomial approximation of iλk. In real applications, it is likely that $$O(\log ( {n})) {\leqslant } {N}$$. It is hence faster to estimate q* than p*. Finally, our estimation $$\bar{ {\boldsymbol{q}}} \in {\mathbb{R}}^{ {N}}$$ of q* has entries   \begin{align} \bar{{\boldsymbol{q}}}_{\ell} := \frac{\sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{F}^{2}}{\sum_{\ell=1}^{{N}} \sum_{i \in \mathscr{N}_{\ell}} {\left\| {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\delta}}_{i}\right\|}_{F}^{2}}, \end{align} (3.10) where each $$\left \| { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\delta }}_{i}\right \|_{2}^{2}$$ is estimated by Algorithm 1 in [32]. 4. Fast reconstruction Once the signal has been sampled, we shall also be able to reconstruct it. In [32], the authors propose to estimate the original signal by solving   \begin{align} \min_{{\boldsymbol{z}} \in{\mathbb{R}}^{{n}}} {\left\| {\mathsf{P}}({{\mathsf{M}}} {\boldsymbol{z}} - {{\boldsymbol{y}}})\right\|}_{2}^{2} + {\gamma} \; {\boldsymbol{z}}^{{{\intercal}}} g({{\mathsf{L}}}) {\boldsymbol{z}}, \end{align} (4.1) where γ > 0 and g is a non-negative non-decreasing polynomial. We have   \begin{align} g({{\mathsf{L}}}) := \sum_{i=0}^{d} \alpha_{i} \, {{\mathsf{L}}}^{i} = \sum_{i=0}^{d} \alpha_{i} \, {{\mathsf{U}}} {{\mathsf{\Lambda}}}^{i} {{\mathsf{U}}}^{{{\intercal}}}, \end{align} (4.2) where $$\alpha _{0}, \ldots , \alpha _{d} \in {\mathbb{R}}$$ are the coefficients of the polynomial and $$d \in {\mathbb{N}}$$ its degree. These polynomial’s parameters can be tuned to improve the quality of the reconstruction. The role of g can be viewed as a filter on $$\mathscr{G}$$ and should ideally be a high-pass filter. Indeed, we seek to recover a signal whose energy is concentrated at low frequencies on $$\mathscr{G}$$. We should thus penalize signals with energy at high frequencies in (4.1), hence the use of a high-pass filter. The matrix P is introduced to account for the RIP. The advantage of this method is that it can be solved efficiently even for large graph, for example, by conjugate gradient or gradient descent method. Indeed, each step of this algorithm can be implemented by using only matrix-vector multiplications with PM and L, which are both sparse matrices. The matrix g(L) does not have to be computed explicitly. Note that the computational complexity of each iteration of such algorithms however increases with the polynomial order. For brevity, we do not recall the theorem proving that (4.1) provides accurate and stable recovery of all k-bandlimited signals. However, this theorem applies as soon as the restricted isometry property (2.14) holds, and thus applies here when (2.15) holds. The reconstruction quality is controlled by g(λk) and the ratio g(λk)/g(λk+1). One should seek to design a filter g such that these quantities are as close as possible to zero to improve the reconstruction quality. We propose now a method to obtain a faster estimation of the original signal when it is nearly piecewise constant. 4.1. Piecewise constant graph signals Before continuing, we want to stress that we consider non-overlapping groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ in this rest of Section 4. If a graph signal is nearly piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ then reconstructing the mean values of this signal for each group is enough to obtain a good approximation of the original signal. Instead of estimating n unknowns, we reduce the estimation to N unknowns. When N ≪ n, this is a large reduction of dimension yielding a significant speed up and memory reduction. The fact that a signal x is piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ is characterized as follows. We construct the averaging row-vectors $$ { {\boldsymbol{a}}}^{(\ell )} \in {\mathbb{R}}^{1 \times {n}}$$ that satisfy   \begin{align} {{\boldsymbol{a}}}^{(\ell)} := \frac{{\boldsymbol{1}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)}}{{\left| \mathscr{N}_{\ell} \right|}^{1/2}}, \end{align} (4.3) and the matrix   \begin{align} {{\mathsf{A}}} := \left( \begin{array}{c} {{\boldsymbol{a}}}^{(1)}\\ \vdots\\{{\boldsymbol{a}}}^{({N})} \end{array} \right) \in{\mathbb{R}}^{{N} \times{n}}. \end{align} (4.4) As the groups do not overlap, we have   \begin{align} {{\mathsf{A}}}{{\mathsf{A}}}^{{{\intercal}}} = {\mathsf{I}}, \end{align} (4.5) hence $$ {\left \| { {\mathsf{A}}}\right \|}_{2} = 1$$. Applying A to x provides N values, each one of them corresponding to the sum of the values of x within the group $$\mathscr{N}_{\ell }$$, scaled by $$ {\left | \mathscr{N}_{\ell } \right |}^{-1/2}$$. Then, applying $$ { {\mathsf{A}}}^{ {{\intercal }}}$$ to Ax gives an approximation of x where the values in the vector $$ { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}}$$ are constant within each group; this is a piecewise constant vector over the groups. The value of $$ { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}}$$ appearing within the group $$\mathscr{N}_{\ell }$$ is exactly the average of x within $$\mathscr{N}_{\ell }$$. Saying that x is nearly constant within each group corresponds to assume that   \begin{align} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \varepsilon{\left\| {{\boldsymbol{x}}}\right\|}_{2}, \end{align} (4.6) where $$\varepsilon {\geqslant } 0$$ is a small value. The signal model of interest in this section is thus   \begin{align} {{\mathscr{A}}_\varepsilon} := \left\{ {{\boldsymbol{x}}} \in{{\textrm{span}}}({{\mathsf{U}}}_{{k}}) \;\; \vert \;\; {\left\| ({{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} - {\mathsf{I}}) \; {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \varepsilon{\left\| {{\boldsymbol{x}}}\right\|}_{2} \right\}. \end{align} (4.7) Note that in the degenerated case of a community graph with k communities, where no edge exists between two different communities and where the groups are formed such that no group contains nodes belonging to two different communities, then the indicator vectors of each community are exactly piecewise constant (ε = 0). Hence, all k-bandlimited signals are also piecewise constant in this case. 4.2. Reducing the dimension To build a fast algorithm exploiting the above property, we use a reconstruction method similar to (4.1), but involving vectors of smaller dimension. We define the averaged vector $$\tilde{ { {\boldsymbol{x}}}} := { {\mathsf{A}}} { {\boldsymbol{x}}} \in {\mathbb{R}}^{ {N}}$$ of dimension N. As $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, estimating $$\tilde{ { {\boldsymbol{x}}}}$$ is enough to get a good approximation of x—we just need to multiply it with $$ { {\mathsf{A}}}^{ {{\intercal }}}$$. Furthermore, as x is nearly piecewise constant over the groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$, by construction of the matrix M, the measurement vector y = Mx is also almost piecewise constant over the sampled groups $$\mathscr{N}_{\omega _{1}}, \ldots , \mathscr{N}_{\omega _{ {s}}}$$. We thus average y over these groups by multiplying it with the matrix $$\widetilde{ { {\mathsf{A}}}} \in {\mathbb{R}}^{ {s} \times {m}}$$ that satisfies   \begin{align} \widetilde{{{\mathsf{A}}}}_{ji} := \begin{cases} {\left| \mathscr{N}_{\omega_{j}} \right|}^{-1/2} & \textrm{for}\ \sum_{j^{\prime}=1}^{j-1} {\left| \mathscr{N}_{\omega_{j^{\prime}}} \right|} {\leqslant} \; i \;{\leqslant} \sum_{j^{\prime}=1}^{\ell} {\left| \mathscr{N}_{\omega_{j^{\prime}}} \right|}, \\ 0 & \textrm{otherwise}. \end{cases} \end{align} (4.8) We obtain   \begin{align} \tilde{{{\boldsymbol{y}}}} := \widetilde{{{\mathsf{A}}}} {{\boldsymbol{y}}} = \widetilde{{{\mathsf{A}}}} {{\mathsf{M}}} {{\boldsymbol{x}}} \in{\mathbb{R}}^{{s}}. \end{align} (4.9) We now have to link $$\tilde{ { {\boldsymbol{y}}}}$$ to $$\tilde{ { {\boldsymbol{x}}}}$$. We create the matrix $$\widetilde{ { {\mathsf{M}}}} \in {\mathbb{R}}^{ {s} \times {N}}$$ that restricts the N mean value of $$\tilde{ { {\boldsymbol{x}}}}$$ to the s mean value of the selected groups, i.e.,   \begin{align} \widetilde{{{\mathsf{M}}}}_{j i} := \begin{cases} 1 & \textrm{if}\ i = \omega_{j},\\ 0 & \textrm{otherwise}. \end{cases} \end{align} (4.10) We have therefore   \begin{align} \tilde{{{\boldsymbol{y}}}} = \widetilde{{{\mathsf{M}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{align} (4.11) The goal is now to estimate $$\tilde{ { {\boldsymbol{x}}}}$$ from $$\tilde{ { {\boldsymbol{y}}}}$$. To ensure that the reconstruction method is stable to measurement noise, we do not consider the perfect scenario above, but instead the scenario where   \begin{align} \tilde{{{\boldsymbol{y}}}} = \widetilde{{{\mathsf{M}}}}\tilde{{{\boldsymbol{x}}}} + \tilde{{{\boldsymbol{n}}}}, \end{align} (4.12) and $$\tilde{ { {\boldsymbol{n}}}} \in {\mathbb{R}}^{ {s}}$$ models noise. We now need a regularization term to estimate $$\tilde{ { {\boldsymbol{x}}}}$$. We obtain this term by reducing the dimension of the regularizer involving the Laplacian L in (4.1). We compute   \begin{align} \widetilde{{{\mathsf{L}}}} := {{\mathsf{A}}} \, g({{\mathsf{L}}}) \, {{\mathsf{A}}}^{{{\intercal}}} = ({{\mathsf{A}}} {{\mathsf{U}}}) \, g({{\mathsf{\Lambda}}}) \, ({{\mathsf{A}}} {{\mathsf{U}}})^{{{\intercal}}} \in{\mathbb{R}}^{{N} \times{N}}. \end{align} (4.13) Note that $$\widetilde{ { {\mathsf{L}}}}$$ is a symmetric positive definite matrix. Like g(L), it can thus be used as a regularization. We thus propose to estimate $$\tilde{ { {\boldsymbol{x}}}}$$ by solving   \begin{align} \min_{\tilde{{\boldsymbol{z}}} \in{\mathbb{R}}^{{N}}} {\left\| \widetilde{{\mathsf{P}}} (\widetilde{{{\mathsf{M}}}} \tilde{{\boldsymbol{z}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{\boldsymbol{z}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{\boldsymbol{z}}}, \end{align} (4.14) where γ > 0 and $$\widetilde{ {\mathsf{P}}} \in {\mathbb{R}}^{ {s} \times {s}}$$ is the diagonal matrix with entries satisfying   \begin{align} \widetilde{{\mathsf{P}}}_{jj} := {{\boldsymbol{p}}}_{\omega_{j}}^{-1/2}. \end{align} (4.15) Let $$\tilde{ { {\boldsymbol{x}}}}^{*} \in {\mathbb{R}}^{ {N}}$$ be a solution of (4.14). We finally obtain an estimation of x by computing $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$. In the particular case where g(⋅) is the identity, one can notice that   \begin{align} \widetilde{{{\mathsf{L}}}}_{\ell\ell^{\prime}} = \sum_{(i,j) \in{{\mathscr{N}}}_{\ell} \times \mathscr{N}_{\ell^{\prime}}} \frac{{{\mathsf{L}}}_{ij}}{{\left| \mathscr{N}_{\ell} \right|}^{1/2} {\left| \mathscr{N}_{\ell^{\prime}} \right|}^{1/2}} \end{align} (4.16) is non-zero only if there is at least one edge in $$ {\mathscr{E}}$$ joining the groups $$\mathscr{N}_{\ell }$$ and $$\mathscr{N}_{\ell ^{\prime }}$$. The Laplacian $$\widetilde{ { {\mathsf{L}}}}_{\ell \ell ^{\prime }}$$ preserves the connections present in the original graph represented by L. The dimension of the unknown vector in (4.14) is N, which can be much smaller than n. This leads to a large gain in memory and computation time when either $$\widetilde{ { {\mathsf{L}}}}$$ or matrix--vector multiplications with $$\widetilde{ { {\mathsf{L}}}}$$ can be computed rapidly. If g has a small degree, then $$\widetilde{ { {\mathsf{L}}}}$$ can be computed explicitly in a short amount of time. In such a case, we can solve (4.14) faster than (4.1) as it involves matrices of (much) smaller dimensions. If L encodes a regular graph such as the ring graph, and the groups are formed with neighbouring nodes of $$\mathscr{G}$$, then $$\widetilde{ { {\mathsf{L}}}}$$ is also regular and fast numerical techniques using the fast Fourier transform can be used for fast matrix--vector multiplication with $$\widetilde{ { {\mathsf{L}}}}$$. In general, it is however not always straightforward to find an efficient implementation for matrix--vector multiplications with $$\widetilde{ { {\mathsf{L}}}}$$ without temporarily going back to the signal domain of dimension n, i.e., multiplying the vector $$\tilde{ {\boldsymbol{z}}}$$ with $$ { {\mathsf{A}}}^{ {{\intercal }}}$$, filtering the high-dimensional signal $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}$$ and downsampling the result. Even though solving (4.14) might still be faster than solving (4.1) in this situation, we lose part of the efficiency by working temporarily in the high-dimensional domain. We thus have less flexibility in the choice of g with this reconstruction technique. Let us also mention that (4.14) can be used to initialize the algorithm used to solve (4.1) with a good approximate solution. As in multigrid approaches to solve linear systems of equations, see, e.g., [8, 19]. The following theorem bounds the error between the signal recovered by (4.14) and the original vector. Theorem 4.1 Let Ω = {ω1, …, ωs} be a set of s indices selected independently from {1, …, N} using a sampling distribution $$ { {\boldsymbol{p}}} \in {\mathbb{R}}^{ {N}}$$, $$ { {\mathsf{M}}}, {\mathsf{P}}, \widetilde{ { {\mathsf{M}}}}, \widetilde{ {\mathsf{P}}}$$ be the associated matrices constructed respectively in (2.7), (2.10), (4.10) and (4.15), and $$M_{\max }>0$$ be a constant such that $$ {\left \| {\mathsf{P}} { {\mathsf{M}}}\right \|}_{2} {\leqslant } M_{\max }$$. Let ξ, δ ∈ (0, 1) and suppose that s satisfies (2.15). With probability at least 1 − ξ, the following holds for all $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, all $$\tilde{ { {\boldsymbol{n}}}} \in {\mathbb{R}}^{ {s}}$$, all γ > 0 and all non-negative non-decreasing polynomial functions g such that g(λk+1) > 0. Let $$\tilde{ { {\boldsymbol{x}}}}^{*}$$ be the solution of (4.14) with $$\tilde{ { {\boldsymbol{y}}}} = \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} { {\boldsymbol{x}}} + \tilde{ { {\boldsymbol{n}}}}$$. Define $$ {\boldsymbol{\alpha }}^{*} := { {\mathsf{U}}}_{ {k}} { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} \,{ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ and $$ {\boldsymbol{\beta }}^{*} := ( {\mathsf{I}} - { {\mathsf{U}}}_{ {k}} { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}}) \,{ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$. Then,   \begin{align} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} \; {\leqslant} \; {\frac{1}{\sqrt{{s} (1-\delta)}}} \; \cdot &\left[ \left( 2 + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}}\right){\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}}\right\|}_{2} + \left( M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{k}})}\right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}\right. \nonumber \\ &\quad \left. + \; \varepsilon \left( 2 M_{\max } + M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{n}})} \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2} \right], \end{align} (4.17) and   \begin{align} {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} {\leqslant} \; \frac{1}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}} {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2} \; + \; \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} \; + \; \varepsilon \; \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (4.18) Proof. See Appendix B. The vector $$\boldsymbol{\alpha}^{*}$$ is the orthogonal projection of $${ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ onto span(Uk). The vector $$\boldsymbol{\beta}^{*}$$ is the projection of $${ { {\mathsf{A}}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*}$$ onto the orthogonal complement of span(Uk). There are several remarks to make about the above theorem: Theorem 4.1 shows that the result obtained via (4.14) is similar to the one we would have obtained by solving (4.1)—see [32] for the error bounds—with additional errors controlled by ε. We recall that ε characterizes how far x is from a piecewise constant signal. As expected, the smaller ε, the better the reconstruction. The reconstruction quality improves when g(λk) and the ratio g(λk)/g(λk+1) go to 0 and when g(λn)/g(λk+1) tends to 1. We recall that we have $$g( {\lambda }_{ {n}}) {\geqslant } g( {\lambda }_{ {k}+1})> g( {\lambda }_{ {k}})$$ by assumption. The effect of the noise $$\tilde{ { {\boldsymbol{n}}}}$$ decreases when g(λk+1) increases, and, obviously, γ should be adapted to the signal-to-noise ratio (SNR). Let us mention that the idea of ‘coarsening’ a graph and the signals that live on it using a partition into different groups of nodes can also be found in [36], where a multiresolution analysis method of graph signals is proposed. The coarsening method is however different than the one used here. 5. Experiments In this last section, we first test our sampling strategies on two different graphs to illustrate the effect of the different sampling distributions on the minimum number of samples required to ensure that the RIP holds.2 Then, we apply our sampling strategy for user-guided object segmentation. In this application, we also test the different recovery techniques proposed in Section 4. 5.1. Sampling distributions We perform experiments on two different graphs: the Minnesota graph of size n = 2642 and the bunny graph of size n = 2503. Both graphs are presented in Fig. 3 and are available in the Graph signal processing (GSP) toolbox [29]. For each graph, we group the nodes using the spatial coordinates associated to each node. For the Minnesota graph, we divide the space into 100 cells and group the nodes that fall in the same cell. After removing empty cells, we obtain the N = 73 groups represented in Fig. 3. For the bunny graph, we obtain N = 213 groups with a similar procedure (see Fig. 3). Fig. 3. View largeDownload slide [This figure is available in color online.] First column from the left: Minnesota graph (top); bunny graph (bottom). The groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ are indicated by different gray level (colors). Other columns: probability that $$\underline{\delta }_{k}$$ is less than 0.995 as a function of s. The dashed (black), continuous (red), dash-dotted (blue) and dotted (green) curves are obtained using the sampling distributions u, p*, $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, respectively. The top row shows the result for the Minnesota graph. The bottom row shows the result for the bunny graph. The bandlimit k is indicated on top of each curve. Fig. 3. View largeDownload slide [This figure is available in color online.] First column from the left: Minnesota graph (top); bunny graph (bottom). The groups $$\mathscr{N}_{1}, \ldots , \mathscr{N}_{ {N}}$$ are indicated by different gray level (colors). Other columns: probability that $$\underline{\delta }_{k}$$ is less than 0.995 as a function of s. The dashed (black), continuous (red), dash-dotted (blue) and dotted (green) curves are obtained using the sampling distributions u, p*, $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, respectively. The top row shows the result for the Minnesota graph. The bottom row shows the result for the bunny graph. The bandlimit k is indicated on top of each curve. For each graph, we compute the combinatorial Laplacian and Uk for different values of k. Then, we compute the lower RIP constant, i.e., the constant $$\underline{\delta }_{k}>0$$ that satisfies   \begin{align*} \underline{\delta}_{k} = 1 - {\frac{1}{{s}}} \;\; \inf_{\substack{{{\boldsymbol{x}}} \in{{\textrm{span}}}({{\mathsf{U}}}_{{k}}) \\{\left\| x\right\|}_{2} = 1}} \;{\left\| {\mathsf{P}} {{\mathsf{M}}} \; {{\boldsymbol{x}}}\right\|}_{2}^{2}. \end{align*} This constant is the smallest value that δ can take such that the left-hand side of the RIP (2.14) holds. Remark that   \begin{align} \underline{\delta}_{k} = 1 - {\frac{1}{{s}}} \; \lambda_{\min } \left( {{\mathsf{U}}}_{k}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}}^{2} {{\mathsf{M}}} {{\mathsf{U}}}_{k}\right). \end{align} (5.1) We estimate $$\underline{\delta }_{k}$$ for 500 independent draws of the set Ω, which defines the matrices PM, and different numbers of selected groups s. All samplings are done in the conditions of Theorem 2.2 using the sampling distributions $$ {\boldsymbol{u}}, {\boldsymbol{p}}^{*}, \bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$. The vector u denotes the uniform distribution over {1, …, N}. When conducting this experiment with the estimated distributions $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$, we re-estimate these distributions at each of the 500 trials. These distributions are estimated using Jackson–Chebychev polynomials of order 50. We choose Chebychev polynomials as they are well suited for the approximation of spectral norm of matrices [26]. Jackson–Chebychev polynomials further minimize the Gibbs phenomenon that appears when one approximates low-pass filters with polynomials [28]. For the Minnesota graph, we consider the bandlimits k = 5, 10, 20. For the bunny graph, we consider the bandlimits k = 10, 25, 50. We present the probability that $$\underline{\delta }_{k}$$ is less than 0.995, estimated over the 500 draws of Ω, as a function of s in Fig. 3. For the Minnesota graph, the performance is better when using the optimal distribution p* than when using the uniform distribution u for all k, which is in line with the theory. The estimated $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$ yield performance equivalent to p*. This confirms that we can achieve similar sampling performance without having to compute the Fourier matrix Uk, which, we recall, is intractable for large graphs. This also shows that $$\bar{ {\boldsymbol{q}}}$$ can lead to nearly optimal results. For the bunny graph, all sampling distributions yield essentially the same results at all bandlimits. We notice a slight improvement at k = 50 when using $$\bar{ { {\boldsymbol{p}}}}$$, $$\bar{ {\boldsymbol{q}}}$$ or p* instead of u. For illustration, we present in Fig. 4 examples of computed sampling distributions p*, $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$. All sampling distributions exhibit similar structures, which explains why they all yield about the same performance in our experiments. Fig. 4. View largeDownload slide Example of sampling distributions. Top panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the Minnesota graph at k = 10. Bottom panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the bunny graph at k = 25. Fig. 4. View largeDownload slide Example of sampling distributions. Top panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the Minnesota graph at k = 10. Bottom panels: p* (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle) and $$\bar{ {\boldsymbol{q}}}$$ (right) for the bunny graph at k = 25. 5.2. Object segmentation 5.2.1. Protocol We now test our method for interactive object segmentation. We consider the image of size n = 321 × 481 presented in Fig. 1 for which our goal is to segment the tiger. The ground truth segmentation map x ∈ {0, 1}n is presented in Fig. 5. The value 1 (white) indicates the presence of the tiger and the value 0 (black) stands for the foreground. The original image and the ground truth image are part of the dataset available3 in [25]. Our objective is to recover the original map x from few user-inputs. We experiment two sorts of labelling procedure. For the first one, we choose pixels at random and ask the user to label these pixels: 1 if the superpixel belongs to the tiger; 0 otherwise. For the second one, we divide the original image into the N = 600 superpixels showed in Fig. 1 and computed with SLIC [1], choose a small number of superpixels at random and ask the user to label these superpixels. Note that, unlike in the previous subsection, the pixels are not solely grouped with respect to their spatial coordinates, but the local gradients also influence the construction of the superpixels. Fig. 5. View largeDownload slide Top row: ground truth segmentation map (left); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$ evaluated with the spectral norm (middle); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ evaluated with the Frobenius norm (right). Middle row: segmentation map estimated from s = 150 sampled superpixels obtained by solving (4.14) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Bottom row: segmentation map estimated from s = 50 sampled superpixels obtained by solving (4.1) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Fig. 5. View largeDownload slide Top row: ground truth segmentation map (left); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$ evaluated with the spectral norm (middle); local group coherence map $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ evaluated with the Frobenius norm (right). Middle row: segmentation map estimated from s = 150 sampled superpixels obtained by solving (4.14) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). Bottom row: segmentation map estimated from s = 50 sampled superpixels obtained by solving (4.1) and using a uniform sampling (left), $$\bar{ { {\boldsymbol{p}}}}$$ (middle), $$\bar{ {\boldsymbol{q}}}$$ (right). The graph $$\mathscr{G}$$ used to propagate the user-labels to the complete image is constructed as follows. We build a feature vector for each pixel by extracting a color RGB patch of size 3 × 3 around the pixel, transform this patch in vector form and augment this vector with the absolute two-dimensional coordinates of the pixels in this extracted patch. This yields n feature vectors $$ {\boldsymbol{g}}_{i} \in {\mathbb{R}}^{45}$$, i = 1, …, n. We then connect each feature vector to its nine nearest neighbours (in the Euclidean sense), which gives a set of 9n edges $$ {\mathscr{E}}$$. The adjacency matrix $$ {\mathsf{W}} \in {\mathbb{R}}^{ {n} \times {n}}$$ satisfies $$ {\mathsf{W}}_{ij} := \exp \left [-{\| {\boldsymbol{g}}_{i} - {\boldsymbol{g}}_{j}\|_{2}^{2}}/{\sigma ^{2}}\right ], $$ where σ > 0 is the 25th percentile of the set $$\{\| {\boldsymbol{g}}_{i} - {\boldsymbol{g}}_{j}\|_{2} : (i, j) \in {\mathscr{E}} \}$$. We finally symmetrize the matrix W and compute the combinatorial Laplacian $$ { {\mathsf{L}}} \in {\mathbb{R}}^{ {n} \times {n}}$$. We study three strategies to choose the superpixels to label. The first strategy consists of choosing the superpixels uniformly at random, i.e., using the sampling distribution u. The second and third strategies consist of choosing the superpixels with respectively the optimized distributions $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ { {\boldsymbol{p}}}}$$, which we evaluate at k0 = 50 using Jackson–Chebychev polynomials of order 150 [28]. For illustration, we present in Fig. 5 the estimated values $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{F}^{2}$$ and $$\left \| { {\mathsf{N}}}^{(\ell )} { {\mathsf{U}}}_{ {k}_{0}}\right \|_{2}^{2}$$, which define the optimized distributions $$\bar{ {\boldsymbol{q}}}$$ and $$\bar{ {\boldsymbol{p}}}$$, respectively. Both distributions indicate that one should label more superpixels around the tiger. The distribution $$\bar{ {\boldsymbol{q}}}$$ ‘focuses’ however more on specific regions, like the head of tiger. The distribution $$\bar{ { {\boldsymbol{p}}}}$$ spreads the measurements over the entire tiger more uniformly. We study two strategies to choose the pixels to label. The first strategy consists of choosing the pixels uniformly at random, i.e., using the sampling distribution u. The second strategy consists of choosing the pixels with the optimal sampling distribution proposed in [32], which we evaluate at k1 = 50 × 257 using Jackson–Chebychev polynomials of order 150 [28]. For illustration, we present this distribution in Fig. 1. The distribution indicates that the most important pixels to sample are near the head and tail of the tiger and, more generally, near edges in the image. We chose k1 > k0 as we obtained better results for larger values of k in the pixels labelling scenario. On the contrary, choosing larger values of k in the superpixels labelling scenario yields slightly worse performance, especially with $$\bar{ { {\boldsymbol{p}}}}$$. Note that we have k0/N ≃ k1/n ≃ 8.3%, i.e., the ratio between k and the number of superpixels or pixels is the same. We emulate user-interactions as follows. When pixels need to be labelled, we simply sampled the ground truth segmentation map x at the sampled pixels. This noise-free strategy gives us access to a measurement vector $$ { {\boldsymbol{y}}} \in {\mathbb{R}}^{ {m}}$$. When superpixels need to be labelled, we compute the mean of the ground truth map x within each superpixel. If the mean value is larger than 0.5, we label the superpixel as part of the tiger. Otherwise, we label the superpixel as part of the background. This strategy obviously introduces noise if some superpixels cover part of the background and of the tiger. Once the labelling is done, we have access to the measurement vector $$\tilde{ { {\boldsymbol{y}}}} \in {\mathbb{R}}^{ {s}}$$ from which we want to reconstruct x. For the experiments with superpixels to label, we repeat this procedure for s ∈ {50, 70, …, 250}. For each s, we also repeat the experiments 50 times with independent draws of the superpixels. For the experiments with pixels to label, we repeat this procedure for m ∈ {50 × 257, 70 × 257, …, 250 × 257}. As each superpixel contains 257 pixels on average, we compare results for about the same number of labelled pixels. Note however that the labelling procedure with the superpixels is much less costly for the user: labelling one superpixel is at least as easy as labelling a single pixel—arguably even easier in term of decision—whilst it amounts to automatically label hundreds of pixels at once. For each m, we also repeat the experiments 50 times with independent draws of the pixels. Note that we draw the pixels and the superpixels with replacements. To reconstruct the original map x, we solve (4.1) directly, starting from the null vector as initialization, in the pixel labelling scenarios. In the superpixels labelling scenarios, we first use the fast reconstruction method (4.14) and then refine the solution at the pixel level with (4.1), using the solution of the first minimization problem as initialization to solve the second minimization problem. We choose g(L) = L and solve all problems in the limit where $$ {\gamma } \rightarrow 0$$. In this limit, the problems (4.14) and (4.1) become   \begin{align} \min_{\tilde{{\boldsymbol{z}}} \in{\mathbb{R}}^{{N}}} \; \tilde{{\boldsymbol{z}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{\boldsymbol{z}}} \quad \textrm{subject to} \quad \widetilde{{{\mathsf{M}}}} \tilde{{\boldsymbol{z}}} = \tilde{{{\boldsymbol{y}}}} \end{align} (5.2) and   \begin{align} \min_{{\boldsymbol{z}} \in{\mathbb{R}}^{{n}}} \; {\boldsymbol{z}}^{{{\intercal}}} g({{\mathsf{L}}}) {\boldsymbol{z}} \quad \textrm{subject to} \quad{{\mathsf{M}}} {\boldsymbol{z}} = {{\boldsymbol{y}}}, \end{align} (5.3) respectively. Both problems are solved using Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [6]. The same stopping criteria are used for all experiments. 5.2.2. Results We present in Fig. 6 the reconstruction signal-to-noise ratio obtained with the different methods. The reconstruction SNR is defined as $$-20\log _{10}(- {\left \| { {\boldsymbol{x}}} - { {\boldsymbol{x}}}^{*}\right \|}_{2}/ {\left \| { {\boldsymbol{x}}}\right \|}_{2})$$, where x* is the reconstructed signal. In the superpixels labelling scenarios, the SNR attained with the fast decoder (4.14) is very similar to the SNR attained with (4.1). We also remark that the optimized distributions $$\bar{ { {\boldsymbol{p}}}}$$ and $$\bar{ {\boldsymbol{q}}}$$ yield both better reconstructions than the uniform distribution u and that reconstruction SNRs with $$\bar{ { {\boldsymbol{p}}}}$$ are better than with $$\bar{ {\boldsymbol{q}}}$$. Note however that the latter can depend on the choice of k0 used to evaluate the sampling distributions. In the pixels labelling scenarios, the optimized distribution yields much better reconstructions than the uniform distribution. Finally, note that 50 × 257 labelled pixels are necessary to reach a mean SNR of 9.5 dB whilst only 250 labelled superpixels are necessary to reach a mean SNR of 9.8 dB. Fig. 6. View largeDownload slide [This figure is available in color online] Top: the curves represent the mean reconstruction SNR as a function of the number of sampled superpixels s (left) and pixels m (right). Bottom: mean computation time in seconds as a function of the number of sampled superpixels s (left) and pixels m (right). Left: superpixels labelling scenario. The curves marked with crosses (black) are obtained with the uniform sampling distribution u. The curves marked with circles (blue) are obtained with $$\bar{ {\boldsymbol{q}}}$$. The curves marked with a triangle (red) are obtained with $$\bar{ { {\boldsymbol{p}}}}$$. Right: pixels labelling scenario. The curves marked with crosses (green) are obtained with the uniform sampling distribution. The curves marked with diamonds (magenta) are obtained with the optimal sampling distribution of [32]. In all graphs, the dash-dotted curves are obtained by solving (4.1) and the solid curves are obtained by solving (4.14). Note that the reconstruction time obtained with (4.14) appears almost constant for all s and all sampling methods on this graph. The mean reconstruction times for this method are all in the range [0.30, 0.38] second. Fig. 6. View largeDownload slide [This figure is available in color online] Top: the curves represent the mean reconstruction SNR as a function of the number of sampled superpixels s (left) and pixels m (right). Bottom: mean computation time in seconds as a function of the number of sampled superpixels s (left) and pixels m (right). Left: superpixels labelling scenario. The curves marked with crosses (black) are obtained with the uniform sampling distribution u. The curves marked with circles (blue) are obtained with $$\bar{ {\boldsymbol{q}}}$$. The curves marked with a triangle (red) are obtained with $$\bar{ { {\boldsymbol{p}}}}$$. Right: pixels labelling scenario. The curves marked with crosses (green) are obtained with the uniform sampling distribution. The curves marked with diamonds (magenta) are obtained with the optimal sampling distribution of [32]. In all graphs, the dash-dotted curves are obtained by solving (4.1) and the solid curves are obtained by solving (4.14). Note that the reconstruction time obtained with (4.14) appears almost constant for all s and all sampling methods on this graph. The mean reconstruction times for this method are all in the range [0.30, 0.38] second. We also present the computation time of each method in Fig. 6. In the superpixels labelling scenarios, we notice that solving (4.14) is much faster than solving (4.1), whilst they yield almost the same quality. This highlights the interest of the fast reconstruction technique. It is also faster to solve (4.1) when the measurements are drawn with $$\bar{ { {\boldsymbol{p}}}}$$ or with $$\bar{ {\boldsymbol{q}}}$$ than with u. The reason is probably a better initialization of (4.1). The reconstruction time in the pixels labelling scenario is much longer than in the superpixels labelling scenario. This is another advantage of the method proposed in this paper. Finally, we present in Fig. 5 some examples of reconstructions from s = 150 sampled superpixels for each method. We notice that the optimized sampling distributions improve the reconstruction of x around the head and tail of the tiger, i.e., where the optimized distributions have higher values. With a uniform distribution, the structure of the graph makes it difficult to reconstruct x around the head and tail from the values of other superpixels. The optimized sampling distribution compensates this issue by favouring this area when selecting the measurements. Note that, if the segmentation map to reconstruct was exactly k-bandlimited, the results presented in [32] suggest that $$O(k\log (k))$$-labelled pixels would have been enough to ensure the reconstruction. Similarly, the results presented in this work show that $$O(k\log (k))$$-labelled superpixels would be enough. However, the experiments presented here show that one needs to sample much more pixels than superpixels to reach the same SNR. These numerical results seem to contradict the theory. However, one should not forget that these theoretical results are derived with the sole assumption that the signals are k-bandlimited. The segmentation map here is smooth on $$\mathscr{G}$$ but, also, nearly piecewise-constant on the superpixels. This last property is not exploited to derive the theoretical sampling rates in this work nor in [32]. We used this property only in the reconstruction algorithm. Exploiting the fact that a signal is k-bandlimited and piecewise-constant would probably change the sampling rate at which the RIP is satisfied. Indeed, if one knows that all labels in a superpixels are identical, it is unnecessary to sample twice a pixel in the same superpixels. Therefore, the optimal sampling distribution of [32] for k-bandlimited signals is probably not optimal any more for k-bandlimited and piecewise-constant signals. 6. Discussion and conclusion We presented a structured sampling strategy for k-bandlimited signals where the nodes are selected by groups. We proved that the local group graph cumulative coherence quantifies the importance of sampling each group to ensure a stable embedding of all k-bandlimited signals. Finally, we presented a fast reconstruction technique for k-bandlimited signals which are also nearly piecewise-constant over pre-defined groups of nodes. Among the possible applications of these methods, we believe that they can also be useful to accelerate the compressive spectral clustering method proposed in [38]. After having computed some features vectors for each nodes, this compressive method works by downsampling the set of features of vectors, performing k-means on this reduced set to find k clusters, and interpolating the clustering results on all nodes by solving (4.1). To accelerate the method, one could (1) pre-group similar nodes to form N groups such that $$ {k} {\leqslant } {N} \ll {n}$$, e.g., by running few iterations of the k-means algorithm; (2) subsample this set of N groups; (3) cluster this subset to find k clusters; and (4) solve (4.14) to cluster all nodes. If the overhead of computing the N groups is small, this method has the potential to be faster than the original compressive spectral clustering method. Finally, we would like to discuss two limitations in the proposed methods. First, the optimal sampling distribution depends on the parameter k. In some applications, the final result may change a lot depending on the value of k which was chosen to compute this distribution. Finding a range of values of k which give acceptable and stable results is thus an important step in every application. Secondly, the estimation of the optimal sampling distribution depends on the quality of the polynomial approximation of the ideal low-pass filter λk. It is sometimes necessary to use a polynomial of large degree to get a correct estimation, which limits the computational efficiency of the proposed methods. In such cases, it would be especially useful to find more efficient alternatives to estimate the distributions p* and q*. Footnotes 1  This property is a consequence of (A.7), proved in Appendix A. 2  More information to reproduce the experiments available at https://sites.google.com/site/puygilles/softwares 3  http://www.ntu.edu.sg/home/asjfcai/Benchmark_Website/benchmark_index.html References 1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. & Süsstrunk, S. ( 2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. , 34, 2274-- 2282. Google Scholar CrossRef Search ADS PubMed  2. Adcock, B., Hansen, A. C., Poon, C. & Roman, B. ( 2013) Breaking the coherence barrier: a new theory for compressed sensing. arXiv:1302.0561. 3. Anis, A., Gadde, A. & Ortega, A. ( 2014) Towards a sampling theorem for signals on arbitrary graphs. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 3864-- 3868. 4. Anis, A., Gadde, A. & Ortega, A. ( 2016) Efficient sampling set selection for bandlimited graph signals using graph spectral proxies., IEEE Trans. Signal Process.  PP, 1-- 1. 5. Anis, A., Gamal, A. E., Avestimehr, S. & Ortega, A. ( 2015) Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 5461-- 5465. 6. Beck, A. & Teboulle, M. ( 2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. , 2, 183-- 202. Google Scholar CrossRef Search ADS   7. Bigot, J., Boyer, C. & Weiss, P. ( 2016) An analysis of block sampling strategies in compressed sensing. IEEE Trans. Inf. Theory , 62, 2125-- 2139. Google Scholar CrossRef Search ADS   8. Borzi, A. & Schulz, V. ( 2009) Multigrid methods for PDE optimization. SIAM Rev ., 51, 361-- 395. Google Scholar CrossRef Search ADS   9. Boyer, C., Bigot, J. & Weiss, P. ( 2015) Compressed sensing with structured sparsity and structured acquisition. arXiv:1505.01619. 10. Boykov, Y. & Jolly, M. ( 2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. Proc. Int. Conf. Comput. Vis. , 1, 105-- 112. 11. Chauffert, N., Ciuciu, P. & Weiss, P. ( 2013) Variable density compressed sensing in MRI. Theoretical VS heuristic sampling strategies. Proc. IEEE Int. Symp. on Biomedical Imaging , pp. 298-- 301. 12. Chen, S., Sandryhaila, A. & Kovacevic, J. ( 2015) Sampling theory for graph signals. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 3392-- 3396. 13. Chen, S., Varma, R., Sandryhaila, A. & Kovacevic, J. ( 2015) Discrete signal processing on graphs: sampling theory. Signal Processing, IEEE Transactions on , PP(99), 1–1. 14. Chen, S., Varma, R., Singh, A. & Kovacević, J. ( 2015) Signal recovery on graphs: random versus experimentally designed sampling. Sampling Theory and Applications (SampTA), 2015 International Conference on , pp. 337-- 341. 15. Chen, X., Zou, D., Zhao, Q. & Tan, P. ( 2012) Manifold preserving edit propagation. ACM Transactions on Graphics , 31, 132-- 139. 16. Chen, X., Zou, D., Zhao, Q. & Tan, P. ( 2012) Manifold preserving edit propagation. ACM SIGGRAPH Asia , vol. 31. 17. Chen, X., Zou, D., Zhou, S. Z., Zhao, Q. & Tan, P. ( 2013) Image matting with local and nonlocal smooth priors. Proc. Conf. Comput. Vis. Patt. Recog , pp. 1902-- 1907. 18. Chung, F. ( 1997) Spectral Graph Theory , no. 92. Amer Math. Soc. , Providence, RI USA. 19. Dreyer, T., Maar, B. & Schulz, V. ( 2000) Multigrid optimization in applications. J. Comput. Appl. Math. , 128, 67-- 84. Google Scholar CrossRef Search ADS   20. Grady, L. ( 2006) Random walks for image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. , 28, 1768-- 1783. Google Scholar CrossRef Search ADS   21. Hammond, D. K., Vandergheynst, P. & Gribonval, R. ( 2011) Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal. , 30, 129-- 150. Google Scholar CrossRef Search ADS   22. Korinke, C., Stratmann, T. C., Laue, T. & Boll, S. ( 2016) SuperSelect: an interactive superpixel-based segmentation method for touch displays. Proc. ACM Int. Conf. Multimedia , pp. 717-- 719. 23. Krahmer, F. & Ward, R. ( 2013) Stable and robust sampling strategies for compressive imaging. IEEE Trans. Image Process. , 23, 612-- 622. Google Scholar CrossRef Search ADS PubMed  24. Levin, A., Lischinski, D. & Weiss, Y. ( 2004) Colorization using optimization. ACM Trans. Graph , 23, 689-- 694. Google Scholar CrossRef Search ADS   25. Li, H., Cai, J., Nguyen, T. N. A. & Zheng, J. ( 2013) A benchmark for semantic image segmentation. IEEE Int. Conf. on Multimedia and Expo , pp. 1-- 6. 26. Liesen, L. & Tichy, T. ( 2009) On best approximations of polynomials in matrices in the matrix 2-norm. SIAM J. Matrix Anal. Appl. , 31, 853-- 863. Google Scholar CrossRef Search ADS   27. Lischinski, D., Farbman, Z., Uyttendaele, M. & Szeliski, R. ( 2006) Interactive local adjustment of tonal values. ACM Trans. Graph. , 25, 646-- 653. Google Scholar CrossRef Search ADS   28. Napoli, E. D., Polizzo, E. & Saad, Y. ( 2016) Efficient estimation of eigenvalue counts in an interval. Numer. Linear Alg. Appl. , 23, 674-- 692. Google Scholar CrossRef Search ADS   29. Perraudin, N., Paratte, J., Shuman, D., Kalofolias, V., Vandergheynst, P. & Hammond, D. K. ( 2014) GSPBOX: a toolbox for signal processing on graphs. arXiv:1408.5781. 30. Pesenson, I. ( 2008) Sampling in Paley-Wiener spaces on combinatorial graphs. Trans. Am. Math. Soc. , 360, 5603-- 5627. Google Scholar CrossRef Search ADS   31. Pesenson, I. Z. & Pesenson, M. Z. ( 2010) Sampling, filtering and sparse approximations on combinatorial graphs. J. Fourier Anal. Appl. , 16, 921-- 942. Google Scholar CrossRef Search ADS   32. Puy, G., Tremblay, N., Gribonval, R. & Vandergheynst, P. ( 2016) Random sampling of bandlimited signals on graphs. Appl. Comput. Harmon. Anal.  (in press). 33. Puy, G., Vandergheynst, P. & Wiaux, Y. ( 2011) On variable density compressive sampling. IEEE Signal Process. Lett. , 18, 595-- 598. Google Scholar CrossRef Search ADS   34. Ramasamy, D. & Madhow, U. ( 2015) Compressive spectral embedding: sidestepping the SVD. Advances in Neural Information Processing Systems , 28, pp. 550-- 558. 35. Shuman, D., Narang, S., Frossard, P., Ortega, A. & Vandergheynst, P. ( 2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. Signal Proc. Mag. IEEE , 30, 83-- 98. Google Scholar CrossRef Search ADS   36. Tremblay, N. & Borgnat, P. ( 2016) Subgraph-based filterbanks for graph signals. IEEE Trans. Signal Proc. , PP(99), 1–1. 37. Tremblay, N., Puy, G., Borgnat, P., Gribonval, R. & Vandergheynst, P. ( 2016) Accelerated spectral clustering using graph filtering of random signals. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . 38. Tremblay, N., Puy, G., Gribonval, R. & Vandergheynst, P. ( 2016) Compressive spectral clustering. Machine Learning, Proceedings of the Thirty-third International Conference (ICML 2016),  June 20–22, New York City, USA. 39. Tropp, J. A. ( 2012) User-friendly tail bounds for sums of random matrices. Found. Comput. Math. , 12, 389-- 434. Google Scholar CrossRef Search ADS   40. Xu, L., Yan, Q. & Jia, J. ( 2013) A sparse control model for image and video editing. ACM Trans. Graph. , 32, 197. Appendix A. Proof of the Theorem 2.2 As done in [32], the proof is obtained by applying the following lemma obtained by Tropp in [39]. Lemma A1 (Theorem 1.1, [39]) Consider a finite sequence {Xj} of independent, random, self-adjoint, positive semi-definite matrices of dimension d × d. Assume that each random matrix satisfies $$ \lambda _{\max }( {\mathsf{X}}_{j}) {\leqslant } R $$ almost surely. Define   \begin{align} \mu_{\min } := \lambda_{\min } \left( \sum_{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right) \; \textrm{and} \; \mu_{\max } := \lambda_{\max } \left( \sum_{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right). \end{align} (A.1) Then   \begin{equation} {\mathbb{P}} \left\{ \lambda_{\min } \left( \sum_{j} {\mathsf{X}}_{j} \right) {\leqslant} (1 - \delta) \mu_{\min } \right\} {\leqslant} d \, \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{\frac{\mu_{\min }}{R}} \textrm{for}\ \delta \in [0, 1],\end{equation} (A.2)  \begin{equation}{\hskip-40pt}\textrm{and}\ \; {\mathbb{P}} \left\{ \lambda_{\max } \left( \sum_{j} {\mathsf{X}}_{j} \right) {\geqslant} (1 + \delta) \mu_{\max } \right\} {\leqslant} d \, \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{\frac{\mu_{\max }}{R}} \textrm{for}\ \delta{\geqslant} 0. \end{equation} (A.3) We will also use the facts that, for all δ ∈ [0, 1],   \begin{align} \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{\mu_{\min }/R} \; {\leqslant} \; \exp \left( - \frac{\delta^{2} \mu_{\min }}{3 \, R}\right) \textrm{and} \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{\mu_{\max }/R} \; {\leqslant} \; \exp \left( - \frac{\delta^{2} \mu_{\max }}{3 \, R}\right). \end{align} (A.4) Proof of Theorem 2.2 We start by noticing that   \begin{align} {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \sum_{j = 1}^{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{l})} {{\mathsf{U}}}_{{k}} \right). \end{align} (A.5) We define   \begin{align} {\mathsf{X}}_{j} := \frac{1}{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{j})} {{\mathsf{U}}}_{{k}} \right) \quad \textrm{and} \quad{\mathsf{X}} := \sum_{j=1}^{{s}} {\mathsf{X}}_{j} = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{M}}}^{{{\intercal}}} {\mathsf{P}}^{2} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}}. \end{align} (A.6) The matrix X is thus a sum of s independent, random, self-adjoint, positive semi-definite matrices. We are in the setting of Lemma 1. We continue by computing $$ {\mathbb{E}} \, {\mathsf{X}}_{j}$$ and $$\lambda _{\max }( {\mathsf{X}}_{j})$$. The expected value of each Xj is   \begin{align} {\mathbb{E}} \, {\mathsf{X}}_{j} & = {\mathbb{E}} \left[ \frac{1}{{s}} \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}}{{{\mathsf{N}}}^{(\omega_{j})}}^{{{\intercal}}}{\mathsf{P}}^{(\omega_{j})} \right) \left({\mathsf{P}}^{(\omega_{j})} {{\mathsf{N}}}^{(\omega_{j})} {{\mathsf{U}}}_{{k}} \right) \right] = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} \left(\sum_{\ell=1}^{{N}} {{\boldsymbol{p}}}_{\ell} \left({{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}}{\mathsf{P}}^{(\ell)} \right) \left({\mathsf{P}}^{(\ell)} {{\mathsf{N}}}^{(\ell)} \right) \right) {{\mathsf{U}}}_{{k}} \nonumber \\ & = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} \left(\sum_{\ell=1}^{{N}}{{{\mathsf{N}}}^{(\ell)}}^{{{\intercal}}} {{\mathsf{N}}}^{(\ell)} \right) {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \; {{\mathsf{U}}}_{{k}}^{{{\intercal}}} {{\mathsf{U}}}_{{k}} = {\frac{1}{{s}}} \; {\mathsf{I}}. \end{align} (A.7) Therefore, $$ \lambda _{\min } \left ( \sum _{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right ) = 1 $$ and $$ \lambda _{\max } \left ( \sum _{j} {\mathbb{E}} \, {\mathsf{X}}_{j} \right ) = 1. $$ Furthermore, for all j = 1, …, s, we have   \begin{align} \lambda_{\max } ({\mathsf{X}}_{j}) = {\left\| {\mathsf{X}}_{j}\right\|}_{2} {\leqslant} \max_{1 {\leqslant} \ell{\leqslant} {N}} {\left\| \frac{{\mathsf{P}}^{(\ell)} {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}}{{s}}\right\|}_{2}^{2} = \frac{1}{{s}} \; \max_{1 {\leqslant} \ell{\leqslant} {N}} \left\{ \frac{{\left\| {{\mathsf{N}}}^{(\ell)} {{\mathsf{U}}}_{{k}}\right\|}_{2}^{2}}{{{\boldsymbol{p}}}_{\ell}} \right\} = \frac{{\nu}_{{{\boldsymbol{p}}}}^{2}}{{s}}. \end{align} (A.8) Lemma A1 yields, for any δ ∈ (0, 1),   \begin{equation} {\mathbb{P}} \left\{ \lambda_{\min } \left( {\mathsf{X}} \right) {\leqslant} (1 - \delta) \right\} \; {\leqslant} \; {k} \cdot \left[ \frac{{{\textrm{e}}}^{-\delta}}{(1-\delta)^{1-\delta}}\right]^{{s}/{\nu}_{{{\boldsymbol{p}}}}^{2}} \; {\leqslant} \; {k} \; \exp \left( - \frac{\delta^{2} {s}}{3 \, {\nu}_{{{\boldsymbol{p}}}}^{2}} \right)\end{equation} (A.9)  \begin{equation} \textrm{and } \; {\mathbb{P}} \left\{ \lambda_{\max } \left( {\mathsf{X}} \right) {\geqslant} (1 + \delta) \right\} \;\ {\leqslant} \; {k} \cdot \left[ \frac{{{\textrm{e}}}^{\delta}}{(1+\delta)^{1+\delta}}\right]^{{s}/{\nu}_{{{\boldsymbol{p}}}}^{2}} \; {\leqslant} \; {k} \; \exp \left( - \frac{\delta^{2} {s}}{3 \, {\nu}_{{{\boldsymbol{p}}}}^{2}} \right). \end{equation} (A.10) Therefore, for any δ ∈ (0, 1), we have, with probability at least 1 − ξ,   \begin{align} 1 - \delta{\leqslant} \lambda_{\min } \left( {\mathsf{X}} \right) \quad \textrm{and} \quad \lambda_{\max } \left( {\mathsf{X}} \right) {\leqslant} 1+\delta \end{align} (A.11) provided that (2.15) holds. Noticing that (A11) implies that   \begin{align} (1-\delta) {\left\| {\boldsymbol{\alpha}}\right\|}_{2}^{2} {\leqslant} {\frac{1}{{s}}} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{U}}}_{{k}} {\boldsymbol{\alpha}} \right\|}_{2}^{2} {\leqslant} (1+\delta) {\left\| {\boldsymbol{\alpha}}\right\|}_{2}^{2}, \end{align} (A.12) for all $$ {\boldsymbol{\alpha }} \in {\mathbb{R}}^{k}$$, which is equivalent to (2.14) for all x ∈ span(Uk), terminates the proof. Appendix B. Proof of Theorem 4.1 In order to prove Theorem 4.1, we need to establish few properties between the different matrices used in this work. The first useful property is   \begin{align} \widetilde{{\mathsf{P}}} \widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} = \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}}. \end{align} (B.1) Indeed, for any $$ {\boldsymbol{z}} \in {\mathbb{R}}^{ {n}}$$, the jth entry of $$\widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} {\boldsymbol{z}}$$ is   \begin{align} \left( \widetilde{{\mathsf{P}}} \widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} {\boldsymbol{z}} \right)_{j} = \frac{{\boldsymbol{1}}^{{{\intercal}}}{{\mathsf{N}}}^{(\omega_{j})} {\boldsymbol{z}} }{\left( {{\boldsymbol{p}}}_{\omega_{j}} \, {\left| \mathscr{N}_{\omega_{j}} \right|} \right)^{1/2}}. \end{align} (B.2) Then, the jth entry of $$\widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} {\boldsymbol{z}}$$ is the scaled sum of the values in the jth sampled group appearing in PMz, which is $$ { {\boldsymbol{p}}}_{\omega _{j}}^{-1/2} { {\mathsf{N}}}^{(\omega _{j})} {\boldsymbol{z}}$$. From the definition of $$\widetilde{ { {\mathsf{A}}}}$$, the sum is scaled by $$ {\left | \mathscr{N}_{\omega _{j}} \right |}^{-1/2}$$. Therefore, $$( \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} {\boldsymbol{z}} )_{j} = ( \widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} {\boldsymbol{z}} )_{j}$$ for all j ∈ {1, …, s}, which terminates the proof. The second property is   \begin{align} \widetilde{{{\mathsf{A}}}} \widetilde{{{\mathsf{A}}}}^{{{\intercal}}} = {\mathsf{I}}, \end{align} (B.3) which implies $$\| \widetilde{ { {\mathsf{A}}}} \|_{2} = 1$$. The third property is   \begin{align} \left\| \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{\boldsymbol{z}}} \right\|_{2} = {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{\boldsymbol{z}}} \right\|}_{2}, \end{align} (B.4) for all $$\tilde{ {\boldsymbol{z}}} \in {\mathbb{R}}^{ {N}}$$. To prove this property, we remark that $$ {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} = \widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}}. $$ Indeed, the entries in $$ {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}$$ corresponding to the first selected group $$\mathscr{N}_{\omega _{1}}$$ are all equal to $$\left ( { {\boldsymbol{p}}}_{\omega _{1}} \, {\left | \mathscr{N}_{\omega _{1}} \right |}\right )^{-1/2} \tilde{ {\boldsymbol{z}}}_{\omega _{1}}$$. There are $$ {\left | \mathscr{N}_{\omega _{1}} \right |}$$ such entries. One can also notice that the first $$ {\left | \mathscr{N}_{\omega _{1}} \right |}$$ entries of $$\widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}}$$ are also equal to $$\left ( { {\boldsymbol{p}}}_{\omega _{j}} \, {\left | \mathscr{N}_{\omega _{1}} \right |}\right )^{-1/2} \tilde{ {\boldsymbol{z}}}_{\omega _{1}}$$. Repeating this reasoning for all the sampled groups proves the equality. On the one side, we thus have $$ {\left \| {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}}\right \|}_{2} = \| \widetilde{ { {\mathsf{A}}}}^{ {{\intercal }}} \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2}, $$ where we used (B.3). On the other side, we have $$ \| \widetilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} { {\mathsf{A}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ {\boldsymbol{z}}} \|_{2} = \| \widetilde{ {\mathsf{P}}} \widetilde{ { {\mathsf{M}}}} \tilde{ {\boldsymbol{z}}} \|_{2}, $$ where we used (B.1) and (4.5). This terminates the proof. Proof of Theorem 4.1 As x* is a minimizer of (4.14), we have   \begin{align} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}^{*} \; {\leqslant} \; {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{align} (B.5) To prove the theorem, we need to lower and upper bound the left-hand and right-hand sides of (B.5), respectively. We start with the bound involving $$\widetilde{ { {\mathsf{L}}}}$$ and then with the ones involving $$\widetilde{ { {\mathsf{M}}}}$$. [Bounding the terms in (B.5) involving $$\widetilde{ { {\mathsf{L}}}}$$]. We define the matrices   \begin{equation} {\hskip-60pt}\bar{{{\mathsf{U}}}}_{{k}} := \left( {{\boldsymbol{u}}}_{{k}+1}, \ldots, {{\boldsymbol{u}}}_{{n}} \right) \in{\mathbb{R}}^{{n} \times ({n} - {k})}, \end{equation} (B.6)  \begin{equation}{\hskip-40pt}{\mathsf{G}}_{{k}} := {{\textrm{diag}}}\left( g({\lambda}_{1}), \ldots, g({\lambda}_{{k}}) \right) \in{\mathbb{R}}^{{k} \times{k}},\end{equation} (B.7)  \begin{equation}\bar{{\mathsf{G}}}_{{k}} := {{\textrm{diag}}}\left( g({\lambda}_{{k}+1}), \ldots, g({\lambda}_{{n}}) \right) \in{\mathbb{R}}^{({n} - {k}) \times ({n} - {k})}. \end{equation} (B.8) By definition of $$\boldsymbol{\alpha}^{*}$$ and $$\boldsymbol{ \beta}^{*}$$, $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} = {\boldsymbol{\alpha }}^{*} + {\boldsymbol{\beta }}^{*}$$ with $$\boldsymbol{\alpha}^{*}$$ ∈ span(Uk) and $$ {\boldsymbol{\beta }}^{*} \in {{\textrm{span}}}(\bar{ { {\mathsf{U}}}}_{ {k}})$$. We recall that $$\widetilde{ { {\mathsf{L}}}} = ( { {\mathsf{A}}} { {\mathsf{U}}}) \; g( { {\mathsf{\Lambda }}}) \; ( { {\mathsf{A}}} { {\mathsf{U}}})^{ {{\intercal }}}$$. Therefore, we obtain   \begin{align} (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \; \widetilde{{{\mathsf{L}}}} \; \tilde{{{\boldsymbol{x}}}}^{*} = \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\alpha}}^{*}\right)^{{{\intercal}}} \; {\mathsf{G}}_{{k}} \; \left({{\mathsf{U}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\alpha}}^{*}\right) + \left(\bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\beta}}^{*}\right)^{{{\intercal}}} \; \bar{{\mathsf{G}}}_{{k}} \; \left(\bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {\boldsymbol{\beta}}^{*}\right) {\geqslant} g({\lambda}_{{k}+1}) {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}^{2}. \end{align} (B.9) In the first step, we used the facts that $$\bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\alpha }}^{*} = {\boldsymbol{0}}$$ and $$ { {\mathsf{U}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\beta }}^{*} = {\boldsymbol{0}}$$. The second step follows from the fact that $$ {\left \| \bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} {\boldsymbol{\beta }}^{*} \right \|}_{2} = {\left \| {\boldsymbol{\beta }}^{*} \right \|}_{2}$$. We also have   \begin{align} \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}} & = ({{\mathsf{U}}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}})^{{{\intercal}}} \; g({{\mathsf{\Lambda}}}) \; ({{\mathsf{U}}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}) \; {\leqslant} \; g({\lambda}_{{k}}) {\left\| {{\mathsf{U}}}_{{k}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}\right\|}_{2}^{2} \nonumber \\ & {\leqslant} g({\lambda}_{{k}}) {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} \; {\!\leqslant\!} \; g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + g({\lambda}_{{n}}) \left[{\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} {{\boldsymbol{x}}}\right\|}_{2}^{2} +\! {\left\| \bar{{{\mathsf{U}}}}_{{k}}^{{{\intercal}}} ({{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}})\right\|}_{2}^{2} \right] \nonumber \\ & {\leqslant} g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + \varepsilon^{2} g({\lambda}_{{n}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2}. \end{align} (B.10) The second inequality follows from the facts that $$ {\left \| { {\mathsf{U}}}_{k}\right \|}_{2} = 1$$ and $$\tilde{ { {\boldsymbol{x}}}} = { {\mathsf{A}}} { {\boldsymbol{x}}}$$. To obtain the third inequality, we used $$ {\left \| { {\mathsf{A}}}\right \|}_{2} = 1$$ and the triangle inequality. For the last step, notice that $$ {\left \| \bar{ { {\mathsf{U}}}}_{ {k}}^{ {{\intercal }}} { {\boldsymbol{x}}}\right \|}_{2} = 0$$ (as x ∈span(Uk)), $$ {\left \| \bar{ { {\mathsf{U}}}}_{k}\right \|}_{2} = 1$$ and use (4.6). [Bounding the terms in (B5) involving $$\widetilde{ { {\mathsf{M}}}}$$]. By definition of $$\tilde{ { {\boldsymbol{y}}}}$$, it is immediate that   \begin{align} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} = {\left\| \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{n}}}}\right\|}_{2}^{2}. \end{align} (B.11) For the other term involving $$\widetilde{ { {\mathsf{M}}}}$$, the triangle inequality yields   \begin{align*} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} & {\geqslant} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align*} Then, we have  \begin{equation} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} = {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} {{\mathsf{A}}}{{\boldsymbol{x}}}\right\|}_{2} = {\left\| \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2}\end{equation} (B.12)  \begin{equation}{\hskip128pt}{\geqslant} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2}. \end{equation} (B.13) The first equality follows from $$ { {\mathsf{A}}} { {\mathsf{A}}}^{ {{\intercal }}} = {\mathsf{I}}$$, the second from (B.1) and the triangle inequality was used in the last step. To summarize, we are at   \begin{align} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} {\geqslant} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align} (B.14) We continue by lower bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} \|_{2}$$ and upper bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}} \|_{2}$$ separately. [Lower bound on $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} \|_{2}$$]. Equality (B.4) yields   \begin{align} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} = {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2}. \end{align} (B.15) Using the triangle inequality and the fact that $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$, we obtain   \begin{align} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}} - {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}}\right\|}_{2} \nonumber \\ & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - {\left\| {\mathsf{P}} {{\mathsf{M}}}\right\|}_{2} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}}\right\|}_{2} \nonumber \\ & {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} - \varepsilon \, M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.16) The restricted isometry property (2.14) then yields   \begin{align} {\left\| {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} & = {\left\| {\mathsf{P}} {{\mathsf{M}}} \, ({\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}}) + {\mathsf{P}} {{\mathsf{M}}} {\boldsymbol{\beta}}^{*}\right\|}_{2} {\geqslant} {\left\| {\mathsf{P}} {{\mathsf{M}}} ({\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}})\right\|}_{2} - {\left\| {\mathsf{P}}{{\mathsf{M}}} {\boldsymbol{\beta}}^{*}\right\|}_{2} \nonumber \\ & {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}. \end{align} (B.17) We used the equality $$ { {\mathsf{A}}}^{ {{\intercal }}} \tilde{ { {\boldsymbol{x}}}}^{*} = {\boldsymbol{\alpha }}^{*} + {\boldsymbol{\beta }}^{*}$$. We thus proved that   \begin{align} \| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} \|_{2} {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} - \varepsilon M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.18) [Upper bounding $$\| \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\mathsf{A}}}^{ {{\intercal }}} { {\mathsf{A}}} { {\boldsymbol{x}}} - \tilde{ { {\mathsf{A}}}} {\mathsf{P}} { {\mathsf{M}}} { {\boldsymbol{x}}} \|_{2}$$]. We have   \begin{align} {\left\| \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - \tilde{{{\mathsf{A}}}} {\mathsf{P}} {{\mathsf{M}}} {{\boldsymbol{x}}}\right\|}_{2} {\leqslant} \| \tilde{{{\mathsf{A}}}} \|_{2} {\left\| {\mathsf{P}} {{\mathsf{M}}}\right\|}_{2} {\left\| {{\mathsf{A}}}^{{{\intercal}}} {{\mathsf{A}}} {{\boldsymbol{x}}} - {{\boldsymbol{x}}} \right\|}_{2} {\leqslant} \varepsilon \, M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.19) We used the facts that $$\| \tilde{ { {\mathsf{A}}}} \|_{2} = 1$$ (see (B.3)) and that $$ { {\boldsymbol{x}}} \in { {\mathscr{A}}_\varepsilon }$$ in the last inequality. Using the inequalities (B.18) and (B.19) in (B.14), we arrive at   \begin{align} {\left\| \widetilde{{\mathsf{P}}}\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \widetilde{{\mathsf{P}}}\tilde{{{\boldsymbol{y}}}}\right\|}_{2} {\geqslant} \sqrt{{s} (1-\delta)} \; {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} - M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} - 2\varepsilon M_{\max } {\left\| {{\boldsymbol{x}}} \right\|}_{2} - {\left\| \widetilde{{\mathsf{P}}} \tilde{{{\boldsymbol{n}}}} \right\|}_{2}. \end{align} (B.20) [Finishing the proof]. Remark that (B.5) implies   \begin{equation} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}}^{*} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} {\leqslant} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}},\end{equation} (B.21)  \begin{equation}{\hskip-5pt}\textrm{and} \ \ {\gamma} \; (\tilde{{{\boldsymbol{x}}}}^{*})^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}^{*} {\leqslant} {\left\| \widetilde{{\mathsf{P}}}(\widetilde{{{\mathsf{M}}}} \tilde{{{\boldsymbol{x}}}} - \tilde{{{\boldsymbol{y}}}})\right\|}_{2}^{2} + {\gamma} \; \tilde{{{\boldsymbol{x}}}}^{{{\intercal}}} \, \widetilde{{{\mathsf{L}}}} \, \tilde{{{\boldsymbol{x}}}}. \end{equation} (B.22) Using (B.9), (B.10), (B.11) in (B.22), we obtain   \begin{align} {\gamma} \; g({\lambda}_{{k}+1}) {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2}^{2} {\leqslant} \; {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2}^{2} + \; {\gamma} \; g({\lambda}_{{k}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2} + \varepsilon^{2} \; {\gamma} \; g({\lambda}_{{n}}) {\left\| {{\boldsymbol{x}}}\right\|}_{2}^{2}, \end{align} (B.23) which implies (4.18) in Theorem 4.1. It remains to prove (4.17) to finish the proof. Using (B.10), (B.11) and (B.20) in (B.21), we obtain   \begin{align} \sqrt{{s} (1-\delta)} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} {\leqslant} 2 {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2} + M_{\max } {\left\| {\boldsymbol{\beta}}^{*}\right\|}_{2} + (\sqrt{{\gamma} g({\lambda}_{{k}})} + \varepsilon \sqrt{{\gamma} g({\lambda}_{{n}})} + 2 \varepsilon M_{\max }) {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.24) Using (4.18) to bound $$ {\left \| {\boldsymbol{\beta }}^{*}\right \|}_{2}$$ on the right-hand side, we have   \begin{align} \sqrt{{s} (1-\delta)} {\left\| {\boldsymbol{\alpha}}^{*} - {{\boldsymbol{x}}} \right\|}_{2} \; {\leqslant} &\; 2 {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}} \right\|}_{2} + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}} {\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}}\right\|}_{2} + M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} \nonumber \\ & + \varepsilon \; M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} {\left\| {{\boldsymbol{x}}} \right\|}_{2} + \left(\sqrt{{\gamma} g({\lambda}_{{k}})} + \varepsilon \sqrt{{\gamma} g({\lambda}_{{n}})} + 2 \varepsilon M_{\max } \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}\end{align} (B.25)  \begin{align}{\hskip75pt}=& \left( 2 + \frac{M_{\max }}{\sqrt{{\gamma} g({\lambda}_{{k}+1})}}\right){\left\| \widetilde{{\mathsf{P}}} {{\boldsymbol{n}}}\right\|}_{2} + \left( M_{\max } \sqrt{\frac{g({\lambda}_{{k}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{k}})}\right) {\left\| {{\boldsymbol{x}}} \right\|}_{2} \nonumber \\ & + \varepsilon \left( 2 M_{\max } + M_{\max } \sqrt{\frac{g({\lambda}_{{n}})}{g({\lambda}_{{k}+1})}} + \sqrt{{\gamma} g({\lambda}_{{n}})} \right) {\left\| {{\boldsymbol{x}}} \right\|}_{2}. \end{align} (B.26) We only rearranged the term in the last step. This proves (4.17) and terminates the proof. © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.

Journal

Information and Inference: A Journal of the IMAOxford University Press

Published: Feb 6, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off