A convergent relaxation of the Douglas–Rachford algorithm

A convergent relaxation of the Douglas–Rachford algorithm Comput Optim Appl (2018) 70:841–863 https://doi.org/10.1007/s10589-018-9989-y A convergent relaxation of the Douglas–Rachford algorithm 1,2 Nguyen Hieu Thao Received: 16 September 2017 / Published online: 6 March 2018 © The Author(s) 2018. This article is an open access publication Abstract This paper proposes an algorithm for solving structured optimization problems, which covers both the backward–backward and the Douglas–Rachford algo- rithms as special cases, and analyzes its convergence. The set of fixed points of the corresponding operator is characterized in several cases. Convergence criteria of the algorithm in terms of general fixed point iterations are established. When applied to nonconvex feasibility including potentially inconsistent problems, we prove local lin- ear convergence results under mild assumptions on regularity of individual sets and of the collection of sets. In this special case, we refine known linear convergence criteria for the Douglas–Rachford (DR) algorithm. As a consequence, for feasibility problem with one of the sets being affine, we establish criteria for linear and sublinear con- vergence of convex combinations of the alternating projection and the DR methods. These results seem to be new. We also demonstrate the seemingly improved numerical performance of this algorithm compared to the RAAR algorithm for both consistent and inconsistent sparse feasibility problems. This paper is dedicated to Professor Alexander Kruger on his 65th birthday. The research leading to these results has received funding from the German-Israeli Foundation Grant G-1253-304.6 and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement No. 339681. Nguyen Hieu Thao h.t.nguyen-3@tudelft.nl; hieuthao.ctu@gmail.com Delft Center for Systems and Control, Delft University of Technology, 2628CD Delft, The Netherlands Department of Mathematics, School of Education, Can Tho University, Can Tho, Vietnam 123 842 N. H. Thao Keywords Almost averagedness · Picard iteration · Alternating projection method · Douglas–Rachford method · RAAR algorithm · Krasnoselski–Mann relaxation · Metric subregularity · Transversality · Collection of sets Mathematics Subject Classification Primary 49J53 · 65K10; Secondary 49K40 · 49M05 · 49M27 · 65K05 · 90C26 1 Introduction Convergence analysis has been one of the central and very active applications of variational analysis and mathematical optimization. Examples of recent contributions to the theory of the field that have initiated efficient programs of analysis are [1,2,7,38]. It is the common recipe emphasized in these and many other works that there are two key ingredients required in order to derive convergence of a numerical method (1) regularity of individual functions or sets such as convexity and averaging property, and (2) regularity of collections of functions or sets at their critical points such as transversality, Kurdyka-Łojasiewicz property and metric subregularity. As a result, the question about convergence of a solving method can often be reduced to checking whether certain regularity properties of the problem data are satisfied. There have been a considerable number of papers studying these two ingredients of convergence analysis in order to establish sharper convergence criteria in various circumstances, especially those applicable to algorithms for solving nonconvex problems [5,12,13, 19,26,27,31–33,38,42,45]. This paper suggests an algorithm called T , which covers both the backward- backward and the DR algorithms as special cases of choosing the parameter λ ∈[0, 1], and analyzes its convergence. When applied to feasibility problem for two sets one of which is affine, T is a convex combination of the alternating projection and the DR methods. On the other hand, T can be viewed as a relaxation of the DR algorithm. Motivation for relaxing the DR algorithm comes from the lack of stability of this algo- rithm when applied to inconsistent problems. This phenomenon has been observed for the Fourier phase retrieval problem which is essentially inconsistent due to the recip- rocal relationship between the spatial and frequency variables of the Fourier transform [35,36]. To address this issue, a relaxation of the DR algorithm, often known as the RAAR algorithm, was proposed and applied to phase retrieval problems by Luke in the aforementioned papers. In the framework of feasibility, the RAAR algorithm is described as a convex combination of the basic DR operator and one of the projectors. Our preliminary numerical experiments have revealed a promising performance of algorithm T in comparison with the RAAR method. This observation has motivated the study of convergence analysis of algorithm T in this paper. After introducing the notation and proving preliminary results in Sect. 2,weintro- duce T as a general fixed point operator, characterize the set of fixed points of T λ λ (Proposition 1), and establish abstract convergence criteria for iterations generated by T (Theorem 2) in Sect. 3. We discuss algorithm T in the framework of feasi- λ λ bility problems in Sect. 4. The set of fixed points of T is characterized for convex inconsistent feasibility (Proposition 3). For consistent feasibility we show that almost 123 A convergent relaxation of the Douglas–Rachford algorithm 843 averagedness of T (Proposition 4) and metric subregularity of T − Id (Lemma 3) λ λ can be obtained from regular properties of the individual sets and of the collection of sets, respectively. As a result, the two regularity notions are combined to yield local linear convergence of iterations generated by T (Theorem 4). Section 5 is devoted to demonstrate the improved numerical performance of algorithm T compared to the RAAR algorithm for both consistent and inconsistent feasibility problems. In this section, we study the feasibility approach for solving the sparse optimization problem. Our linear convergence result established in Sect. 4 for iterations generated by T is also illustrated in this application (Theorem 5). 2 Notation and preliminary results Our notation is standard, c.f. [11,40,46]. The setting throughout this paper is a finite dimensional Euclidean space E. The norm · denotes the Euclidean norm. The open unit ball in a Euclidean space is denoted B, and B (x ) stands for the open ball with radius δ> 0 and center x. The distance to a set A ⊂ E with respect to the bivariate function dist (·, ·) is defined by dist (·, A) : E → R : x → inf dist (x , y). y∈ A We use the convention that the distance to the empty set is +∞. The set-valued mapping { | } P : E ⇒ E : x → y ∈ A dist (x , y) = dist (x , A) is the projector on A. An element y ∈ P (x ) is called a projection. This exists for anyclosedset A ⊂ E. Note that the projector is not, in general, single-valued. Closely related to the projector is the prox mapping corresponding to a function f and a stepsize τ> 0[41] 1 2 prox (x ) := argmin f ( y) +  y − x  . τ, f y∈E 2τ When f = ι is the indicator function of A, that is ι (x ) = 0if x ∈ A and ι (x ) = A A A −1 +∞ otherwise, then prox = P for all τ> 0. The inverse of the projector, P , τ,ι A A is defined by −1 P (a) := {x ∈ E | a ∈ P (x ) } . The proximal normal cone to A at x¯ is the set, which need not be either closed or convex, prox −1 N (x¯ ) := cone P (x¯) −¯ x . (1) A A prox If x¯ ∈ / A, then N (x¯) is defined to be empty. Normal cones are central to charac- terizations both of the regularity of individual sets and of the regularity of collections of sets. For a refined numerical analysis of projection methods, one also defines the Λ-proximal normal cone to A at x¯ by prox −1 N (x¯) := cone P (x¯ ) ∩ Λ −¯ x . A|Λ A When Λ = E, it coincides with the proximal normal cone (1). 123 844 N. H. Thao For ε ≥ 0 and δ> 0, a set A is (ε, δ)-regular relative to Λ at x¯ ∈ A [13, Definition prox 2.9] if for all x ∈ B (x¯ ), a ∈ A ∩ B (x¯ ) and v ∈ N (a), δ δ A|Λ x − a,v ≤ ε x − av . When Λ = E, the quantifier “relative to” is dropped. For a set-valued operator T : E ⇒ E, its fixed point set is defined by Fix T := {x ∈ E | x ∈ Tx }. For a number λ ∈[0, 1], we denote the λ-reflector of T by R := T ,λ (1 + λ)T − λ Id. A frequently used example in this paper corresponds to T being a projector. In the context of convergence analysis of Picard iterations, the following general- ization of the Fejér monotonicity of sequences appears frequently, see, for example, the book [4] or the paper [39] for the terminology. Definition 1 (Linear monotonicity) The sequence (x ) is linearly monotone with respect to a set S ⊂ E with rate c ∈[0, 1] if dist (x , S) ≤ c dist (x , S) ∀k ∈ N. k+1 k Our analysis follows the abstract analysis program proposed in [38] which requires the two key components of the convergence: almost averagedness and metric subreg- ularity. Definition 2 (Almost nonexpansive/averaging mappings)[38]Let T : E ⇒ E and U ⊂ E. (i) T is pointwise almost nonexpansive at y on U with violation ε ≥ 0 if for all x ∈ U, + + x ∈ Tx and y ∈ Ty, + + x − y ≤ 1 + ε x − y . (ii) T is pointwise almost averaging at y on U with violation ε ≥ 0 and averaging + + constant α> 0 if for all x ∈ U, x ∈ Tx and y ∈ Ty, 1 − α 2 2 + + 2 + + x − y ≤ (1 + ε) x − y − (x − x ) − ( y − y) . (2) When a property holds at all y ∈ U on U, we simply say that the property holds on U. From Definition 2, almost nonexpansiveness is actually the almost averaging prop- erty with the same violation and averaging constant α = 1. Remark 1 (the range of quantitative constants) In the context of Definition 2,itis natural to consider violation ε ≥ 0 and averaging constant α ∈ (0, 1]. Mathematically, it also makes sense to consider ε< 0 and α> 1 provided that the required estimate (2) holds true. Simple examples for the later case are linear contraction mappings. In this paper, averaging constant α> 1 will frequently be involved implicitly in intermediate 123 A convergent relaxation of the Douglas–Rachford algorithm 845 steps of our analysis without any contradiction or confusion. This is the reason why in Definition 2 (ii) we considered α> 0 instead of α ∈ (0, 1] as in [38, Definition 2.2]. It is worth noting that if the iteration x ∈ Tx is linearly monotone with respect k+1 k to Fix T with rate c ∈ (0, 1) and T is almost averaging on some neighborhood of Fix T with averaging constant α ∈ (0, 1], then (x ) converges R-linearly to a fixed point of T [39, Proposition 3.5]. We next prove a fundamental preliminary result for our analysis regarding almost averaging mappings. Lemma 1 Let T : E ⇒ E,U ⊂ E, λ ∈[0, 1], ε ≥ 0 and α> 0. The following two statements are equivalent. (i) T is almost averaging on U with violation ε and averaging constant α. (ii) The λ-reflector of T , R = (1 + λ)T − λ Id, is almost averaging on U with T ,λ violation (1 + λ)ε and averaging constant (1 + λ)α. + + + Proof Take any x , y ∈ U, x ∈ Tx, y ∈ Ty, x˜ = (1 + λ)x − λx ∈ R x and T ,λ y ˜ = (1 + λ) y − λy ∈ R y. We have by definition of R and [4, Corollary 2.14] T ,λ T ,λ that 2 + + x˜ −˜ y = (1 + λ)(x − y ) − λ(x − y) 2 2 + + 2 + + = (1 + λ) x − y − λ x − y + λ(1 + λ) (x − x ) − ( y − y) . (3) We also note that + + (x˜ − x ) − ( y ˜ − y) = (1 + λ) (x − x ) − ( y − y) . (4) (i) ⇒ (ii). Suppose that T is almost averaging on U with violation ε and averaging constant α. Substituting (2)into(3) and using (4), we obtain that x˜ −˜ y 1 − α 2 + + ≤ (1 + (1 + λ)ε) x − y − (1 + λ) − λ (x − x ) − ( y − y) 1−α − λ 2 α 2 = (1 + (1 + λ)ε) x − y − (x˜ − x ) − ( y ˜ − y) 1 + λ 1 − (1 + λ)α 2 2 = (1 + (1 + λ)ε) x − y − (x˜ − x ) − ( y ˜ − y) , (5) (1 + λ)α which means that R is almost averaging on U with violation (1 + λ)ε and averaging T ,λ constant (1 + λ)α. (ii) ⇒ (i). Suppose that R is almost averaging on U with violation (1 + λ)ε and T ,λ averaging constant (1 + λ)α, that is, the inequality (5) is satisfied. Substituting (3)into 123 846 N. H. Thao (5) and using (4), we obtain 2 2 + + 2 + + (1 + λ) x − y − λ x − y + λ(1 + λ) (x − x ) − ( y − y) 1 − α 2 + + ≤ (1 + (1 + λ)ε) x − y − (1 + λ) − λ (x − x ) − ( y − y) . Equivalently, 1 − α 2 2 + + 2 + + x − y ≤ (1 + ε) x − y − (x − x ) − ( y − y) . Hence T is almost averaging on U with violation ε and averaging constant α and the proof is complete. Lemma 1 generalizes [13, Lemma 2.4] where the result was proved for α = 1/2 and λ = 1. The next lemma recalls facts regarding the almost averagedness of projectors and reflectors associated with regular sets. Lemma 2 Let A ⊂ E be closed and (ε, δ)-regular at x¯ ∈ A and define U := {x ∈ E | P x ⊂ B (x¯ )}. A δ (i) The projector P is pointwise almost nonexpansive on U at every point z ∈ A ∩ B (x¯ ) with violation 2ε + ε . (ii) The projector P is pointwise almost averaging on U at every point z ∈ A ∩ B (x¯ ) A δ with violation 2ε + 2ε and averaging constant 1/2. (iii) The λ-reflector R is pointwise almost averaging on U at every point z ∈ P ,λ 2 1+λ A ∩ B (x¯ ) with violation (1 + λ)(2ε + 2ε ) and averaging constant . Proof Statements (i) and (ii) can be found in [13, Theorem 2.14] or [38, Theorem 3.1 (i) & (iii)]. Statement (iii) follows from (ii) and Lemma 1 applied to T = P and α = 1/2. The following concept of metric subregularity with functional modulus has played a central role, explicitly or implicitly, in the convergence analysis of Picard iterations [1,13,38,39]. Recall that a function μ :[0, ∞) →[0, ∞) is a gauge function if μ is continuous and strictly increasing and μ(0) = 0. Definition 3 (Metric subregularity with functional modulus) A mapping F : E ⇒ E is metrically subregular with gauge μ on U ⊂ E for y relative to Λ ⊂ E if −1 μ dist x , F ( y) ∩ Λ ≤ dist ( y, F (x )) ∀x ∈ U ∩ Λ. When μ is a linear function, that is μ(t ) = κ t, ∀t ∈[0, ∞), one says “with constant κ” instead of “with gauge μ = κ Id”. When Λ = E, the quantifier “relative to” is dropped. 123 A convergent relaxation of the Douglas–Rachford algorithm 847 Metric subregularity has many important applications in variational analysis and mathematical optimization, see the monographs and papers [11,15–18,20,21,25,40, 44]. For the discussion of metric subregularity in connection with subtransversality of collections of sets, we refer the reader to [23,24,29,30]. The next theorem serves as the basic template for the quantitative convergence analysis of fixed point iterations. By the notation T : Λ ⇒ Λ where Λ is a subset of E, we mean that T : E ⇒ E and Tx ⊂ Λ for all x ∈ Λ. This simplification of notation should not lead to any confusion if one keeps in mind that there may exist fixed points of T that are not in Λ. For the importance of the use of Λ in isolating the desirable fixed point, we refer the reader to [1, Example 1.8]. In the following, ri Λ denotes the relative interior of Λ. Theorem 1 [38, Theorem 2.1] Let T : Λ ⇒ Λ for Λ ⊂ E and let S ⊂ ri Λ be closed and nonempty such that T y ⊂ Fix T ∩ S for all y ∈ S. Let O be a neighborhood of S such that O ∩ Λ ⊂ ri Λ. Suppose that (a) T is pointwise almost averaging at all points y ∈ S with violation ε and averaging constant α ∈ (0, 1) on O ∩ Λ, and (b) there exists a neighborhood V of Fix T ∩ S and a constant κ> 0 such that for all + + y ∈ S, y ∈ T y and all x ∈ T x the estimate + + κ dist (x , S) ≤ x − x − y − y (6) holds whenever x ∈ (O ∩ Λ) \ (V ∩ Λ). Then for all x ∈ Tx (1 − α)κ dist x , Fix T ∩ S ≤ 1 + ε − dist (x , S) whenever x ∈ (O ∩ Λ) \ (V ∩ Λ). εα In particular, if κ> , then for any initial point x ∈ O ∩ Λ the iteration 1−α x ∈ Tx satisfies k+1 k dist (x , Fix T ∩ S) ≤ c dist (x , S) k+1 0 (1−α)κ with c := 1 + ε − < 1 for all k such that x ∈ (O ∩ Λ) \ (V ∩ Λ) for j = 1, 2,..., k. Remark 2 [38, p. 13] In the case of S = Fix T condition (6) reduces to metric sub- regularity of the mapping F := T − Id for 0 on the annular set (O ∩ Λ) \ (V ∩ Λ), that is −1 κ dist (x , F (0)) ≤ dist (0, F (x )) ∀x ∈ (O ∩ Λ) \ (V ∩ Λ) . εα The inequality κ> then states that the constant of metric subregularity κ is 1−α sufficiently large relative to the violation of the averaging property of T to guarantee linear progression of the iterates through that annular region. 123 848 N. H. Thao For a comprehensive discussion on the roles of S and Λ in the analysis program of Theorem 1, we would like to refer the reader to the paper [38]. For the sake of simplification in terms of presentation, we have chosen to reduce the number of technical constants appearing in the analysis. It would be obviously analogous to formulate more theoretically general results by using more technical constants in appropriate places. 3 T as a fixed point operator We consider the problem of finding a fixed point of the operator T := T ((1 + λ)T − λ Id) − λ (T − Id) , (7) λ 1 2 2 where λ ∈[0, 1] and T : E ⇒ E (i = 1, 2) are assumed to be easily computed. Examples of T include the backward-backward and the DR algorithms [8,10,34, 36,43] for solving the structured optimization problem minimize f (x ) + f (x ) 1 2 x ∈E under different assumptions on the functions f (i = 1, 2). Indeed, when T are the i i prox mappings of f with parameters τ > 0, then T with λ = 0 and 1 takes the form i i λ T = prox ◦ prox , and T = prox 2prox − Id − prox + Id, λ λ τ , f τ , f τ , f τ , f τ , f 1 1 2 2 1 1 2 2 2 2 respectively. We first characterize the set of fixed points of T via those of the constituent oper- ators T (i = 1, 2). Proposition 1 Let T , T : E ⇒ E, λ ∈[0, 1] and consider T defined at (7). The 1 2 λ following statements hold true. (i) (1 + λ)T − λ Id = ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id). λ 1 2 As a consequence, Fix T = Fix ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id) . λ 1 2 (ii) Suppose that T = P is the projector on an affine set A and T is single-valued. 1 A 2 Then Fix T ={x ∈ E | P x = λT x + (1 − λ)x } λ A 2 ⊂{x ∈ E | P x = P T x }. (8) A A 2 Proof (i). We have by the construction of T that (1 + λ)T − λ Id = (1 + λ) (T ((1 + λ)T − λ Id) − λ(T − Id)) − λ Id λ 1 2 2 = (1 + λ)T ((1 + λ)T − λ Id) − λ [(1 + λ)T − λ Id] 1 2 2 = ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id) . 1 2 123 A convergent relaxation of the Douglas–Rachford algorithm 849 (ii). We first take an arbitrary x ∈ Fix T and prove that P x = P T x = λT x + (1 − λ)x . A A 2 2 Indeed, from x = T x, we get x = P ((1 + λ)T x − λx ) − λ(T x − x ) A 2 2 ⇔ λT x + (1 − λ)x = P ((1 + λ)T x − λx ) . (9) 2 A 2 In particular, λT x + (1 − λ)x ∈ A. Thus by equality (9) and the assumption that P 2 A is affine, we have P (λT x + (1 − λ)x ) = P ((1 + λ)T x − λx ) A 2 A 2 ⇔ λ P T x + (1 − λ) P x = (1 + λ) P T x − λ P x A 2 A A 2 A ⇔ P x = P T x . (10) A A 2 Substituting (10)into(9) also yields λT x + (1 − λ)x = (1 + λ) P T x − λ P x 2 A 2 A = (1 + λ) P x − λ P x = P x . A A A Finally, let us take an arbitrary x satisfying P x = λT x + (1 − λ)x and prove A 2 that x ∈ Fix T . Indeed, we note that λT x + (1 − λ)x ∈ A. Since P is affine, one λ 2 A can easily check (10) and then (9), which is equivalent to x ∈ Fix T . The proof is complete. The inclusion (8) in Proposition 1 can be strict as shown in the next example. 2 2 Example 1 Let us consider E = R ,the set A = (x , x ) ∈ R | x = 0 and the 1 2 1 1 2 two operators T = P and T x = x (∀x ∈ R ). Then for any point x = (x , 0) 1 A 2 1 with x = 0, we have P x = P T x = (0, 0) but P x = (0, 0) = (1 − λ/2)x = 1 A A 2 A λT x + (1 − λ)x, that is x ∈ / Fix T . 2 λ The next proposition shows that the almost averagedness of T naturally inherits from that of T and T via Krasnoselski–Mann relaxations. 1 2 Proposition 2 (Almost averagedness of T ) Let λ ∈[0, 1],T be almost averaging on λ i U ⊂ E with violation ε ≥ 0 and averaging constant α > 0 (i = 1, 2) and define i i i the set U := {x ∈ U | R x ⊂ U }. 2 T ,λ 1 Then T is almost averaging on U with violation ε = ε + ε + (1 + λ)ε ε and λ 1 2 1 2 2 max{α ,α } 1 2 averaging constant α = . 1+(1+λ) max{α ,α } 1 2 123 850 N. H. Thao Proof By the implication (i) ⇒ (ii) of Lemma 1, the operators R = (1+λ)T −λ Id T ,λ i are almost averaging on U with violation (1 + λ)ε and averaging constant (1 + λ)α i i i (i = 1, 2). Then thanks to [38, Proposition 2.4 (iii)], the operator T := R R is T ,λ T ,λ 1 2 almost averaging on U with violation (1 + λ) (ε + ε + (1 + λ)ε ε ) and averaging 1 2 1 2 2(1+λ) max{α ,α } 1 2 constant . Note that T = (1 + λ)T − λ Id by Proposition 1.We 1+(1+λ) max{α ,α } 1 2 have by the implication (ii) ⇒ (i) of Lemma 1 that T is almost averaging on U with 2 max{α ,α } 1 2 violation ε = ε + ε + (1 + λ)ε ε and averaging constant α = as 1 2 1 2 1+(1+λ) max{α ,α } 1 2 claimed. We next discuss convergence of T based on the abstract results established in [38]. Our agenda is to verify the assumptions of Theorem 1. To simplify the exposure in terms of presentation, we have chosen to state the results corresponding to S = Fix T and Λ = E in Theorem 1. In the sequel, we will denote, for a nonnegative real ρ, S := Fix T + ρB. ρ λ Theorem 2 (Convergence of algorithm T with metric subregularity) Let T be defined λ λ at (7), δ> 0 and γ ∈ (0, 1). Suppose that for each n ∈ N, the following conditions are satisfied. (i) T is almost averaging on S with violation ε ≥ 0 and averaging constant 2 γ δ 2,n n n α ∈ (0, 1), and T is almost averaging on the set S ∪ R S with 2,n 1 γ δ T ,λ γ δ violation ε ≥ 0 and averaging constant α ∈ (0, 1). 1,n 1,n (ii) The mapping T − Id is metrically subregular on D := S \ S n+1 for 0 with λ n γ δ γ δ gauge μ satisfying μ (dist (x , Fix T )) α ε n λ n n inf ≥ κ > , (11) x ∈ D dist (x , Fix T ) 1 − α n λ n 2 max{α ,α } 1,n 2,n where ε := ε + ε + (1 + λ)ε ε and α := . n 1,n 2,n 1,n 2,n n 1+(1+λ) max{α ,α } 1,n 2,n Then all iterations x ∈ T x starting in S satisfy k+1 λ k δ dist (x , Fix T ) →0(12) k λ and dist (x , Fix T ) ≤ c dist (x , Fix T ) ∀x ∈ D , (13) k+1 λ n k λ k n (1−α )κ where c := 1 + ε − < 1. n n (1−α )κ In particular, if − ε is bounded from below by some τ> 0 for all n sufficiently large, then the convergence (12) is R-linear with rate at most 1 − τ . Proof For each n ∈ N, we verify the assumptions of Theorem 1 for O = S , γ δ V = S and D = O \ V = S n \ S . Under assumption (i) of Theorem n+1 n+1 n γ δ γ δ γ δ 2, Proposition 2 ensures that T is almost averaging on S n with violation ε and λ γ δ n averaging constant α . In other words, condition (a) of Theorem 1 is satisfied with 123 A convergent relaxation of the Douglas–Rachford algorithm 851 ε = ε and α = α . Assumption (ii) of Theorem 2 also fulfills condition (b) of n n Theorem 1 with κ = κ in view of Remark 2. Theorem 1 then yields the conclusion of Theorem 2 after a straightforward care of the involving quantitative constants. The first inequality in (11) essentially says that the gauge function μ can be bounded from below by a linear function on the reference interval. Remark 3 In Theorem 2, the fundamental goal of formulating assumption (i) on the set S and assumption (ii) on the set D is that one can characterize sublinear γ δ n convergence of an iteration on S via linear progression of its iterates through each of the annular set D . This idea is based on the fact that for larger n, the almost averaging property of T on S is always improved but the metric subregularity on D may λ γ δ n get worse, however, if the corresponding quantitative constants still satisfy condition (11), then convergence is guaranteed. For an illustrative example, we refer the reader to [38, Example 2.4]. 4 Application to feasibility We consider algorithm T for solving feasibility problem involving two closed sets A, B ⊂ E, x ∈ T x = P ((1 + λ) P x − λx ) − λ ( P x − x ) λ A B B = P R (x ) − λ ( P x − x ) . (14) A P ,λ B Note that T with λ = 0 and 1 corresponds to the alternating projections P P and λ A B the DR method ( R ◦ R + Id), respectively. A B It is worth recalling that feasibility problem for m ≥ 2 sets can be reformulated as a feasibility problem for two constructed sets on the product space E with one of the later sets is a linear subspace, and the regularity properties in terms of both individual sets and collections of sets of the later sets are inherited from those of the former ones [3,32]. When A is an affine set, then the projector P is affine and T is a convex combi- A λ nation of the alternating projection and the DR methods since T x = P (1 − λ) P x + λ(2 P x − x ) − λ P x − x ( ) ( ) λ A B B B = (1 − λ) P P x + λ (x + P (2 P x − x ) − P x ) A B A B B = (1 − λ)T (x ) + λT (x ). 0 1 In this case, we establish convergence results for all convex combinations of the alternating projection and the DR methods. To our best awareness, this kind of results seems to be new. Recall that when applied to inconsistent feasibility problems the DR operator has no fixed points. We next show that the set of fixed points of T with λ ∈[0, 1) for convex inconsistent feasibility problems is nonempty. This result follows the lines of [36, Lemma 2.1] where the fixed point set of the RAAR operator is characterized. 123 852 N. H. Thao Proposition 3 (Fixed points of T for convex inconsistent feasibility) For closed con- vex sets A, B ⊂ E,let G = B − A, g = P 0,E = A ∩ ( B − g) and F = ( A + g) ∩ B. Then Fix T = E − g ∀λ ∈[0, 1). 1 − λ Proof We first show that E − g ⊂ Fix T .Pickany e ∈ E and denote f = e + g ∈ 1−λ F as definitions of E and F. We are checking that x := e − g ∈ Fix T . 1 − λ Since x = f − g and −g ∈ N ( f ), we get P x = f . B B 1−λ Analogously, since g ∈ N (e) and (1 + λ) P x − λx = (1 + λ) f − λx = e + g, 1 − λ we have P ((1 + λ) P x − λx ) = e. A B Hence, x − T x = x − P ((1 + λ) P x − λx ) + λ ( P x − x ) λ A B B = x − e + λ ( f − x ) = 0. That is x ∈ Fix T . We next show that Fix T ⊂ E − g.Pickany x ∈ Fix T .Let f = P x and λ λ B 1−λ y = x − f . Thanks to x ∈ Fix T and the definition of T , λ λ P ((1 + λ) P x − λx ) = λ( P x − x ) + x A B B =− λy + y + f = f + (1 − λ) y. (15) Now, for any a ∈ A, since A is closed and convex, we have 0 ≥ a − P ((1 + λ) P x − λx ), (1 + λ) P x − λx − P ((1 + λ) P x − λx ) A B B A B a − ( f + (1 − λ) y), (1 + λ) f − λx − ( f + (1 − λ) y) a − f − (1 − λ) y, − y = −a + f, y + (1 − λ)  y . On the other hand, for any b ∈ B, since B is closed and convex, we have b − f, y = b − f, x − f = b − P x , x − P x ≤ 0. B B Combining the last two inequalities yields b − a, y ≤−(1 − λ)  y ≤ 0 ∀a ∈ A, ∀b ∈ B. 123 A convergent relaxation of the Douglas–Rachford algorithm 853 Take a sequence (a ) in A and a sequence (b ) in B such that g := b − a → g. n n n n n Then g , y ≤−(1 − λ)  y ≤ 0 ∀n. (16) Taking the limit and using the Cauchy–Schwarz inequality yields y ≤ g . 1 − λ Conversely, by (15) with noting that f ∈ B and P ((1 + λ) P x − λx ) ∈ A, A B 1 1 y =  f − P ((1 + λ) P x − λx ) ≥ g . A B 1 − λ 1 − λ 1 1 Hence  y = g, and taking the limit in (16), which yields y =− g. Since 1−λ 1−λ f ∈ B and f − g = f + (1 − λ) y = P ((1 + λ) P x − λx ) ∈ A,wehave A B f − g ∈ A ∩ ( B − g) = E and, therefore, 1 λ λ x = f + y = f − g = f − g − g ∈ E − g. 1 − λ 1 − λ 1 − λ We next discuss the two key ingredients for convergence of algorithm T applied to feasibility problems: 1) almost averagedness of T , and 2) metric subregularity of T − Id. The two properties will be deduced from the (ε, δ)-regularity of the individual sets and the transversality of the collection of sets, respectively. The next proposition shows averagedness of T applied to feasibility problems involving (ε, δ)-regular sets. Proposition 4 Let A and B be (ε, δ)-regular at x¯ ∈ A ∩ B and define the set U := {x ∈ E | P x ⊂ B (x¯ ) and P R x ⊂ B (x¯ )}. (17) B δ A P ,λ δ Then T is pointwise almost averaging on U at every point z ∈ S := A ∩ B ∩ B (x¯ ) λ δ with averaging constant and violation 3+λ 2 2 2 ε ˜ := 2(2ε + 2ε ) + (1 + λ)(2ε + 2ε ) . (18) Proof Let us define the two sets U := { y ∈ E | P y ⊂ B (x¯)}, U := {x ∈ E | P x ⊂ B (x¯)} A A δ B B δ and note that x ∈ U if and only if x ∈ U and R x ⊂ U . Thanks to Lemma 2 (iii), B P ,λ A R and R are pointwise almost averaging at every point z ∈ S with violation P ,λ P ,λ A B 1+λ (1 + λ)(2ε + 2ε ) and averaging constant on U and U , respectively. Then A B due to [38, Proposition 2.4 (iii)], the operator T := R R is pointwise almost P ,λ P ,λ A B 123 854 N. H. Thao 2(1+λ) averaging on U at every point z ∈ S with averaging constant and violation 3+λ (1 + λ)ε ˜, where ε ˜ is given by (18). Note that T = (1 + λ)T − λ Id by Proposition 1. Thanks to Lemma 1, T is pointwise almost averaging on U at every point z ∈ S with violation ε ˜ and averaging constant as claimed. 3+λ Remark 4 It follows from Lemma 2 (i) & (iii) that the set U defined by (17) contains at least the ball B  (x¯), where δ := > 0. 2(1 + ε) 1 + (1 + λ)(2ε + 2ε ) We next integrate Proposition 4 into Theorem 2 to obtain convergence of algorithm T for solving consistent feasibility problems involving (ε, δ)-regular sets. Corollary 1 (Convergence of algorithm T for feasibility) Consider the algorithm T λ λ defined at (14) and suppose that Fix T = A ∩ B =∅. Denote S = Fix T + ρB for λ ρ λ a nonnegative real ρ. Suppose that there are δ> 0, ε ≥ 0 and γ ∈ (0, 1) such that A and B are (ε, δ )-regular at avery point z ∈ A ∩ B , where δ := 2δ(1 + ε) 1 + (1 + λ)(2ε + 2ε ), and for each n ∈ N, the mapping T − Id is metrically subregular on D := S n \ λ n γ δ S for 0 with gauge μ satisfying n+1 γ δ μ (dist (x , A ∩ B)) 2ε ˜ inf ≥ κ > , x ∈ D dist (x , A ∩ B) 1 + λ where ε ˜ is given at (18). Then all iterations x ∈ T x starting in S satisfy (12) and (13) with c := k+1 λ k δ n (1+λ)κ 1 +˜ ε − < 1. 2ε ˜ In particular, if (κ ) is bounded from below by some κ> for all n sufficiently 1+λ large, then (x ) eventually converges R-linearly to a point in A ∩ B with rate at most (1+λ)κ 1 +˜ ε − < 1. Proof Let any x ∈ D ,for some n ∈ N, x ∈ T x and x¯ ∈ P x. A combination of n λ A∩ B Proposition 4 and Remark 4 implies that T is pointwise almost averaging on B (x¯) at λ δ every point z ∈ A ∩ B ∩ B (x¯ ) with violation ε ˜ given by (18) and averaging constant . In other words, condition (a) of Theorem 1 is satisfied. Condition (b) of Theorem 3+λ 1 is also fulfilled by the same argument as the one used in Theorem 2. The desired conclusion now follows from Theorem 1. In practice, the metric subregularity assumption is often more challenging to be verified than the averaging property. In the concrete example of consistent alternating projections P P , that metric subregularity condition holds true if and only if the A B collection of sets is subtransversal. We next show that the metric subregularity of 123 A convergent relaxation of the Douglas–Rachford algorithm 855 T − Id can be deduced from the transversality of the collection of sets { A, B}.As a result, if the sets are also sufficiently regular, then local linear convergence of the iteration x ∈ T x is guaranteed. k+1 λ k We first describe the concept of relative transversality of collections of sets. In the sequel, we set Λ := aff( A ∪ B), the smallest affine set in E containing both A and B. Assumption 3 The collection { A, B} is transversal at x¯ ∈ A ∩ B relative to Λ with ¯ ¯ constant θ< 1, that is, for any θ ∈ (θ, 1), there exists δ> 0 such that u,v ≥−θ u · v prox prox holds for all a ∈ A ∩ B (x¯ ), b ∈ B ∩ B (x¯), u ∈ N (a) and v ∈ N (b). δ δ A|Λ B|Λ Thanks to [22, Theorem 1] and [28, Theorem 1], Assumption 3 also ensures 1−θ subtransversality of { A, B} at x¯ relative to Λ with constant at least on the neighborhood B (x¯ ), that is 1 − θ dist (x , A ∩ B) ≤ max{dist (x , A), dist (x , B)}∀x ∈ Λ ∩ B (x¯ ). (19) The next lemma is at the heart of our subsequent discussion. Lemma 3 Suppose that Assumption 3 is satisfied. Then for any θ ∈ (θ, 1), there exists a number δ> 0 such that for all x ∈ B (x¯) and x ∈ T x, δ λ κ dist (x , A ∩ B) ≤ x − x , (20) where κ is defined by (1 − θ) 1 + θ κ :=   > 0. (21) √ √ 2max 1,λ + 1 − θ Proof For any θ ∈ (θ, 1), there is a number δ> 0 satisfying the property described in Assumption 3. Let us set δ = δ/6 and show that condition (20) is fulfilled with δ . Indeed, let us consider any x ∈ B (x¯ ), b ∈ P x, y = (1 + λ)b − λx, a ∈ P y and B A x = a − λ(b − x ) ∈ T x. From the choice of δ , it is clear that a, b ∈ B (x¯ ). Since λ δ prox prox x − b ∈ N (b) and y − a ∈ N (a), Assumption 3 yields that B|Λ A|Λ x − b, y − a ≥−θ x − b ·  y − a . (22) 123 856 N. H. Thao By the definition of T ,wehave + 2 x − x = x − b + y − a 2 2 = x − b +  y − a + 2 x − b, y − a 2 2 ≥ x − b +  y − a − 2θ x − b ·  y − a 2 2 2 2 ≥ 1 − θ x − b = 1 − θ dist (x , B), (23) where the first inequality follows from (22). We will take care of the two possible cases regarding dist (x , A) as follows. Case 1 dist (x , A) ≤ λ + 1 − θ dist (x , B). Thanks to (23) we get 1 − θ + 2 x − x ≥ dist (x , A). (24) √ 2 λ + 1 − θ Case 2 dist (x , A)> λ + 1 − θ dist (x , B). By the triangle inequality and the construction of T , we get + + x − x ≥ x − a − a − x = x − a − λ x − b ≥ dist (x , A) − λ dist (x , B) ≥ 1 − √ dist (x , A). (25) λ + 1 − θ Since 1 − θ λ = 1 − √ , √ 2 λ + 1 − θ λ + 1 − θ we always have from (24) and (25) that 1 − θ + 2 x − x ≥ dist (x , A). (26) √ 2 λ + 1 − θ Combining (23), (26) and (19), we obtain 1 − θ + 2 2 x − x ≥ max dist (x , A), dist (x , B) √ 2 max 1, λ + 1 − θ (1 − θ )(1 − θ) ≥ dist (x , A ∩ B), √ 2 2max 1, λ + 1 − θ which yields (20)asclaimed. 123 A convergent relaxation of the Douglas–Rachford algorithm 857 In the special case that λ = 1, Lemma 3 refines [13, Lemma 3.14] and [45, Lemma 4.2] where the result was proved for the DR operator with an additional assumption on regularity of the sets. The next result is the final preparation for our linear convergence result. Lemma 4 [45, Proposition 2.11] Let T : E ⇒ E,S ⊂ E be closed and x¯ ∈ S. Suppose that there are δ> 0 and c ∈[0, 1) such that for all x ∈ B (x¯ ),x ∈ T x and z ∈ P x, x − z ≤ c x − z . (27) Then every iteration x ∈ Tx starting sufficiently close to x¯ converges R-linearly k+1 k to a point x˜ ∈ S ∩ B (x¯). In particular, x −¯ x  (1 + c) x −˜ x  ≤ c . 1 − c We are now ready to prove local linear convergence for algorithm T which gener- alizes the corresponding results established in [13,45] for the DR method. Theorem 4 (Linear convergence of algorithm T for feasibility) In addition to (1+λ)κ Assumption 3, suppose that A and B are (ε, δ)-regular at x¯ with ε< ˜ , where ε ˜ and κ are given by (18) and (21), respectively. Then every iteration x ∈ T x k+1 λ k starting sufficiently close to x¯ converges R-linearly to a point in A ∩ B. Proof Assumption 3 ensures the existence of δ > 0 such that Lemma 3 holds true. In view of Proposition 4 and Remark 4, one can find a number δ > 0 such that T is 2 λ pointwise almost averaging on B (x¯ ) at every point z ∈ A ∩ B ∩ B (x¯ ) with violation δ δ 2 2 ε ˜ given by (18) and averaging constant . Define δ = min{δ ,δ } > 0. 1 2 3+λ Now let us consider any x ∈ B  (x¯ ), x ∈ T x and z ∈ P x. It is clear that λ A∩ B δ /2 z ∈ B (x¯ ). Proposition 4 and Lemma 3 then respectively yield 1 + λ 2 2 + 2 + x − z ≤ (1 +˜ ε) x − z − x − x , (28) + 2 2 2 2 x − x ≥ κ dist (x , A ∩ B) = κ x − z , (29) where κ is given by (21). Substituting (29)into(28), we get (1 + λ)κ + 2 x − z ≤ 1 +˜ ε − x − z , which yields condition (27)ofLemma 4 and the desired conclusion now follows from this lemma. 5 Application to sparse optimization Our goal in this section is twofold: 1) to illustrate the linear convergence of algorithm T formulated in Theorem 4 via the sparse optimization problem, and 2) to demonstrate 123 858 N. H. Thao a promising performance of the algorithm T in comparison with the RAAR algorithm for this applied problem. 5.1 Sparse optimization We consider the sparse optimization problem min x  subject to Mx = b, (30) x ∈R m×n m where M ∈ R (m < n) is a full rank matrix, b is a given vector in R , and x  is the number of nonzero entries of the vector x. The sparse optimization problem with complex variable is defined analogously by replacing R by C everywhere in the above model. Many strategies for solving (30) have been proposed. We refer the reader to the famous paper by Candès and Tao [9] for solving this problem by using convex relax- ations. On the other hand, assuming to have a good guess on the sparsity of the solutions to (30), one can tackle this problem by solving the sparse feasibility problem [14]of finding x¯ ∈ A ∩ B, (31) n n where A := {x ∈ R | x  ≤ s} and B := {x ∈ R | Mx = b}. It is worth mentioning that the initial guess s of the true sparsity is not numerically sensitive with respect to various projection methods, that is, for a relatively wide range of values of s above the true sparsity, projection algorithms perform very much in the same nature. Note also that the approach via sparse feasibility does not require convex relaxations of (30) and thus can avoid the likely expensive increase of dimensionality. We run the two algorithms T and RAAR to solve (31) and compare their numerical performances. By taking s smaller than the true sparsity, we can also compare their performances for inconsistent feasibility. Since B is affine, there is the closed algebraic form for the projector P , † n P x = x − M (Mx − b) ∀x ∈ R , † T T −1 where M := M (MM ) is the Moore–Penrose inverse of M. We have denoted M the transpose matrix of M and taken into account that M is full rank. There is also a closed form for P [6]. For each x ∈ R , let us denote I (x ) the set of all s-tubles A s of indices of s largest in absolute value entries of x.The set I (x ) can contain multiple such s-tubles. The projector P can be described as x (k) if k ∈ I, P x = z ∈ R |∃ I ∈ I (x ) such that z(k) = . A s 0else For convenience, we recall the two algorithms in this specific setting 123 A convergent relaxation of the Douglas–Rachford algorithm 859 RA A R = β P (2 P − Id) + (1 − 2β) P + β Id, β A B B T = P ((1 + λ) P − λ Id) − λ( P − Id). λ A B B 5.2 Convergence analysis We analyze the convergence of algorithm T for the sparse feasibility problem (31). The next theorem establishes local linear convergence of algorithm T for solving sparse feasibility problems. Theorem 5 (Linear convergence of algorithm T for sparse feasibility) Let x¯ = (x¯ ) ∈ λ i A ∩ B and suppose that s is the sparsity of the solutions to the problem (30). Then any iteration x ∈ T x starting sufficiently close to x¯ converges R-linearly to x. ¯ k+1 λ k Proof We first show that x¯ is an isolated point of A ∩ B. Since s is the sparsity of the solutions to (30), we have that x¯  = s and the set I (x¯ ) contains a unique element, denoted I . Note that E := span{e : i ∈ I } is the unique s-dimensional space x¯ x¯ i x¯ component of A containing x¯, where {e : 1 ≤ i ≤ n} is the canonical basic of R . s i Let us denote δ := min |¯ x | > 0. i ∈I x¯ We claim that A ∩ B (x¯ ) = E ∩ B (x¯ ), (32) s δ x¯ δ E ∩ B ={x¯}. (33) x¯ Indeed, for any x = (x ) ∈ A ∩ B (x¯ ), we have by definition of δ that x = 0for i s δ i all i ∈ I . Hence x  = s and x ∈ E ∩ B (x¯ ). This proves (32). x¯ 0 x¯ δ For (33), it suffices to show the singleton of E ∩ B since we already know that x¯ x¯ ∈ E ∩ B. Suppose otherwise that there exists x = (x ) ∈ E ∩ B with x =¯ x x¯ i x¯ j j for some index j. Since both E and B are affine, the intersection E ∩ B contains x¯ x¯ the line {x + t (x¯ − x ) : t ∈ R} passing x and x¯. In particular, it contains the point z := x + (x¯ − x ). Then we have that z ∈ B and z ≤ s − 1as z = 0. This 0 j x −¯ x j j contradicts to the assumption that s is the sparsity of the solutions to (30), and hence (33) is proved. A combination of (32) and (33) then yields A ∩ B ∩ B (x¯ ) = E ∩ B ∩ B (x¯ ) ={x¯ }. (34) s δ x¯ δ This means that x¯ is an isolated point of A ∩ B as claimed. Moreover, the equalities in (34) imply that P x = P x ∀x ∈ B (x¯ ). A E δ/2 s x¯ Therefore, for any starting point x ∈ B (x¯), the iteration x ∈ T x for solving 0 δ/2 k+1 λ k (31) is identical to that for solving the feasibility problem for the two sets E and B. x¯ 123 860 N. H. Thao Since E and B are two affine subspaces intersecting at the unique point x¯ by (33), x¯ the collection of sets { E , B} is transversal at x¯ relative to the affine hull aff( E ∪ B). x¯ x¯ Theorem 4 now can be applied to conclude that the iteration x ∈ T x converges k+1 λ k R-linearly to x¯. The proof is complete. It is worth mentioning that the convergence analysis in Theorem 5 is also valid for the RAAR algorithm. 5.3 Numerical experiment We now set up a toy example as in [9,14] which involves an unknown true object x¯ ∈ R with x¯  = 328 (the sparsity rate is .005). Let b be 1/8 of the measurements of F (x¯),the Fourier transform of x¯, with the sample indices denoted J .The Poisson noise was added when calculating the measurement b. Note that since x¯ is real, F (x¯) is conjugate symmetric, we indeed have nearly a double number of measurements. In this setting, we have B ={x ∈ C | F (x )(k) = b(k), ∀k ∈ J }, and the two prox operators, respectively, take the forms Re (x (k)) if k ∈ I, P x = z ∈ R |∃ I ∈ I (x ) such that z(k) = , A s 0else b(k) if k ∈ J , −1 P x = F (xˆ), where xˆ (k) = F (x )(k) else, −1 where Re(x (k)) denotes the real part of the complex number x (k), and F is the inverse Fourier transform. The initial point was chosen randomly, and a warm-up procedure with 10 DR iterates was performed before running the two algorithms. The stopping criterion + −10 x − x < 10 was used. We have used the Matlab ProxToolbox [37] to run this numerical experiment. The parameters were chosen in such a way that the performance is seemingly optimal for both algorithms. We chose β = .65 for the RAAR algorithm and λ = .45 for algorithm T in the case of consistent feasibility problem correspond- ing to s = 340, and β = .6 for the RAAR algorithm and λ = .4 for algorithm T in the case of inconsistent feasibility problem corresponding to s = 310. The change of distances between two consecutive iterates is of interest. When linear convergence appears to be the case, it can yield useful information of the convergence rate. Under the assumption that the iterates will remain in the convergence area, one can obtain error bounds for the distance from the current iterate to a nearest solution. We also pay attention to the gaps in iterates that in a sense measure the infeasibility at the iterates. If we think feasibility problem as the problem of minimizing the sum of the squares of the distance functions to the sets, then gaps in iterates are the values of that function evaluated at the iterates. For the two algorithms under consideration, 123 A convergent relaxation of the Douglas–Rachford algorithm 861 consistent feasibility consistent feasibility 20 20 10 10 RAAR RAAR 0 0 10 10 -20 -20 10 10 0 50 100 150 0 50 100 150 iteration iteration inconsistent feasibility inconsistent feasibility 20 10 RAAR RAAR 0 0 10 10 -20 -10 10 10 0 50 100 150 0 50 100 150 iteration iteration Fig. 1 Performances of the RAAR and T algorithms for sparse feasibility problem: iterate changes in consistent case (top-left), iterate gaps in consistent case (top-right), iterate changes in inconsistent case (bottom-left) and iterate gaps in inconsistent case (bottom-right) the iterates are themselves not informative but their shadows, by which we mean the projections of the iterates on one of the sets. Hence, the gaps in iterates are calculated for the iterate shadows instead of the iterates themselves. Figure 1 summarizes the performances of the two algorithms for both consistent and inconsistent sparse feasibility problems. We first emphasize that the algorithms appear to be convergent in both cases of feasibility. For the consistent case, algorithm T appears to perform better than the RAAR algorithm in terms of both the iterate changes and gaps. Also, the CPU time of algorithm T is around 10% less than that of the RAAR algorithm. For the inconsistent case, we have a similar observation except that the iterate gaps for the RAAR algorithm are slightly better (smaller) than those for algorithm T . Extensive numerical experiments in imaging problems illustrating the empirical performance of algorithm T will be the future work. Acknowledgements The author would like to thank Prof. Dr. Russell Luke and Prof. Dr. Alexander Kruger for their encouragement and valuable suggestions during the preparation of this work. He also would like to thank the anonymous referees for their very helpful and constructive comments on the manuscript version of the paper. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. change in iterates change in iterates log of gap in iterates log of gap in iterates 862 N. H. Thao References 1. Aspelmeier, T., Charitha, C., Luke, D.R.: Local linear convergence of the ADMM/Douglas–Rachford algorithms without strong convexity and application to statistical imaging. SIAM J. Imaging Sci. 9(2), 842–868 (2016) 2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010) 3. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996) 4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011) 5. Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.: Restricted normal cones and the method of alternating projections: applications. Set-Valued Var. Anal. 21, 475–501 (2013) 6. Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014) 7. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014) 8. Borwein, J.M., Tam, M.K.: The cyclic Douglas–Rachford method for inconsistent feasibility problems. J. Nonlinear Convex Anal. 16(4), 537–584 (2015) 9. Candés, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005) 10. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point Algo- rithms for Inverse Problems in Science and Engineering, vol. 49. Springer, Berlin, pp. 185–212 (2011) 11. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mapppings. Srpinger, New York (2014) 12. Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015) 13. Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013) 14. Hesse, R., Luke, D.R., Neumann, P.: Alternating projections and Douglas–Rachford for sparse affine feasibility. IEEE Trans. Signal. Process. 62(18), 4868–4881 (2014) 15. Ioffe, A.D.: Metric regularity and subdifferential calculus. Russian Math. Surv. 55(3), 501–558 (2000) 16. Ioffe, A.D.: Regularity on a fixed set. SIAM J. Optim. 21(4), 1345–1370 (2011) 17. Ioffe, A.D.: Nonlinear regularity models. Math. Program. 139(1–2), 223–242 (2013) 18. Ioffe, A.D.: Metric regularity: a survey. Part I. Theory. J. Aust. Math. Soc. 101(2), 188–243 (2016) 19. Khanh, Phan Q., Kruger, A.Y., Thao, Nguyen H.: An induction theorem and nonlinear regularity models. SIAM J. Optim. 25(4), 2561–2588 (2015) 20. Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002) 21. Klatte, D., Kummer, B.: Optimization methods and stability of inclusions in Banach spaces. Math. Program. 117(1–2), 305–330 (2009) 22. Kruger, A.Y.: Stationarity and regularity of set systems. Pac. J. Optim. 1(1), 101–126 (2005) 23. Kruger, A.Y.: About regularity of collections of sets. Set-Valued Anal. 14, 187–206 (2006) 24. Kruger, A.Y.: About stationarity and regularity in variational analysis. Taiwan. J. Math. 13(6A), 1737– 1785 (2009) 25. Kruger, A.Y.: Error bounds and metric subregularity. Optimization 64(1), 49–79 (2015) 26. Kruger, A.Y., Luke, D.R., Thao, Nguyen H.: Set regularities and feasibility problems. Math. Program. B. https://doi.org/10.1007/s10107-016-1039-x 27. Kruger, A.Y., Luke, D.R., Thao, Nguyen H.: About subtransversality of collections of sets. Set-Valued Var. Anal. 25(4), 701–729 (2017) 28. Kruger, A.Y., Thao, Nguyen H.: About uniform regularity of collections of sets. Serdica Math. J. 39, 287–312 (2013) 29. Kruger, A.Y., Thao, Nguyen H.: About [q]-regularity properties of collections of sets. J. Math. Anal. Appl. 416(2), 471–496 (2014) 30. Kruger, A.Y., Thao, Nguyen H.: Quantitative characterizations of regularity properties of collections of sets. J. Optim. Theory Appl. 164, 41–67 (2015) 123 A convergent relaxation of the Douglas–Rachford algorithm 863 31. Kruger, A.Y., Thao, Nguyen H.: Regularity of collections of sets and convergence of inexact alternating projections. J. Convex Anal. 23(3), 823–847 (2016) 32. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence of alternating and averaged projections. Found. Comput. Math. 9(4), 485–513 (2009) 33. Lewis, A.S., Malick, J.: Alternating projections on manifolds. Math. Oper. Res. 33, 216–234 (2008) 34. Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex feasibility problems. Math. Program. 159(1), 371–401 (2016) 35. Luke, D.R.: Relaxed averaged alternating reflections for diffraction imaging. Inverse Problems 21, 37–50 (2005) 36. Luke, D.R.: Finding best approximation pairs relative to a convex and a prox-regular set in Hilbert space. SIAM J. Optim. 19(2), 714–739 (2008) 37. Luke, D.R.: ProxToolbox. http://num.math.uni-goettingen.de/proxtoolbox (2017). Accessed Aug 2017 38. Luke, D.R., Thao, Nguyen H., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. https://doi.org/10.1287/moor.2017.0898 39. Luke, D.R., Thao, Nguyen H., Teboulle, M.: Necessary conditions for linear convergence of Picard iterations and application to alternating projections. https://arxiv.org/pdf/1704.08926.pdf (2017) 40. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation. I: Basic Theory. Springer, Berlin (2006) 41. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendus de l’Académie des Sciences de Paris 255, 2897–2899 (1962) 42. Noll, D., Rondepierre, A.: On local convergence of the method of alternating projections. Found. Comput. Math. 16(2), 425–455 (2016) 43. Patrinos, P., Stella, L., Bemporad, A.: Douglas-Rachford splitting: Complexity estimates and acceler- ated variants. In: 53rd IEEE Conference on Decision and Control, pp. 4234–4239 (2014) 44. Penot, J.-P.: Calculus Without Derivatives. Springer, New York (2013) 45. Phan, H.M.: Linear convergence of the Douglas–Rachford method for two closed sets. Optimization 65, 369–385 (2016) 46. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Grundlehren Math. Wiss. Springer, Berlin (1998) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Computational Optimization and Applications Springer Journals

A convergent relaxation of the Douglas–Rachford algorithm

Free
23 pages
Loading next page...
 
/lp/springer_journal/a-convergent-relaxation-of-the-douglas-rachford-algorithm-CEIrWD2QGd
Publisher
Springer US
Copyright
Copyright © 2018 by The Author(s)
Subject
Mathematics; Optimization; Operations Research, Management Science; Operations Research/Decision Theory; Statistics, general; Convex and Discrete Geometry
ISSN
0926-6003
eISSN
1573-2894
D.O.I.
10.1007/s10589-018-9989-y
Publisher site
See Article on Publisher Site

Abstract

Comput Optim Appl (2018) 70:841–863 https://doi.org/10.1007/s10589-018-9989-y A convergent relaxation of the Douglas–Rachford algorithm 1,2 Nguyen Hieu Thao Received: 16 September 2017 / Published online: 6 March 2018 © The Author(s) 2018. This article is an open access publication Abstract This paper proposes an algorithm for solving structured optimization problems, which covers both the backward–backward and the Douglas–Rachford algo- rithms as special cases, and analyzes its convergence. The set of fixed points of the corresponding operator is characterized in several cases. Convergence criteria of the algorithm in terms of general fixed point iterations are established. When applied to nonconvex feasibility including potentially inconsistent problems, we prove local lin- ear convergence results under mild assumptions on regularity of individual sets and of the collection of sets. In this special case, we refine known linear convergence criteria for the Douglas–Rachford (DR) algorithm. As a consequence, for feasibility problem with one of the sets being affine, we establish criteria for linear and sublinear con- vergence of convex combinations of the alternating projection and the DR methods. These results seem to be new. We also demonstrate the seemingly improved numerical performance of this algorithm compared to the RAAR algorithm for both consistent and inconsistent sparse feasibility problems. This paper is dedicated to Professor Alexander Kruger on his 65th birthday. The research leading to these results has received funding from the German-Israeli Foundation Grant G-1253-304.6 and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement No. 339681. Nguyen Hieu Thao h.t.nguyen-3@tudelft.nl; hieuthao.ctu@gmail.com Delft Center for Systems and Control, Delft University of Technology, 2628CD Delft, The Netherlands Department of Mathematics, School of Education, Can Tho University, Can Tho, Vietnam 123 842 N. H. Thao Keywords Almost averagedness · Picard iteration · Alternating projection method · Douglas–Rachford method · RAAR algorithm · Krasnoselski–Mann relaxation · Metric subregularity · Transversality · Collection of sets Mathematics Subject Classification Primary 49J53 · 65K10; Secondary 49K40 · 49M05 · 49M27 · 65K05 · 90C26 1 Introduction Convergence analysis has been one of the central and very active applications of variational analysis and mathematical optimization. Examples of recent contributions to the theory of the field that have initiated efficient programs of analysis are [1,2,7,38]. It is the common recipe emphasized in these and many other works that there are two key ingredients required in order to derive convergence of a numerical method (1) regularity of individual functions or sets such as convexity and averaging property, and (2) regularity of collections of functions or sets at their critical points such as transversality, Kurdyka-Łojasiewicz property and metric subregularity. As a result, the question about convergence of a solving method can often be reduced to checking whether certain regularity properties of the problem data are satisfied. There have been a considerable number of papers studying these two ingredients of convergence analysis in order to establish sharper convergence criteria in various circumstances, especially those applicable to algorithms for solving nonconvex problems [5,12,13, 19,26,27,31–33,38,42,45]. This paper suggests an algorithm called T , which covers both the backward- backward and the DR algorithms as special cases of choosing the parameter λ ∈[0, 1], and analyzes its convergence. When applied to feasibility problem for two sets one of which is affine, T is a convex combination of the alternating projection and the DR methods. On the other hand, T can be viewed as a relaxation of the DR algorithm. Motivation for relaxing the DR algorithm comes from the lack of stability of this algo- rithm when applied to inconsistent problems. This phenomenon has been observed for the Fourier phase retrieval problem which is essentially inconsistent due to the recip- rocal relationship between the spatial and frequency variables of the Fourier transform [35,36]. To address this issue, a relaxation of the DR algorithm, often known as the RAAR algorithm, was proposed and applied to phase retrieval problems by Luke in the aforementioned papers. In the framework of feasibility, the RAAR algorithm is described as a convex combination of the basic DR operator and one of the projectors. Our preliminary numerical experiments have revealed a promising performance of algorithm T in comparison with the RAAR method. This observation has motivated the study of convergence analysis of algorithm T in this paper. After introducing the notation and proving preliminary results in Sect. 2,weintro- duce T as a general fixed point operator, characterize the set of fixed points of T λ λ (Proposition 1), and establish abstract convergence criteria for iterations generated by T (Theorem 2) in Sect. 3. We discuss algorithm T in the framework of feasi- λ λ bility problems in Sect. 4. The set of fixed points of T is characterized for convex inconsistent feasibility (Proposition 3). For consistent feasibility we show that almost 123 A convergent relaxation of the Douglas–Rachford algorithm 843 averagedness of T (Proposition 4) and metric subregularity of T − Id (Lemma 3) λ λ can be obtained from regular properties of the individual sets and of the collection of sets, respectively. As a result, the two regularity notions are combined to yield local linear convergence of iterations generated by T (Theorem 4). Section 5 is devoted to demonstrate the improved numerical performance of algorithm T compared to the RAAR algorithm for both consistent and inconsistent feasibility problems. In this section, we study the feasibility approach for solving the sparse optimization problem. Our linear convergence result established in Sect. 4 for iterations generated by T is also illustrated in this application (Theorem 5). 2 Notation and preliminary results Our notation is standard, c.f. [11,40,46]. The setting throughout this paper is a finite dimensional Euclidean space E. The norm · denotes the Euclidean norm. The open unit ball in a Euclidean space is denoted B, and B (x ) stands for the open ball with radius δ> 0 and center x. The distance to a set A ⊂ E with respect to the bivariate function dist (·, ·) is defined by dist (·, A) : E → R : x → inf dist (x , y). y∈ A We use the convention that the distance to the empty set is +∞. The set-valued mapping { | } P : E ⇒ E : x → y ∈ A dist (x , y) = dist (x , A) is the projector on A. An element y ∈ P (x ) is called a projection. This exists for anyclosedset A ⊂ E. Note that the projector is not, in general, single-valued. Closely related to the projector is the prox mapping corresponding to a function f and a stepsize τ> 0[41] 1 2 prox (x ) := argmin f ( y) +  y − x  . τ, f y∈E 2τ When f = ι is the indicator function of A, that is ι (x ) = 0if x ∈ A and ι (x ) = A A A −1 +∞ otherwise, then prox = P for all τ> 0. The inverse of the projector, P , τ,ι A A is defined by −1 P (a) := {x ∈ E | a ∈ P (x ) } . The proximal normal cone to A at x¯ is the set, which need not be either closed or convex, prox −1 N (x¯ ) := cone P (x¯) −¯ x . (1) A A prox If x¯ ∈ / A, then N (x¯) is defined to be empty. Normal cones are central to charac- terizations both of the regularity of individual sets and of the regularity of collections of sets. For a refined numerical analysis of projection methods, one also defines the Λ-proximal normal cone to A at x¯ by prox −1 N (x¯) := cone P (x¯ ) ∩ Λ −¯ x . A|Λ A When Λ = E, it coincides with the proximal normal cone (1). 123 844 N. H. Thao For ε ≥ 0 and δ> 0, a set A is (ε, δ)-regular relative to Λ at x¯ ∈ A [13, Definition prox 2.9] if for all x ∈ B (x¯ ), a ∈ A ∩ B (x¯ ) and v ∈ N (a), δ δ A|Λ x − a,v ≤ ε x − av . When Λ = E, the quantifier “relative to” is dropped. For a set-valued operator T : E ⇒ E, its fixed point set is defined by Fix T := {x ∈ E | x ∈ Tx }. For a number λ ∈[0, 1], we denote the λ-reflector of T by R := T ,λ (1 + λ)T − λ Id. A frequently used example in this paper corresponds to T being a projector. In the context of convergence analysis of Picard iterations, the following general- ization of the Fejér monotonicity of sequences appears frequently, see, for example, the book [4] or the paper [39] for the terminology. Definition 1 (Linear monotonicity) The sequence (x ) is linearly monotone with respect to a set S ⊂ E with rate c ∈[0, 1] if dist (x , S) ≤ c dist (x , S) ∀k ∈ N. k+1 k Our analysis follows the abstract analysis program proposed in [38] which requires the two key components of the convergence: almost averagedness and metric subreg- ularity. Definition 2 (Almost nonexpansive/averaging mappings)[38]Let T : E ⇒ E and U ⊂ E. (i) T is pointwise almost nonexpansive at y on U with violation ε ≥ 0 if for all x ∈ U, + + x ∈ Tx and y ∈ Ty, + + x − y ≤ 1 + ε x − y . (ii) T is pointwise almost averaging at y on U with violation ε ≥ 0 and averaging + + constant α> 0 if for all x ∈ U, x ∈ Tx and y ∈ Ty, 1 − α 2 2 + + 2 + + x − y ≤ (1 + ε) x − y − (x − x ) − ( y − y) . (2) When a property holds at all y ∈ U on U, we simply say that the property holds on U. From Definition 2, almost nonexpansiveness is actually the almost averaging prop- erty with the same violation and averaging constant α = 1. Remark 1 (the range of quantitative constants) In the context of Definition 2,itis natural to consider violation ε ≥ 0 and averaging constant α ∈ (0, 1]. Mathematically, it also makes sense to consider ε< 0 and α> 1 provided that the required estimate (2) holds true. Simple examples for the later case are linear contraction mappings. In this paper, averaging constant α> 1 will frequently be involved implicitly in intermediate 123 A convergent relaxation of the Douglas–Rachford algorithm 845 steps of our analysis without any contradiction or confusion. This is the reason why in Definition 2 (ii) we considered α> 0 instead of α ∈ (0, 1] as in [38, Definition 2.2]. It is worth noting that if the iteration x ∈ Tx is linearly monotone with respect k+1 k to Fix T with rate c ∈ (0, 1) and T is almost averaging on some neighborhood of Fix T with averaging constant α ∈ (0, 1], then (x ) converges R-linearly to a fixed point of T [39, Proposition 3.5]. We next prove a fundamental preliminary result for our analysis regarding almost averaging mappings. Lemma 1 Let T : E ⇒ E,U ⊂ E, λ ∈[0, 1], ε ≥ 0 and α> 0. The following two statements are equivalent. (i) T is almost averaging on U with violation ε and averaging constant α. (ii) The λ-reflector of T , R = (1 + λ)T − λ Id, is almost averaging on U with T ,λ violation (1 + λ)ε and averaging constant (1 + λ)α. + + + Proof Take any x , y ∈ U, x ∈ Tx, y ∈ Ty, x˜ = (1 + λ)x − λx ∈ R x and T ,λ y ˜ = (1 + λ) y − λy ∈ R y. We have by definition of R and [4, Corollary 2.14] T ,λ T ,λ that 2 + + x˜ −˜ y = (1 + λ)(x − y ) − λ(x − y) 2 2 + + 2 + + = (1 + λ) x − y − λ x − y + λ(1 + λ) (x − x ) − ( y − y) . (3) We also note that + + (x˜ − x ) − ( y ˜ − y) = (1 + λ) (x − x ) − ( y − y) . (4) (i) ⇒ (ii). Suppose that T is almost averaging on U with violation ε and averaging constant α. Substituting (2)into(3) and using (4), we obtain that x˜ −˜ y 1 − α 2 + + ≤ (1 + (1 + λ)ε) x − y − (1 + λ) − λ (x − x ) − ( y − y) 1−α − λ 2 α 2 = (1 + (1 + λ)ε) x − y − (x˜ − x ) − ( y ˜ − y) 1 + λ 1 − (1 + λ)α 2 2 = (1 + (1 + λ)ε) x − y − (x˜ − x ) − ( y ˜ − y) , (5) (1 + λ)α which means that R is almost averaging on U with violation (1 + λ)ε and averaging T ,λ constant (1 + λ)α. (ii) ⇒ (i). Suppose that R is almost averaging on U with violation (1 + λ)ε and T ,λ averaging constant (1 + λ)α, that is, the inequality (5) is satisfied. Substituting (3)into 123 846 N. H. Thao (5) and using (4), we obtain 2 2 + + 2 + + (1 + λ) x − y − λ x − y + λ(1 + λ) (x − x ) − ( y − y) 1 − α 2 + + ≤ (1 + (1 + λ)ε) x − y − (1 + λ) − λ (x − x ) − ( y − y) . Equivalently, 1 − α 2 2 + + 2 + + x − y ≤ (1 + ε) x − y − (x − x ) − ( y − y) . Hence T is almost averaging on U with violation ε and averaging constant α and the proof is complete. Lemma 1 generalizes [13, Lemma 2.4] where the result was proved for α = 1/2 and λ = 1. The next lemma recalls facts regarding the almost averagedness of projectors and reflectors associated with regular sets. Lemma 2 Let A ⊂ E be closed and (ε, δ)-regular at x¯ ∈ A and define U := {x ∈ E | P x ⊂ B (x¯ )}. A δ (i) The projector P is pointwise almost nonexpansive on U at every point z ∈ A ∩ B (x¯ ) with violation 2ε + ε . (ii) The projector P is pointwise almost averaging on U at every point z ∈ A ∩ B (x¯ ) A δ with violation 2ε + 2ε and averaging constant 1/2. (iii) The λ-reflector R is pointwise almost averaging on U at every point z ∈ P ,λ 2 1+λ A ∩ B (x¯ ) with violation (1 + λ)(2ε + 2ε ) and averaging constant . Proof Statements (i) and (ii) can be found in [13, Theorem 2.14] or [38, Theorem 3.1 (i) & (iii)]. Statement (iii) follows from (ii) and Lemma 1 applied to T = P and α = 1/2. The following concept of metric subregularity with functional modulus has played a central role, explicitly or implicitly, in the convergence analysis of Picard iterations [1,13,38,39]. Recall that a function μ :[0, ∞) →[0, ∞) is a gauge function if μ is continuous and strictly increasing and μ(0) = 0. Definition 3 (Metric subregularity with functional modulus) A mapping F : E ⇒ E is metrically subregular with gauge μ on U ⊂ E for y relative to Λ ⊂ E if −1 μ dist x , F ( y) ∩ Λ ≤ dist ( y, F (x )) ∀x ∈ U ∩ Λ. When μ is a linear function, that is μ(t ) = κ t, ∀t ∈[0, ∞), one says “with constant κ” instead of “with gauge μ = κ Id”. When Λ = E, the quantifier “relative to” is dropped. 123 A convergent relaxation of the Douglas–Rachford algorithm 847 Metric subregularity has many important applications in variational analysis and mathematical optimization, see the monographs and papers [11,15–18,20,21,25,40, 44]. For the discussion of metric subregularity in connection with subtransversality of collections of sets, we refer the reader to [23,24,29,30]. The next theorem serves as the basic template for the quantitative convergence analysis of fixed point iterations. By the notation T : Λ ⇒ Λ where Λ is a subset of E, we mean that T : E ⇒ E and Tx ⊂ Λ for all x ∈ Λ. This simplification of notation should not lead to any confusion if one keeps in mind that there may exist fixed points of T that are not in Λ. For the importance of the use of Λ in isolating the desirable fixed point, we refer the reader to [1, Example 1.8]. In the following, ri Λ denotes the relative interior of Λ. Theorem 1 [38, Theorem 2.1] Let T : Λ ⇒ Λ for Λ ⊂ E and let S ⊂ ri Λ be closed and nonempty such that T y ⊂ Fix T ∩ S for all y ∈ S. Let O be a neighborhood of S such that O ∩ Λ ⊂ ri Λ. Suppose that (a) T is pointwise almost averaging at all points y ∈ S with violation ε and averaging constant α ∈ (0, 1) on O ∩ Λ, and (b) there exists a neighborhood V of Fix T ∩ S and a constant κ> 0 such that for all + + y ∈ S, y ∈ T y and all x ∈ T x the estimate + + κ dist (x , S) ≤ x − x − y − y (6) holds whenever x ∈ (O ∩ Λ) \ (V ∩ Λ). Then for all x ∈ Tx (1 − α)κ dist x , Fix T ∩ S ≤ 1 + ε − dist (x , S) whenever x ∈ (O ∩ Λ) \ (V ∩ Λ). εα In particular, if κ> , then for any initial point x ∈ O ∩ Λ the iteration 1−α x ∈ Tx satisfies k+1 k dist (x , Fix T ∩ S) ≤ c dist (x , S) k+1 0 (1−α)κ with c := 1 + ε − < 1 for all k such that x ∈ (O ∩ Λ) \ (V ∩ Λ) for j = 1, 2,..., k. Remark 2 [38, p. 13] In the case of S = Fix T condition (6) reduces to metric sub- regularity of the mapping F := T − Id for 0 on the annular set (O ∩ Λ) \ (V ∩ Λ), that is −1 κ dist (x , F (0)) ≤ dist (0, F (x )) ∀x ∈ (O ∩ Λ) \ (V ∩ Λ) . εα The inequality κ> then states that the constant of metric subregularity κ is 1−α sufficiently large relative to the violation of the averaging property of T to guarantee linear progression of the iterates through that annular region. 123 848 N. H. Thao For a comprehensive discussion on the roles of S and Λ in the analysis program of Theorem 1, we would like to refer the reader to the paper [38]. For the sake of simplification in terms of presentation, we have chosen to reduce the number of technical constants appearing in the analysis. It would be obviously analogous to formulate more theoretically general results by using more technical constants in appropriate places. 3 T as a fixed point operator We consider the problem of finding a fixed point of the operator T := T ((1 + λ)T − λ Id) − λ (T − Id) , (7) λ 1 2 2 where λ ∈[0, 1] and T : E ⇒ E (i = 1, 2) are assumed to be easily computed. Examples of T include the backward-backward and the DR algorithms [8,10,34, 36,43] for solving the structured optimization problem minimize f (x ) + f (x ) 1 2 x ∈E under different assumptions on the functions f (i = 1, 2). Indeed, when T are the i i prox mappings of f with parameters τ > 0, then T with λ = 0 and 1 takes the form i i λ T = prox ◦ prox , and T = prox 2prox − Id − prox + Id, λ λ τ , f τ , f τ , f τ , f τ , f 1 1 2 2 1 1 2 2 2 2 respectively. We first characterize the set of fixed points of T via those of the constituent oper- ators T (i = 1, 2). Proposition 1 Let T , T : E ⇒ E, λ ∈[0, 1] and consider T defined at (7). The 1 2 λ following statements hold true. (i) (1 + λ)T − λ Id = ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id). λ 1 2 As a consequence, Fix T = Fix ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id) . λ 1 2 (ii) Suppose that T = P is the projector on an affine set A and T is single-valued. 1 A 2 Then Fix T ={x ∈ E | P x = λT x + (1 − λ)x } λ A 2 ⊂{x ∈ E | P x = P T x }. (8) A A 2 Proof (i). We have by the construction of T that (1 + λ)T − λ Id = (1 + λ) (T ((1 + λ)T − λ Id) − λ(T − Id)) − λ Id λ 1 2 2 = (1 + λ)T ((1 + λ)T − λ Id) − λ [(1 + λ)T − λ Id] 1 2 2 = ((1 + λ)T − λ Id) ◦ ((1 + λ)T − λ Id) . 1 2 123 A convergent relaxation of the Douglas–Rachford algorithm 849 (ii). We first take an arbitrary x ∈ Fix T and prove that P x = P T x = λT x + (1 − λ)x . A A 2 2 Indeed, from x = T x, we get x = P ((1 + λ)T x − λx ) − λ(T x − x ) A 2 2 ⇔ λT x + (1 − λ)x = P ((1 + λ)T x − λx ) . (9) 2 A 2 In particular, λT x + (1 − λ)x ∈ A. Thus by equality (9) and the assumption that P 2 A is affine, we have P (λT x + (1 − λ)x ) = P ((1 + λ)T x − λx ) A 2 A 2 ⇔ λ P T x + (1 − λ) P x = (1 + λ) P T x − λ P x A 2 A A 2 A ⇔ P x = P T x . (10) A A 2 Substituting (10)into(9) also yields λT x + (1 − λ)x = (1 + λ) P T x − λ P x 2 A 2 A = (1 + λ) P x − λ P x = P x . A A A Finally, let us take an arbitrary x satisfying P x = λT x + (1 − λ)x and prove A 2 that x ∈ Fix T . Indeed, we note that λT x + (1 − λ)x ∈ A. Since P is affine, one λ 2 A can easily check (10) and then (9), which is equivalent to x ∈ Fix T . The proof is complete. The inclusion (8) in Proposition 1 can be strict as shown in the next example. 2 2 Example 1 Let us consider E = R ,the set A = (x , x ) ∈ R | x = 0 and the 1 2 1 1 2 two operators T = P and T x = x (∀x ∈ R ). Then for any point x = (x , 0) 1 A 2 1 with x = 0, we have P x = P T x = (0, 0) but P x = (0, 0) = (1 − λ/2)x = 1 A A 2 A λT x + (1 − λ)x, that is x ∈ / Fix T . 2 λ The next proposition shows that the almost averagedness of T naturally inherits from that of T and T via Krasnoselski–Mann relaxations. 1 2 Proposition 2 (Almost averagedness of T ) Let λ ∈[0, 1],T be almost averaging on λ i U ⊂ E with violation ε ≥ 0 and averaging constant α > 0 (i = 1, 2) and define i i i the set U := {x ∈ U | R x ⊂ U }. 2 T ,λ 1 Then T is almost averaging on U with violation ε = ε + ε + (1 + λ)ε ε and λ 1 2 1 2 2 max{α ,α } 1 2 averaging constant α = . 1+(1+λ) max{α ,α } 1 2 123 850 N. H. Thao Proof By the implication (i) ⇒ (ii) of Lemma 1, the operators R = (1+λ)T −λ Id T ,λ i are almost averaging on U with violation (1 + λ)ε and averaging constant (1 + λ)α i i i (i = 1, 2). Then thanks to [38, Proposition 2.4 (iii)], the operator T := R R is T ,λ T ,λ 1 2 almost averaging on U with violation (1 + λ) (ε + ε + (1 + λ)ε ε ) and averaging 1 2 1 2 2(1+λ) max{α ,α } 1 2 constant . Note that T = (1 + λ)T − λ Id by Proposition 1.We 1+(1+λ) max{α ,α } 1 2 have by the implication (ii) ⇒ (i) of Lemma 1 that T is almost averaging on U with 2 max{α ,α } 1 2 violation ε = ε + ε + (1 + λ)ε ε and averaging constant α = as 1 2 1 2 1+(1+λ) max{α ,α } 1 2 claimed. We next discuss convergence of T based on the abstract results established in [38]. Our agenda is to verify the assumptions of Theorem 1. To simplify the exposure in terms of presentation, we have chosen to state the results corresponding to S = Fix T and Λ = E in Theorem 1. In the sequel, we will denote, for a nonnegative real ρ, S := Fix T + ρB. ρ λ Theorem 2 (Convergence of algorithm T with metric subregularity) Let T be defined λ λ at (7), δ> 0 and γ ∈ (0, 1). Suppose that for each n ∈ N, the following conditions are satisfied. (i) T is almost averaging on S with violation ε ≥ 0 and averaging constant 2 γ δ 2,n n n α ∈ (0, 1), and T is almost averaging on the set S ∪ R S with 2,n 1 γ δ T ,λ γ δ violation ε ≥ 0 and averaging constant α ∈ (0, 1). 1,n 1,n (ii) The mapping T − Id is metrically subregular on D := S \ S n+1 for 0 with λ n γ δ γ δ gauge μ satisfying μ (dist (x , Fix T )) α ε n λ n n inf ≥ κ > , (11) x ∈ D dist (x , Fix T ) 1 − α n λ n 2 max{α ,α } 1,n 2,n where ε := ε + ε + (1 + λ)ε ε and α := . n 1,n 2,n 1,n 2,n n 1+(1+λ) max{α ,α } 1,n 2,n Then all iterations x ∈ T x starting in S satisfy k+1 λ k δ dist (x , Fix T ) →0(12) k λ and dist (x , Fix T ) ≤ c dist (x , Fix T ) ∀x ∈ D , (13) k+1 λ n k λ k n (1−α )κ where c := 1 + ε − < 1. n n (1−α )κ In particular, if − ε is bounded from below by some τ> 0 for all n sufficiently large, then the convergence (12) is R-linear with rate at most 1 − τ . Proof For each n ∈ N, we verify the assumptions of Theorem 1 for O = S , γ δ V = S and D = O \ V = S n \ S . Under assumption (i) of Theorem n+1 n+1 n γ δ γ δ γ δ 2, Proposition 2 ensures that T is almost averaging on S n with violation ε and λ γ δ n averaging constant α . In other words, condition (a) of Theorem 1 is satisfied with 123 A convergent relaxation of the Douglas–Rachford algorithm 851 ε = ε and α = α . Assumption (ii) of Theorem 2 also fulfills condition (b) of n n Theorem 1 with κ = κ in view of Remark 2. Theorem 1 then yields the conclusion of Theorem 2 after a straightforward care of the involving quantitative constants. The first inequality in (11) essentially says that the gauge function μ can be bounded from below by a linear function on the reference interval. Remark 3 In Theorem 2, the fundamental goal of formulating assumption (i) on the set S and assumption (ii) on the set D is that one can characterize sublinear γ δ n convergence of an iteration on S via linear progression of its iterates through each of the annular set D . This idea is based on the fact that for larger n, the almost averaging property of T on S is always improved but the metric subregularity on D may λ γ δ n get worse, however, if the corresponding quantitative constants still satisfy condition (11), then convergence is guaranteed. For an illustrative example, we refer the reader to [38, Example 2.4]. 4 Application to feasibility We consider algorithm T for solving feasibility problem involving two closed sets A, B ⊂ E, x ∈ T x = P ((1 + λ) P x − λx ) − λ ( P x − x ) λ A B B = P R (x ) − λ ( P x − x ) . (14) A P ,λ B Note that T with λ = 0 and 1 corresponds to the alternating projections P P and λ A B the DR method ( R ◦ R + Id), respectively. A B It is worth recalling that feasibility problem for m ≥ 2 sets can be reformulated as a feasibility problem for two constructed sets on the product space E with one of the later sets is a linear subspace, and the regularity properties in terms of both individual sets and collections of sets of the later sets are inherited from those of the former ones [3,32]. When A is an affine set, then the projector P is affine and T is a convex combi- A λ nation of the alternating projection and the DR methods since T x = P (1 − λ) P x + λ(2 P x − x ) − λ P x − x ( ) ( ) λ A B B B = (1 − λ) P P x + λ (x + P (2 P x − x ) − P x ) A B A B B = (1 − λ)T (x ) + λT (x ). 0 1 In this case, we establish convergence results for all convex combinations of the alternating projection and the DR methods. To our best awareness, this kind of results seems to be new. Recall that when applied to inconsistent feasibility problems the DR operator has no fixed points. We next show that the set of fixed points of T with λ ∈[0, 1) for convex inconsistent feasibility problems is nonempty. This result follows the lines of [36, Lemma 2.1] where the fixed point set of the RAAR operator is characterized. 123 852 N. H. Thao Proposition 3 (Fixed points of T for convex inconsistent feasibility) For closed con- vex sets A, B ⊂ E,let G = B − A, g = P 0,E = A ∩ ( B − g) and F = ( A + g) ∩ B. Then Fix T = E − g ∀λ ∈[0, 1). 1 − λ Proof We first show that E − g ⊂ Fix T .Pickany e ∈ E and denote f = e + g ∈ 1−λ F as definitions of E and F. We are checking that x := e − g ∈ Fix T . 1 − λ Since x = f − g and −g ∈ N ( f ), we get P x = f . B B 1−λ Analogously, since g ∈ N (e) and (1 + λ) P x − λx = (1 + λ) f − λx = e + g, 1 − λ we have P ((1 + λ) P x − λx ) = e. A B Hence, x − T x = x − P ((1 + λ) P x − λx ) + λ ( P x − x ) λ A B B = x − e + λ ( f − x ) = 0. That is x ∈ Fix T . We next show that Fix T ⊂ E − g.Pickany x ∈ Fix T .Let f = P x and λ λ B 1−λ y = x − f . Thanks to x ∈ Fix T and the definition of T , λ λ P ((1 + λ) P x − λx ) = λ( P x − x ) + x A B B =− λy + y + f = f + (1 − λ) y. (15) Now, for any a ∈ A, since A is closed and convex, we have 0 ≥ a − P ((1 + λ) P x − λx ), (1 + λ) P x − λx − P ((1 + λ) P x − λx ) A B B A B a − ( f + (1 − λ) y), (1 + λ) f − λx − ( f + (1 − λ) y) a − f − (1 − λ) y, − y = −a + f, y + (1 − λ)  y . On the other hand, for any b ∈ B, since B is closed and convex, we have b − f, y = b − f, x − f = b − P x , x − P x ≤ 0. B B Combining the last two inequalities yields b − a, y ≤−(1 − λ)  y ≤ 0 ∀a ∈ A, ∀b ∈ B. 123 A convergent relaxation of the Douglas–Rachford algorithm 853 Take a sequence (a ) in A and a sequence (b ) in B such that g := b − a → g. n n n n n Then g , y ≤−(1 − λ)  y ≤ 0 ∀n. (16) Taking the limit and using the Cauchy–Schwarz inequality yields y ≤ g . 1 − λ Conversely, by (15) with noting that f ∈ B and P ((1 + λ) P x − λx ) ∈ A, A B 1 1 y =  f − P ((1 + λ) P x − λx ) ≥ g . A B 1 − λ 1 − λ 1 1 Hence  y = g, and taking the limit in (16), which yields y =− g. Since 1−λ 1−λ f ∈ B and f − g = f + (1 − λ) y = P ((1 + λ) P x − λx ) ∈ A,wehave A B f − g ∈ A ∩ ( B − g) = E and, therefore, 1 λ λ x = f + y = f − g = f − g − g ∈ E − g. 1 − λ 1 − λ 1 − λ We next discuss the two key ingredients for convergence of algorithm T applied to feasibility problems: 1) almost averagedness of T , and 2) metric subregularity of T − Id. The two properties will be deduced from the (ε, δ)-regularity of the individual sets and the transversality of the collection of sets, respectively. The next proposition shows averagedness of T applied to feasibility problems involving (ε, δ)-regular sets. Proposition 4 Let A and B be (ε, δ)-regular at x¯ ∈ A ∩ B and define the set U := {x ∈ E | P x ⊂ B (x¯ ) and P R x ⊂ B (x¯ )}. (17) B δ A P ,λ δ Then T is pointwise almost averaging on U at every point z ∈ S := A ∩ B ∩ B (x¯ ) λ δ with averaging constant and violation 3+λ 2 2 2 ε ˜ := 2(2ε + 2ε ) + (1 + λ)(2ε + 2ε ) . (18) Proof Let us define the two sets U := { y ∈ E | P y ⊂ B (x¯)}, U := {x ∈ E | P x ⊂ B (x¯)} A A δ B B δ and note that x ∈ U if and only if x ∈ U and R x ⊂ U . Thanks to Lemma 2 (iii), B P ,λ A R and R are pointwise almost averaging at every point z ∈ S with violation P ,λ P ,λ A B 1+λ (1 + λ)(2ε + 2ε ) and averaging constant on U and U , respectively. Then A B due to [38, Proposition 2.4 (iii)], the operator T := R R is pointwise almost P ,λ P ,λ A B 123 854 N. H. Thao 2(1+λ) averaging on U at every point z ∈ S with averaging constant and violation 3+λ (1 + λ)ε ˜, where ε ˜ is given by (18). Note that T = (1 + λ)T − λ Id by Proposition 1. Thanks to Lemma 1, T is pointwise almost averaging on U at every point z ∈ S with violation ε ˜ and averaging constant as claimed. 3+λ Remark 4 It follows from Lemma 2 (i) & (iii) that the set U defined by (17) contains at least the ball B  (x¯), where δ := > 0. 2(1 + ε) 1 + (1 + λ)(2ε + 2ε ) We next integrate Proposition 4 into Theorem 2 to obtain convergence of algorithm T for solving consistent feasibility problems involving (ε, δ)-regular sets. Corollary 1 (Convergence of algorithm T for feasibility) Consider the algorithm T λ λ defined at (14) and suppose that Fix T = A ∩ B =∅. Denote S = Fix T + ρB for λ ρ λ a nonnegative real ρ. Suppose that there are δ> 0, ε ≥ 0 and γ ∈ (0, 1) such that A and B are (ε, δ )-regular at avery point z ∈ A ∩ B , where δ := 2δ(1 + ε) 1 + (1 + λ)(2ε + 2ε ), and for each n ∈ N, the mapping T − Id is metrically subregular on D := S n \ λ n γ δ S for 0 with gauge μ satisfying n+1 γ δ μ (dist (x , A ∩ B)) 2ε ˜ inf ≥ κ > , x ∈ D dist (x , A ∩ B) 1 + λ where ε ˜ is given at (18). Then all iterations x ∈ T x starting in S satisfy (12) and (13) with c := k+1 λ k δ n (1+λ)κ 1 +˜ ε − < 1. 2ε ˜ In particular, if (κ ) is bounded from below by some κ> for all n sufficiently 1+λ large, then (x ) eventually converges R-linearly to a point in A ∩ B with rate at most (1+λ)κ 1 +˜ ε − < 1. Proof Let any x ∈ D ,for some n ∈ N, x ∈ T x and x¯ ∈ P x. A combination of n λ A∩ B Proposition 4 and Remark 4 implies that T is pointwise almost averaging on B (x¯) at λ δ every point z ∈ A ∩ B ∩ B (x¯ ) with violation ε ˜ given by (18) and averaging constant . In other words, condition (a) of Theorem 1 is satisfied. Condition (b) of Theorem 3+λ 1 is also fulfilled by the same argument as the one used in Theorem 2. The desired conclusion now follows from Theorem 1. In practice, the metric subregularity assumption is often more challenging to be verified than the averaging property. In the concrete example of consistent alternating projections P P , that metric subregularity condition holds true if and only if the A B collection of sets is subtransversal. We next show that the metric subregularity of 123 A convergent relaxation of the Douglas–Rachford algorithm 855 T − Id can be deduced from the transversality of the collection of sets { A, B}.As a result, if the sets are also sufficiently regular, then local linear convergence of the iteration x ∈ T x is guaranteed. k+1 λ k We first describe the concept of relative transversality of collections of sets. In the sequel, we set Λ := aff( A ∪ B), the smallest affine set in E containing both A and B. Assumption 3 The collection { A, B} is transversal at x¯ ∈ A ∩ B relative to Λ with ¯ ¯ constant θ< 1, that is, for any θ ∈ (θ, 1), there exists δ> 0 such that u,v ≥−θ u · v prox prox holds for all a ∈ A ∩ B (x¯ ), b ∈ B ∩ B (x¯), u ∈ N (a) and v ∈ N (b). δ δ A|Λ B|Λ Thanks to [22, Theorem 1] and [28, Theorem 1], Assumption 3 also ensures 1−θ subtransversality of { A, B} at x¯ relative to Λ with constant at least on the neighborhood B (x¯ ), that is 1 − θ dist (x , A ∩ B) ≤ max{dist (x , A), dist (x , B)}∀x ∈ Λ ∩ B (x¯ ). (19) The next lemma is at the heart of our subsequent discussion. Lemma 3 Suppose that Assumption 3 is satisfied. Then for any θ ∈ (θ, 1), there exists a number δ> 0 such that for all x ∈ B (x¯) and x ∈ T x, δ λ κ dist (x , A ∩ B) ≤ x − x , (20) where κ is defined by (1 − θ) 1 + θ κ :=   > 0. (21) √ √ 2max 1,λ + 1 − θ Proof For any θ ∈ (θ, 1), there is a number δ> 0 satisfying the property described in Assumption 3. Let us set δ = δ/6 and show that condition (20) is fulfilled with δ . Indeed, let us consider any x ∈ B (x¯ ), b ∈ P x, y = (1 + λ)b − λx, a ∈ P y and B A x = a − λ(b − x ) ∈ T x. From the choice of δ , it is clear that a, b ∈ B (x¯ ). Since λ δ prox prox x − b ∈ N (b) and y − a ∈ N (a), Assumption 3 yields that B|Λ A|Λ x − b, y − a ≥−θ x − b ·  y − a . (22) 123 856 N. H. Thao By the definition of T ,wehave + 2 x − x = x − b + y − a 2 2 = x − b +  y − a + 2 x − b, y − a 2 2 ≥ x − b +  y − a − 2θ x − b ·  y − a 2 2 2 2 ≥ 1 − θ x − b = 1 − θ dist (x , B), (23) where the first inequality follows from (22). We will take care of the two possible cases regarding dist (x , A) as follows. Case 1 dist (x , A) ≤ λ + 1 − θ dist (x , B). Thanks to (23) we get 1 − θ + 2 x − x ≥ dist (x , A). (24) √ 2 λ + 1 − θ Case 2 dist (x , A)> λ + 1 − θ dist (x , B). By the triangle inequality and the construction of T , we get + + x − x ≥ x − a − a − x = x − a − λ x − b ≥ dist (x , A) − λ dist (x , B) ≥ 1 − √ dist (x , A). (25) λ + 1 − θ Since 1 − θ λ = 1 − √ , √ 2 λ + 1 − θ λ + 1 − θ we always have from (24) and (25) that 1 − θ + 2 x − x ≥ dist (x , A). (26) √ 2 λ + 1 − θ Combining (23), (26) and (19), we obtain 1 − θ + 2 2 x − x ≥ max dist (x , A), dist (x , B) √ 2 max 1, λ + 1 − θ (1 − θ )(1 − θ) ≥ dist (x , A ∩ B), √ 2 2max 1, λ + 1 − θ which yields (20)asclaimed. 123 A convergent relaxation of the Douglas–Rachford algorithm 857 In the special case that λ = 1, Lemma 3 refines [13, Lemma 3.14] and [45, Lemma 4.2] where the result was proved for the DR operator with an additional assumption on regularity of the sets. The next result is the final preparation for our linear convergence result. Lemma 4 [45, Proposition 2.11] Let T : E ⇒ E,S ⊂ E be closed and x¯ ∈ S. Suppose that there are δ> 0 and c ∈[0, 1) such that for all x ∈ B (x¯ ),x ∈ T x and z ∈ P x, x − z ≤ c x − z . (27) Then every iteration x ∈ Tx starting sufficiently close to x¯ converges R-linearly k+1 k to a point x˜ ∈ S ∩ B (x¯). In particular, x −¯ x  (1 + c) x −˜ x  ≤ c . 1 − c We are now ready to prove local linear convergence for algorithm T which gener- alizes the corresponding results established in [13,45] for the DR method. Theorem 4 (Linear convergence of algorithm T for feasibility) In addition to (1+λ)κ Assumption 3, suppose that A and B are (ε, δ)-regular at x¯ with ε< ˜ , where ε ˜ and κ are given by (18) and (21), respectively. Then every iteration x ∈ T x k+1 λ k starting sufficiently close to x¯ converges R-linearly to a point in A ∩ B. Proof Assumption 3 ensures the existence of δ > 0 such that Lemma 3 holds true. In view of Proposition 4 and Remark 4, one can find a number δ > 0 such that T is 2 λ pointwise almost averaging on B (x¯ ) at every point z ∈ A ∩ B ∩ B (x¯ ) with violation δ δ 2 2 ε ˜ given by (18) and averaging constant . Define δ = min{δ ,δ } > 0. 1 2 3+λ Now let us consider any x ∈ B  (x¯ ), x ∈ T x and z ∈ P x. It is clear that λ A∩ B δ /2 z ∈ B (x¯ ). Proposition 4 and Lemma 3 then respectively yield 1 + λ 2 2 + 2 + x − z ≤ (1 +˜ ε) x − z − x − x , (28) + 2 2 2 2 x − x ≥ κ dist (x , A ∩ B) = κ x − z , (29) where κ is given by (21). Substituting (29)into(28), we get (1 + λ)κ + 2 x − z ≤ 1 +˜ ε − x − z , which yields condition (27)ofLemma 4 and the desired conclusion now follows from this lemma. 5 Application to sparse optimization Our goal in this section is twofold: 1) to illustrate the linear convergence of algorithm T formulated in Theorem 4 via the sparse optimization problem, and 2) to demonstrate 123 858 N. H. Thao a promising performance of the algorithm T in comparison with the RAAR algorithm for this applied problem. 5.1 Sparse optimization We consider the sparse optimization problem min x  subject to Mx = b, (30) x ∈R m×n m where M ∈ R (m < n) is a full rank matrix, b is a given vector in R , and x  is the number of nonzero entries of the vector x. The sparse optimization problem with complex variable is defined analogously by replacing R by C everywhere in the above model. Many strategies for solving (30) have been proposed. We refer the reader to the famous paper by Candès and Tao [9] for solving this problem by using convex relax- ations. On the other hand, assuming to have a good guess on the sparsity of the solutions to (30), one can tackle this problem by solving the sparse feasibility problem [14]of finding x¯ ∈ A ∩ B, (31) n n where A := {x ∈ R | x  ≤ s} and B := {x ∈ R | Mx = b}. It is worth mentioning that the initial guess s of the true sparsity is not numerically sensitive with respect to various projection methods, that is, for a relatively wide range of values of s above the true sparsity, projection algorithms perform very much in the same nature. Note also that the approach via sparse feasibility does not require convex relaxations of (30) and thus can avoid the likely expensive increase of dimensionality. We run the two algorithms T and RAAR to solve (31) and compare their numerical performances. By taking s smaller than the true sparsity, we can also compare their performances for inconsistent feasibility. Since B is affine, there is the closed algebraic form for the projector P , † n P x = x − M (Mx − b) ∀x ∈ R , † T T −1 where M := M (MM ) is the Moore–Penrose inverse of M. We have denoted M the transpose matrix of M and taken into account that M is full rank. There is also a closed form for P [6]. For each x ∈ R , let us denote I (x ) the set of all s-tubles A s of indices of s largest in absolute value entries of x.The set I (x ) can contain multiple such s-tubles. The projector P can be described as x (k) if k ∈ I, P x = z ∈ R |∃ I ∈ I (x ) such that z(k) = . A s 0else For convenience, we recall the two algorithms in this specific setting 123 A convergent relaxation of the Douglas–Rachford algorithm 859 RA A R = β P (2 P − Id) + (1 − 2β) P + β Id, β A B B T = P ((1 + λ) P − λ Id) − λ( P − Id). λ A B B 5.2 Convergence analysis We analyze the convergence of algorithm T for the sparse feasibility problem (31). The next theorem establishes local linear convergence of algorithm T for solving sparse feasibility problems. Theorem 5 (Linear convergence of algorithm T for sparse feasibility) Let x¯ = (x¯ ) ∈ λ i A ∩ B and suppose that s is the sparsity of the solutions to the problem (30). Then any iteration x ∈ T x starting sufficiently close to x¯ converges R-linearly to x. ¯ k+1 λ k Proof We first show that x¯ is an isolated point of A ∩ B. Since s is the sparsity of the solutions to (30), we have that x¯  = s and the set I (x¯ ) contains a unique element, denoted I . Note that E := span{e : i ∈ I } is the unique s-dimensional space x¯ x¯ i x¯ component of A containing x¯, where {e : 1 ≤ i ≤ n} is the canonical basic of R . s i Let us denote δ := min |¯ x | > 0. i ∈I x¯ We claim that A ∩ B (x¯ ) = E ∩ B (x¯ ), (32) s δ x¯ δ E ∩ B ={x¯}. (33) x¯ Indeed, for any x = (x ) ∈ A ∩ B (x¯ ), we have by definition of δ that x = 0for i s δ i all i ∈ I . Hence x  = s and x ∈ E ∩ B (x¯ ). This proves (32). x¯ 0 x¯ δ For (33), it suffices to show the singleton of E ∩ B since we already know that x¯ x¯ ∈ E ∩ B. Suppose otherwise that there exists x = (x ) ∈ E ∩ B with x =¯ x x¯ i x¯ j j for some index j. Since both E and B are affine, the intersection E ∩ B contains x¯ x¯ the line {x + t (x¯ − x ) : t ∈ R} passing x and x¯. In particular, it contains the point z := x + (x¯ − x ). Then we have that z ∈ B and z ≤ s − 1as z = 0. This 0 j x −¯ x j j contradicts to the assumption that s is the sparsity of the solutions to (30), and hence (33) is proved. A combination of (32) and (33) then yields A ∩ B ∩ B (x¯ ) = E ∩ B ∩ B (x¯ ) ={x¯ }. (34) s δ x¯ δ This means that x¯ is an isolated point of A ∩ B as claimed. Moreover, the equalities in (34) imply that P x = P x ∀x ∈ B (x¯ ). A E δ/2 s x¯ Therefore, for any starting point x ∈ B (x¯), the iteration x ∈ T x for solving 0 δ/2 k+1 λ k (31) is identical to that for solving the feasibility problem for the two sets E and B. x¯ 123 860 N. H. Thao Since E and B are two affine subspaces intersecting at the unique point x¯ by (33), x¯ the collection of sets { E , B} is transversal at x¯ relative to the affine hull aff( E ∪ B). x¯ x¯ Theorem 4 now can be applied to conclude that the iteration x ∈ T x converges k+1 λ k R-linearly to x¯. The proof is complete. It is worth mentioning that the convergence analysis in Theorem 5 is also valid for the RAAR algorithm. 5.3 Numerical experiment We now set up a toy example as in [9,14] which involves an unknown true object x¯ ∈ R with x¯  = 328 (the sparsity rate is .005). Let b be 1/8 of the measurements of F (x¯),the Fourier transform of x¯, with the sample indices denoted J .The Poisson noise was added when calculating the measurement b. Note that since x¯ is real, F (x¯) is conjugate symmetric, we indeed have nearly a double number of measurements. In this setting, we have B ={x ∈ C | F (x )(k) = b(k), ∀k ∈ J }, and the two prox operators, respectively, take the forms Re (x (k)) if k ∈ I, P x = z ∈ R |∃ I ∈ I (x ) such that z(k) = , A s 0else b(k) if k ∈ J , −1 P x = F (xˆ), where xˆ (k) = F (x )(k) else, −1 where Re(x (k)) denotes the real part of the complex number x (k), and F is the inverse Fourier transform. The initial point was chosen randomly, and a warm-up procedure with 10 DR iterates was performed before running the two algorithms. The stopping criterion + −10 x − x < 10 was used. We have used the Matlab ProxToolbox [37] to run this numerical experiment. The parameters were chosen in such a way that the performance is seemingly optimal for both algorithms. We chose β = .65 for the RAAR algorithm and λ = .45 for algorithm T in the case of consistent feasibility problem correspond- ing to s = 340, and β = .6 for the RAAR algorithm and λ = .4 for algorithm T in the case of inconsistent feasibility problem corresponding to s = 310. The change of distances between two consecutive iterates is of interest. When linear convergence appears to be the case, it can yield useful information of the convergence rate. Under the assumption that the iterates will remain in the convergence area, one can obtain error bounds for the distance from the current iterate to a nearest solution. We also pay attention to the gaps in iterates that in a sense measure the infeasibility at the iterates. If we think feasibility problem as the problem of minimizing the sum of the squares of the distance functions to the sets, then gaps in iterates are the values of that function evaluated at the iterates. For the two algorithms under consideration, 123 A convergent relaxation of the Douglas–Rachford algorithm 861 consistent feasibility consistent feasibility 20 20 10 10 RAAR RAAR 0 0 10 10 -20 -20 10 10 0 50 100 150 0 50 100 150 iteration iteration inconsistent feasibility inconsistent feasibility 20 10 RAAR RAAR 0 0 10 10 -20 -10 10 10 0 50 100 150 0 50 100 150 iteration iteration Fig. 1 Performances of the RAAR and T algorithms for sparse feasibility problem: iterate changes in consistent case (top-left), iterate gaps in consistent case (top-right), iterate changes in inconsistent case (bottom-left) and iterate gaps in inconsistent case (bottom-right) the iterates are themselves not informative but their shadows, by which we mean the projections of the iterates on one of the sets. Hence, the gaps in iterates are calculated for the iterate shadows instead of the iterates themselves. Figure 1 summarizes the performances of the two algorithms for both consistent and inconsistent sparse feasibility problems. We first emphasize that the algorithms appear to be convergent in both cases of feasibility. For the consistent case, algorithm T appears to perform better than the RAAR algorithm in terms of both the iterate changes and gaps. Also, the CPU time of algorithm T is around 10% less than that of the RAAR algorithm. For the inconsistent case, we have a similar observation except that the iterate gaps for the RAAR algorithm are slightly better (smaller) than those for algorithm T . Extensive numerical experiments in imaging problems illustrating the empirical performance of algorithm T will be the future work. Acknowledgements The author would like to thank Prof. Dr. Russell Luke and Prof. Dr. Alexander Kruger for their encouragement and valuable suggestions during the preparation of this work. He also would like to thank the anonymous referees for their very helpful and constructive comments on the manuscript version of the paper. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. change in iterates change in iterates log of gap in iterates log of gap in iterates 862 N. H. Thao References 1. Aspelmeier, T., Charitha, C., Luke, D.R.: Local linear convergence of the ADMM/Douglas–Rachford algorithms without strong convexity and application to statistical imaging. SIAM J. Imaging Sci. 9(2), 842–868 (2016) 2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010) 3. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996) 4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011) 5. Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.: Restricted normal cones and the method of alternating projections: applications. Set-Valued Var. Anal. 21, 475–501 (2013) 6. Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014) 7. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014) 8. Borwein, J.M., Tam, M.K.: The cyclic Douglas–Rachford method for inconsistent feasibility problems. J. Nonlinear Convex Anal. 16(4), 537–584 (2015) 9. Candés, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005) 10. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point Algo- rithms for Inverse Problems in Science and Engineering, vol. 49. Springer, Berlin, pp. 185–212 (2011) 11. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mapppings. Srpinger, New York (2014) 12. Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015) 13. Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013) 14. Hesse, R., Luke, D.R., Neumann, P.: Alternating projections and Douglas–Rachford for sparse affine feasibility. IEEE Trans. Signal. Process. 62(18), 4868–4881 (2014) 15. Ioffe, A.D.: Metric regularity and subdifferential calculus. Russian Math. Surv. 55(3), 501–558 (2000) 16. Ioffe, A.D.: Regularity on a fixed set. SIAM J. Optim. 21(4), 1345–1370 (2011) 17. Ioffe, A.D.: Nonlinear regularity models. Math. Program. 139(1–2), 223–242 (2013) 18. Ioffe, A.D.: Metric regularity: a survey. Part I. Theory. J. Aust. Math. Soc. 101(2), 188–243 (2016) 19. Khanh, Phan Q., Kruger, A.Y., Thao, Nguyen H.: An induction theorem and nonlinear regularity models. SIAM J. Optim. 25(4), 2561–2588 (2015) 20. Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002) 21. Klatte, D., Kummer, B.: Optimization methods and stability of inclusions in Banach spaces. Math. Program. 117(1–2), 305–330 (2009) 22. Kruger, A.Y.: Stationarity and regularity of set systems. Pac. J. Optim. 1(1), 101–126 (2005) 23. Kruger, A.Y.: About regularity of collections of sets. Set-Valued Anal. 14, 187–206 (2006) 24. Kruger, A.Y.: About stationarity and regularity in variational analysis. Taiwan. J. Math. 13(6A), 1737– 1785 (2009) 25. Kruger, A.Y.: Error bounds and metric subregularity. Optimization 64(1), 49–79 (2015) 26. Kruger, A.Y., Luke, D.R., Thao, Nguyen H.: Set regularities and feasibility problems. Math. Program. B. https://doi.org/10.1007/s10107-016-1039-x 27. Kruger, A.Y., Luke, D.R., Thao, Nguyen H.: About subtransversality of collections of sets. Set-Valued Var. Anal. 25(4), 701–729 (2017) 28. Kruger, A.Y., Thao, Nguyen H.: About uniform regularity of collections of sets. Serdica Math. J. 39, 287–312 (2013) 29. Kruger, A.Y., Thao, Nguyen H.: About [q]-regularity properties of collections of sets. J. Math. Anal. Appl. 416(2), 471–496 (2014) 30. Kruger, A.Y., Thao, Nguyen H.: Quantitative characterizations of regularity properties of collections of sets. J. Optim. Theory Appl. 164, 41–67 (2015) 123 A convergent relaxation of the Douglas–Rachford algorithm 863 31. Kruger, A.Y., Thao, Nguyen H.: Regularity of collections of sets and convergence of inexact alternating projections. J. Convex Anal. 23(3), 823–847 (2016) 32. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence of alternating and averaged projections. Found. Comput. Math. 9(4), 485–513 (2009) 33. Lewis, A.S., Malick, J.: Alternating projections on manifolds. Math. Oper. Res. 33, 216–234 (2008) 34. Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex feasibility problems. Math. Program. 159(1), 371–401 (2016) 35. Luke, D.R.: Relaxed averaged alternating reflections for diffraction imaging. Inverse Problems 21, 37–50 (2005) 36. Luke, D.R.: Finding best approximation pairs relative to a convex and a prox-regular set in Hilbert space. SIAM J. Optim. 19(2), 714–739 (2008) 37. Luke, D.R.: ProxToolbox. http://num.math.uni-goettingen.de/proxtoolbox (2017). Accessed Aug 2017 38. Luke, D.R., Thao, Nguyen H., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings. Math. Oper. Res. https://doi.org/10.1287/moor.2017.0898 39. Luke, D.R., Thao, Nguyen H., Teboulle, M.: Necessary conditions for linear convergence of Picard iterations and application to alternating projections. https://arxiv.org/pdf/1704.08926.pdf (2017) 40. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation. I: Basic Theory. Springer, Berlin (2006) 41. Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendus de l’Académie des Sciences de Paris 255, 2897–2899 (1962) 42. Noll, D., Rondepierre, A.: On local convergence of the method of alternating projections. Found. Comput. Math. 16(2), 425–455 (2016) 43. Patrinos, P., Stella, L., Bemporad, A.: Douglas-Rachford splitting: Complexity estimates and acceler- ated variants. In: 53rd IEEE Conference on Decision and Control, pp. 4234–4239 (2014) 44. Penot, J.-P.: Calculus Without Derivatives. Springer, New York (2013) 45. Phan, H.M.: Linear convergence of the Douglas–Rachford method for two closed sets. Optimization 65, 369–385 (2016) 46. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Grundlehren Math. Wiss. Springer, Berlin (1998)

Journal

Computational Optimization and ApplicationsSpringer Journals

Published: Mar 6, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off