# A semismooth Newton method with analytical path-following for the $$H^1$$ -projection onto the Gibbs simplex

A semismooth Newton method with analytical path-following for the $$H^1$$ -projection onto the... Abstract An efficient, function-space-based second-order method for the $$H^1$$-projection onto the Gibbs simplex is presented. The method makes use of the theory of semismooth Newton methods in function spaces as well as Moreau–Yosida regularization and techniques from parametric optimization. A path-following technique is considered for the regularization parameter updates. A rigorous first- and second-order sensitivity analysis of the value function for the regularized problem is provided to justify the update scheme. The viability of the algorithm is then demonstrated for two applications found in the literature: binary image inpainting and labeled data classification. In both cases, the algorithm exhibits mesh-independent behavior. 1. Introduction In many modern applications such as image inpainting (Bertozzi et al., 2007; Burger et al., 2009), topology optimization (Blank et al., 2012, 2014), multiclass data segmentation (Garcia-Cardona et al., 2014) and multiphase fluid dynamics (Van Wachem & Almstedt, 2003; Stinner et al., 2004), the associated variational problems contain multiple phase-field functions, denoted here by $$u_1,\dots , u_n$$, that fulfill the conditions $$u_i \geqslant 0$$ and $$\sum u_i =1$$ (in a pointwise almost everywhere (a.e.) sense). This typically leads to situations in which one needs to project onto the so-called Gibbs simplex in the Sobolev space $$H^1(\varOmega )$$. Mathematically speaking, this involves the solution of the following infinite dimensional optimization problem: $$\min\left\{\frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} \textrm{ over } u \in \mathbb V \;\Big|\; u \in \mathcal{G} \right\},$$ (P) where $$\mathcal{G} \subset H^1(\varOmega )$$, given by $$\mathcal{G} := \left\{ u \in \mathbb V \left| \; \mathbf{1}^\top u = 1,\; u_i \geqslant 0, i =1,\dots,n \right.\right\},$$ is the Gibbs simplex (n ⩾ 2). Here, we let $$\varOmega \subset \mathbb R^{d}$$ be nonempty, open and bounded with Lipschitz boundary $$\varGamma$$ and set $$V:= H^1(\varOmega )$$ and $$\mathbb V:= V^n$$. In addition, we fix some $$\varphi \in \mathbb V$$ and note that $$\mathbf{1}$$ is the column vector of ones in $$\mathbb R^n$$. The conditions on the functions u in $$\mathcal{G}$$ are always understood pointwise a.e. Moreover, since $$\varOmega$$ is bounded, $$\mathcal{G}$$ is in fact a nonempty, closed and convex subset in $$H^1(\varOmega )^n \cap L^{\infty }(\varOmega )^n$$, which is also bounded in $$L^\infty (\varOmega )^n$$. Although the constraint set $$\mathcal{G}$$ is formulated with pointwise a.e. constraints, it is crucial that the correct norm, in this case the $$H^1$$-norm, is used in the projection. Indeed, let $$\varGamma$$ be smooth, n = 2 and $$\varphi \in H^2(\varOmega )^2$$ such that $$\nabla [\varphi _1 + \varphi _2 - 1]\cdot n|_{\varGamma } = 0$$. Then (P) reduces to solving the (one-dimensional) obstacle problem in $$H^1(\varOmega )$$ given by $$\min\left\{\|u\|^2_{\mathbb{V}} - (f, u)_{L^2} \textrm{ over } u \in \mathbb V \;\Big|\; 0 \leqslant u \leqslant 1\right\},$$ where, setting $$\widetilde{\varphi } := \varphi _1 + \varphi _2 - 1$$, the function $$f = -\varDelta \widetilde{\varphi } +\widetilde{\varphi }\in L^2(\varOmega )$$. It is well known, cf. Brézis & Stampacchia (1968); Kinderlehrer & Stampacchia (2000), that the unique optimal solution $$\overline{u}$$ is in $$H^2(\varOmega )$$. Furthermore, the pointwise/thresholding/$$L^2$$-projection of $$\varphi$$, given by $$\varphi + \max(0,-\varphi) - \max(0,\varphi - 1),$$ is feasible, but only in $$H^1(\varOmega )$$; as the pointwise $$\max (0,\cdot )$$-operator does not preserve $$H^2$$-regularity. Hence, the pointwise projection of $$\varphi$$ onto the Gibbs simplex is feasible, yet it cannot be the solution to the original problem. Following the approach in the studies by Hintermüller & Kunisch (2006a,b), we approximate (P) by replacing $$\mathcal{G}$$ with a Moreau–Yosida-type regularization of the associated indicator function. This yields the approximation $$\theta(\gamma) := \min\left\{ \frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} + \frac{\gamma}{2}\sum_{i=1}^n\int_{\varOmega}(-u_i)^2_+\mathrm{d}x + \frac{\gamma}{2}\int_{\varOmega}\left|\mathbf{1}^\top u - 1\right|{}^2\mathrm{d}x \textrm{ over } u \in \mathbb V \right\}.$$ (Pγ) The associated first-order system can be solved by a semismooth Newton method in function space. In order to increase numerical efficiency by avoiding excess parameter updates, we derive a full differential sensitivity analysis of the optimal value function $$\theta (\gamma )$$. These results lead to an explicit parameter update scheme referred to as ‘path-following’ (PF). We note that our analysis is much more compact than in the studies by Hintermüller & Kunisch (2006a,b). Moreover, no assumptions on the active sets are needed to derive second-order properties of $$\theta (\gamma )$$. A further important factor, which motivates this work, arises from the fact that a first-discretize-then-optimize approach will typically exhibit mesh dependence under adaptive refinements (cf. the examples in Section 5). The main culprit for this aspect is due to a lack of multiplier regularity for the constraints in $$\mathcal{G}$$. Using a Moreau–Yosida-type regularization increases the regularity (smoothness) of the problem, which, in particular, leads to superlinear convergence of semismooth Newton methods in function space for the regularized problems. Moreover, by deriving an efficient approximation of the primal–dual path value functional $$\theta$$, we avoid problems connected with ill-conditioning. Before discussing the outline of the paper, which is inspired by the frequent occurrence of the Gibbs simplex in applications, we note that our approach readily extends to general polygonal constraints as in, e.g., Kunisch et al. (2010); Kunisch & Lu (2013). For n = 1 and more general convex constraints handled by Moreau–Yosida regularization, see Keuthen & Ulbrich (2015). The rest of the paper is structured as follows. In Section 2 we analyze the parameter dependent problem (Pγ), which lays the foundation for the PF. Afterwards, we suggest a PF scheme and present the algorithm in Sections 3 and 4. Finally, in Section 5, we demonstrate the performance of the algorithm on two applications taken from the literature. This is done for both PF schemes and a comparison is made to a mesh-dependent scheme based on the direct solution of the first-order system of (P). 2. Sensitivity analysis As indicated in the introduction, we need to analyze the first- and second-order differentiability properties of the parameter dependent quantities related to (Pγ) in order to derive a PF scheme. We thus begin this section with some basic facts about the optimal solution mapping and optimal value functions associated with (P) and (Pγ). Throughout the discussion, we use • to indicate the dependence on the penalty parameter $$\gamma> 0$$ and for readability let $$\beta(u) := \frac12\sum_{i=1}^n\int_{\varOmega}(-u_i)^2_+\mathrm{d}x + \frac12\int_{\varOmega}\left|\mathbf{1}^\top u - 1\right|{}^2\mathrm{d}x.$$ Theorem 2.1 The following facts hold: (P) has a unique solution $$\bar{u}$$ and, for all $$\gamma> 0$$, (Pγ) has a unique solution $$\bar{u}^{\gamma }$$. The optimal solution mapping $$\bar{u}^{\bullet }: (0,\infty ) \to \mathbb V$$ is globally Hölder continuous with exponent 1/2 and locally Lipschitz continuous. For any sequence $$\gamma _k \to +\infty$$, the sequence $$\left \{\bar{u}^k\right \}$$ with $$\bar{u}^k := \bar{u}^{\gamma _k}$$ converges strongly to $$\bar{u}$$ in $$\mathbb V$$. The optimal value function $$\theta : (0,\infty ) \to \mathbb R$$ is non-negative, monotonically increasing with $$\lim _{\gamma \downarrow 0} \theta (\gamma )= 0$$, locally Lipschitz continuous and bounded from above by $$\theta _{\infty }$$, the optimal value of (P). If there exists s > 0 such that $$\beta (\bar{u}^{\gamma }) = \mathcal{O}(\gamma ^{-1-s})$$, then $$\| \bar{u}^{\gamma }- \overline{u}\|_{\mathbb{V}} = \mathcal{O}(\gamma ^{-s/2})$$. Proof. To see that item 1 holds, consider that both (P) and (Pγ) involve the minimization of a strongly convex continuous functional over a nonempty closed set in a real Hilbert space, both problems admit unique solutions $$\bar{u}$$ and $$\bar{u}^{\gamma }$$, respectively. To prove continuity of $$\bar{u}^{\bullet }$$ on $$(0,\infty )$$, we need several results. By definition of a global optimum, we have $$\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\leqslant \frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} + \gamma\beta(u),\,\,\forall u \in \mathbb V.$$ Therefore, it follows from the feasibility of $$\bar{u}$$ and non-negativity of the objectives that $$0 \leqslant \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}} = \theta_{\infty},\,\,\forall \gamma> 0.$$ (2.1) Furthermore, since $$\beta (\cdot ) \geqslant 0$$, we deduce the uniform boundedness in $$H^1(\varOmega )^n$$ of the set $$\left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$. Let $$\gamma _1, \gamma _2 \in (0,\infty )$$ and define the associated optimal solutions of (Pγ) by $$\bar{u}^{\gamma _i}$$, i = 1, 2. Since $$\beta : \mathbb V \to \mathbb R$$ is continuous and convex, it is subdifferentiable. It follows then from the first-order optimality conditions for (Pγ) that $$0 \in \partial\left(\frac{1}{2}\|\cdot - \varphi\|^2_{\mathbb{V}} + \gamma_i \beta(\cdot)\right)(\bar{u}^{\gamma_i}) = \partial\left(\frac{1}{2}\|\cdot - \varphi\|^2_{\mathbb{V}}\right)(\bar{u}^{\gamma_i}) + \gamma_i \partial \beta(\cdot)(\bar{u}^{\gamma_i}) \Longrightarrow -v_i \in \gamma_i \partial \beta(\cdot)(\bar{u}^{\gamma_i}),$$ where $$v_i \in \partial \left (\frac{1}{2}\|\cdot - \varphi \|^2_{\mathbb{V}}\right )(\bar{u}^{\gamma _i}) = A(\bar{u}^{\gamma _i}-\varphi )$$ and A is the (uniformly elliptic) bounded linear operator associated with the inner product on $$\mathbb V$$. Note that due to convexity and continuity, the sum rule for convex subdifferentials holds, cf. Ioffe & Tihomirov (1979). By definition of the subdifferential, we now have $$\gamma_i \beta(z) \geqslant \gamma_i \beta(\bar{u}^{\gamma_i}) + \langle -v_i, z - \bar{u}^{\gamma_i}\rangle,\,\,\forall z \in \mathbb V,\,\, i =1,2.$$ (2.2) For i = 1, we set $$z = \bar{u}^{\gamma _2}$$ and for i = 2, we set $$z = \bar{u}^{\gamma _1}$$ in (2.2). Adding the resulting terms yields $$\gamma_1 \beta(\bar{u}^{\gamma_2}) + \gamma_2 \beta(\bar{u}^{\gamma_1})\geqslant \gamma_1 \beta(\bar{u}^{\gamma_1}) + \gamma_2 \beta(\bar{u}^{\gamma_2}) + \langle -v_1, \bar{u}^{\gamma_2} - \bar{u}^{\gamma_1}\rangle + \langle -v_2, \bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\rangle.$$ Hence, we have $$|\gamma_1 - \gamma_2| | \beta(\bar{u}^{\gamma_1})-\beta(\bar{u}^{\gamma_2})| \geqslant (\gamma_1 - \gamma_2)(\beta(\bar{u}^{\gamma_2})-\beta(\bar{u}^{\gamma_1})) \geqslant \langle A(\bar{u}^{\gamma_1}-\bar{u}^{\gamma_2}),\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\rangle = \|\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\|^2_{\mathbb{V}}.$$ Since $$\{\beta (\bar u^\gamma )\}_{\gamma>0}$$ is bounded, there exists some M > 0 such that $$\|\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\|_{\mathbb{V}} \leqslant M^{1/2}|\gamma_1 - \gamma_2|^{1/2},\forall \gamma_1,\gamma_2> 0,$$ i.e., $$\bar{u}^{\bullet }: (0,\infty ) \to \mathbb V$$ is Hölder continuous. Since $$\beta$$ is convex and continuous, it is locally Lipschitz on any open convex set containing $$\left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$. Therefore, if we fix some $$\gamma _0 \in (0,\infty )$$ and consider $$\bar{u}^{\bullet }: (\gamma _0-\varepsilon ,\gamma _0+\varepsilon ) \to \mathbb V$$ for sufficiently small $$\varepsilon> 0$$, then $$\bar{u}^{\bullet }: (\gamma _0-\varepsilon ,\gamma _0+\varepsilon )\to \mathbb V$$ is Lipschitz continuous. This proves 2. Next, let $$\gamma _k \to +\infty$$ and define the sequence $$\{\bar{u}^k\}$$ by $$\bar{u}^k := \bar{u}^{\gamma _k}$$. Since $$\{\bar{u}^k\} \subset \left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$, which is bounded, there exists a weakly convergent subsequence $$\{\bar{u}^{k_l}\}$$ with limit $$\bar{u}^{\star }$$. Returning to (2.1) in this setting and dividing both sides by $$\gamma _{k_l}$$, it follows from the weak lower semicontinuity of $$\beta$$ that $$\beta (\bar{u}^{\star }) =0$$; from which it can be easily deduced that $$\bar{u}^{\star } \in \mathcal{G}$$. But then given $$\frac{1}{2}\|\bar{u}^{k_l} - \varphi\|^2_{\mathbb{V}} \leqslant \frac{1}{2}\|\bar{u}^{k_l} - \varphi\|^2_{\mathbb{V}} + \gamma_{k_l}\beta(\bar{u}^{k_l}) \leqslant \frac{1}{2}\|\bar{u}- \varphi\|^2_{\mathbb{V}},$$ (2.3) it follows from the weak lower semicontinuity of $$\|\cdot -\varphi \|^2_{\mathbb{V}}$$ and the uniqueness of the global minimizer $$\bar{u}$$ that $$\bar{u}^{\star } = \bar{u}$$. Moreover, the uniqueness of $$\bar{u}$$ also implies, by the Urysohn property, that the entire sequence $$\{\bar{u}^k\}$$ converges weakly to $$\bar{u}$$. Using the previous inequality, we have once again by weak lower semicontinuity and optimality (respectively) $$\liminf_{k \to+\infty}\frac{1}{2}\|\bar{u}^{k} - \varphi\|^2_{\mathbb{V}} \geqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}}\qquad\textrm{ and }\qquad \limsup_{k \to +\infty} \frac{1}{2}\|\bar{u}^{k} - \varphi\|^2_{\mathbb{V}} \leqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}}.$$ Thus, $$\|\bar{u}^{k} - \varphi \|^2_{\mathbb{V}} \to \|\bar{u} - \varphi \|^2_{\mathbb{V}}$$ and subsequently $$\|\bar u^k\|_{\mathbb{V}}\to \|\bar u\|_{\mathbb{V}}$$. Since $$\mathbb V$$ is a Hilbert space, it follows that $$\bar{u}^k \to \bar{u}$$ strongly in $$\mathbb V$$. This proves 3. Next, we prove 4. Since $$0\leqslant \theta (\gamma ) \leqslant \gamma \beta (\varphi ) \to 0$$ as $$\gamma \downarrow 0$$, we obtain $$\lim _{\gamma \downarrow 0}\theta (\gamma )=0$$. Moreover, local Lipschitz continuity of $$\theta :(0,\infty ) \to \mathbb R$$ follows from that of $$\bar{u}^{\bullet }:(0,\infty ) \to \mathbb V$$. To see that $$\theta$$ is monotonically increasing, fix $$\gamma , \eta> 0$$ consider that \begin{align} \theta(\gamma + \eta) - \theta(\gamma) &\geqslant \frac{1}{2}\|\bar{u}^{\gamma+\eta} - \varphi\|^2_{\mathbb{V}} + (\gamma+\eta)\beta(\bar{u}^{\gamma+\eta}) - \left(\frac{1}{2}\|\bar{u}^{\gamma+\eta} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma+\eta})\right) \\ &= \eta \beta(\bar{u}^{\gamma+\eta}) \geqslant 0. \end{align} (2.4) Finally, we prove a rate of convergence result. The idea is inspired by similar results and techniques in the studies by Hintermüller & Kunisch (2006b); Keuthen & Ulbrich (2015). We start by noting that by assumption there exists some s > 0 such that $$\beta(\bar{u}^{\gamma}) = \mathcal{O}(\gamma^{-1-s}).$$ Looking ahead to the result of Theorem 2.3, which guarantees continuous differentiability of $$\theta$$, in particular $$\theta ^{\prime}(\gamma ) = \beta (\bar{u}^{\gamma })$$, it follows from the integral form of the mean value theorem that for any $$\gamma _2> \gamma _1 > 0$$ we have $$\theta(\gamma_2) - \theta(\gamma_1) = \int_{\gamma_1}^{\gamma_2} \theta^{\prime}(\gamma) \,\mathrm{d} \gamma = \int_{\gamma_1}^{\gamma_2} \beta(\bar{u}^{\gamma}) \,\mathrm{d} \gamma \leqslant C \int_{\gamma_1}^{\gamma_2} \frac{1}{\gamma^{1+s}} \,\mathrm{d} \gamma = C \left(-\gamma_{2}^{-s} + \gamma_{1}^{-s}\right)\!,$$ where C > 0 is some constant independent of $$\gamma$$. Passing to the limit in $$\gamma _2 \to +\infty$$, it follows from the above that $$\theta_{\infty} - \theta(\gamma_1) = \frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \theta(\gamma_1) = \frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \frac{1}{2} \|\bar{u}^{\gamma_1} - \varphi \|^2_{\mathbb{V}} - \gamma_1 \beta(\bar{u}^{\gamma_1}) \le C \gamma_{1}^{-s}.$$ (2.5) Next, consider that $$\frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \frac{1}{2} \|\bar{u}^{\gamma_1} - \varphi \|^2_{\mathbb{V}} = \frac{1}{2}\|\bar{u} - \bar{u}^{\gamma_1}\|^2_{\mathbb{V}} + (\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}}.$$ (2.6) Furthermore, using $$\bar{u} - \bar{u}^{\gamma _1}$$ as a test function in the first-order optimality conditions for the penalized problem, we obtain (due to symmetry) $$(\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}} = \langle A[\bar{u}^{\gamma_1} - \varphi], \bar{u} - \bar{u}^{\gamma_1}\rangle = -\gamma \langle \partial \beta(\bar{u}^{\gamma_1}), \bar{u} - \bar{u}^{\gamma_1} \rangle.$$ Then, since $$\partial \beta$$ is a monotone operator and $$\bar{u}$$ is feasible, we have $$(\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}} = \gamma \langle \partial \beta(\bar{u}^{\gamma_1}) - \partial \beta(\bar{u}), \bar{u}^{\gamma_1} - \bar{u} \rangle \geqslant 0.$$ Hence, using this in (2.6) and then returning to (2.5), we obtain $$\frac{1}{2}\|\bar{u}^{\gamma_1} - \bar{u}\|^2_{\mathbb{V}} \leqslant \gamma_{1}^{-s} \left(C + \gamma^{1+s}_1\beta(\bar{u}^{\gamma_1})\right).$$ It follows that $$\|\bar{u}^{\gamma_1} - \bar{u}\|_{\mathbb{V}} = \mathcal{O}\left( \gamma_{1}^{-s/2} \right),$$ as was to be shown. This completes the proof. Remark 2.2 Since $$\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma \beta(\bar{u}^{\gamma}) \leqslant \frac{1}{2}\|\bar{u}- \varphi\|^2_{\mathbb{V}}$$ and $$\frac{1}{2}\|\bar{u}^{\gamma } - \varphi \|^2_{\mathbb{V}} \to \frac{1}{2}\|\bar{u}- \varphi \|^2_{\mathbb{V}}$$ as $$\gamma \to +\infty$$, it follows that $$\gamma \beta (\bar{u}^{\gamma }) \to 0$$, i.e., $$\beta (\bar{u}^{\gamma }) = o(\gamma ^{-1})$$. Therefore, it is plausible that $$\beta (\bar{u}^{\gamma })$$ satisfies the necessary rate condition. In particular, we readily see that for each $$i = 1,\dots , n$$ $$\left\| \left(-\bar{u}^{\gamma}_{i}\right)_+ \right\|_{L^2(\varOmega)} = o(\gamma^{-1/2}) \textrm{ and } \| 1 - \mathbf{1}^T\bar{u}^{\gamma} \|_{L^2(\varOmega)} = o(\gamma^{-1/2}).$$ In fact, when $$\varOmega \subset \mathbb R^1$$, which would be relevant for signal processing applications, we can even show directly that at least s = 1/2. Since $$H^1(\varOmega )$$ is continuously embedded into $$L^{\infty }(\varOmega )$$ for domains in $$\mathbb R^1$$, we can directly verify Robinson’s constraint qualification for the Gibbs simplex, cf. Bonnans & Shapiro (2000). Therefore, there exist Lagrange multipliers $$\lambda , \mu \in \mathbb V^{\ast } \times H^1(\varOmega )^{\ast }$$ such that for $$i=1,\dots ,n$$ \begin{align} -\varDelta \bar{u}_i + \bar{u}_i - \lambda_{i} - \mathbf{1} \mu &= -\varDelta \varphi_i + \varphi_i,\,\,\textrm{ in } \varOmega, \end{align} (2.7a) \begin{align} \frac{\partial (\bar{u}_i-\varphi_i)}{\partial n} &=0,\,\,\textrm{ on } \varGamma, \end{align} (2.7b) plus $$\lambda _i \geqslant 0$$ (in the sense of $$H^1(\varOmega )^{\ast }$$) and $$\langle \lambda _i, \bar{u}_i \rangle = 0$$. Thus, if we use $$\bar{u} - \bar{u}^{\gamma }$$ as a test function in (2.7) as well as in the optimality system for the penalized problem, then subtracting the resulting terms yields \begin{multline*} (\bar{u}-\bar{u}^{\gamma},\bar{u} - \bar{u}^{\gamma})_{\mathbb{V}} = \langle \lambda,\bar{u} - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} + \sum_{i=1}^n\langle \mu, \bar{u}_i - \bar{u}^{\gamma}_i\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)} \\ -\gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} +\gamma \sum_{i=1}^n\left( (\mathbf{1}^T \bar{u}^{\gamma} - 1), \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)}. \end{multline*} Next, consider that \begin{align*} \langle \lambda,\bar{u} - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} &= \langle \lambda,-\bar{u}^{\gamma} \rangle_{\mathbb V^{\ast}, \mathbb V}\\ \sum_{i=1}^n\langle \mu, \bar{u}_i - \bar{u}^{\gamma}_i\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)} &= \langle \mu, 1 - \mathbf{1}^T \bar{u}^{\gamma}\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)}\\ \gamma \sum_{i=1}^n\left( (\mathbf{1}^T \bar{u}^{\gamma} - 1), \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} &= -\gamma \| \mathbf{1}^T \bar{u}^{\gamma} - 1 \|^2_{L^2(\varOmega)}\\ \gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} &= \gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i\right)_{L^2(\varOmega)} + \gamma \sum_{i=1}^n \left\| \left(-\bar{u}^{\gamma}_i\right)_+ \right\|^2_{L^2(\varOmega)} \geqslant 0. \end{align*} It follows that there exists some constant C > 0 such that \begin{align*} \| \bar{u} - \bar{u}^{\gamma} \|^2_{\mathbb{V}} &\leqslant \langle \lambda, - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} + \langle \mu, 1 - \mathbf{1}^T \bar{u}^{\gamma} \rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)}\\ &\leqslant C \left( \sum_{i=1}^n \left\|\left(- \bar{u}^{\gamma}_i\right)_+ \right\|_{L^2(\varOmega)} + \| 1 - \mathbf{1}^T \bar{u}^{\gamma} \|_{L^2(\varOmega)} \right). \end{align*} Hence, $$\| \bar{u} - \bar{u}^{\gamma } \|_{\mathbb{V}} = o(\gamma ^{-1/4})$$. Building on the results of Theorem 2.1, we show that $$\theta :(0,\infty ) \to \mathbb R$$ is continuously differentiable. Theorem 2.3 Under the standing assumptions the optimal value function $$\theta \!:\!(0,\infty ) \!\to\! \mathbb R$$ is continuously differentiable. The gradient of $$\theta$$ at $$\bar{\gamma }> 0$$ is given by $$\frac{\partial \theta}{\partial \gamma}(\bar{\gamma}) = \beta(\bar{u}^{\bar \gamma}).$$ Proof. Fix $$\gamma , \eta> 0$$. According to (2.4), we have $$\eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) \geqslant \beta(\bar{u}^{\gamma+\eta}).$$ (2.8) Similarly, we have (for sufficiently small $$\eta> 0$$) $$\theta(\gamma - \eta) - \theta(\gamma) \geqslant \frac{1}{2}\|\bar{u}^{\gamma-\eta} - \varphi\|^2_{\mathbb{V}} + (\gamma-\eta)\beta(\bar{u}^{\gamma-\eta}) - \left(\frac{1}{2}\|\bar{u}^{\gamma-\eta} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma-\eta})\right) = -\eta \beta(\bar{u}^{\gamma-\eta}).$$ Hence, $$\eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) \geqslant -\beta(\bar{u}^{\gamma-\eta}).$$ (2.9) We also have the upper bounds $$\theta(\gamma + \eta) - \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + (\gamma+\eta)\beta(\bar{u}^{\gamma}) - \left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\right) = \eta \beta(\bar{u}^{\gamma})$$ and (for sufficiently small $$\eta> 0$$) $$\theta(\gamma - \eta) - \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + (\gamma-\eta)\beta(\bar{u}^{\gamma}) - \left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\right) = -\eta \beta(\bar{u}^{\gamma}).$$ This yields \begin{align} \eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) &\leqslant \beta(\bar{u}^{\gamma}) \end{align} (2.10) \begin{align} \eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) &\leqslant -\beta(\bar{u}^{\gamma}). \end{align} (2.11) It follows from Theorem 2.1.2, the continuity of $$\beta$$, (2.8) and (2.10) that $$\lim_{\eta \downarrow 0} \eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) = \beta(\bar{u}^{\gamma}).$$ Similarly, we have from Theorem 2.1.2, the continuity of $$\beta$$, (2.9) and (2.11) that $$\lim_{\eta \downarrow 0} \eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) = -\beta(\bar{u}^{\gamma}).$$ Hence, $$\theta$$ is differentiable from $$(0,\infty ) \to \mathbb R$$. Finally, as $$\bar{u}^{\bullet }:(0,\infty ) \to \mathbb V$$ and $$\beta$$ are continuous, $$\theta$$ is continuously differentiable. Remark 2.4 (Relation to Danskin’s theorem). Since $$\beta$$ was kept rather general in Theorems 2.1 and 2.3, it appears that the value function is continuously differentiable for a wide variety of constraints beyond our setting. Note that Theorem 2.3 is not a direct consequence of Danskin’s theorem. Indeed, the hypotheses of Danskin’s theorem (cf. Bonnans & Shapiro,2000, Prop. 4.12) require, in particular, that the objective functional is continuous and $$\inf$$-compact on some Hausdorff topological vector space. In our case, the level sets of the objective are $$\inf$$-compact in the weak topology on $$\mathbb V$$. However, the objective is weakly lower-semicontinuous, but not continuous with respect to the weak topology. Therefore, we cannot simply embed the problem into the setting of Danskin’s theorem by replacing $$(\mathbb V,\|\cdot \|_{\mathbb{V}})$$ with $$(\mathbb V,\sigma _{\textrm{weak}})$$. In order to approximate $$\theta$$ in the PF scheme by a simple model function, we need second-order information. This will require a differential sensitivity analysis of $$\bar{u}^{\bullet }$$. For this discussion, we will make use of the first-order necessary and sufficient optimality conditions for (Pγ), which for each $$i=1,\dots ,n$$ are given by \begin{align} -\varDelta \bar{u}^{\gamma}_i + \bar{u}^{\gamma}_i - \lambda^{\gamma}_{i} + \mathbf{1} \mu^{\gamma} &= -\varDelta \varphi_i + \varphi_i,\,\,\textrm{ in } \varOmega, \end{align} (2.12a) \begin{align} \frac{\partial \left(\bar{u}^{\gamma}_i-\varphi_i\right)}{\partial n} &=0,\,\,\textrm{ on } \varGamma, \end{align} (2.12b) $$\lambda^{\gamma}_i = \gamma\left(-\bar{u}^{\gamma}_i\right)_+,$$ (2.12c) $$\mu^{\gamma} = \gamma(\mathbf{1}^\top \bar{u}^{\gamma} - 1).$$ (2.12d) Note that $$-\lambda ^{\gamma }_i = \gamma \partial \left (\frac{1}{2}\|(-\cdot )_+\|^2_{L^2(\varOmega )}\right )\left (\bar{u}^{\gamma }_i \right )$$ and $$\mu ^{\gamma } = \gamma \partial _{u_i}\left (\frac{1}{2}\|\mathbf{1}^\top \cdot - 1\|^2_{L^2(\varOmega )}\right )(\bar{u}^{\gamma })$$. In addition, we define the difference quotients $$d^{\eta}_{i} := \frac{\bar{u}^{\gamma+\eta}_i - \bar{u}^{\gamma}_i}{\eta},\quad \chi^{\eta}_{i} := \frac{\lambda^{\gamma+\eta}_i - \lambda^{\gamma}_i}{\eta},\quad \nu^{\eta} := \frac{\mu^{\gamma+\eta} - \mu^{\gamma}}{\eta}.$$ We then have the following result: Proposition 2.5 The directional derivative of the composition $$\widehat{\beta }(\bullet ) := (\beta \circ \bar{u}^{\bullet }) :(0,\infty ) \to \mathbb R$$ at $$\gamma> 0$$ in direction 1 is given by $$\widehat{\beta}^{\prime}(\gamma;1) = -\sum_{i=1}^n \left[ \int_{\varOmega}\left(-\bar u^{\gamma}_i\right)_+\bar{d}_i\,\mathrm{d}x + \int_{\varOmega}(\mathbf{1}^ T \bar u^{\gamma} - 1)\bar{d}_i\,\mathrm{d}x\right],$$ where $$\bar{d} \in \mathbb V$$ is the unique solution of the following variational problem: \begin{multline} \min\Bigg\{ \frac{1}{2}\|d\|^2_{\mathbb{V}} - \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i \right)_+-(\mathbf{1}^\top \bar{u}^{\gamma} - 1),d_i\right)_{L^2(\varOmega)} + \frac{\gamma}{2}\int_{\varOmega}\left|\mathbf{1}^\top d \right|{}^2\,\mathrm{d}x + \frac{\gamma}{2}\sum_{i=1}^n\int_{\left\{\bar{u}^{\gamma}_i = 0\right\}}(-d_i)^2_+\,\mathrm{d}x \\+ \frac{\gamma}{2}\int_{\left\{\bar{u}^{\gamma}_i < 0\right\}}|d_i|^2\,\mathrm{d}x \textrm{ over } d \in \mathbb V \Bigg\}. \end{multline} (2.13) Moreover, it holds that $$\widehat{\beta}^{\prime}(\gamma;1) \leqslant 0,\,\,\forall \gamma> 0.$$ Hence, $$\theta :(0,\infty ) \to \mathbb R$$ is concave. Proof. We first show that $$\left \{d^{\eta }\right \}_{\eta> 0}$$ is uniformly bounded in $$\mathbb V$$. First note that $$d^{\eta }_i$$ solves the following elliptic partial differential equation (PDE) with homogeneous Neumann boundary conditions: \begin{align} -\varDelta d^{\eta}_i + d^{\eta}_i - \chi^{\eta}_{i} + \nu^{\eta} &= 0,\,\,\textrm{ in } \varOmega, \end{align} (2.13a) $$\frac{\partial d^{\eta}_i}{\partial n} =0,\,\,\textrm{ on } \varGamma.$$ (2.13b) Now note that $$\sum _{i=1}^n \left \|d_i^\eta \right \|_V^2 = \sum _{i=1}^n\left \langle -\varDelta d_i^\eta +d_i^\eta ,d_i^\eta \right \rangle$$ and by the monotonicity of the gradients of $$\beta$$ that \begin{align*} \sum_{i=1}^n \left\|d_i^\eta\right\|_V^2 \leqslant \sum_{i=1}^n\left\langle -\varDelta d_i^\eta+d_i^\eta,d_i^\eta\right\rangle &- \sum_{i=1}^n\gamma\eta^{-2} \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+ - \left(-\bar{u}_i^{\gamma}\right)_+, \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\&+ \sum_{i=1}^n\gamma\eta^{-2} \left((\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1) - (\mathbf{1}^\top\bar{u}^{\gamma}-1), \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)}. \end{align*} Then, testing (2.13a) with $$d^{\eta }_i$$ and summing over i we have $$\sum_{i=1}^n\left\langle -\varDelta d_i^\eta+d_i^\eta,d_i^\eta\right\rangle = \sum_{i=1}^n\left\langle \chi_i^\eta,d_i^\eta\right\rangle - \sum_{i=1}^n\left\langle \nu^\eta,d_i^\eta\right\rangle.$$ Next, given \begin{align*} \sum_{i=1}^n\left\langle \chi_i^\eta,d_i^\eta\right\rangle &- \sum_{i=1}^n\left\langle \nu^\eta,d_i^\eta\right\rangle - \sum_{i=1}^n\gamma\eta^{-2} \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+ - \left(-\bar{u}_i^{\gamma}\right)_+, \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\&+ \sum_{i=1}^n\gamma\eta^{-2} \left((\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1) - (\mathbf{1}^\top\bar{u}^{\gamma}-1), \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\ =& \sum_{i=1}^n \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+, d_i^\eta\right)_{L^2(\varOmega)} -\sum_{i=1}^n \left(\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1, d_i^\eta\right)_{L^2(\varOmega)}, \end{align*} we have $$\sum_{i=1}^n \left\|d_i^\eta\right\|_V^2 \leqslant \sum_{i=1}^n \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+, d_i^\eta\right)_{L^2(\varOmega)} -\sum_{i=1}^n \left(\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1, d_i^\eta\right)_{L^2(\varOmega)}.$$ (2.14) It follows from Theorem 2.1.3, that $$\left \{d^{\eta }\right \}_{\eta> 0}$$ is uniformly bounded in $$\mathbb V$$. Next, we derive expansions for $$\lambda ^{\eta }_i$$ and $$\mu ^{\eta }$$. From the Lebesgue dominated convergence theorem we see that the superposition operator $$(\cdot )_+:L^2(\varOmega ) \to L^2(\varOmega )$$ is Lipschitz continuous and Hadamard directionally differentiable with $$(\cdot)_+^{\prime}\left(\bar{u}^{\gamma}_i;d^{\eta}_i\right)(x) = \begin{cases} d_i^\eta(x) & \textrm{on }\left\{\bar u_i^\eta>0\right\}\cup\left\{\bar u_i^\eta=0, d_i^\eta>0\right\},\\ 0 &\text{otherwise.}\end{cases}$$ (2.15) Moreover, due to Shapiro (1990, Proposition 3.1) it is continuous in the second variable. Since $$\left \{d^{\eta }_i\right \}_{\eta> 0}$$ is uniformly bounded in V and $$\chi^{\eta}_i = \eta^{-1}\left( (\gamma + \eta)\left(-\bar{u}^{\gamma+\eta}_i\right)_+ - \gamma\left(-\bar{u}^{\gamma}_i\right)_+\right) = \gamma \left(\frac{\left(-(\bar{u}^{\gamma}_i + \eta d^{\eta}_i)_+ - \gamma(-\bar{u}^{\gamma}_i)_+\right)}{\eta} \right) + \left(-\bar{u}^{\gamma + \eta}_i \right)_+,$$ there exists a small o-function such that $$\chi^{\eta}_i = \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-d^{\eta}_i\right) + o(1) + \left(-\bar{u}^{\gamma + \eta}_i \right)_+.$$ (2.16) Arguing similarly for $$\nu ^{\eta }$$, we have $$\nu^{\eta} = \gamma\mathbf{1}^\top d^{\eta} + \mathbf{1}^\top \bar{u}^{\gamma+\eta} - 1.$$ (2.17) Now, letting $$\eta _k \downarrow 0$$, there exists a weakly converging subsequence $$\left \{d^{k_l}_i\right \}$$ with $$d^{k_l}_i := d^{\eta _{k_l}}_i$$ and a weak limit point $$\bar{d}_i \in V$$. Since V embeds compactly into $$L^2(\varOmega )$$ and $$\bar{u}^{\bullet } :(0,\infty ) \to \mathbb V$$ is continuous, we have \begin{align*} \chi^{\eta_{k_l}}_i &\to \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-\bar{d}_i\right) + \left(-\bar{u}^{\gamma}_i \right)_+ \textrm{ strongly in } L^2(\varOmega)\\ \nu^{\eta_{k_l}} &\to \gamma\mathbf{1}^\top \bar{d}+ \mathbf{1}^\top \bar{u}^{\gamma} - 1 \textrm{ strongly in } L^2(\varOmega). \end{align*} Moreover, $$\bar{d}_i$$ solves \begin{align} -\varDelta \bar{d}_i + \bar{d}_i - \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-\bar{d}_i\right) + \gamma\mathbf{1}^\top \bar{d} &= \left(-\bar{u}^{\gamma}_i \right)_+-(\mathbf{1}^\top \bar{u}^{\gamma} - 1),\,\,\textrm{ in } \varOmega, \end{align} (2.18a) \begin{align} \frac{\partial \bar{d}_i}{\partial n} &=0,\,\,\textrm{ on } \varGamma. \end{align} (2.18b) It is then easy to see that the vector $$\bar{d} \in \mathbb V$$ solves (Dγ). Since (Dγ) has a unique solution, we can deduce by the Urysohn property that the entire sequence $$\left \{d^{k}\right \}$$ converges weakly to $$\bar{d}$$. We are now ready to derive a formula for the directional derivative of $$\theta ^{\prime}(\gamma )$$. Consider that \begin{align*} \eta^{-1}\left(\beta(\bar{u}^{\gamma + \eta}) - \beta(\bar{u}^{\gamma})\right) =&\, (2\eta)^{-1}\sum_{i=1}^n \int_{\varOmega}\bigg[\left(-\bar{u}^{\gamma + \eta}_i\right)^2_+ - \left(-\bar{u}^{\gamma}_i\right)^2_+\bigg]\, \mathrm{d}x \\ &+ (2\eta)^{-1}\int_{\varOmega}\left[\big|\mathbf{1}^ T \bar{u}^{\gamma+\eta} - 1\big|^2 - \big|\mathbf{1}^ T \bar{u}^{\gamma} - 1\big|^2\right]\mathrm{d}x. \end{align*} Using the chain rule, formula (2.15) and $$\bar{u}^{\gamma +\eta } = \bar{u}^{\gamma } + \eta d^{\gamma }$$, we have $$\eta^{-1}(\beta(\bar{u}^{\gamma + \eta}) - \beta(\bar{u}^{\gamma})) = \sum_{i=1}^n \int_{\varOmega}\left[-\left(-\bar{u}^{\gamma}_i\right)_+d^{\eta}_i\right] \mathrm{d}x + \int_{\varOmega}\left[(\mathbf{1}^ T \bar{u}^{\gamma} - 1)d^{\eta}_i\right]\mathrm{d}x + o(1).$$ Since $$\eta _k \to 0$$ implies $$d^{\eta _k} \rightharpoonup \bar{d}$$ (weakly in $$\mathbb V$$), we can use the compactness of the embedding V into $$L^2(\varOmega )$$ and pass to the limit in the previous equation. This yields $$\widehat{\beta}^{\prime}(\gamma;1) = -\sum_{i=1}^n \left[ \int_{\varOmega}\left(-\bar{u}^{\gamma}_i\right)_+\bar{d}_i\,\mathrm{d}x + \int_{\varOmega}(\mathbf{1}^ T \bar{u}^{\gamma} - 1)\bar{d}_i\,\mathrm{d}x\right].$$ Finally, it follows from (2.14) that $$\widehat{\beta }^{\prime}(\gamma ;1) \leqslant 0$$. This completes the proof. 3. Solving the first-order system 3.1 A semismooth Newton iteration The main component of the algorithm involves the direct solution of (2.12) using a semismooth Newton method. We refer the reader to the papers by Hintermüller et al. (2002); Ulbrich (2002) for the full theory of this method and only note that in the current setting, the iterates are generated by solving the following (reduced) linear system; given $$\gamma> 0$$ and $$u_{\textrm{old}}$$, solve \begin{align} A(u-\varphi) + \gamma I_{\{u_{\textrm{old}}<0\}}u + \gamma \mathbf{1}(\mathbf{1}^\top u-1) &= 0 \textrm{ in }\varOmega, \end{align} (3.1a) $$\frac{\partial(u-\varphi)}{\partial n} = 0 \textrm{ on }\partial\varOmega$$ (3.1b) for $$u_{\textrm{new}}$$. Here, A is the PDE-operator associated with the $$\mathbb V$$-inner product and $$I_{\{u_{\textrm{old}}<0\}}$$ is the characteristic function for the set $$\{u_{\textrm{old}}<0\}$$. Once the residual of (2.12) is sufficiently small, the method terminates and $$\gamma$$ is updated. The previous $$\gamma$$-dependent solution serves as the starting value for the next iteration. We summarize this procedure in Algorithm 3.1. Defining the residuals $$r^1_{k} = \| (-u_k)_+ \|_{L^2(\varOmega,\mathbb{R}^n)}, \; r^2_{k} = \left\| \mathbf{1}^\top u_k -1 \right\|_{L^2(\varOmega)}, \; r^3_{k} = (\lambda_k,u_k)_{L^2(\varOmega)},$$ the outer loop terminates whenever $$\left\|\left(r_k^1,r_k^2,r_k^3\right)\right\|_2 \leqslant \mathrm{tol}.$$ Note that $$r^1_k$$, $$r^2_k$$ are feasibility errors, whereas $$r^1_k$$, $$r^3_k$$ represent the complementarity error. The stopping criteria for the inner loop depend on the type of $$\gamma$$-update strategy. These are detailed below in Section 4. 3.2 Feasibility restoration In some applications, e.g., topology optimization, where the phase field functions $$u_1,\dots ,u_n$$ arise as parameters in a second-order PDE-operator, it is imperative that $$u \in \mathcal{G}$$. For such situations, we are forced to employ a mesh-dependent numerical method. However, the (very) warm start provided by Algorithm 3.1 provides a point that is sufficiently close to the true solution of (P) so that the jump to feasibility requires minimal effort. We demonstrate this fact in the examples. 4. Path-following scheme We now detail a strategy for efficiently updating $$\gamma _k$$ in Algorithm 3.1. As suggested in the study by Hintermüller & Kunisch (2006b), an ideal update scheme for $$\gamma$$ (at step k) would be to choose $$\gamma _{k+1}$$ such that $$|\theta_{\infty} - \theta(\gamma_{k+1})| \le \tau_k |\theta_{\infty} - \theta(\gamma_{k})|,$$ (4.1) where $$\tau _k\downarrow 0$$ is a given null sequence. However, as noted there, this is not a practical strategy as we cannot evaluate $$\theta (\gamma _{k+1})$$ and $$\theta _{\infty }$$. Nevertheless, our analysis shows that $$\theta : (0,\infty ) \to \mathbb R$$ is positive, bounded, increasing, continuously differentiable and concave. Hence, we first fit a model function, using the information available at step k and then update according to an approximation of (4.1). This serves the basis for the PF approach. Since our problem exhibits structural similarities to the problem class consider in the study by Hintermüller & Kunisch (2006b), we consider their model function $$m:(0,\infty ) \to \mathbb R$$ given by \begin{equation*} m(\delta) := C_1 - \frac{C_2}{E + \delta} \end{equation*} as an approximation of $$\theta$$, which is related to the solution of an ordinary differential equation. Given $$\gamma> 0$$ and $$\bar{u}^{\gamma }$$, we determine the ($$\gamma$$-dependent) constants in each of the model functions by setting $$m(0) = 0,\quad m(\gamma) = \theta(\gamma),\quad m^{\prime}(\gamma) = \theta^{\prime}(\gamma).$$ (4.2) This amounts to \begin{align*} E &= \gamma^2 \beta(\bar{u}^{\gamma})\left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}}\right)^{-1},\\ C_2 &= \gamma^{-1} E(E + \gamma)\left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma \beta(\bar{u}^{\gamma})\right),\\ C_1 &= C_2 E^{-1}. \end{align*} Clearly, $$E, C_1, C_2>0$$. Moreover, it can be argued using (2.3) that $$\gamma \beta (\bar{u}^{\gamma }) \to 0$$ as $$\gamma \to +\infty$$. Hence, m(⋅) monotonically increasing, $$m(\cdot ) \le C_1$$ and $$C_1 = \left(\frac{\gamma \beta(\bar{u}^{\gamma})}{\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}}} + 1 \right) \theta(\gamma)$$ implies that $$m(\delta ) \to C_1$$ as $$\delta \to \infty$$ and $$C_1\to \theta _{\infty }$$ as $$\gamma \to \infty$$. Now, we may use (4.1), which leads to the following explicit update formula: $$\gamma_{k+1} = \frac{C_{2,k}}{\tau_k|C_{1,k} - \theta(\gamma_k)|}-E_k.$$ (4.3) By choosing three different reference values for $$\gamma$$ in (4.2), we see in Fig. 1 that m is an excellent local approximation of $$\theta$$. In this method, the Newton iteration (inner loop) is terminated whenever the active index sets $$\{u_k<0\}$$ are the same in two subsequent iterations or $$\left\| A(u_{k,l+1}-\varphi) - \gamma_k (-u_{k,l+1})_+ + \gamma_k \mathbf{1}\left(\mathbf{1}^\top u_{k,l+1}-1\right)\right\|_{H^1(\varOmega,\mathbb{R}^n)^{\ast}} \le \mathrm{tol}.$$ (4.4) Fig. 1. View largeDownload slide Approximations m of $$\theta$$ taken at different reference points $$\gamma$$. Fig. 1. View largeDownload slide Approximations m of $$\theta$$ taken at different reference points $$\gamma$$. 5. Numerical results In this section, we demonstrate the performance and wide applicability of the proposed methods on three examples. The first example shows the mesh-independent behavior of the PF methods and compares this to the first-discrete-then-optimize approach suggested above, which utilizes the primal–dual active set (PDAS) method (adapted to our setting) from the study by Hintermüller et al. (2002). This method does not admit a function space convergence analysis and, hence, is expected to behave mesh dependently, i.e., utilizing the same stopping rule and initial point on all meshes, the number of iterations until successful termination will grow as the mesh is refined. For the latter two examples, we consider two optimization problems from mathematical image processing and labeled data classification that may be solved using a gradient flow or projected gradient method, see e.g., Bertsekas (1976). At each step of the line search in the projected gradient method, a projection onto the Gibbs simplex is required; with an average number of $$\gamma$$ updates (k in example 1) similar to the example in 5.1. In all examples, the underlying domain $$\varOmega = (0,1)^2$$ is discretized using a uniform triangular mesh with parameter h > 0. The function space $$H^1(\varOmega )$$ is discretized using the associated nodal basis comprised of the standard continuous piecewise affine finite elements, see, e.g., Ciarlet (1978); Brenner & Scott (2008). Note that the pointwise a.e. constraints are approximated in the finite dimensional problem by requiring the coefficients for each nodal basis function to be in the finite dimensional Gibbs simplex. In order to calculate the $$H^1(\varOmega ;\mathbb R^n)^{\ast }$$-norm in (4.4), we need the Riesz representation of the dual object. This is done by solving a (vector-valued) Poisson problem with homogeneous Neumann boundary conditions with the residual of interest on the right-hand side. We make use of the P1AFEM package (Funken et al., 2011). Concerning the PF parameters, we choose $$\tau _k=10^{-2k}$$, initial parameter $$\gamma _1=10^4$$, maximum parameter $$\gamma _{\max }=10^{14}$$ and the stopping criterion $$\textrm{tol}=10^{-6}$$. PDAS is stopped once the residual of the optimality conditions to (P) drops to $$10^{-10}$$. 5.1 Projection onto the Gibbs simplex For this example, we set n = 3 and construct $$\varphi _i$$ such that the optimal solution of (P) exhibits biactivity at the optimal solution, i.e., both the inequality constraint and the associated Lagrange multiplier are equal to zero. Such problems can be notoriously difficult to solve. Let \begin{align} \hat v_1(x,y) &= \begin{cases} 0 &\textrm{if } x\in (0,1),\ y\in \left(0,\frac 12\right)\\ \sin\left(y-\frac12\right)\cos(xy) &\textrm{if } x\in (0,1),\ y\in \big[\frac 12,1\big) \end{cases}\\ \hat v_2(x,y) &= 2+\cos(10xy),\,\, (x,y) \in (0,1)^2 \\ \hat v_3(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \end{align} and then set $$v_i = \frac{\hat v_i}{\hat v_1+\hat v_2+\hat v_3}$$ for i = 1, 2, 3. Note that $$v = (v_1,v_2,v_3)$$ satisfies the constraints in $${\mathcal{G}}$$. Similarly, we define the functions \begin{align} \lambda_1(x,y) &= \begin{cases} 1 &\textrm{if } x\in (0,1),\ y\in \left(0,\frac 12\right)\\ 0 &\textrm{if } x\in (0,1),\ y\in\big[\frac 12,1\big) \end{cases}\\ \lambda_2(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \lambda_3(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \mu(x,y) &= 1,\,\, (x,y) \in (0,1)^2 \end{align} and construct $$\varphi$$ by solving the problem \begin{align} -\varDelta \varphi + \varphi &= -\varDelta v+v+\lambda-\mathbf{1}\mu,\,\,\textrm{ in } \varOmega,\\ \frac{\partial (\varphi-v)}{\partial n} &=0,\,\,\textrm{ on } \varGamma. \end{align} Note that $$\lambda = (\lambda _1,\lambda _2,\lambda _3) \in L^2(\varOmega )^3$$, that $$(\lambda ,v)_{L^2}=0$$ and that the above system forms the Karush–Kuhn–Tucker conditions for (P). Thus, v is its optimal solution, $$\lambda$$ is the Lagrange multiplier associated with the inequality constraint and biactivity is present on the whole third component. The starting point was chosen as $$\varphi$$. The results of Algorithm 3.1 are presented in Table 1. We compare the behavior of the PDAS and the PF method on six different meshes with decreasing h. For PDAS, ‘total iter’ refers to the number of linear solves. For PF, ‘inner steps l’ refers to the total number of linear solves used in the Newton iterations, ‘outer steps k’ is the number of $$\gamma$$ updates and ‘feas. step’ counts the number of linear solves to optimality needed when using the solution of PF as a starting point in the PDAS method. This qualitatively demonstrates how close the solution of the approximating problems gets to solving (P). Note that no adaptive mesh refinement or nested grid strategies, where one would consider subsequently refined meshes and use the prolongation of the solution on the coarser mesh as the starting point on the finer mesh, are considered. Clearly, the PF strategies behave much better over mesh refinements than PDAS. For example, with $$h = 2^{-8}$$, the true solution to (P) is obtained in 99 iterations with PDAS and 15 with PF. Table 1 Results for projection of $$\varphi$$ onto the Gibbs simplex h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 Table 1 Results for projection of $$\varphi$$ onto the Gibbs simplex h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 5.2 Applications in image processing and labeled data classification In this subsection, we consider two examples inspired by the work in the studies by Bertozzi et al. (2007); Garcia-Cardona et al. (2014). Having n phases, the common goal is to minimize a criterion J such that the phases are (almost) pure. This amounts to solving the problem $$\min_{u \in H^1(\varOmega;\mathbb R^n)}\left\{J(u) + \zeta\int_\varOmega \left(\varepsilon| \nabla u|^2 + \frac1\varepsilon u(1-u)\right)\mathrm{d}x \;\Big|\; u \in \mathcal{G}\right\},$$ (5.1) where $$\zeta>0$$ is the scaling parameter and parameter $$\varepsilon>0$$ is proportional to the interfacial width. The second part of the objective is the multiphase Ginzburg–Landau energy, which along with the indicator function for $$\mathcal{G}$$, serves as a penalty functional that avoids pathological free boundaries between the phases. Together, the sum of these functionals $$\varGamma$$-converges to the perimeter of the characteristic functions associated with the true phases in a sharp interface regime (Modica, 1987a,b). As such, straight edges are more favorable. We use the standard projected gradient method with line search to solve (5.1). Labeled data classification In the first example we separate n = 3 different classes of observations with known labels. First, we ‘train’ three segmentation phase field functions u on M noisy data and then test the accuracy of these solutions on further noisy realizations. This amounts to solving (5.1) with $$J(u) = \sum_{m=1}^M\left(1-\frac{1}{| B(x_m,\delta)|}\int_{B(x_m,\delta)}u_{j(m)}\,\mathrm{d}x\right),$$ (5.2) where $$m \in \{1,\dots ,M\}$$ is the index of a noisy data point $$x_m \in \varOmega$$ and $$B(x_m,\delta )$$ is a small ball of radius $$\delta> 0$$ around this point; $$j(m)\in \{1,2,3\}$$ refers to the known label (class) of this point. Due to the structure of the Gibbs simplex, J is minimized when $$u_{j(m)}=1$$ and $$u_i=0$$ for $$i\neq j(m)$$ on $$B(x_m,\delta )$$. Thus, J expressed the number of incorrectly classified points. In (5.1) we consider $$\varepsilon =2^{-6}$$ and $$\zeta =2\cdot 10^{-2}$$, and in (5.2) we set $$\delta =3\cdot 2^{-7}$$. The algorithm was started with a uniform mixture of phases. We randomly generated M = 1500 points (500 for each category $$+$$/left half circle, ∘/bottom half circle, ×/right half circle), see the left-hand side of Fig. 2. We then solved the optimization problem in order to train three phase field functions, see the right-hand side of Fig. 2. In order to test the accuracy of these fields, we again generated the same number of points and checked how many of these points lie in the correct region. In this instance, the solution was correct for 99.6% of the data points. The numerical comparison of both methods is shown in Table 2. Even though PF needs more iteration than PDAS, the iteration growth is much slower upon mesh refinement. The difference diminishes when we use warm start from the previous mesh. As we will see it the next application, on sufficiently fine meshes PF will eventually need less iterations than PDAS. Table 2 Classification results: number of iterations on uniform meshes with element size h and the increases with respect to previous iterations. The first part depicts the start with uniform phases, the other one the warm start from the previous mesh. The projected gradients needed four iterations with seven projections in both cases Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Table 2 Classification results: number of iterations on uniform meshes with element size h and the increases with respect to previous iterations. The first part depicts the start with uniform phases, the other one the warm start from the previous mesh. The projected gradients needed four iterations with seven projections in both cases Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Inpainting For the inpainting application, we assume that an original image (top left of Fig. 3) is degraded by adding a text on top of it to produce a known image $$\hat u$$ (top right of Fig. 3).1 The goal is then to recover the original image as faithfully as possible. We follow the suggestion in the study by Bertozzi et al. (2007) to use a modified Allen–Cahn equation for image inpainting, which leads to (5.1) with $$J(u) = \int_{\varOmega}(u-\hat u)^2\,\mathrm{d}x.$$ (5.3) In (5.1) we consider $$\varepsilon =2^{-7}$$ and $$\zeta =7\cdot 10^{-3}$$. The algorithm was started with the degraded image. On the contrary to the previous cases, we use nonuniform mesh, where the phase interface is much more refined, see the bottom right part of Fig. 3 that depicts the mesh for the right panda eye. For simplicity, we consider a locally adapted mesh which is given. A similar mesh may be obtained from developing an a posteriori error estimates for adaptive mesh refinement, see e.g., Baňas & Nürnberg (2009) or Hintermüller et al. (2011). As the latter technique, however, is not the focus of this work, we rely on an a priori mesh. For simplicity, we consider locally adapted mesh that is given a priori, for results. The recovered image is shown in the bottom left part of Fig. 3. The numerical evidence is summarized in Table 3, where h is the smallest element size. On the coarsest mesh PDAS outperforms PF, but as the mesh is getting finer, PF needs less iterations to converge to the solution. In all cases, the projected gradients needed 12 iterations with 26 projections. Table 3 Inpainting results: nonuniform mesh with refined mesh interface; h is the size of the smallest element. The projected gradients needed 12 iterations with 26 projections h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 Table 3 Inpainting results: nonuniform mesh with refined mesh interface; h is the size of the smallest element. The projected gradients needed 12 iterations with 26 projections h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 Fig. 2. View largeDownload slide Classification results: generated random points in the shape of three half-moons (left) and the obtained three regions used for categorization of points (right). Fig. 2. View largeDownload slide Classification results: generated random points in the shape of three half-moons (left) and the obtained three regions used for categorization of points (right). Fig. 3. View largeDownload slide Inpainting results: initial image (top left), devalued image (top right), recovered image (bottom left) and mesh detail of the right panda eye (bottom right). Fig. 3. View largeDownload slide Inpainting results: initial image (top left), devalued image (top right), recovered image (bottom left) and mesh detail of the right panda eye (bottom right). 6. Summary In this paper, we motivate the need for a function-space-based algorithm for the $$H^1$$-projection onto the multiphase Gibbs simplex. In particular, the wide array of applications alone involving optimization or control problems with the Gibbs simplex as a constraint warrants such a study. In order to make the best use of fast second-order methods for semismooth equations, an analytical PF study is carried out. Our analytical approach drastically reduces the length of the arguments and necessary assumptions required in the studies by Hintermüller & Kunisch (2006a,b); the two works that inspired our approach. Due to the generality of our arguments, they can clearly be extended to similar constraint sets. As in the studies by Hintermüller & Kunisch (2006a,b), we suggest a PF strategy for the update of the Moreau–Yosida regularization parameter, which is based on a concave model function. In our numerical experiments, we first demonstrate the clear advantage of our methods over a first-discretize-then-optimize approach. Finally, we consider two small examples inspired by recent work in the studies by Bertozzi et al. (2007); Garcia-Cardona et al. (2014) that require the $$H^1$$-projection onto the Gibbs simplex. Further applications, for example in topology optimization will be considered in future work. Funding This work was carried out in the framework of the Deutsche Forschungsgemeinschaft (DFG) under grant no. HI 1466/7-1 ‘Free Boundary Problems and Level Set Methods’ as well as the Research Center Matheon supported by the Einstein Foundation Berlin within projects OT1, SE5 and SE15. Footnotes 1  The picture can be downloaded from many wallpaper sites such as http://dairennsa.net/. References Baňas , L. & Nürnberg , R. ( 2009 ) A posteriori estimates for the Cahn–Hilliard equation with obstacle free energy . Esaim Math. Model. Numer. Anal. , 43 , 1003 – 1026 . Google Scholar CrossRef Search ADS Bertozzi , A. L. , Esedoglu , S. & Gillette , A. ( 2007 ) Inpainting of binary images using the Cahn–Hilliard equation . IEEE Trans. Image Process. , 16 , 285 – 291 . Google Scholar CrossRef Search ADS PubMed Bertsekas , D. P. ( 1976 ) On the Goldstein–Levitin–Polyak gradient projection method . IEEE Trans. Automat. Contr., AC-21 , 174 – 184 . Google Scholar CrossRef Search ADS Blank , L. , Farshbaf-Shaker , M. H. , Garcke , H. , Rupprecht , C. & Styles , V. ( 2014 ) Multi-material phase field approach to structural topology optimization . Trends in PDE Constrained Optimization . International Series of Numerical Mathematics , vol. 165. Cham : Birkhäuser/Springer , pp. 231 – 246 . Blank , L. , Garcke , H. , Sarbu , L. , Srisupattarawanit , T. , Styles , V. & Voigt , A. ( 2012 ) Phase-field approaches to Structural Topology Optimization . In: Leugering G. et al. (eds) Constrained Optimization and Optimal Control for Partial Differential Equations . International Series of Numerical Mathematics, vol. 160. Springer , Basel . Google Scholar CrossRef Search ADS Bonnans , J. F. & Shapiro , A. ( 2000 ) Perturbation Analysis of Optimization Problems , New York : Springer . Google Scholar CrossRef Search ADS Brenner , S. C. & Scott , L. R. ( 2008 ) The Mathematical Theory of Finite Element Methods , 3rd edn . Texts in Applied Mathematics , vol. 15. New York : Springer . Brézis , H. & Stampacchia , G. ( 1968 ) Sur la régularité de la solution d’inéquations elliptiques . Bull. Soc. Math. France , 96 , 153 – 180 . Google Scholar CrossRef Search ADS Burger , M. , He , L. & Schönlieb , C.-B. ( 2009 ) Cahn–Hilliard inpainting and a generalization for grayvalue images . SIAM Journal on Imaging Sciences , 2 , 1129 – 1167 . Ciarlet , P. G. ( 1978 ) The Finite Element Method for Elliptic Problems . Studies in Mathematics and its Applications , vol. 4. Amsterdam-New York-Oxford : North-Holland Publishing Co . Funken , S. , Praetorius , D. & Wissgott , P. ( 2011 ) Efficient implementation of adaptive P1-FEM in Matlab . Comput. Methods Appl. Math. , 11 , 460 – 490 . Google Scholar CrossRef Search ADS Garcia-Cardona , C. , Merkurjev , E. , Bertozzi , A. L. , Flenner , A. & Percus , A. G. ( 2014 ) Multiclass data segmentation using diffuse interface methods on graphs . IEEE Trans. Pattern Anal. Mach. Intell. , 36 , 1600 – 1613 . Google Scholar CrossRef Search ADS PubMed Hintermüller , M. , Hinze , M. & Tber , M. H. ( 2011 ) An adaptive finite-element Moreau–Yosida-based solver for a non-smooth Cahn–Hilliard problem . Optim. Methods Softw. , 26 , 777 – 811 . Google Scholar CrossRef Search ADS Hintermüller , M. , Ito , K. & Kunisch , K. ( 2002 ) The primal-dual active set strategy as a semismooth Newton method . SIAM J. Optim. 13 , 865 – 888 . Hintermüller , M. & Kunisch , K. ( 2006a ) Feasible and noninterior path-following in constrained minimization with low multiplier regularity . SIAM J. Control Optim. , 45 , 1198 – 1221 . Google Scholar CrossRef Search ADS Hintermüller , M. & Kunisch , K. ( 2006b ) Path-following methods for a class of constrained minimization problems in function space . SIAM J. Optim. , 17 , 159 – 187 . Google Scholar CrossRef Search ADS Ioffe , A. D. & Tihomirov , V. M. ( 1979 ) Theory of Extremal Problems . Studies in Mathematics and its Applications , vol. 6. Amsterdam-New York : North-Holland Publishing Co. ( Translated from Russian by Karol Makowski .) Keuthen , M. & Ulbrich , M. ( 2015 ) Moreau–Yosida regularization in shape optimization with geometric constraints . Comput. Optim. Appl. , 62 , 181 – 216 . Google Scholar CrossRef Search ADS Kinderlehrer , D. & Stampacchia , G. ( 2000 ) An Introduction to Variational Inequalities and Their Applications . Classics in Applied Mathematics , vol. 31. Philadelphia, PA : Society for Industrial and Applied Mathematics (SIAM) . ( Reprint of the 1980 original .) Kunisch , K. , Liang , K. & Lu , X. ( 2010 ) Optimal control for an elliptic system with polygonal state constraints . SIAM J. Control Optim. , 48 , 5053 – 5072 . Google Scholar CrossRef Search ADS Kunisch , K. & Lu , X. ( 2013 ) Optimal control for an elliptic system with convex polygonal control constraints . IMA J. Numer. Anal. , 33 , 875 – 897 . Google Scholar CrossRef Search ADS Modica , L. ( 1987a ) Gradient theory of phase transitions with boundary contact energy . Ann. Inst. H. Poincaré Anal. Non Linéaire , 4 , 487 – 512 . Google Scholar CrossRef Search ADS Modica , L. ( 1987b ) The gradient theory of phase transitions and the minimal interface criterion . Arch. Ration. Mech. Anal. , 98 , 123 – 142 . Google Scholar CrossRef Search ADS Shapiro , A. ( 1990 ) On concepts of directional differentiability . J. Optim. Theory Appl. , 66 , 477 – 487 . Google Scholar CrossRef Search ADS Stinner , B. , Nestler , B. & Garcke , H. ( 2004 ) A diffuse interface model for alloys with multiple components and phases . SIAM J. Appl. Math. , 64 , 775 – 799 . Google Scholar CrossRef Search ADS Ulbrich , M . ( 2002 ) Semismooth Newton methods for operator equations in function spaces . SIAM J. Optim. , 13 , 805 – 841 . Google Scholar CrossRef Search ADS Van Wachem , B. & Almstedt , A.-E. ( 2003 ) Methods for multiphase computational fluid dynamics . Chem. Eng. J. , 96 , 81 – 98 . Google Scholar CrossRef Search ADS This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Numerical Analysis Oxford University Press

# A semismooth Newton method with analytical path-following for the $$H^1$$ -projection onto the Gibbs simplex

, Volume Advance Article – Jun 7, 2018
20 pages

/lp/ou_press/a-semismooth-newton-method-with-analytical-path-following-for-the-h-1-Xxi2ZfY3y5
Publisher
Oxford University Press
ISSN
0272-4979
eISSN
1464-3642
D.O.I.
10.1093/imanum/dry034
Publisher site
See Article on Publisher Site

### Abstract

Abstract An efficient, function-space-based second-order method for the $$H^1$$-projection onto the Gibbs simplex is presented. The method makes use of the theory of semismooth Newton methods in function spaces as well as Moreau–Yosida regularization and techniques from parametric optimization. A path-following technique is considered for the regularization parameter updates. A rigorous first- and second-order sensitivity analysis of the value function for the regularized problem is provided to justify the update scheme. The viability of the algorithm is then demonstrated for two applications found in the literature: binary image inpainting and labeled data classification. In both cases, the algorithm exhibits mesh-independent behavior. 1. Introduction In many modern applications such as image inpainting (Bertozzi et al., 2007; Burger et al., 2009), topology optimization (Blank et al., 2012, 2014), multiclass data segmentation (Garcia-Cardona et al., 2014) and multiphase fluid dynamics (Van Wachem & Almstedt, 2003; Stinner et al., 2004), the associated variational problems contain multiple phase-field functions, denoted here by $$u_1,\dots , u_n$$, that fulfill the conditions $$u_i \geqslant 0$$ and $$\sum u_i =1$$ (in a pointwise almost everywhere (a.e.) sense). This typically leads to situations in which one needs to project onto the so-called Gibbs simplex in the Sobolev space $$H^1(\varOmega )$$. Mathematically speaking, this involves the solution of the following infinite dimensional optimization problem: $$\min\left\{\frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} \textrm{ over } u \in \mathbb V \;\Big|\; u \in \mathcal{G} \right\},$$ (P) where $$\mathcal{G} \subset H^1(\varOmega )$$, given by $$\mathcal{G} := \left\{ u \in \mathbb V \left| \; \mathbf{1}^\top u = 1,\; u_i \geqslant 0, i =1,\dots,n \right.\right\},$$ is the Gibbs simplex (n ⩾ 2). Here, we let $$\varOmega \subset \mathbb R^{d}$$ be nonempty, open and bounded with Lipschitz boundary $$\varGamma$$ and set $$V:= H^1(\varOmega )$$ and $$\mathbb V:= V^n$$. In addition, we fix some $$\varphi \in \mathbb V$$ and note that $$\mathbf{1}$$ is the column vector of ones in $$\mathbb R^n$$. The conditions on the functions u in $$\mathcal{G}$$ are always understood pointwise a.e. Moreover, since $$\varOmega$$ is bounded, $$\mathcal{G}$$ is in fact a nonempty, closed and convex subset in $$H^1(\varOmega )^n \cap L^{\infty }(\varOmega )^n$$, which is also bounded in $$L^\infty (\varOmega )^n$$. Although the constraint set $$\mathcal{G}$$ is formulated with pointwise a.e. constraints, it is crucial that the correct norm, in this case the $$H^1$$-norm, is used in the projection. Indeed, let $$\varGamma$$ be smooth, n = 2 and $$\varphi \in H^2(\varOmega )^2$$ such that $$\nabla [\varphi _1 + \varphi _2 - 1]\cdot n|_{\varGamma } = 0$$. Then (P) reduces to solving the (one-dimensional) obstacle problem in $$H^1(\varOmega )$$ given by $$\min\left\{\|u\|^2_{\mathbb{V}} - (f, u)_{L^2} \textrm{ over } u \in \mathbb V \;\Big|\; 0 \leqslant u \leqslant 1\right\},$$ where, setting $$\widetilde{\varphi } := \varphi _1 + \varphi _2 - 1$$, the function $$f = -\varDelta \widetilde{\varphi } +\widetilde{\varphi }\in L^2(\varOmega )$$. It is well known, cf. Brézis & Stampacchia (1968); Kinderlehrer & Stampacchia (2000), that the unique optimal solution $$\overline{u}$$ is in $$H^2(\varOmega )$$. Furthermore, the pointwise/thresholding/$$L^2$$-projection of $$\varphi$$, given by $$\varphi + \max(0,-\varphi) - \max(0,\varphi - 1),$$ is feasible, but only in $$H^1(\varOmega )$$; as the pointwise $$\max (0,\cdot )$$-operator does not preserve $$H^2$$-regularity. Hence, the pointwise projection of $$\varphi$$ onto the Gibbs simplex is feasible, yet it cannot be the solution to the original problem. Following the approach in the studies by Hintermüller & Kunisch (2006a,b), we approximate (P) by replacing $$\mathcal{G}$$ with a Moreau–Yosida-type regularization of the associated indicator function. This yields the approximation $$\theta(\gamma) := \min\left\{ \frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} + \frac{\gamma}{2}\sum_{i=1}^n\int_{\varOmega}(-u_i)^2_+\mathrm{d}x + \frac{\gamma}{2}\int_{\varOmega}\left|\mathbf{1}^\top u - 1\right|{}^2\mathrm{d}x \textrm{ over } u \in \mathbb V \right\}.$$ (Pγ) The associated first-order system can be solved by a semismooth Newton method in function space. In order to increase numerical efficiency by avoiding excess parameter updates, we derive a full differential sensitivity analysis of the optimal value function $$\theta (\gamma )$$. These results lead to an explicit parameter update scheme referred to as ‘path-following’ (PF). We note that our analysis is much more compact than in the studies by Hintermüller & Kunisch (2006a,b). Moreover, no assumptions on the active sets are needed to derive second-order properties of $$\theta (\gamma )$$. A further important factor, which motivates this work, arises from the fact that a first-discretize-then-optimize approach will typically exhibit mesh dependence under adaptive refinements (cf. the examples in Section 5). The main culprit for this aspect is due to a lack of multiplier regularity for the constraints in $$\mathcal{G}$$. Using a Moreau–Yosida-type regularization increases the regularity (smoothness) of the problem, which, in particular, leads to superlinear convergence of semismooth Newton methods in function space for the regularized problems. Moreover, by deriving an efficient approximation of the primal–dual path value functional $$\theta$$, we avoid problems connected with ill-conditioning. Before discussing the outline of the paper, which is inspired by the frequent occurrence of the Gibbs simplex in applications, we note that our approach readily extends to general polygonal constraints as in, e.g., Kunisch et al. (2010); Kunisch & Lu (2013). For n = 1 and more general convex constraints handled by Moreau–Yosida regularization, see Keuthen & Ulbrich (2015). The rest of the paper is structured as follows. In Section 2 we analyze the parameter dependent problem (Pγ), which lays the foundation for the PF. Afterwards, we suggest a PF scheme and present the algorithm in Sections 3 and 4. Finally, in Section 5, we demonstrate the performance of the algorithm on two applications taken from the literature. This is done for both PF schemes and a comparison is made to a mesh-dependent scheme based on the direct solution of the first-order system of (P). 2. Sensitivity analysis As indicated in the introduction, we need to analyze the first- and second-order differentiability properties of the parameter dependent quantities related to (Pγ) in order to derive a PF scheme. We thus begin this section with some basic facts about the optimal solution mapping and optimal value functions associated with (P) and (Pγ). Throughout the discussion, we use • to indicate the dependence on the penalty parameter $$\gamma> 0$$ and for readability let $$\beta(u) := \frac12\sum_{i=1}^n\int_{\varOmega}(-u_i)^2_+\mathrm{d}x + \frac12\int_{\varOmega}\left|\mathbf{1}^\top u - 1\right|{}^2\mathrm{d}x.$$ Theorem 2.1 The following facts hold: (P) has a unique solution $$\bar{u}$$ and, for all $$\gamma> 0$$, (Pγ) has a unique solution $$\bar{u}^{\gamma }$$. The optimal solution mapping $$\bar{u}^{\bullet }: (0,\infty ) \to \mathbb V$$ is globally Hölder continuous with exponent 1/2 and locally Lipschitz continuous. For any sequence $$\gamma _k \to +\infty$$, the sequence $$\left \{\bar{u}^k\right \}$$ with $$\bar{u}^k := \bar{u}^{\gamma _k}$$ converges strongly to $$\bar{u}$$ in $$\mathbb V$$. The optimal value function $$\theta : (0,\infty ) \to \mathbb R$$ is non-negative, monotonically increasing with $$\lim _{\gamma \downarrow 0} \theta (\gamma )= 0$$, locally Lipschitz continuous and bounded from above by $$\theta _{\infty }$$, the optimal value of (P). If there exists s > 0 such that $$\beta (\bar{u}^{\gamma }) = \mathcal{O}(\gamma ^{-1-s})$$, then $$\| \bar{u}^{\gamma }- \overline{u}\|_{\mathbb{V}} = \mathcal{O}(\gamma ^{-s/2})$$. Proof. To see that item 1 holds, consider that both (P) and (Pγ) involve the minimization of a strongly convex continuous functional over a nonempty closed set in a real Hilbert space, both problems admit unique solutions $$\bar{u}$$ and $$\bar{u}^{\gamma }$$, respectively. To prove continuity of $$\bar{u}^{\bullet }$$ on $$(0,\infty )$$, we need several results. By definition of a global optimum, we have $$\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\leqslant \frac{1}{2}\|u - \varphi\|^2_{\mathbb{V}} + \gamma\beta(u),\,\,\forall u \in \mathbb V.$$ Therefore, it follows from the feasibility of $$\bar{u}$$ and non-negativity of the objectives that $$0 \leqslant \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}} = \theta_{\infty},\,\,\forall \gamma> 0.$$ (2.1) Furthermore, since $$\beta (\cdot ) \geqslant 0$$, we deduce the uniform boundedness in $$H^1(\varOmega )^n$$ of the set $$\left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$. Let $$\gamma _1, \gamma _2 \in (0,\infty )$$ and define the associated optimal solutions of (Pγ) by $$\bar{u}^{\gamma _i}$$, i = 1, 2. Since $$\beta : \mathbb V \to \mathbb R$$ is continuous and convex, it is subdifferentiable. It follows then from the first-order optimality conditions for (Pγ) that $$0 \in \partial\left(\frac{1}{2}\|\cdot - \varphi\|^2_{\mathbb{V}} + \gamma_i \beta(\cdot)\right)(\bar{u}^{\gamma_i}) = \partial\left(\frac{1}{2}\|\cdot - \varphi\|^2_{\mathbb{V}}\right)(\bar{u}^{\gamma_i}) + \gamma_i \partial \beta(\cdot)(\bar{u}^{\gamma_i}) \Longrightarrow -v_i \in \gamma_i \partial \beta(\cdot)(\bar{u}^{\gamma_i}),$$ where $$v_i \in \partial \left (\frac{1}{2}\|\cdot - \varphi \|^2_{\mathbb{V}}\right )(\bar{u}^{\gamma _i}) = A(\bar{u}^{\gamma _i}-\varphi )$$ and A is the (uniformly elliptic) bounded linear operator associated with the inner product on $$\mathbb V$$. Note that due to convexity and continuity, the sum rule for convex subdifferentials holds, cf. Ioffe & Tihomirov (1979). By definition of the subdifferential, we now have $$\gamma_i \beta(z) \geqslant \gamma_i \beta(\bar{u}^{\gamma_i}) + \langle -v_i, z - \bar{u}^{\gamma_i}\rangle,\,\,\forall z \in \mathbb V,\,\, i =1,2.$$ (2.2) For i = 1, we set $$z = \bar{u}^{\gamma _2}$$ and for i = 2, we set $$z = \bar{u}^{\gamma _1}$$ in (2.2). Adding the resulting terms yields $$\gamma_1 \beta(\bar{u}^{\gamma_2}) + \gamma_2 \beta(\bar{u}^{\gamma_1})\geqslant \gamma_1 \beta(\bar{u}^{\gamma_1}) + \gamma_2 \beta(\bar{u}^{\gamma_2}) + \langle -v_1, \bar{u}^{\gamma_2} - \bar{u}^{\gamma_1}\rangle + \langle -v_2, \bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\rangle.$$ Hence, we have $$|\gamma_1 - \gamma_2| | \beta(\bar{u}^{\gamma_1})-\beta(\bar{u}^{\gamma_2})| \geqslant (\gamma_1 - \gamma_2)(\beta(\bar{u}^{\gamma_2})-\beta(\bar{u}^{\gamma_1})) \geqslant \langle A(\bar{u}^{\gamma_1}-\bar{u}^{\gamma_2}),\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\rangle = \|\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\|^2_{\mathbb{V}}.$$ Since $$\{\beta (\bar u^\gamma )\}_{\gamma>0}$$ is bounded, there exists some M > 0 such that $$\|\bar{u}^{\gamma_1} - \bar{u}^{\gamma_2}\|_{\mathbb{V}} \leqslant M^{1/2}|\gamma_1 - \gamma_2|^{1/2},\forall \gamma_1,\gamma_2> 0,$$ i.e., $$\bar{u}^{\bullet }: (0,\infty ) \to \mathbb V$$ is Hölder continuous. Since $$\beta$$ is convex and continuous, it is locally Lipschitz on any open convex set containing $$\left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$. Therefore, if we fix some $$\gamma _0 \in (0,\infty )$$ and consider $$\bar{u}^{\bullet }: (\gamma _0-\varepsilon ,\gamma _0+\varepsilon ) \to \mathbb V$$ for sufficiently small $$\varepsilon> 0$$, then $$\bar{u}^{\bullet }: (\gamma _0-\varepsilon ,\gamma _0+\varepsilon )\to \mathbb V$$ is Lipschitz continuous. This proves 2. Next, let $$\gamma _k \to +\infty$$ and define the sequence $$\{\bar{u}^k\}$$ by $$\bar{u}^k := \bar{u}^{\gamma _k}$$. Since $$\{\bar{u}^k\} \subset \left \{\bar{u}^{\gamma }\right \}_{\gamma> 0}$$, which is bounded, there exists a weakly convergent subsequence $$\{\bar{u}^{k_l}\}$$ with limit $$\bar{u}^{\star }$$. Returning to (2.1) in this setting and dividing both sides by $$\gamma _{k_l}$$, it follows from the weak lower semicontinuity of $$\beta$$ that $$\beta (\bar{u}^{\star }) =0$$; from which it can be easily deduced that $$\bar{u}^{\star } \in \mathcal{G}$$. But then given $$\frac{1}{2}\|\bar{u}^{k_l} - \varphi\|^2_{\mathbb{V}} \leqslant \frac{1}{2}\|\bar{u}^{k_l} - \varphi\|^2_{\mathbb{V}} + \gamma_{k_l}\beta(\bar{u}^{k_l}) \leqslant \frac{1}{2}\|\bar{u}- \varphi\|^2_{\mathbb{V}},$$ (2.3) it follows from the weak lower semicontinuity of $$\|\cdot -\varphi \|^2_{\mathbb{V}}$$ and the uniqueness of the global minimizer $$\bar{u}$$ that $$\bar{u}^{\star } = \bar{u}$$. Moreover, the uniqueness of $$\bar{u}$$ also implies, by the Urysohn property, that the entire sequence $$\{\bar{u}^k\}$$ converges weakly to $$\bar{u}$$. Using the previous inequality, we have once again by weak lower semicontinuity and optimality (respectively) $$\liminf_{k \to+\infty}\frac{1}{2}\|\bar{u}^{k} - \varphi\|^2_{\mathbb{V}} \geqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}}\qquad\textrm{ and }\qquad \limsup_{k \to +\infty} \frac{1}{2}\|\bar{u}^{k} - \varphi\|^2_{\mathbb{V}} \leqslant \frac{1}{2}\|\bar{u} - \varphi\|^2_{\mathbb{V}}.$$ Thus, $$\|\bar{u}^{k} - \varphi \|^2_{\mathbb{V}} \to \|\bar{u} - \varphi \|^2_{\mathbb{V}}$$ and subsequently $$\|\bar u^k\|_{\mathbb{V}}\to \|\bar u\|_{\mathbb{V}}$$. Since $$\mathbb V$$ is a Hilbert space, it follows that $$\bar{u}^k \to \bar{u}$$ strongly in $$\mathbb V$$. This proves 3. Next, we prove 4. Since $$0\leqslant \theta (\gamma ) \leqslant \gamma \beta (\varphi ) \to 0$$ as $$\gamma \downarrow 0$$, we obtain $$\lim _{\gamma \downarrow 0}\theta (\gamma )=0$$. Moreover, local Lipschitz continuity of $$\theta :(0,\infty ) \to \mathbb R$$ follows from that of $$\bar{u}^{\bullet }:(0,\infty ) \to \mathbb V$$. To see that $$\theta$$ is monotonically increasing, fix $$\gamma , \eta> 0$$ consider that \begin{align} \theta(\gamma + \eta) - \theta(\gamma) &\geqslant \frac{1}{2}\|\bar{u}^{\gamma+\eta} - \varphi\|^2_{\mathbb{V}} + (\gamma+\eta)\beta(\bar{u}^{\gamma+\eta}) - \left(\frac{1}{2}\|\bar{u}^{\gamma+\eta} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma+\eta})\right) \\ &= \eta \beta(\bar{u}^{\gamma+\eta}) \geqslant 0. \end{align} (2.4) Finally, we prove a rate of convergence result. The idea is inspired by similar results and techniques in the studies by Hintermüller & Kunisch (2006b); Keuthen & Ulbrich (2015). We start by noting that by assumption there exists some s > 0 such that $$\beta(\bar{u}^{\gamma}) = \mathcal{O}(\gamma^{-1-s}).$$ Looking ahead to the result of Theorem 2.3, which guarantees continuous differentiability of $$\theta$$, in particular $$\theta ^{\prime}(\gamma ) = \beta (\bar{u}^{\gamma })$$, it follows from the integral form of the mean value theorem that for any $$\gamma _2> \gamma _1 > 0$$ we have $$\theta(\gamma_2) - \theta(\gamma_1) = \int_{\gamma_1}^{\gamma_2} \theta^{\prime}(\gamma) \,\mathrm{d} \gamma = \int_{\gamma_1}^{\gamma_2} \beta(\bar{u}^{\gamma}) \,\mathrm{d} \gamma \leqslant C \int_{\gamma_1}^{\gamma_2} \frac{1}{\gamma^{1+s}} \,\mathrm{d} \gamma = C \left(-\gamma_{2}^{-s} + \gamma_{1}^{-s}\right)\!,$$ where C > 0 is some constant independent of $$\gamma$$. Passing to the limit in $$\gamma _2 \to +\infty$$, it follows from the above that $$\theta_{\infty} - \theta(\gamma_1) = \frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \theta(\gamma_1) = \frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \frac{1}{2} \|\bar{u}^{\gamma_1} - \varphi \|^2_{\mathbb{V}} - \gamma_1 \beta(\bar{u}^{\gamma_1}) \le C \gamma_{1}^{-s}.$$ (2.5) Next, consider that $$\frac{1}{2} \|\bar{u} - \varphi \|^2_{\mathbb{V}} - \frac{1}{2} \|\bar{u}^{\gamma_1} - \varphi \|^2_{\mathbb{V}} = \frac{1}{2}\|\bar{u} - \bar{u}^{\gamma_1}\|^2_{\mathbb{V}} + (\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}}.$$ (2.6) Furthermore, using $$\bar{u} - \bar{u}^{\gamma _1}$$ as a test function in the first-order optimality conditions for the penalized problem, we obtain (due to symmetry) $$(\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}} = \langle A[\bar{u}^{\gamma_1} - \varphi], \bar{u} - \bar{u}^{\gamma_1}\rangle = -\gamma \langle \partial \beta(\bar{u}^{\gamma_1}), \bar{u} - \bar{u}^{\gamma_1} \rangle.$$ Then, since $$\partial \beta$$ is a monotone operator and $$\bar{u}$$ is feasible, we have $$(\bar{u} - \bar{u}^{\gamma_1},\bar{u}^{\gamma_1} - \varphi)_{\mathbb{V}} = \gamma \langle \partial \beta(\bar{u}^{\gamma_1}) - \partial \beta(\bar{u}), \bar{u}^{\gamma_1} - \bar{u} \rangle \geqslant 0.$$ Hence, using this in (2.6) and then returning to (2.5), we obtain $$\frac{1}{2}\|\bar{u}^{\gamma_1} - \bar{u}\|^2_{\mathbb{V}} \leqslant \gamma_{1}^{-s} \left(C + \gamma^{1+s}_1\beta(\bar{u}^{\gamma_1})\right).$$ It follows that $$\|\bar{u}^{\gamma_1} - \bar{u}\|_{\mathbb{V}} = \mathcal{O}\left( \gamma_{1}^{-s/2} \right),$$ as was to be shown. This completes the proof. Remark 2.2 Since $$\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma \beta(\bar{u}^{\gamma}) \leqslant \frac{1}{2}\|\bar{u}- \varphi\|^2_{\mathbb{V}}$$ and $$\frac{1}{2}\|\bar{u}^{\gamma } - \varphi \|^2_{\mathbb{V}} \to \frac{1}{2}\|\bar{u}- \varphi \|^2_{\mathbb{V}}$$ as $$\gamma \to +\infty$$, it follows that $$\gamma \beta (\bar{u}^{\gamma }) \to 0$$, i.e., $$\beta (\bar{u}^{\gamma }) = o(\gamma ^{-1})$$. Therefore, it is plausible that $$\beta (\bar{u}^{\gamma })$$ satisfies the necessary rate condition. In particular, we readily see that for each $$i = 1,\dots , n$$ $$\left\| \left(-\bar{u}^{\gamma}_{i}\right)_+ \right\|_{L^2(\varOmega)} = o(\gamma^{-1/2}) \textrm{ and } \| 1 - \mathbf{1}^T\bar{u}^{\gamma} \|_{L^2(\varOmega)} = o(\gamma^{-1/2}).$$ In fact, when $$\varOmega \subset \mathbb R^1$$, which would be relevant for signal processing applications, we can even show directly that at least s = 1/2. Since $$H^1(\varOmega )$$ is continuously embedded into $$L^{\infty }(\varOmega )$$ for domains in $$\mathbb R^1$$, we can directly verify Robinson’s constraint qualification for the Gibbs simplex, cf. Bonnans & Shapiro (2000). Therefore, there exist Lagrange multipliers $$\lambda , \mu \in \mathbb V^{\ast } \times H^1(\varOmega )^{\ast }$$ such that for $$i=1,\dots ,n$$ \begin{align} -\varDelta \bar{u}_i + \bar{u}_i - \lambda_{i} - \mathbf{1} \mu &= -\varDelta \varphi_i + \varphi_i,\,\,\textrm{ in } \varOmega, \end{align} (2.7a) \begin{align} \frac{\partial (\bar{u}_i-\varphi_i)}{\partial n} &=0,\,\,\textrm{ on } \varGamma, \end{align} (2.7b) plus $$\lambda _i \geqslant 0$$ (in the sense of $$H^1(\varOmega )^{\ast }$$) and $$\langle \lambda _i, \bar{u}_i \rangle = 0$$. Thus, if we use $$\bar{u} - \bar{u}^{\gamma }$$ as a test function in (2.7) as well as in the optimality system for the penalized problem, then subtracting the resulting terms yields \begin{multline*} (\bar{u}-\bar{u}^{\gamma},\bar{u} - \bar{u}^{\gamma})_{\mathbb{V}} = \langle \lambda,\bar{u} - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} + \sum_{i=1}^n\langle \mu, \bar{u}_i - \bar{u}^{\gamma}_i\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)} \\ -\gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} +\gamma \sum_{i=1}^n\left( (\mathbf{1}^T \bar{u}^{\gamma} - 1), \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)}. \end{multline*} Next, consider that \begin{align*} \langle \lambda,\bar{u} - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} &= \langle \lambda,-\bar{u}^{\gamma} \rangle_{\mathbb V^{\ast}, \mathbb V}\\ \sum_{i=1}^n\langle \mu, \bar{u}_i - \bar{u}^{\gamma}_i\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)} &= \langle \mu, 1 - \mathbf{1}^T \bar{u}^{\gamma}\rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)}\\ \gamma \sum_{i=1}^n\left( (\mathbf{1}^T \bar{u}^{\gamma} - 1), \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} &= -\gamma \| \mathbf{1}^T \bar{u}^{\gamma} - 1 \|^2_{L^2(\varOmega)}\\ \gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i - \bar{u}^{\gamma}_i\right)_{L^2(\varOmega)} &= \gamma \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i\right)_+, \bar{u}_i\right)_{L^2(\varOmega)} + \gamma \sum_{i=1}^n \left\| \left(-\bar{u}^{\gamma}_i\right)_+ \right\|^2_{L^2(\varOmega)} \geqslant 0. \end{align*} It follows that there exists some constant C > 0 such that \begin{align*} \| \bar{u} - \bar{u}^{\gamma} \|^2_{\mathbb{V}} &\leqslant \langle \lambda, - \bar{u}^{\gamma} \rangle_{\mathbb V^{\ast},\mathbb V} + \langle \mu, 1 - \mathbf{1}^T \bar{u}^{\gamma} \rangle_{(H^1(\varOmega))^{\ast},H^1(\varOmega)}\\ &\leqslant C \left( \sum_{i=1}^n \left\|\left(- \bar{u}^{\gamma}_i\right)_+ \right\|_{L^2(\varOmega)} + \| 1 - \mathbf{1}^T \bar{u}^{\gamma} \|_{L^2(\varOmega)} \right). \end{align*} Hence, $$\| \bar{u} - \bar{u}^{\gamma } \|_{\mathbb{V}} = o(\gamma ^{-1/4})$$. Building on the results of Theorem 2.1, we show that $$\theta :(0,\infty ) \to \mathbb R$$ is continuously differentiable. Theorem 2.3 Under the standing assumptions the optimal value function $$\theta \!:\!(0,\infty ) \!\to\! \mathbb R$$ is continuously differentiable. The gradient of $$\theta$$ at $$\bar{\gamma }> 0$$ is given by $$\frac{\partial \theta}{\partial \gamma}(\bar{\gamma}) = \beta(\bar{u}^{\bar \gamma}).$$ Proof. Fix $$\gamma , \eta> 0$$. According to (2.4), we have $$\eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) \geqslant \beta(\bar{u}^{\gamma+\eta}).$$ (2.8) Similarly, we have (for sufficiently small $$\eta> 0$$) $$\theta(\gamma - \eta) - \theta(\gamma) \geqslant \frac{1}{2}\|\bar{u}^{\gamma-\eta} - \varphi\|^2_{\mathbb{V}} + (\gamma-\eta)\beta(\bar{u}^{\gamma-\eta}) - \left(\frac{1}{2}\|\bar{u}^{\gamma-\eta} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma-\eta})\right) = -\eta \beta(\bar{u}^{\gamma-\eta}).$$ Hence, $$\eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) \geqslant -\beta(\bar{u}^{\gamma-\eta}).$$ (2.9) We also have the upper bounds $$\theta(\gamma + \eta) - \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + (\gamma+\eta)\beta(\bar{u}^{\gamma}) - \left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\right) = \eta \beta(\bar{u}^{\gamma})$$ and (for sufficiently small $$\eta> 0$$) $$\theta(\gamma - \eta) - \theta(\gamma) \leqslant \frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + (\gamma-\eta)\beta(\bar{u}^{\gamma}) - \left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma\beta(\bar{u}^{\gamma})\right) = -\eta \beta(\bar{u}^{\gamma}).$$ This yields \begin{align} \eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) &\leqslant \beta(\bar{u}^{\gamma}) \end{align} (2.10) \begin{align} \eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) &\leqslant -\beta(\bar{u}^{\gamma}). \end{align} (2.11) It follows from Theorem 2.1.2, the continuity of $$\beta$$, (2.8) and (2.10) that $$\lim_{\eta \downarrow 0} \eta^{-1}(\theta(\gamma + \eta) - \theta(\gamma)) = \beta(\bar{u}^{\gamma}).$$ Similarly, we have from Theorem 2.1.2, the continuity of $$\beta$$, (2.9) and (2.11) that $$\lim_{\eta \downarrow 0} \eta^{-1}(\theta(\gamma - \eta) - \theta(\gamma)) = -\beta(\bar{u}^{\gamma}).$$ Hence, $$\theta$$ is differentiable from $$(0,\infty ) \to \mathbb R$$. Finally, as $$\bar{u}^{\bullet }:(0,\infty ) \to \mathbb V$$ and $$\beta$$ are continuous, $$\theta$$ is continuously differentiable. Remark 2.4 (Relation to Danskin’s theorem). Since $$\beta$$ was kept rather general in Theorems 2.1 and 2.3, it appears that the value function is continuously differentiable for a wide variety of constraints beyond our setting. Note that Theorem 2.3 is not a direct consequence of Danskin’s theorem. Indeed, the hypotheses of Danskin’s theorem (cf. Bonnans & Shapiro,2000, Prop. 4.12) require, in particular, that the objective functional is continuous and $$\inf$$-compact on some Hausdorff topological vector space. In our case, the level sets of the objective are $$\inf$$-compact in the weak topology on $$\mathbb V$$. However, the objective is weakly lower-semicontinuous, but not continuous with respect to the weak topology. Therefore, we cannot simply embed the problem into the setting of Danskin’s theorem by replacing $$(\mathbb V,\|\cdot \|_{\mathbb{V}})$$ with $$(\mathbb V,\sigma _{\textrm{weak}})$$. In order to approximate $$\theta$$ in the PF scheme by a simple model function, we need second-order information. This will require a differential sensitivity analysis of $$\bar{u}^{\bullet }$$. For this discussion, we will make use of the first-order necessary and sufficient optimality conditions for (Pγ), which for each $$i=1,\dots ,n$$ are given by \begin{align} -\varDelta \bar{u}^{\gamma}_i + \bar{u}^{\gamma}_i - \lambda^{\gamma}_{i} + \mathbf{1} \mu^{\gamma} &= -\varDelta \varphi_i + \varphi_i,\,\,\textrm{ in } \varOmega, \end{align} (2.12a) \begin{align} \frac{\partial \left(\bar{u}^{\gamma}_i-\varphi_i\right)}{\partial n} &=0,\,\,\textrm{ on } \varGamma, \end{align} (2.12b) $$\lambda^{\gamma}_i = \gamma\left(-\bar{u}^{\gamma}_i\right)_+,$$ (2.12c) $$\mu^{\gamma} = \gamma(\mathbf{1}^\top \bar{u}^{\gamma} - 1).$$ (2.12d) Note that $$-\lambda ^{\gamma }_i = \gamma \partial \left (\frac{1}{2}\|(-\cdot )_+\|^2_{L^2(\varOmega )}\right )\left (\bar{u}^{\gamma }_i \right )$$ and $$\mu ^{\gamma } = \gamma \partial _{u_i}\left (\frac{1}{2}\|\mathbf{1}^\top \cdot - 1\|^2_{L^2(\varOmega )}\right )(\bar{u}^{\gamma })$$. In addition, we define the difference quotients $$d^{\eta}_{i} := \frac{\bar{u}^{\gamma+\eta}_i - \bar{u}^{\gamma}_i}{\eta},\quad \chi^{\eta}_{i} := \frac{\lambda^{\gamma+\eta}_i - \lambda^{\gamma}_i}{\eta},\quad \nu^{\eta} := \frac{\mu^{\gamma+\eta} - \mu^{\gamma}}{\eta}.$$ We then have the following result: Proposition 2.5 The directional derivative of the composition $$\widehat{\beta }(\bullet ) := (\beta \circ \bar{u}^{\bullet }) :(0,\infty ) \to \mathbb R$$ at $$\gamma> 0$$ in direction 1 is given by $$\widehat{\beta}^{\prime}(\gamma;1) = -\sum_{i=1}^n \left[ \int_{\varOmega}\left(-\bar u^{\gamma}_i\right)_+\bar{d}_i\,\mathrm{d}x + \int_{\varOmega}(\mathbf{1}^ T \bar u^{\gamma} - 1)\bar{d}_i\,\mathrm{d}x\right],$$ where $$\bar{d} \in \mathbb V$$ is the unique solution of the following variational problem: \begin{multline} \min\Bigg\{ \frac{1}{2}\|d\|^2_{\mathbb{V}} - \sum_{i=1}^n\left( \left(-\bar{u}^{\gamma}_i \right)_+-(\mathbf{1}^\top \bar{u}^{\gamma} - 1),d_i\right)_{L^2(\varOmega)} + \frac{\gamma}{2}\int_{\varOmega}\left|\mathbf{1}^\top d \right|{}^2\,\mathrm{d}x + \frac{\gamma}{2}\sum_{i=1}^n\int_{\left\{\bar{u}^{\gamma}_i = 0\right\}}(-d_i)^2_+\,\mathrm{d}x \\+ \frac{\gamma}{2}\int_{\left\{\bar{u}^{\gamma}_i < 0\right\}}|d_i|^2\,\mathrm{d}x \textrm{ over } d \in \mathbb V \Bigg\}. \end{multline} (2.13) Moreover, it holds that $$\widehat{\beta}^{\prime}(\gamma;1) \leqslant 0,\,\,\forall \gamma> 0.$$ Hence, $$\theta :(0,\infty ) \to \mathbb R$$ is concave. Proof. We first show that $$\left \{d^{\eta }\right \}_{\eta> 0}$$ is uniformly bounded in $$\mathbb V$$. First note that $$d^{\eta }_i$$ solves the following elliptic partial differential equation (PDE) with homogeneous Neumann boundary conditions: \begin{align} -\varDelta d^{\eta}_i + d^{\eta}_i - \chi^{\eta}_{i} + \nu^{\eta} &= 0,\,\,\textrm{ in } \varOmega, \end{align} (2.13a) $$\frac{\partial d^{\eta}_i}{\partial n} =0,\,\,\textrm{ on } \varGamma.$$ (2.13b) Now note that $$\sum _{i=1}^n \left \|d_i^\eta \right \|_V^2 = \sum _{i=1}^n\left \langle -\varDelta d_i^\eta +d_i^\eta ,d_i^\eta \right \rangle$$ and by the monotonicity of the gradients of $$\beta$$ that \begin{align*} \sum_{i=1}^n \left\|d_i^\eta\right\|_V^2 \leqslant \sum_{i=1}^n\left\langle -\varDelta d_i^\eta+d_i^\eta,d_i^\eta\right\rangle &- \sum_{i=1}^n\gamma\eta^{-2} \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+ - \left(-\bar{u}_i^{\gamma}\right)_+, \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\&+ \sum_{i=1}^n\gamma\eta^{-2} \left((\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1) - (\mathbf{1}^\top\bar{u}^{\gamma}-1), \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)}. \end{align*} Then, testing (2.13a) with $$d^{\eta }_i$$ and summing over i we have $$\sum_{i=1}^n\left\langle -\varDelta d_i^\eta+d_i^\eta,d_i^\eta\right\rangle = \sum_{i=1}^n\left\langle \chi_i^\eta,d_i^\eta\right\rangle - \sum_{i=1}^n\left\langle \nu^\eta,d_i^\eta\right\rangle.$$ Next, given \begin{align*} \sum_{i=1}^n\left\langle \chi_i^\eta,d_i^\eta\right\rangle &- \sum_{i=1}^n\left\langle \nu^\eta,d_i^\eta\right\rangle - \sum_{i=1}^n\gamma\eta^{-2} \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+ - \left(-\bar{u}_i^{\gamma}\right)_+, \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\&+ \sum_{i=1}^n\gamma\eta^{-2} \left((\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1) - (\mathbf{1}^\top\bar{u}^{\gamma}-1), \bar{u}_i^{\gamma+\eta} - \bar{u}_i^{\gamma}\right)_{L^2(\varOmega)} \\ =& \sum_{i=1}^n \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+, d_i^\eta\right)_{L^2(\varOmega)} -\sum_{i=1}^n \left(\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1, d_i^\eta\right)_{L^2(\varOmega)}, \end{align*} we have $$\sum_{i=1}^n \left\|d_i^\eta\right\|_V^2 \leqslant \sum_{i=1}^n \left(\left(-\bar{u}_i^{\gamma+\eta}\right)_+, d_i^\eta\right)_{L^2(\varOmega)} -\sum_{i=1}^n \left(\mathbf{1}^\top\bar{u}^{\gamma+\eta}-1, d_i^\eta\right)_{L^2(\varOmega)}.$$ (2.14) It follows from Theorem 2.1.3, that $$\left \{d^{\eta }\right \}_{\eta> 0}$$ is uniformly bounded in $$\mathbb V$$. Next, we derive expansions for $$\lambda ^{\eta }_i$$ and $$\mu ^{\eta }$$. From the Lebesgue dominated convergence theorem we see that the superposition operator $$(\cdot )_+:L^2(\varOmega ) \to L^2(\varOmega )$$ is Lipschitz continuous and Hadamard directionally differentiable with $$(\cdot)_+^{\prime}\left(\bar{u}^{\gamma}_i;d^{\eta}_i\right)(x) = \begin{cases} d_i^\eta(x) & \textrm{on }\left\{\bar u_i^\eta>0\right\}\cup\left\{\bar u_i^\eta=0, d_i^\eta>0\right\},\\ 0 &\text{otherwise.}\end{cases}$$ (2.15) Moreover, due to Shapiro (1990, Proposition 3.1) it is continuous in the second variable. Since $$\left \{d^{\eta }_i\right \}_{\eta> 0}$$ is uniformly bounded in V and $$\chi^{\eta}_i = \eta^{-1}\left( (\gamma + \eta)\left(-\bar{u}^{\gamma+\eta}_i\right)_+ - \gamma\left(-\bar{u}^{\gamma}_i\right)_+\right) = \gamma \left(\frac{\left(-(\bar{u}^{\gamma}_i + \eta d^{\eta}_i)_+ - \gamma(-\bar{u}^{\gamma}_i)_+\right)}{\eta} \right) + \left(-\bar{u}^{\gamma + \eta}_i \right)_+,$$ there exists a small o-function such that $$\chi^{\eta}_i = \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-d^{\eta}_i\right) + o(1) + \left(-\bar{u}^{\gamma + \eta}_i \right)_+.$$ (2.16) Arguing similarly for $$\nu ^{\eta }$$, we have $$\nu^{\eta} = \gamma\mathbf{1}^\top d^{\eta} + \mathbf{1}^\top \bar{u}^{\gamma+\eta} - 1.$$ (2.17) Now, letting $$\eta _k \downarrow 0$$, there exists a weakly converging subsequence $$\left \{d^{k_l}_i\right \}$$ with $$d^{k_l}_i := d^{\eta _{k_l}}_i$$ and a weak limit point $$\bar{d}_i \in V$$. Since V embeds compactly into $$L^2(\varOmega )$$ and $$\bar{u}^{\bullet } :(0,\infty ) \to \mathbb V$$ is continuous, we have \begin{align*} \chi^{\eta_{k_l}}_i &\to \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-\bar{d}_i\right) + \left(-\bar{u}^{\gamma}_i \right)_+ \textrm{ strongly in } L^2(\varOmega)\\ \nu^{\eta_{k_l}} &\to \gamma\mathbf{1}^\top \bar{d}+ \mathbf{1}^\top \bar{u}^{\gamma} - 1 \textrm{ strongly in } L^2(\varOmega). \end{align*} Moreover, $$\bar{d}_i$$ solves \begin{align} -\varDelta \bar{d}_i + \bar{d}_i - \gamma(\cdot)_+^{\prime}\left(-\bar{u}^{\gamma}_i;-\bar{d}_i\right) + \gamma\mathbf{1}^\top \bar{d} &= \left(-\bar{u}^{\gamma}_i \right)_+-(\mathbf{1}^\top \bar{u}^{\gamma} - 1),\,\,\textrm{ in } \varOmega, \end{align} (2.18a) \begin{align} \frac{\partial \bar{d}_i}{\partial n} &=0,\,\,\textrm{ on } \varGamma. \end{align} (2.18b) It is then easy to see that the vector $$\bar{d} \in \mathbb V$$ solves (Dγ). Since (Dγ) has a unique solution, we can deduce by the Urysohn property that the entire sequence $$\left \{d^{k}\right \}$$ converges weakly to $$\bar{d}$$. We are now ready to derive a formula for the directional derivative of $$\theta ^{\prime}(\gamma )$$. Consider that \begin{align*} \eta^{-1}\left(\beta(\bar{u}^{\gamma + \eta}) - \beta(\bar{u}^{\gamma})\right) =&\, (2\eta)^{-1}\sum_{i=1}^n \int_{\varOmega}\bigg[\left(-\bar{u}^{\gamma + \eta}_i\right)^2_+ - \left(-\bar{u}^{\gamma}_i\right)^2_+\bigg]\, \mathrm{d}x \\ &+ (2\eta)^{-1}\int_{\varOmega}\left[\big|\mathbf{1}^ T \bar{u}^{\gamma+\eta} - 1\big|^2 - \big|\mathbf{1}^ T \bar{u}^{\gamma} - 1\big|^2\right]\mathrm{d}x. \end{align*} Using the chain rule, formula (2.15) and $$\bar{u}^{\gamma +\eta } = \bar{u}^{\gamma } + \eta d^{\gamma }$$, we have $$\eta^{-1}(\beta(\bar{u}^{\gamma + \eta}) - \beta(\bar{u}^{\gamma})) = \sum_{i=1}^n \int_{\varOmega}\left[-\left(-\bar{u}^{\gamma}_i\right)_+d^{\eta}_i\right] \mathrm{d}x + \int_{\varOmega}\left[(\mathbf{1}^ T \bar{u}^{\gamma} - 1)d^{\eta}_i\right]\mathrm{d}x + o(1).$$ Since $$\eta _k \to 0$$ implies $$d^{\eta _k} \rightharpoonup \bar{d}$$ (weakly in $$\mathbb V$$), we can use the compactness of the embedding V into $$L^2(\varOmega )$$ and pass to the limit in the previous equation. This yields $$\widehat{\beta}^{\prime}(\gamma;1) = -\sum_{i=1}^n \left[ \int_{\varOmega}\left(-\bar{u}^{\gamma}_i\right)_+\bar{d}_i\,\mathrm{d}x + \int_{\varOmega}(\mathbf{1}^ T \bar{u}^{\gamma} - 1)\bar{d}_i\,\mathrm{d}x\right].$$ Finally, it follows from (2.14) that $$\widehat{\beta }^{\prime}(\gamma ;1) \leqslant 0$$. This completes the proof. 3. Solving the first-order system 3.1 A semismooth Newton iteration The main component of the algorithm involves the direct solution of (2.12) using a semismooth Newton method. We refer the reader to the papers by Hintermüller et al. (2002); Ulbrich (2002) for the full theory of this method and only note that in the current setting, the iterates are generated by solving the following (reduced) linear system; given $$\gamma> 0$$ and $$u_{\textrm{old}}$$, solve \begin{align} A(u-\varphi) + \gamma I_{\{u_{\textrm{old}}<0\}}u + \gamma \mathbf{1}(\mathbf{1}^\top u-1) &= 0 \textrm{ in }\varOmega, \end{align} (3.1a) $$\frac{\partial(u-\varphi)}{\partial n} = 0 \textrm{ on }\partial\varOmega$$ (3.1b) for $$u_{\textrm{new}}$$. Here, A is the PDE-operator associated with the $$\mathbb V$$-inner product and $$I_{\{u_{\textrm{old}}<0\}}$$ is the characteristic function for the set $$\{u_{\textrm{old}}<0\}$$. Once the residual of (2.12) is sufficiently small, the method terminates and $$\gamma$$ is updated. The previous $$\gamma$$-dependent solution serves as the starting value for the next iteration. We summarize this procedure in Algorithm 3.1. Defining the residuals $$r^1_{k} = \| (-u_k)_+ \|_{L^2(\varOmega,\mathbb{R}^n)}, \; r^2_{k} = \left\| \mathbf{1}^\top u_k -1 \right\|_{L^2(\varOmega)}, \; r^3_{k} = (\lambda_k,u_k)_{L^2(\varOmega)},$$ the outer loop terminates whenever $$\left\|\left(r_k^1,r_k^2,r_k^3\right)\right\|_2 \leqslant \mathrm{tol}.$$ Note that $$r^1_k$$, $$r^2_k$$ are feasibility errors, whereas $$r^1_k$$, $$r^3_k$$ represent the complementarity error. The stopping criteria for the inner loop depend on the type of $$\gamma$$-update strategy. These are detailed below in Section 4. 3.2 Feasibility restoration In some applications, e.g., topology optimization, where the phase field functions $$u_1,\dots ,u_n$$ arise as parameters in a second-order PDE-operator, it is imperative that $$u \in \mathcal{G}$$. For such situations, we are forced to employ a mesh-dependent numerical method. However, the (very) warm start provided by Algorithm 3.1 provides a point that is sufficiently close to the true solution of (P) so that the jump to feasibility requires minimal effort. We demonstrate this fact in the examples. 4. Path-following scheme We now detail a strategy for efficiently updating $$\gamma _k$$ in Algorithm 3.1. As suggested in the study by Hintermüller & Kunisch (2006b), an ideal update scheme for $$\gamma$$ (at step k) would be to choose $$\gamma _{k+1}$$ such that $$|\theta_{\infty} - \theta(\gamma_{k+1})| \le \tau_k |\theta_{\infty} - \theta(\gamma_{k})|,$$ (4.1) where $$\tau _k\downarrow 0$$ is a given null sequence. However, as noted there, this is not a practical strategy as we cannot evaluate $$\theta (\gamma _{k+1})$$ and $$\theta _{\infty }$$. Nevertheless, our analysis shows that $$\theta : (0,\infty ) \to \mathbb R$$ is positive, bounded, increasing, continuously differentiable and concave. Hence, we first fit a model function, using the information available at step k and then update according to an approximation of (4.1). This serves the basis for the PF approach. Since our problem exhibits structural similarities to the problem class consider in the study by Hintermüller & Kunisch (2006b), we consider their model function $$m:(0,\infty ) \to \mathbb R$$ given by \begin{equation*} m(\delta) := C_1 - \frac{C_2}{E + \delta} \end{equation*} as an approximation of $$\theta$$, which is related to the solution of an ordinary differential equation. Given $$\gamma> 0$$ and $$\bar{u}^{\gamma }$$, we determine the ($$\gamma$$-dependent) constants in each of the model functions by setting $$m(0) = 0,\quad m(\gamma) = \theta(\gamma),\quad m^{\prime}(\gamma) = \theta^{\prime}(\gamma).$$ (4.2) This amounts to \begin{align*} E &= \gamma^2 \beta(\bar{u}^{\gamma})\left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}}\right)^{-1},\\ C_2 &= \gamma^{-1} E(E + \gamma)\left(\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}} + \gamma \beta(\bar{u}^{\gamma})\right),\\ C_1 &= C_2 E^{-1}. \end{align*} Clearly, $$E, C_1, C_2>0$$. Moreover, it can be argued using (2.3) that $$\gamma \beta (\bar{u}^{\gamma }) \to 0$$ as $$\gamma \to +\infty$$. Hence, m(⋅) monotonically increasing, $$m(\cdot ) \le C_1$$ and $$C_1 = \left(\frac{\gamma \beta(\bar{u}^{\gamma})}{\frac{1}{2}\|\bar{u}^{\gamma} - \varphi\|^2_{\mathbb{V}}} + 1 \right) \theta(\gamma)$$ implies that $$m(\delta ) \to C_1$$ as $$\delta \to \infty$$ and $$C_1\to \theta _{\infty }$$ as $$\gamma \to \infty$$. Now, we may use (4.1), which leads to the following explicit update formula: $$\gamma_{k+1} = \frac{C_{2,k}}{\tau_k|C_{1,k} - \theta(\gamma_k)|}-E_k.$$ (4.3) By choosing three different reference values for $$\gamma$$ in (4.2), we see in Fig. 1 that m is an excellent local approximation of $$\theta$$. In this method, the Newton iteration (inner loop) is terminated whenever the active index sets $$\{u_k<0\}$$ are the same in two subsequent iterations or $$\left\| A(u_{k,l+1}-\varphi) - \gamma_k (-u_{k,l+1})_+ + \gamma_k \mathbf{1}\left(\mathbf{1}^\top u_{k,l+1}-1\right)\right\|_{H^1(\varOmega,\mathbb{R}^n)^{\ast}} \le \mathrm{tol}.$$ (4.4) Fig. 1. View largeDownload slide Approximations m of $$\theta$$ taken at different reference points $$\gamma$$. Fig. 1. View largeDownload slide Approximations m of $$\theta$$ taken at different reference points $$\gamma$$. 5. Numerical results In this section, we demonstrate the performance and wide applicability of the proposed methods on three examples. The first example shows the mesh-independent behavior of the PF methods and compares this to the first-discrete-then-optimize approach suggested above, which utilizes the primal–dual active set (PDAS) method (adapted to our setting) from the study by Hintermüller et al. (2002). This method does not admit a function space convergence analysis and, hence, is expected to behave mesh dependently, i.e., utilizing the same stopping rule and initial point on all meshes, the number of iterations until successful termination will grow as the mesh is refined. For the latter two examples, we consider two optimization problems from mathematical image processing and labeled data classification that may be solved using a gradient flow or projected gradient method, see e.g., Bertsekas (1976). At each step of the line search in the projected gradient method, a projection onto the Gibbs simplex is required; with an average number of $$\gamma$$ updates (k in example 1) similar to the example in 5.1. In all examples, the underlying domain $$\varOmega = (0,1)^2$$ is discretized using a uniform triangular mesh with parameter h > 0. The function space $$H^1(\varOmega )$$ is discretized using the associated nodal basis comprised of the standard continuous piecewise affine finite elements, see, e.g., Ciarlet (1978); Brenner & Scott (2008). Note that the pointwise a.e. constraints are approximated in the finite dimensional problem by requiring the coefficients for each nodal basis function to be in the finite dimensional Gibbs simplex. In order to calculate the $$H^1(\varOmega ;\mathbb R^n)^{\ast }$$-norm in (4.4), we need the Riesz representation of the dual object. This is done by solving a (vector-valued) Poisson problem with homogeneous Neumann boundary conditions with the residual of interest on the right-hand side. We make use of the P1AFEM package (Funken et al., 2011). Concerning the PF parameters, we choose $$\tau _k=10^{-2k}$$, initial parameter $$\gamma _1=10^4$$, maximum parameter $$\gamma _{\max }=10^{14}$$ and the stopping criterion $$\textrm{tol}=10^{-6}$$. PDAS is stopped once the residual of the optimality conditions to (P) drops to $$10^{-10}$$. 5.1 Projection onto the Gibbs simplex For this example, we set n = 3 and construct $$\varphi _i$$ such that the optimal solution of (P) exhibits biactivity at the optimal solution, i.e., both the inequality constraint and the associated Lagrange multiplier are equal to zero. Such problems can be notoriously difficult to solve. Let \begin{align} \hat v_1(x,y) &= \begin{cases} 0 &\textrm{if } x\in (0,1),\ y\in \left(0,\frac 12\right)\\ \sin\left(y-\frac12\right)\cos(xy) &\textrm{if } x\in (0,1),\ y\in \big[\frac 12,1\big) \end{cases}\\ \hat v_2(x,y) &= 2+\cos(10xy),\,\, (x,y) \in (0,1)^2 \\ \hat v_3(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \end{align} and then set $$v_i = \frac{\hat v_i}{\hat v_1+\hat v_2+\hat v_3}$$ for i = 1, 2, 3. Note that $$v = (v_1,v_2,v_3)$$ satisfies the constraints in $${\mathcal{G}}$$. Similarly, we define the functions \begin{align} \lambda_1(x,y) &= \begin{cases} 1 &\textrm{if } x\in (0,1),\ y\in \left(0,\frac 12\right)\\ 0 &\textrm{if } x\in (0,1),\ y\in\big[\frac 12,1\big) \end{cases}\\ \lambda_2(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \lambda_3(x,y) &= 0,\,\, (x,y) \in (0,1)^2 \\ \mu(x,y) &= 1,\,\, (x,y) \in (0,1)^2 \end{align} and construct $$\varphi$$ by solving the problem \begin{align} -\varDelta \varphi + \varphi &= -\varDelta v+v+\lambda-\mathbf{1}\mu,\,\,\textrm{ in } \varOmega,\\ \frac{\partial (\varphi-v)}{\partial n} &=0,\,\,\textrm{ on } \varGamma. \end{align} Note that $$\lambda = (\lambda _1,\lambda _2,\lambda _3) \in L^2(\varOmega )^3$$, that $$(\lambda ,v)_{L^2}=0$$ and that the above system forms the Karush–Kuhn–Tucker conditions for (P). Thus, v is its optimal solution, $$\lambda$$ is the Lagrange multiplier associated with the inequality constraint and biactivity is present on the whole third component. The starting point was chosen as $$\varphi$$. The results of Algorithm 3.1 are presented in Table 1. We compare the behavior of the PDAS and the PF method on six different meshes with decreasing h. For PDAS, ‘total iter’ refers to the number of linear solves. For PF, ‘inner steps l’ refers to the total number of linear solves used in the Newton iterations, ‘outer steps k’ is the number of $$\gamma$$ updates and ‘feas. step’ counts the number of linear solves to optimality needed when using the solution of PF as a starting point in the PDAS method. This qualitatively demonstrates how close the solution of the approximating problems gets to solving (P). Note that no adaptive mesh refinement or nested grid strategies, where one would consider subsequently refined meshes and use the prolongation of the solution on the coarser mesh as the starting point on the finer mesh, are considered. Clearly, the PF strategies behave much better over mesh refinements than PDAS. For example, with $$h = 2^{-8}$$, the true solution to (P) is obtained in 99 iterations with PDAS and 15 with PF. Table 1 Results for projection of $$\varphi$$ onto the Gibbs simplex h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 Table 1 Results for projection of $$\varphi$$ onto the Gibbs simplex h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 h $$2^{-4}$$ $$2^{-5}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 10 16 33 58 99 196 PF: inner steps l 12 12 14 14 15 14 outer steps k 3 3 3 3 3 3 feas. step 1 1 1 1 1 1 5.2 Applications in image processing and labeled data classification In this subsection, we consider two examples inspired by the work in the studies by Bertozzi et al. (2007); Garcia-Cardona et al. (2014). Having n phases, the common goal is to minimize a criterion J such that the phases are (almost) pure. This amounts to solving the problem $$\min_{u \in H^1(\varOmega;\mathbb R^n)}\left\{J(u) + \zeta\int_\varOmega \left(\varepsilon| \nabla u|^2 + \frac1\varepsilon u(1-u)\right)\mathrm{d}x \;\Big|\; u \in \mathcal{G}\right\},$$ (5.1) where $$\zeta>0$$ is the scaling parameter and parameter $$\varepsilon>0$$ is proportional to the interfacial width. The second part of the objective is the multiphase Ginzburg–Landau energy, which along with the indicator function for $$\mathcal{G}$$, serves as a penalty functional that avoids pathological free boundaries between the phases. Together, the sum of these functionals $$\varGamma$$-converges to the perimeter of the characteristic functions associated with the true phases in a sharp interface regime (Modica, 1987a,b). As such, straight edges are more favorable. We use the standard projected gradient method with line search to solve (5.1). Labeled data classification In the first example we separate n = 3 different classes of observations with known labels. First, we ‘train’ three segmentation phase field functions u on M noisy data and then test the accuracy of these solutions on further noisy realizations. This amounts to solving (5.1) with $$J(u) = \sum_{m=1}^M\left(1-\frac{1}{| B(x_m,\delta)|}\int_{B(x_m,\delta)}u_{j(m)}\,\mathrm{d}x\right),$$ (5.2) where $$m \in \{1,\dots ,M\}$$ is the index of a noisy data point $$x_m \in \varOmega$$ and $$B(x_m,\delta )$$ is a small ball of radius $$\delta> 0$$ around this point; $$j(m)\in \{1,2,3\}$$ refers to the known label (class) of this point. Due to the structure of the Gibbs simplex, J is minimized when $$u_{j(m)}=1$$ and $$u_i=0$$ for $$i\neq j(m)$$ on $$B(x_m,\delta )$$. Thus, J expressed the number of incorrectly classified points. In (5.1) we consider $$\varepsilon =2^{-6}$$ and $$\zeta =2\cdot 10^{-2}$$, and in (5.2) we set $$\delta =3\cdot 2^{-7}$$. The algorithm was started with a uniform mixture of phases. We randomly generated M = 1500 points (500 for each category $$+$$/left half circle, ∘/bottom half circle, ×/right half circle), see the left-hand side of Fig. 2. We then solved the optimization problem in order to train three phase field functions, see the right-hand side of Fig. 2. In order to test the accuracy of these fields, we again generated the same number of points and checked how many of these points lie in the correct region. In this instance, the solution was correct for 99.6% of the data points. The numerical comparison of both methods is shown in Table 2. Even though PF needs more iteration than PDAS, the iteration growth is much slower upon mesh refinement. The difference diminishes when we use warm start from the previous mesh. As we will see it the next application, on sufficiently fine meshes PF will eventually need less iterations than PDAS. Table 2 Classification results: number of iterations on uniform meshes with element size h and the increases with respect to previous iterations. The first part depicts the start with uniform phases, the other one the warm start from the previous mesh. The projected gradients needed four iterations with seven projections in both cases Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Table 2 Classification results: number of iterations on uniform meshes with element size h and the increases with respect to previous iterations. The first part depicts the start with uniform phases, the other one the warm start from the previous mesh. The projected gradients needed four iterations with seven projections in both cases Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Uniform phases start Warm start h $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ $$2^{-6}$$ $$2^{-7}$$ $$2^{-8}$$ $$2^{-9}$$ PDAS: total iter 33 39 58 89 33 29 39 48 increase 18.18% 48.72% 53.45% -12.12% 34.48% 23.08% PF: inner steps l 59 68 85 113 59 51 68 81 increase 15.25% 25.00% 32.94% -13.56% 33.33% 19.12% feas. step 8 13 14 16 8 9 15 17 Inpainting For the inpainting application, we assume that an original image (top left of Fig. 3) is degraded by adding a text on top of it to produce a known image $$\hat u$$ (top right of Fig. 3).1 The goal is then to recover the original image as faithfully as possible. We follow the suggestion in the study by Bertozzi et al. (2007) to use a modified Allen–Cahn equation for image inpainting, which leads to (5.1) with $$J(u) = \int_{\varOmega}(u-\hat u)^2\,\mathrm{d}x.$$ (5.3) In (5.1) we consider $$\varepsilon =2^{-7}$$ and $$\zeta =7\cdot 10^{-3}$$. The algorithm was started with the degraded image. On the contrary to the previous cases, we use nonuniform mesh, where the phase interface is much more refined, see the bottom right part of Fig. 3 that depicts the mesh for the right panda eye. For simplicity, we consider a locally adapted mesh which is given. A similar mesh may be obtained from developing an a posteriori error estimates for adaptive mesh refinement, see e.g., Baňas & Nürnberg (2009) or Hintermüller et al. (2011). As the latter technique, however, is not the focus of this work, we rely on an a priori mesh. For simplicity, we consider locally adapted mesh that is given a priori, for results. The recovered image is shown in the bottom left part of Fig. 3. The numerical evidence is summarized in Table 3, where h is the smallest element size. On the coarsest mesh PDAS outperforms PF, but as the mesh is getting finer, PF needs less iterations to converge to the solution. In all cases, the projected gradients needed 12 iterations with 26 projections. Table 3 Inpainting results: nonuniform mesh with refined mesh interface; h is the size of the smallest element. The projected gradients needed 12 iterations with 26 projections h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 Table 3 Inpainting results: nonuniform mesh with refined mesh interface; h is the size of the smallest element. The projected gradients needed 12 iterations with 26 projections h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 h $$2^{-8}$$ $$2^{-9}$$ $$2^{-10}$$ $$2^{-11}$$ $$2^{-11}$$ $$2^{-12}$$ Node number 66049 80500 110065 169733 289685 529971 PDAS: total iter 96 108 125 159 194 245 PF: inner steps 155 164 176 190 200 200 feas. step 35 33 45 48 51 74 Fig. 2. View largeDownload slide Classification results: generated random points in the shape of three half-moons (left) and the obtained three regions used for categorization of points (right). Fig. 2. View largeDownload slide Classification results: generated random points in the shape of three half-moons (left) and the obtained three regions used for categorization of points (right). Fig. 3. View largeDownload slide Inpainting results: initial image (top left), devalued image (top right), recovered image (bottom left) and mesh detail of the right panda eye (bottom right). Fig. 3. View largeDownload slide Inpainting results: initial image (top left), devalued image (top right), recovered image (bottom left) and mesh detail of the right panda eye (bottom right). 6. Summary In this paper, we motivate the need for a function-space-based algorithm for the $$H^1$$-projection onto the multiphase Gibbs simplex. In particular, the wide array of applications alone involving optimization or control problems with the Gibbs simplex as a constraint warrants such a study. In order to make the best use of fast second-order methods for semismooth equations, an analytical PF study is carried out. Our analytical approach drastically reduces the length of the arguments and necessary assumptions required in the studies by Hintermüller & Kunisch (2006a,b); the two works that inspired our approach. Due to the generality of our arguments, they can clearly be extended to similar constraint sets. As in the studies by Hintermüller & Kunisch (2006a,b), we suggest a PF strategy for the update of the Moreau–Yosida regularization parameter, which is based on a concave model function. In our numerical experiments, we first demonstrate the clear advantage of our methods over a first-discretize-then-optimize approach. Finally, we consider two small examples inspired by recent work in the studies by Bertozzi et al. (2007); Garcia-Cardona et al. (2014) that require the $$H^1$$-projection onto the Gibbs simplex. Further applications, for example in topology optimization will be considered in future work. Funding This work was carried out in the framework of the Deutsche Forschungsgemeinschaft (DFG) under grant no. HI 1466/7-1 ‘Free Boundary Problems and Level Set Methods’ as well as the Research Center Matheon supported by the Einstein Foundation Berlin within projects OT1, SE5 and SE15. Footnotes 1  The picture can be downloaded from many wallpaper sites such as http://dairennsa.net/. References Baňas , L. & Nürnberg , R. ( 2009 ) A posteriori estimates for the Cahn–Hilliard equation with obstacle free energy . Esaim Math. Model. Numer. Anal. , 43 , 1003 – 1026 . Google Scholar CrossRef Search ADS Bertozzi , A. L. , Esedoglu , S. & Gillette , A. ( 2007 ) Inpainting of binary images using the Cahn–Hilliard equation . IEEE Trans. Image Process. , 16 , 285 – 291 . Google Scholar CrossRef Search ADS PubMed Bertsekas , D. P. ( 1976 ) On the Goldstein–Levitin–Polyak gradient projection method . IEEE Trans. Automat. Contr., AC-21 , 174 – 184 . Google Scholar CrossRef Search ADS Blank , L. , Farshbaf-Shaker , M. H. , Garcke , H. , Rupprecht , C. & Styles , V. ( 2014 ) Multi-material phase field approach to structural topology optimization . Trends in PDE Constrained Optimization . International Series of Numerical Mathematics , vol. 165. Cham : Birkhäuser/Springer , pp. 231 – 246 . Blank , L. , Garcke , H. , Sarbu , L. , Srisupattarawanit , T. , Styles , V. & Voigt , A. ( 2012 ) Phase-field approaches to Structural Topology Optimization . In: Leugering G. et al. (eds) Constrained Optimization and Optimal Control for Partial Differential Equations . International Series of Numerical Mathematics, vol. 160. Springer , Basel . Google Scholar CrossRef Search ADS Bonnans , J. F. & Shapiro , A. ( 2000 ) Perturbation Analysis of Optimization Problems , New York : Springer . Google Scholar CrossRef Search ADS Brenner , S. C. & Scott , L. R. ( 2008 ) The Mathematical Theory of Finite Element Methods , 3rd edn . Texts in Applied Mathematics , vol. 15. New York : Springer . Brézis , H. & Stampacchia , G. ( 1968 ) Sur la régularité de la solution d’inéquations elliptiques . Bull. Soc. Math. France , 96 , 153 – 180 . Google Scholar CrossRef Search ADS Burger , M. , He , L. & Schönlieb , C.-B. ( 2009 ) Cahn–Hilliard inpainting and a generalization for grayvalue images . SIAM Journal on Imaging Sciences , 2 , 1129 – 1167 . Ciarlet , P. G. ( 1978 ) The Finite Element Method for Elliptic Problems . Studies in Mathematics and its Applications , vol. 4. Amsterdam-New York-Oxford : North-Holland Publishing Co . Funken , S. , Praetorius , D. & Wissgott , P. ( 2011 ) Efficient implementation of adaptive P1-FEM in Matlab . Comput. Methods Appl. Math. , 11 , 460 – 490 . Google Scholar CrossRef Search ADS Garcia-Cardona , C. , Merkurjev , E. , Bertozzi , A. L. , Flenner , A. & Percus , A. G. ( 2014 ) Multiclass data segmentation using diffuse interface methods on graphs . IEEE Trans. Pattern Anal. Mach. Intell. , 36 , 1600 – 1613 . Google Scholar CrossRef Search ADS PubMed Hintermüller , M. , Hinze , M. & Tber , M. H. ( 2011 ) An adaptive finite-element Moreau–Yosida-based solver for a non-smooth Cahn–Hilliard problem . Optim. Methods Softw. , 26 , 777 – 811 . Google Scholar CrossRef Search ADS Hintermüller , M. , Ito , K. & Kunisch , K. ( 2002 ) The primal-dual active set strategy as a semismooth Newton method . SIAM J. Optim. 13 , 865 – 888 . Hintermüller , M. & Kunisch , K. ( 2006a ) Feasible and noninterior path-following in constrained minimization with low multiplier regularity . SIAM J. Control Optim. , 45 , 1198 – 1221 . Google Scholar CrossRef Search ADS Hintermüller , M. & Kunisch , K. ( 2006b ) Path-following methods for a class of constrained minimization problems in function space . SIAM J. Optim. , 17 , 159 – 187 . Google Scholar CrossRef Search ADS Ioffe , A. D. & Tihomirov , V. M. ( 1979 ) Theory of Extremal Problems . Studies in Mathematics and its Applications , vol. 6. Amsterdam-New York : North-Holland Publishing Co. ( Translated from Russian by Karol Makowski .) Keuthen , M. & Ulbrich , M. ( 2015 ) Moreau–Yosida regularization in shape optimization with geometric constraints . Comput. Optim. Appl. , 62 , 181 – 216 . Google Scholar CrossRef Search ADS Kinderlehrer , D. & Stampacchia , G. ( 2000 ) An Introduction to Variational Inequalities and Their Applications . Classics in Applied Mathematics , vol. 31. Philadelphia, PA : Society for Industrial and Applied Mathematics (SIAM) . ( Reprint of the 1980 original .) Kunisch , K. , Liang , K. & Lu , X. ( 2010 ) Optimal control for an elliptic system with polygonal state constraints . SIAM J. Control Optim. , 48 , 5053 – 5072 . Google Scholar CrossRef Search ADS Kunisch , K. & Lu , X. ( 2013 ) Optimal control for an elliptic system with convex polygonal control constraints . IMA J. Numer. Anal. , 33 , 875 – 897 . Google Scholar CrossRef Search ADS Modica , L. ( 1987a ) Gradient theory of phase transitions with boundary contact energy . Ann. Inst. H. Poincaré Anal. Non Linéaire , 4 , 487 – 512 . Google Scholar CrossRef Search ADS Modica , L. ( 1987b ) The gradient theory of phase transitions and the minimal interface criterion . Arch. Ration. Mech. Anal. , 98 , 123 – 142 . Google Scholar CrossRef Search ADS Shapiro , A. ( 1990 ) On concepts of directional differentiability . J. Optim. Theory Appl. , 66 , 477 – 487 . Google Scholar CrossRef Search ADS Stinner , B. , Nestler , B. & Garcke , H. ( 2004 ) A diffuse interface model for alloys with multiple components and phases . SIAM J. Appl. Math. , 64 , 775 – 799 . Google Scholar CrossRef Search ADS Ulbrich , M . ( 2002 ) Semismooth Newton methods for operator equations in function spaces . SIAM J. Optim. , 13 , 805 – 841 . Google Scholar CrossRef Search ADS Van Wachem , B. & Almstedt , A.-E. ( 2003 ) Methods for multiphase computational fluid dynamics . Chem. Eng. J. , 96 , 81 – 98 . Google Scholar CrossRef Search ADS This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

### Journal

IMA Journal of Numerical AnalysisOxford University Press

Published: Jun 7, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations