# Schur complement preconditioners for multiple saddle point problems of block tridiagonal form with application to optimization problems

Schur complement preconditioners for multiple saddle point problems of block tridiagonal form... Abstract The importance of Schur-complement-based preconditioners is well established for classical saddle point problems in $$\mathbb{R}^N \times \mathbb{R}^M$$. In this paper we extend these results to multiple saddle point problems in Hilbert spaces $$X_1\times X_2 \times \cdots \times X_n$$. For such problems with a block tridiagonal Hessian and a well-defined sequence of associated Schur complements, sharp bounds for the condition number of the problem are derived, which do not depend on the involved operators. These bounds can be expressed in terms of the roots of the difference of two Chebyshev polynomials of the second kind. If applied to specific classes of optimal control problems the abstract analysis leads to new existence results as well as to the construction of efficient preconditioners for the associated discretized optimality systems. 1. Introduction In this paper we discuss the well-posedness of a particular class of saddle point problems in function spaces and the related topic of efficient preconditioning of such problems after discretization. Problems of this class arise as the optimality systems of optimization problems in function spaces with a quadratic objective functional and constrained by linear partial differential equations. Another source for such problems is mixed formulations of elliptic boundary value problems. For numerous applications of saddle point problems we refer to the seminal survey article Benzi et al. (2005). To be more specific we consider saddle point problems of the following form: for a given functional $$\mathcal{L}(x_1,x_2,\ldots ,x_n)$$ defined on a product space $$X_1 \times X_2 \times \cdots \times X_n$$ of Hilbert spaces $$X_i$$ with n ≥ 2 find an n-tuple $$(x_1^{\ast },x_2^{\ast },\ldots ,x_n^{\ast })$$ from this space such that its component $$x_i^{\ast }$$ minimizes $$\mathcal{L}^{[i]}(x_i)$$ for all odd indices i and maximizes $$\mathcal{L}^{[i]}(x_i)$$ for all even indices i, where $$\mathcal{L}^{[i]}(x_i) = \mathcal{L}(x_1^{\ast },\ldots ,x_{i-1}^{\ast },x_i,x_{i+1}^{\ast },\ldots ,x_n^{\ast })$$. Very often the discussion of saddle point problems is restricted to the case n = 2. We will refer to these problems as classical saddle point problems. In this paper we are interested in the general case n ≥ 2. We call such problems multiple saddle point problems. Saddle-point problems with n = 3 and n = 4 are typically addressed in the literature as double (or twofold) and triple (or threefold) saddle point problems, respectively. For notational convenience n-tuples $$(x_1,x_2,\ldots ,x_n) \in X_1 \times X_2 \times \cdots \times X_n$$ are identified with the corresponding column vectors $$\textbf{x}=(x_1,x_2,\ldots ,x_n)^\top$$ from the corresponding space X. We consider only linear problems; that is, we assume that \begin{equation*} \mathcal{L}(\textbf{x}) = \tfrac12 \left\langle \mathcal{A} \textbf{x}, \textbf{x}\right\rangle - \left\langle \textbf{b}, \textbf{x}\right\rangle, \end{equation*} where $$\mathcal{A}$$ is a bounded and self-adjoint linear operator which maps from X to its dual space X′, b ∈ X′, and $$\left \langle ., .\right \rangle$$ denotes the duality product. Observe that $$\mathcal{A}$$ is the (constant) Hessian of $$\mathcal{L}(\textbf{x})$$ and has a natural n-by-n block structure consisting of elements $$\mathcal{A}_{ij}$$ which map from $$X_j$$ to $$X_i^{\prime}$$. Since $$x_i^{\ast }$$ minimizes the quadratic functional $$\mathcal{L}^{[i]}(x_i)$$ for odd indices i and maximizes $$\mathcal{L}^{[i]}(x_i)$$ for even indices i, the Hessian of $$\mathcal{L}^{[i]}(x_i)$$, which is constant and coincides with the diagonal block $$\mathcal{A}_{ii}$$ of $$\mathcal{A}$$, must be positive semidefinite for odd indices i and negative semidefinite for even indices i, according to the necessary second-order optimality conditions. Under this assumption on the diagonal blocks of $$\mathcal{A}$$ the problem of finding a saddle point of $$\mathcal{L}$$ is equivalent to finding a solution $$\textbf{x}^{\ast } \in \textbf{X}$$ of the linear operator equation $$\mathcal{A} \textbf{x} = \textbf{b}.$$ (1.1) Typical examples for the case n = 2 are optimality systems of constrained quadratic optimization problems, where $$\mathcal{L}$$ is the associated Lagrangian, $$x_1$$ is the primal variable and $$x_2$$ is the Lagrangian multiplier associated to the constraint. Optimal control problems viewed as constrained optimization problems fall also into this category with n = 2. However, since in this case the primal variable itself consists of two components, the state variable and the control variable, we can view such problems also as double saddle problems (after some reordering of the variables); see the study by Mardal et al. (2017) and Section 3. Other examples of double and triple saddle point problems result from dual–dual formulations, for example, of elasticity problems; see the studies by Gatica & Heuer (2000), Gatica & Heuer (2002) and Gatica et al. (2007). Double and triple saddle point problems also arise in boundary element tearing and interconnecting methods; see, e.g., the study by Langer et al. (2007). For double saddle point problems from potential fluid flow modeling and liquid crystal modeling, see the report by Beik & Benzi (2017) and the references therein, where further applications can be found. The goal of this paper is to extend well-established results on block diagonal preconditioners for classical saddle point problems in $$\mathbb{R}^N \times \mathbb{R}^M$$ as presented in the studies by Kuznetsov (1995) and Murphy et al. (2000) to multiple saddle point problems in Hilbert spaces. This goal is achieved for operators $$\mathcal{A}$$ of block tridiagonal form, which possess an associated sequence of positive definite Schur complements. We will show for a particular norm built from these Schur complements that the condition number of the operator $$\mathcal{A}$$ is bounded by a constant independent of $$\mathcal{A}$$. So, if $$\mathcal{A}$$ contains any sensitive model parameters (like a regularization parameter) or $$\mathcal{A}$$ depends on some discretization parameters (like the mesh size) the bound of the condition number is independent of these quantities. This, for example, prevents the performance of iterative methods from deteriorating for small regularization parameters or small mesh sizes. Moreover, we will show that the bounds are solely given in terms of the roots of the difference of two Chebyshev polynomials of the second kind and that the bounds are sharp for the discussed class of block tridiagonal operators. The abstract analysis allows us to recover known existence results under less restrictive assumptions. This was the main motivation for extending the analysis to Hilbert spaces. We will exemplify this for optimal control problems with a second-order elliptic state equation, distributed observation and boundary control, as discussed, e.g., in the study by May et al. (2013), and for boundary observation and distributed control, as discussed, e.g., in the study by Mardal et al. (2017). Another outcome of the abstract analysis is the construction of preconditioners for discretized optimality systems which perform well in combination with Krylov subspace methods for solving the linear system. Here we were able to recover known results from Mardal et al. (2017) and extend them to other problems. The article Mardal et al. (2017) has been very influential for the study presented here. As already noticed in Mardal et al. (2017) there is a price to pay for the construction of these efficient preconditioners: for second-order elliptic state equations, discretization spaces of continuously differentiable functions are required, for which we use here technology from Isogeometric Analysis (IgA); see the monograph by Cottrell et al. (2009). Observe that the analysis presented here is valid for any number n ≥ 2 of blocks. There are numerous contributions for preconditioning classical saddle point problems; see the study by Benzi et al. (2005) and the references cited there. See, in particular, among many other contributions, Pearson & Wathen (2012) for Schur-complement-based approaches. For other results on the analysis and the construction of preconditioners for double/twofold and triple/threefold saddle point problems see, e.g., the studies by Gatica & Heuer (2000, 2002), Gatica et al. (2007), Langer et al. (2007), Pestana & Rees (2016) and Beik & Benzi (2017). The paper is organized as follows. The abstract analysis of a class of multiple saddle point problems of block tridiagonal form is given in Section 2. Section 3 deals with the application to particular optimization problems in function spaces. Discretization and efficient realization of the preconditioner are discussed in Section 4. A few numerical experiments are shown in Section 5 for illustrating the abstract results. The paper ends with some conclusions in Section 6 and an appendix, which contains some technical details related to Chebyshev polynomials of the second kind used in the proofs of the abstract results in Section 2. 2. Schur complement preconditioners The following notation is used throughout the paper. Let X and Y be Hilbert spaces with dual spaces X′ and Y′. For a bounded linear operator $$B: X \rightarrow Y^{\prime}$$ its adjoint $$B^{\prime}: Y \rightarrow X^{\prime}$$ is given by \begin{equation*} \langle B^{\prime}y,x\rangle = \langle Bx,y\rangle \quad \forall\, x\in X, \ y \in Y, \end{equation*} where $$\left \langle \cdot , \cdot \right \rangle$$ the denotes the duality product. For a bounded linear operator $$L: X \rightarrow Y$$ its Hilbert space adjoint $$L^{*}: Y \rightarrow X$$ is given by \begin{equation*} \left(L^{*}y, x\right)_{X} = \left(Lx, y\right)_{Y} \quad \forall\,x\in X, \ y \in Y, \end{equation*} where $$\left (\cdot , \cdot \right )_X$$ and $$\left (\cdot , \cdot \right )_Y$$ are the inner products of X and Y with associated norms $$\|\cdot \|_X$$ and $$\|\cdot \|_Y$$, respectively. Let X = U × V with Hilbert spaces U and V. Then its dual X′ can be identified with U′× V′. For a linear operator $$T : U \times V \rightarrow U^{\prime} \times V^{\prime}$$ of a 2-by-2 block form \begin{equation*} T = \begin{pmatrix} A & C \\ B & D \end{pmatrix}, \end{equation*} with an invertible operator $$A : U \rightarrow U^{\prime}$$, its Schur complement $$\operatorname{Schur} T : V \rightarrow V^{\prime}$$ is given by \begin{equation*} \operatorname{Schur} T = D - B A^{-1} C . \end{equation*} With this notation we will now precisely formulate the assumptions on problem (1.1) as already indicated in the introduction. Let $$\textbf{X} = X_1 \times X_2\times \cdots \times X_n$$ with Hilbert spaces $$X_i$$ for i = 1, 2, … , n, and let the linear operator $$\mathcal{A}: \textbf{X} \rightarrow \textbf{X}^{\prime}$$ be of n-by-n block tridiagonal form $$\mathcal{A} = \left(\begin{array}{cccc} A_1 & B_1^{\prime} & & \\ B_1 & -A_2 & \ddots & \\ &\ddots & \ddots & B_{n-1}^{\prime} \\[1ex] & & B_{n-1} & (-1)^{n-1}A_n \end{array}\right),$$ (2.1) where $$A_i:X_i \rightarrow X_i^{\prime}$$, $$B_i: X_i \rightarrow X_{i+1}^{\prime}$$ are bounded operators, and, additionally, $$A_i$$ are self-adjoint and positive semidefinite, that is, $$\langle A_i y_i,x_i \rangle = \langle A_i x_i,y_i \rangle \quad \textrm{and} \quad \langle A_i x_i, x_i \rangle \geq 0 \quad \forall\, x_i,\ y_i \in X_i.$$ (2.2) The basic assumption of the whole paper is now that the operators $$\mathcal{A}_i$$ consisting of the first i rows and columns of $$\mathcal{A}$$ are invertible operators from $$X_1 \times \cdots \times X_i$$ to $$X_1^{\prime} \times \cdots \times X_i^{\prime}$$. That allows one to introduce the linear operators \begin{equation*} S_{i+1} = (-1)^i \, \operatorname{Schur} \mathcal{A}_{i+1} \quad \textrm{for}\enspace i=1,\ldots,n-1, \end{equation*} where, for the definition of the Schur complement, $$\mathcal{A}_{i+1}$$ is interpreted as the 2-by-2 block operator \begin{equation*} \mathcal{A}_{i+1} = \begin{pmatrix} \mathcal{A}_i & \textbf{B}_i^{\prime} \\ \textbf{B}_i & (-1)^i \, A_{i+1} \end{pmatrix}, \quad \textrm{where} \quad \textbf{B}_i = \left(\begin{array}{cccc} \!\!0 & \ldots & 0 & B_i\!\! \end{array}\right) . \end{equation*} It is easy to see that $$S_{i+1} = A_{i+1} + B_i S^{-1}_i B_i^{\prime}, \quad \textrm{for}\enspace i = 1, 2,\ldots,n-1,$$ (2.3) with initial setting $$S_1 = A_1$$. The following basic result holds. Lemma 2.1 Assume that $$A_i:X_i \rightarrow X_i^{\prime}$$, i = 1, 2… , n are bounded operators satisfying (2.2), $$B_i: X_i \rightarrow X_{i+1}^{\prime}$$, i = 1, … , n − 1 are bounded operators and $$S_i$$, i = 1, 2, … , n, given by (2.3), are well defined and positive definite, that is, \begin{equation*} \langle S_i x_i, x_i \rangle \geq \sigma_i \, \|x_i\|_{X_i}^2 \quad \forall\, x_i \in X_i, \end{equation*} for some positive constants $$\sigma _i$$. Then $$\mathcal{A}$$ is an isomorphism from X to X′. Proof. From the lemma of Lax–Milgram it follows that $$S_i$$, i = 1, 2, … , n are invertible, which allows us to derive a block LU-decomposition of $$\mathcal{A}$$ into invertible factors: \begin{equation*} \mathcal{A} = \left(\begin{array}{cccc} I& & & \\ B_1 S_1^{-1} & I && \\ & \ddots & \ddots \\ && (-1)^{n-2} B_{n-1} S_{n-1}^{-1} & I \end{array}\right) \left(\begin{array}{cccc} S_1 & B_1^{\prime} & & \\ & -S_2 & \ddots& \\ & & \ddots & B_{n-1}^{\prime}\\ & & & (-1)^{n-1} \, S_n \end{array}\right) . \end{equation*} So $$\mathcal{A}$$ is a bounded linear operator, which is invertible. Therefore, $$\mathcal{A}$$ is an isomorphism by the open mapping theorem. With a slight abuse of notation we call $$S_i$$ Schur complements, although they are actually positive or negative Schur complements in the literal sense. Under the assumptions made so far we define the Schur complement preconditioner as the block diagonal linear operator $$\mathcal{S}(\mathcal{A}) \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ given by $$\mathcal{S}(\mathcal{A}) = \left(\begin{array}{cccc} S_1 & & & \\ & S_2 & & \\ & & \ddots & \\ & & & S_n \end{array}\right).$$ (2.4) If it is clear from the context which operator $$\mathcal{A}$$ is meant, we will omit the argument $$\mathcal{A}$$ and simply use $$\mathcal{S}$$ for the Schur complement preconditioner. Since $$\mathcal{S}$$ is bounded, self-adjoint and positive definite it induces an inner product on X by \begin{equation*} \left(\textbf{x},\textbf{y}\right)_{\mathcal{S}} = \left\langle \mathcal{S}\textbf{x},\textbf{y}\right\rangle \quad \textrm{for }\, \textbf{x},\textbf{y}\in \textbf{X}, \end{equation*} whose associated norm $$\|\textbf{x}\|_{\mathcal{S}} = \left (\textbf{x},\textbf{x}\right )_{\mathcal{S}}^{1/2}$$ is equivalent to the canonical product norm in X. Note that \begin{equation*} \left(\textbf{x},\textbf{y}\right)_{\mathcal{S}} = \sum_{i=1}^n (x_i,y_i)_{S_i} \quad \textrm{with} \enspace (x_i,y_i)_{S_i} = \langle S_i x_i,y_i\rangle. \end{equation*} Here $$x_i \in X_i$$ and $$y_i \in X_i$$ denote the ith component of x ∈ X and y ∈ X, respectively. From now on we assume that the spaces X and X′ are equipped with the norm $$\Vert .\Vert _{\mathcal{S}}$$ and the associated dual norm, respectively. The question of whether (1.1) is well posed translates to the question of whether $$\mathcal{A} \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ is an isomorphism. The condition number $$\kappa (\mathcal{A})$$, given by \begin{equation*} \kappa\left(\mathcal{A}\right) = \Vert \mathcal{A}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}^{\prime}\right)} \|\mathcal{A}^{-1}\|_{\mathcal{L}\left(\textbf{X}^{\prime},\textbf{X}\right)}, \end{equation*} measures the sensitivity of the solution of (1.1) with respect to data perturbations. Here $$\mathcal{L}(X,Y)$$ denotes the space of bounded linear operators from X to Y, equipped with the standard operator norm. By the Riesz representation theorem the linear operator $$\mathcal{S} \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ is an isomorphism from X to X′. Therefore, $$\mathcal{A}$$ is an isomorphism if and only if $$\mathcal{M}\colon \textbf{X} \rightarrow \textbf{X}$$, given by \begin{equation*} \mathcal{M} = \mathcal{S}^{-1}\mathcal{A}, \end{equation*} is an isomorphism. In this context the operator $$\mathcal{S}$$ can be seen as a preconditioner for $$\mathcal{A}$$ and $$\mathcal{M}$$ is the associated preconditioned operator. Moreover, it is easy to see that \begin{equation*} \|\mathcal{A}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}^{\prime}\right)} = \|\mathcal{M}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \end{equation*} and, in the case of well-posedness, \begin{equation*} \kappa\left(\mathcal{A}\right) = \kappa\left(\mathcal{M}\right) \quad \textrm{with} \quad \kappa\left(\mathcal{M}\right) = \Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \|\mathcal{M}^{-1}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)}. \end{equation*} The condition number $$\kappa (\mathcal{M})$$ is of significant influence on the convergence rate of preconditioned Krylov subspace methods for solving (1.1). We will now derive bounds for $$\kappa (\mathcal{M})$$, from which we will simultaneously learn about both the efficiency of the preconditioner $$\mathcal{S}$$ and the well-posedness of (1.1) with respect to the norm $$\Vert .\Vert _{\mathcal{S}}$$. See the study by Mardal & Winther (2011) for more on this topic of operator preconditioning. We start the discussion by observing that $$\mathcal{M} = \left(\begin{array}{cccc} I & C^{*}_1 & & \\ C_1 & -\left(I-C_1C^{*}_1\right) & \ddots & \\ & \ddots& \ddots& C^{*}_{n-1}\\ & & C_{n-1} & (-1)^{n-1}\left(I-C_{n-1}C^{*}_{n-1}\right) \end{array}\right),$$ (2.5) where \begin{equation*} C_{i}=S^{-1}_{i+1}B_i \quad\textrm{for } \, i = 1, 2,\ldots,n-1. \end{equation*} For its Hilbert space adjoint $$C^{*}_{i}$$ with respect to the inner products $$\left (x_i,y_i\right )_{S_i}$$ and $$\left (x_{i+1},y_{i+1}\right )_{S_{i+1}}$$ we have the following representation: \begin{equation*} C^{*}_{i}=S^{-1}_iB_i^{\prime}\quad\textrm{for } \, i = 1, 2,\ldots,n-1. \end{equation*} In the next two theorems we will derive bounds for the norm of $$\mathcal{M}$$ and its inverse. Theorem 2.2 For the operator $$\mathcal{M}$$ the following estimate holds: \begin{equation*} \Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \leq 2\cos\left(\frac{\pi}{2n+1}\right). \end{equation*} Proof. First we note that the norm can be written in the following way: $$\Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} = \sup_{0 \neq \textbf{x}\in \textbf{X}} \frac{\vert\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}\vert}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}}.$$ (2.6) We will now estimate the numerator $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$. Let x ∈ X and let $$x_i \in X_i$$ be the ith component of x. Then it follows from (2.5) that \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} = \Vert x_1\Vert_{S_1}^2 + 2\sum^{n-1}_{i=1} \left(C^{*}_{i} x_{i+1}, x_i\right)_{S_i} + \sum^{n-1}_{i=1} (-1)^{i} \left(\left(I-C_{i}C^{*}_{i}\right)x_{i+1}, x_{i+1}\right)_{S_{i+1}} . \end{equation*} By applying Cauchy’s inequality and Young’s inequality we obtain for parameters $$\epsilon _i> 0$$, \begin{align*} 2\left(C^{*}_{i} x_{i+1}, x_i\right)_{S_i} \leq 2\left\Vert C^{*}_{i} x_{i+1}\right\Vert_{S_i} \Vert x_i\Vert_{S_i} &\leq \epsilon_i\left(C^{*}_{i}x_{i+1}, C^{*}_{i}x_{i+1}\right)_{S_i}+ \frac{1}{\epsilon_i} \Vert x_i\Vert_{S_i}^2\\ &= \epsilon_i\left(C_{i}C^{*}_{i}x_{i+1}, x_{i+1}\right)_{S_{i+1}} + \frac{1}{\epsilon_i} \Vert x_i\Vert_{S_i}^2. \end{align*} Therefore, \begin{align*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \Vert x_1\Vert_{S_1}^2 + \sum^{n-1}_{i=1} \frac{1}{\epsilon_i}\Vert x_{i}\Vert_{S_{i}}^2 +\sum^{n-1}_{i=1} \left( \epsilon_i - (-1)^i\right) \, \left(C_{i}C^{*}_{i}x_{i+1}, x_{i+1}\right)_{S_{i+1}}+ \sum^{n-1}_{i=1} (-1)^{i}\Vert x_{i+1}\Vert_{S_{i+1}}^2. \end{align*} Since $$A_{i+1}$$ is positive semidefinite it follows that \begin{align} \left(C_iC^{*}_ix_{i+1}, x_{i+1}\right)_{S_{i+1}} =&\, \left\langle B_{i}S^{-1}_{i}B_{i}^{\prime}x_{i+1}, x_{i+1}\right\rangle \nonumber\\ \leq&\, \left\langle A_{i+1}x_{i+1}, x_{i+1}\right\rangle +\left\langle B_{i}S^{-1}_{i}B_{i}^{\prime}x_{i+1}, x_{i+1}\right\rangle = \left\langle S_{i+1}x_{i+1}, x_{i+1}\right\rangle = \Vert x_{i+1}\Vert_{S_{i+1}}^2. \end{align} (2.7) Now we make an essential assumption on the choice of the parameters $$\epsilon_i$$: $$\epsilon_i \geq 1 \quad \textrm{for} \ i = 1,\ldots,n-1.$$ (2.8) By using (2.8) and (2.7) the estimate for $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$ from above simplifies to \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \Vert x_1\Vert_{S_1}^2 + \sum^{n-1}_{i=1}\frac{1}{\epsilon_i} \Vert x_{i}\Vert_{S_{i}}^2 +\sum^{n-1}_{i=1} \epsilon_i \Vert x_{i+1}\Vert_{S_{i+1}}^2 = \textbf{y}^\top D_u \, \textbf{y}, \end{equation*} where \begin{equation*} D_u = \left(\begin{array}{ccccc} 1+\frac{1}{\epsilon_1} & & & &\\ & \epsilon_1 +\frac{1}{\epsilon_2}& & &\\ & & \ddots & & \\ & & & \epsilon_{n-2} +\frac{1}{\epsilon_{n-1}} & \\ & & & & \epsilon_{n-1} \end{array}\right) \quad \textrm{and} \quad \textbf{y} = \left(\begin{array}{c} \|x_1\|_{S_1} \\ \|x_2\|_{S_2} \\ \vdots \\ \|x_n\|_{S_i} \end{array}\right). \end{equation*} Next we choose $$\epsilon_1, \ldots , \epsilon_{n-1}$$ such that the diagonal elements in $$D_u$$ are all equal, that is, \begin{equation*} 1 +\frac{1}{\epsilon_1} = \epsilon_1 +\frac{1}{\epsilon_2} = \cdots = \epsilon_{n-2} +\frac{1}{\epsilon_{n-1}} = \epsilon_{n-1}. \end{equation*} We can successively eliminate $$\epsilon_1, \ldots , \epsilon_{n-2}$$ from these equations and obtain \begin{equation*} 1 = \epsilon_{n-1} -\cfrac{1}{\epsilon_1} = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_2} } = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_3}} } = \cdots , \end{equation*} which eventually leads to the following equation for $$\epsilon_{n-1}$$: $$1 = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \ddots \genfrac{}{}{0pt}{}{}{\cfrac{1}{\epsilon_{n-1}}} }}} \ .$$ (2.9) The right-hand side of (2.9) is a continued fraction of depth n−1. It can easily be shown that this continued fraction is a rational function in $$\epsilon_{n-1}$$ of the form $$P_n(\epsilon_{n-1})/P_{n-1}(\epsilon_{n-1})$$, where $$P_j(\epsilon )$$ are polynomials of degree j, recursively given by \begin{equation*} P_0 (\epsilon) = 1, \quad P_1(\epsilon) = \epsilon \quad \textrm{and} \quad P_{i+1}(\epsilon) = \epsilon \, P_i(\epsilon)-P_{i-1}(\epsilon) \quad \textrm{for } i \geq 1. \end{equation*} Therefore, (2.9) becomes $$1 = P_n(\epsilon_{n-1})/P_{n-1}(\epsilon_{n-1})$$ or, equivalently, \begin{equation*} \bar{P}_n(\epsilon_{n-1}) = 0 \quad \textrm{with} \quad \bar{P}_j(\epsilon) = P_j(\epsilon) - P_{j-1}(\epsilon). \end{equation*} For the other parameters $$\epsilon_1,\ldots ,\epsilon_{n-2}$$ it follows that \begin{equation*} \epsilon_{n-i} = \frac{P_{i}(\epsilon_{n-1})}{P_{i-1}(\epsilon_{n-1}) } \quad \textrm{for}\ \ i = 2,\ldots,n-1. \end{equation*} With this setting of the parameters the basic assumption (2.8) is equivalent to the following conditions: $$\bar{P}_i(\epsilon_{n-1}) \geq 0 \quad \textrm{and} \quad \epsilon_{n-1} \geq 1.$$ (2.10) To summarize, the parameter $$\epsilon_{n-1}$$ must be a root of $$\bar{P}_n$$ satisfying (2.10). In the appendix it will be shown that \begin{equation*} \epsilon_{n-1} = 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which is the largest root of $$\bar{P}_n$$, is an appropriate choice. Hence, \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \textbf{y}^\top D_u \, \textbf{y} = 2\cos\left(\frac{\pi}{2n+1}\right)\left(\textbf{x}, \textbf{x}\right)_{\mathcal{S}}, \end{equation*} and therefore \begin{equation*} \frac{\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}} \leq 2\cos\left(\frac{\pi}{2n+1}\right) . \end{equation*} Following the same line of arguments a lower bound of $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$ can be derived: \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \geq \textbf{y}^\top D_l \, \textbf{y}, \end{equation*} where \begin{equation*} D_l = \left(\begin{array}{ccccc} 1-\frac{1}{\epsilon_1} & & & &\\ & -\epsilon_1 -\frac{1}{\epsilon_2}& & &\\ & & \ddots & & \\ & & & -\epsilon_{n-2} -\frac{1}{\epsilon_{n-1}} & \\ & & & & -\epsilon_{n-1} \end{array}\right), \end{equation*} with the same values for $$\epsilon_i$$ as before. From comparing $$D_u$$ and $$D_l$$ it follows that the diagonal elements of $$D_l$$ are equal to $$-2\cos (\pi /(2n+1))$$, except for the first element, which has the larger value $$2-2\cos \left (\pi /(2n+1)\right )$$. This directly implies \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \geq \left(D_l\textbf{x}, \textbf{x}\right)_{\mathcal{S}}\geq -2\cos\left(\frac{\pi}{2n+1}\right)\left(\textbf{x}, \textbf{x}\right)_{\mathcal{S}}, \end{equation*} and therefore \begin{equation*} - 2\cos\left(\frac{\pi}{2n+1}\right) \le \frac{\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}}. \end{equation*} To summarize, we have shown that \begin{equation*} \frac{\vert\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}\vert}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}} \leq 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which completes the proof using (2.6). For investigating the inverse operator $$\mathcal{M}^{-1}$$ notice first that $$\mathcal{M} = \mathcal{M}_n$$, where $$\mathcal{M}_j$$, j = 1, 2, … , n are recursively given by \begin{equation*} \mathcal{M}_1 = I, \quad \mathcal{M}_{i+1} = \left(\begin{array}{ccc} \mathcal{M}_i & \textbf{C}^{*}_i \\ \textbf{C}_i & (-1)^i \, \big(I - C_i C_i^{\ast}\big) \end{array}\right) \quad \textrm{with} \ \textbf{C}_i = \left(0\quad \ldots\quad 0\quad C_i \right) \quad \textrm{for}\ \ i \geq 1. \end{equation*} Under the assumptions of Lemma 2.1 one can show by elementary calculations that $$\mathcal{M}_j^{-1}$$, j = 1, 2, … , n exist and satisfy the following recurrence relation: $$\mathcal{M}^{-1}_1 = I, \quad \mathcal{M}^{-1}_{i+1} = \begin{pmatrix} \mathcal{M}^{-1}_i & 0 \\ 0 & 0 \end{pmatrix} + (-1)^i \begin{pmatrix} \mathcal{M}^{-1}_i\textbf{C}^{*}_i\textbf{C}_i \mathcal{M}^{-1}_i & - \mathcal{M}^{-1}_i\textbf{C}^{*}_i \\ - \textbf{C}_i\mathcal{M}^{-1}_i & I \end{pmatrix} \quad \textrm{for}\ \ i \geq 1.$$ (2.11) Theorem 2.3 The operator $$\mathcal{M}$$ is invertible and we have \begin{equation*} \big\Vert \mathcal{M}^{-1}\big\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \leq \frac{1}{2\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)} . \end{equation*} Proof. Let $$\textbf{x} \in \textbf{X} = X_1\times \cdots \times X_n$$ with components $$x_i \in X_i$$. The restriction of x to its first j components is denoted by $$\textbf{x}_j \in X_1\times \cdots \times X_j$$. The corresponding restriction of $$\mathcal{S}$$ to its first j components is denoted by $$\mathcal{S}_j$$. From (2.11) we obtain \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} = \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} + (-1)^i \, \left\|\textbf{C}_i\mathcal{M}^{-1}_i \textbf{x}_i - x_{i+1}\right\|_{S_{i+1}}^2, \end{equation*} which implies that $$\left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \begin{cases} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} & \textrm{for odd}\ \ i, \\[4pt] \ge \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} & \textrm{for even}\ \ i . \end{cases}$$ (2.12) For estimates in the opposite direction observe that (2.11) also implies that \begin{aligned} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} =&\, \left(\left[\mathcal{M}_i^{-1} + (-1)^i \, \mathcal{M}^{-1}_i\textbf{C}^{*}_i\textbf{C}_i \mathcal{M}^{-1}_i\right] \, \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \\ &+ (-1)^i \, \left[ \Vert x_{i+1}\Vert_{S_{i+1}}^2 - 2 \, \left(\textbf{C}_i\mathcal{M}^{-1}_i \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} \right]. \end{aligned} (2.13) For the first term on the right-hand side of (2.13) observe that \begin{equation*} \mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i = \begin{pmatrix} \mathcal{M}_{i-1}^{-1} & 0 \\[2pt] 0 & 0 \end{pmatrix} + (-1)^{i-1} \, \mathcal{P}_{i} \quad \textrm{for} \ i> 1 \end{equation*} with \begin{equation*} \mathcal{P}_i = \begin{pmatrix} \mathcal{M}_{i-1}^{-1} \textbf{C}_{i-1}^{\ast} \big(I -C_i^{\ast} \, C_i\big) \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & -\mathcal{M}_{i-1}^{-1} \textbf{C}_{i-1}^{\ast} \big(I -C_i^{\ast} \, C_i\big)\\[4pt] - \big(I -C_i^{\ast} \, C_i\big) \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & \big(I -C_i^{\ast} \, C_i\big) \end{pmatrix}, \end{equation*} which easily follows by using (2.11) with i replaced by i − 1. The operator $$\mathcal{P}_i$$ is positive semidefinite, since \begin{equation*} ( \mathcal{P}_i \textbf{x}_i, \textbf{x}_i )_{\mathcal{S}_i} = \left(\left[I -C_i^{\ast} \, C_i\right]\left(\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1} - x_i\right), \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1} - x_i\right)_{S_i} \end{equation*} and $$I -C_i^{\ast } \, C_i$$ is positive semidefinite. The positive semidefiniteness of $$I -C_i^{\ast } \, C_i$$ is a consequence of (2.7), which can also be written as \begin{equation*} \left\|C_i^{\ast} x_{i+1}\right\|_{S_i} \le \|x_{i+1}\|_{S_{i+1}} \quad \forall \, x_{i+1} \in X_{i+1}, \end{equation*} that is $$\|C_i^{\ast }\|_{L(X_{i+1},X_i)} \le 1$$. Since $$\|C_i\|_{L(X_i,X_{i+1})} = \|C_i^{\ast }\|_{L(X_{i+1},X_i)}$$ we have $$\|C_i\|_{L(X_i,X_{i+1})} \le 1$$, that is $$\|C_i x_i\|_{S_{i+1}} \le \|x_i\|_{S_i} \quad \forall \, x_i \in X_i,$$ (2.14) or, equivalently, $$I -C_i^{\ast } \, C_i$$ is positive semidefinite. Therefore, \begin{equation*} \left(\left[\mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i\right] \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \begin{cases} \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} & \textrm{for odd} \ i {}> 1, \\[6pt] \le \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} & \textrm{for even} \ i. \end{cases} \end{equation*} Then it follows from (2.13) that for odd i > 1, \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} + 2 \left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} - \|x_{i+1}\|_{S_{i+1}}^2 \end{equation*} and for even i, \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} - 2 \left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} + \|x_{i+1}\|_{S_{i+1}}^2 . \end{equation*} In order to estimate $$(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1})_{S_{i+1}}$$ observe that \begin{equation*} \textbf{C}_i \mathcal{M}_i^{-1} = -(-1)^i \, C_i \, \begin{pmatrix} -\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & I \end{pmatrix} \quad \forall \, i> 1, \end{equation*} which is obtained from (2.11) with i replaced by i − 1 by multiplying with $$\textbf{C}_i$$ from the left. By using (2.14) it follows that \begin{align*} \big\|\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i\big\|_{S_{i+1}} & \le \left\| C_i \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1}\right\|_{S_{i+1}} + \|C_i x_i\|_{S_{i+1}} \\ & \le \left\|\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1}\right\|_{S_i} + \|x_i\|_{S_i} \quad \forall \, i> 1, \end{align*} which recursively applied eventually leads to \begin{align*} \left\|\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i\right\|_{S_{i+1}} &\le \left\|\textbf{C}_1 \mathcal{M}_1^{-1} \textbf{x}_1\right\|_{S_{2}}+ \sum_{j=2}^i \|x_j\|_{S_j}\\ &= \|C_1 x_1\|_{S_{2}}+ \sum_{j=2}^i \|x_j\|_{S_j} \le \sum_{j=1}^i \|x_j\|_{S_j} \quad \forall \, i \ge 1, \end{align*} using (2.14). Hence, \begin{equation*} \left|\left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}}\right| \le \sum_{j=1}^i \|x_j\|_{S_j} \, \|x_{i+1}\|_{S_{i+1}} \quad \textrm{for} \ i \ge 1. \end{equation*} Using this estimate we obtain for odd i > 1, \begin{align*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} & \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} - 2 \sum_{j=1}^i \|x_j\|_{S_j} \, \|x_{i+1}\|_{S_{i+1}} - \|x_{i+1}\|_{S_{i+1}}^2 \\ & = \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} + y_{i+1}^\top L_{i+1} \, y_{+1}, \end{align*} where $$y_j= (\Vert x_1\Vert _{S_1},\Vert x_2\Vert _{S_2},\ldots ,\Vert x_j\Vert _{S_j})^\top$$ and $$L_{i+1}$$ is the $$(i+1)\times (i+1)$$ matrix whose only nonzero entries are −1 in the last row and last column. For the leftover case i = 1 we have \begin{equation*} \left(\left[\mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i\right] \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} = \left(\left[I - C_1^{\ast} C_1\right] x_1,x_1\right)_{S_1} \ge 0. \end{equation*} Then it follows directly from (2.13) that \begin{align*} \left(\mathcal{M}_2^{-1} \textbf{x}_2, \textbf{x}_2\right)_{\mathcal{S}_2} & \ge 2 \, \left(\textbf{C}_1\mathcal{M}^{-1}_1 \textbf{x}_1,x_2\right)_{S_2} - \Vert x_2\Vert_{S_2}^2 \\ & = 2 \, \left(\textbf{C}_1\mathcal{M}^{-1}_1 \textbf{x}_1,x_2\right)_{S_2} - \Vert x_2\Vert_{S_2}^2 \\ & \ge - 2 \, \|x_1\|_{S-1} \, \|x_2\|_{S_2} - \Vert x_2\Vert_{S_2}^2 = \textbf{y}_2^\top L_2 \textbf{y}_2 . \end{align*} Applying these estimates recursively eventually leads to \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \ge y_j^\top Q_j \, y_j, \end{equation*} where $$Q_j$$, j = 2, 4, 6, … are given by the recurrence relation \begin{equation*} Q_2 = \begin{pmatrix} 0 & -1 \\ -1 & -1\end{pmatrix}, \quad Q_{i+2} = \left(\begin{array}{@{}c|cc@{}} Q_{i} & \begin{matrix} 0 \\ \vdots \end{matrix} & \begin{matrix} -1 \\ \vdots \end{matrix} \\ \hline \begin{matrix} 0 & \cdots \end{matrix} & 0 & -1\\ \begin{matrix} -1 & \cdots \end{matrix} & -1 & -1 \end{array}\right) \quad \textrm{for} \ i = 2,4,\ldots . \end{equation*} Therefore, \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \ge - \|Q_j\| \, \left(\textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \quad \textrm{for even} \ j, \end{equation*} where $$\|Q_j\|$$ denotes the spectral norm of $$Q_j$$. It follows analogously that \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \le \|Q_j\| \, \left(\textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \quad \textrm{for odd} \ j, \end{equation*} where $$Q_j$$, j = 1, 3, 5, … are given by the recurrence relation \begin{equation*} Q_1 = 1, \quad Q_{i+2} = \left(\begin{array}{@{}c|cc@{}} Q_{i} & \begin{matrix} 0 \\ \vdots \end{matrix} & \begin{matrix} 1 \\ \vdots \end{matrix} \\ \hline \begin{matrix} 0 & \cdots \end{matrix} & 0 & 1\\ \begin{matrix} 1 & \cdots \end{matrix} & 1 & 1 \end{array}\right) \quad \textrm{for} \ i = 1,3,\ldots \end{equation*} Together with (2.12) it follows for odd i that \begin{equation*} - \|Q_{i+1}\| \, \left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \le \|Q_i\| \, \left(\textbf{x}_i, \textbf{x}_i\right)_{\mathcal{S}_i}, \end{equation*} and for even i that \begin{equation*} - \|Q_i\| \, \left(\textbf{x}_i, \textbf{x}_i\right)_{\mathcal{S}_i} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \le \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \|Q_{i+1}\| \, \left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}. \end{equation*} So in both cases we obtain \begin{equation*} \frac{\left|\left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}\right|} {\left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}} \le \max\left(\|Q_i\|,\|Q_{i+1}\|\right) \quad \forall\, \textbf{x}_{i+1} \neq 0. \end{equation*} Since \begin{equation*} \|Q_j\| = \frac{1}{2\sin\left(\frac{\pi}{2\left(2 j+1\right)}\right)} \end{equation*} (see the appendix), the proof is completed. A direct consequence of Theorems 2.2 and 2.3 is the following corollary. Corollary 2.4 Under the assumptions of Lemma 2.1 the block operators $$\mathcal{A}$$ and $$\mathcal{M}$$, given by (2.1) and (2.5), are isomorphisms from X to X′ and from X to X, respectively. Moreover, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{A}\right)= \kappa\left(\mathcal{M}\right) \leq \frac{\cos\left(\frac{\pi}{2n+1}\right)}{\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)} . \end{equation*} Remark 2.5 For n = 2 we have $$2 \sin \left( \frac{\pi}{10} \right) = \frac{1}{2}\big(\sqrt{5}-1\big) \quad \textrm{and} \quad 2 \cos \left( \frac{\pi}{5} \right) = \frac{1}{2}\big(\sqrt{5}+1\big),$$ and therefore \begin{equation*} \kappa\left(\mathcal{M}\right) \le \frac{\sqrt{5}+1}{\sqrt{5}-1} = \frac{3+\sqrt{5}}{2}. \end{equation*} This result is well known for finite-dimensional spaces; see the studies by Kuznetsov (1995) and Murphy et al. (2000). In Kuznetsov (1995) and Murphy et al. (2000) it was also shown for the case n = 2 and $$A_2 = 0$$ that $$\mathcal{M}$$ has only three eigenvalues: \begin{equation*} \left\{-\frac{1}{2}\big(\sqrt{5}-1\big), 1, \frac{1}{2}\big(\sqrt{5}+1\big) \right\}. \end{equation*} This result can also be extended for n ≥ 2 and for general Hilbert spaces. Theorem 2.6 If the assumptions of Lemma 2.1 hold and if, additionally, $$A_i=0$$ for i = 2, … , n, then the set $$\sigma _p(\mathcal{M})$$ of all eigenvalues of $$\mathcal{M}$$ is given by \begin{equation*} \sigma_p(\mathcal{M}) = \left\lbrace 2\cos\left(\frac{2i-1}{2j+1}\pi\right) \colon j = 1,\ldots,n, \ i = 1,\ldots, j \right\rbrace. \end{equation*} Moreover, \begin{equation*} \|\mathcal{M}\|_{\mathcal{L}(\textbf{X},\textbf{X})} = \max \left\{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \right\} = 2\cos\left(\frac{\pi}{2n+1}\right) \end{equation*} and \begin{equation*} \big\|\mathcal{M}^{-1}\big\|_{\mathcal{L}(\textbf{X},\textbf{X})} = \max \left\{\frac{1}{|\lambda|} : \lambda \in \sigma_p(\mathcal{M}) \right\} = \frac{1}{2\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)}. \end{equation*} So, equality is attained in the estimates of Theorems 2.2, 2.3 and Corollary 2.4. All estimates are sharp. Proof. Since $$A_i=0$$ for i = 2, … , n it follows that $$C_iC^{*}_i = I$$ and the block operator $$\mathcal{M}$$ simplifies to \begin{equation*} \mathcal{M} = \left(\begin{array}{cccc} I & C^{*}_1 & &\\ C_1 & 0 & \ddots &\\ & \ddots& \ddots& C^{*}_{n-1}\\ & & C_{n-1} & 0 \end{array}\right). \end{equation*} The eigenvalue problem $$\mathcal{M} \textbf{x} = \lambda \, \textbf{x}$$ reads, in detail, \begin{align*} x_1 +C^{*}_{1}x_2 &= \lambda x_1,\\ C_1x_1 +C^{*}_{2}x_3 &= \lambda x_2,\\ &\vdots\\ C_{n-2}x_{n-2} +C^{*}_{n-1}x_{n} &= \lambda x_{n-1},\\ C_{n-1}x_{n-1} &= \lambda x_{n}. \end{align*} From the first equation \begin{equation*} C^{*}_{1} x_2 = \bar{P}_1(\lambda) \, x_1, \quad \textrm{where} \quad \bar{P}_1(\lambda) = \lambda - 1, \end{equation*} we conclude that the root $$\lambda _{11} = 1$$ of $$\bar{P}_1(\lambda )$$ is an eigenvalue by setting $$x_2 = x_3 = \cdots = x_n = 0$$. If $$\lambda \neq \lambda _{11}$$ then we obtain from the second equation, by eliminating $$x_1$$, \begin{equation*} C^{*}_{2}x_3 = C_1 x_1 - \lambda \, x_2 = \frac{1}{\bar{P}_1(\lambda)} \, C_1 C_1^{\ast} x_2 - \lambda \, x_2 = \frac{1}{\bar{P}_1(\lambda)} \, x_2 - \lambda \, x_2 = R_2(\lambda) \, x_2, \end{equation*} where \begin{equation*} R_2(\lambda) = \lambda -\frac{1}{\bar{P}_1(\lambda)} = \frac{\bar{P}_2(\lambda)}{\bar{P}_1(\lambda)} \quad \textrm{with} \quad \bar{P}_2(\lambda) = \lambda \, \bar{P}_1(\lambda) -1. \end{equation*} We conclude that the two roots $$\lambda _{21}$$ and $$\lambda _{22}$$ of the polynomial $$\bar{P}_2(\lambda )$$ of degree 2 are eigenvalues by setting $$x_3 = \cdots = x_n = 0$$. Repeating this procedure gives \begin{equation*} C^{*}_{j}x_{j+1} = R_j(\lambda) \, x_j, \textrm{ for } j=2,\ldots,n-1, \textrm{ and } 0 = R_n(\lambda) \, x_n \textrm{ with } R_j(\lambda) = \frac{\bar{P}_j(\lambda)}{\bar{P}_{j-1}(\lambda)}, \end{equation*} where the polynomials $$\bar{P}_j(\lambda )$$ are recursively given by \begin{equation*} \bar{P}_0(\lambda) = 1,\quad \bar{P}_1(\lambda) = \lambda - 1, \quad \bar{P}_{i+1}(\lambda) = \lambda \, \bar{P}_i(\lambda) - \bar{P}_{i-1}(\lambda) \quad \textrm{for}\ \, i \ge 1. \end{equation*} So the eigenvalues of $$\mathcal{M}$$ are the roots of the polynomials $$\bar{P}_1\left (\lambda \right ),\bar{P}_2(\lambda ),\ldots ,\bar{P}_n\left (\lambda \right )$$. For the roots of $$\bar{P}_j\left (\lambda \right )$$ we obtain \begin{equation*} \lambda = 2 \,\cos\left(\frac{2i-1}{2j+1}\pi\right) \quad \textrm{for } i= 1,\ldots,j; \end{equation*} see Lemma A1. It is easy to see that \begin{equation*} \max \{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \} = 2\cos\left(\frac{\pi}{2n+1}\right). \end{equation*} Therefore, with Theorem 2.2 it follows that \begin{equation*} 2\cos\left(\frac{\pi}{2n+1}\right) \ge \|\mathcal{M}\|_{\mathcal{L}(\textbf{X},\textbf{X})} \ge \max \left\{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \right\} = 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which implies equality. An analogous argument applies to $$\mathcal{M}^{-1}$$. 3. Application: optimization problems in function space In this section we apply the theory from Section 2 to optimization problems in function spaces with an elliptic partial differential equation as constraint. First, we present a standard model problem. Then we look at two more challenging variations of the model problem in Sections 3.2 and 3.3. 3.1 Distributed observation and distributed control We start with the following model problem: find u ∈ U and $$f \in F = L^2(\varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u + f &= 0\quad \textrm{in} \quad \varOmega,\\ u &= 0 \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$\varOmega$$ is a bounded open subset of $$\mathbb{R}^d$$ with Lipschitz boundary $$\partial \varOmega$$, $$\varDelta$$ is the Laplacian operator, $$d\in L^2\left (\varOmega \right )$$, $$\alpha> 0$$ are given data and $$U \subset L^2\left (\varOmega \right )$$ is a Hilbert space. Here $$L^2(\varOmega )$$ denotes the standard Lebesgue space of square-integrable functions on $$\varOmega$$ with inner product $$\left (\cdot , \cdot \right )_{L^2(\varOmega )}$$ and norm $$\Vert \cdot \Vert _{L^2(\varOmega )}$$. This problem can be seen either as an inverse problem for identifying f from the data d or as an optimal control problem with state u, control f and the desired state d. In the first case the parameter $$\alpha$$ is a regularization parameter, and in the second case, a cost parameter. Throughout this paper we adopt the terminology of an optimal control problem and call U the state space and F the control space. We discuss now the construction of preconditioners for the associated optimality system such that the condition number of the preconditioned system is bounded independently of $$\alpha$$. We will call such preconditioners $$\alpha$$-robust. This is of particular interest in the context of inverse problems, where $$\alpha$$ is typically small, in which case the unpreconditioned operator becomes severely ill posed. The problem is not yet fully specified. We need a variational formulation for the constraint, which will eventually lead to the definition of the state space U. The most natural way is to use the standard weak formulation with $$U =H_0^1(\varOmega )$$: \begin{equation*} \big(\nabla u, \nabla w\big)_{L^2\left(\varOmega\right)} +\left(f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = H_0^1(\varOmega), \end{equation*} where ∇ denotes the gradient of a scalar function. Here we use $$H^m(\varOmega )$$ to denote the standard Sobolev spaces of functions on $$\varOmega$$ with associated norm $$\Vert \cdot \Vert _{H^m(\varOmega )}$$. The subspace of functions $$u\in H^1(\varOmega )$$ with vanishing trace is denoted by $$H_0^1(\varOmega )$$. This problem is well studied; see, for example, the book by Tröltzsch (2010). Other options for the state equation are the very weak form in the following sense: $$U = L^2(\varOmega )$$ and \begin{equation*} - \left(u, \varDelta w\right)_{L^2\left(\varOmega\right)} +\left(\,f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = H^2(\varOmega) \cap H_0^1(\varOmega), \end{equation*} and the strong form with $$U = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and \begin{equation*} -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)} +\left(\,f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = L^2(\varOmega). \end{equation*} In each of these different formulations of the optimization problem, only two bilinear forms are involved: the $$L^2$$-inner product (twice in the objective functional and once in the state equation as the second term) and the bilinear form representing the negative Laplacian. In operator notation we use $$M \colon L^2(\varOmega ) \rightarrow (L^2(\varOmega ))^{\prime}$$ for representing the $$L^2$$-inner product, \begin{equation*} \langle M y,z \rangle = \left(y, z\right)_{L^2(\varOmega)} \end{equation*} and $$K \colon U \rightarrow W^{\prime}$$ for representing the bilinear form associated to the negative Laplacian \begin{equation*} \langle K u,w \rangle = \begin{cases} \left(\nabla u, \nabla w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=W = H_0^1(\varOmega), \\ -\left(u, \varDelta w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=L^2(\varOmega),\ W = H^2(\varOmega) \cap H_0^1(\varOmega), \\ -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=H^2(\varOmega) \cap H_0^1(\varOmega),\ W=L^2(\varOmega), \end{cases} \end{equation*} depending on the choice of U. With this notation the state equation reads \begin{equation*} \left\langle K u, w\right\rangle + \left\langle M f, w\right\rangle = 0 \quad \forall\, w\in W . \end{equation*} For discretizing the problem we use the optimize-then-discretize approach. Therefore, we start by introducing the associated Lagrangian functional which reads \begin{equation*} \mathcal{L}\left(u,f,w\right) = \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} +\left\langle Ku, w\right\rangle +\left\langle Mf, w\right\rangle, \end{equation*} with u ∈ U, f ∈ F and the Lagrangian multiplier w ∈ W. From the first-order necessary optimality conditions \begin{equation*} \frac{\partial \mathcal{L}}{\partial u}\left(u,f,w\right)=0, \quad \frac{\partial \mathcal{L}}{\partial f}\left(u,f,w\right)=0, \quad \frac{\partial \mathcal{L}}{\partial w}\left(u,f,w\right)=0, \end{equation*} which are also sufficient here, we obtain the optimality system, which leads to the following problem. Problem 3.1 Find $$\left (u,f,w\right )\in U\times F \times W$$ such that \begin{equation*} \mathcal{A}_\alpha \begin{pmatrix} u \\ f \\ w \end{pmatrix} = \begin{pmatrix} M d\\ 0 \\ 0 \end{pmatrix} \quad \textrm{with} \quad \mathcal{A}_\alpha = \begin{pmatrix} M & 0 & K^{\prime} \\ 0 & \alpha M & M \\ K & M & 0 \end{pmatrix} . \end{equation*} Strictly speaking, the four operators M appearing in $$\mathcal{A}_\alpha$$ are restrictions of the original operator M introduced above on the corresponding spaces U, F and W. The block operator in Problem 3.1 is of the form (2.1) for n = 2 with $$A_1 = \begin{pmatrix} M & 0 \\ 0 & \alpha M \end{pmatrix},\quad A_2 = 0 \quad\textrm{and}\quad B_1 = \begin{pmatrix} K & M \end{pmatrix}.$$ (3.1) We now analyse the three possible choices of U, which were considered above: 1. We start with the weak formulation of the state equation, where $$U =H_0^1(\varOmega )$$, $$W=H_0^1(\varOmega )$$ and $$\left \langle Ku, w\right \rangle = \left (\nabla u, \nabla w\right )_{L^2\left (\varOmega \right )}$$. In this case it is obvious that $$A_1$$ is not positive definite on $$X_1 = U \times F = H_0^1(\varOmega ) \times L^2(\varOmega )$$. So the results of Section 2 do not apply. However, there exist other preconditioners that are $$\alpha$$-robust for this choice of $$U =H_0^1(\varOmega )$$; see the study by Schöberl & Zulehner (2007). 2. Next we examine the very weak form of the state equation, where $$U = L^2(\varOmega )$$, $$W = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and $$\left \langle Ku, v\right \rangle = -\left (u, \varDelta v\right )_{L^2\left (\varOmega \right )}$$. For this choice it is easy to see that $$S_1: U \times F \rightarrow U^{\prime} \times F^{\prime}$$ is positive definite, $$S_2$$ is well defined with $$S_1 = \begin{pmatrix} M & 0 \\ 0 & \alpha M \end{pmatrix}\quad \textrm{and} \quad S_2 = \frac{1}{\alpha} \, M+K M^{-1}K^{\prime}.$$ (3.2) In order to apply the results of Section 2 we are left with showing that $$S_2: W\rightarrow W^{\prime}$$ is positive definite. First, observe that we have the following alternative representation of $$S_2$$. Lemma 3.2 \begin{equation*} S_2 = \frac{1}{\alpha} \, M + B, \end{equation*} where the (biharmonic) operator B is given by $$\left\langle B y, z\right\rangle = \left(\varDelta y, \varDelta z\right)_{L^2\left(\varOmega\right)} \quad \forall\, y, z \in H^2(\varOmega) \cap H_0^1(\varOmega).$$ (3.3) Proof. For $$w \in W = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ we have \begin{equation*} \left\langle K M^{-1} K^{\prime} w, w\right\rangle = \sup_{0 \neq v \in U}\frac{\left\langle K^{\prime} w, v\right\rangle^{2}}{\left\langle M v, v\right\rangle} = \sup_{0 \neq v \in U}\frac{\left(v, -\varDelta w\right)^{2}_{L^2(\varOmega)}}{\left(v, v\right)_{L^2\left(\varOmega\right)}} = \Vert \varDelta w\Vert_{L^2\left(\varOmega\right)}^2, \end{equation*} from which it follows that $$K M^{-1} K^{\prime} = B$$, since both operators are self-adjoint. The second ingredient for showing the positive definiteness of $$S_2$$ is the following result from the books by Grisvard (1992, 2011). Lemma 3.3 Assume that $$\varOmega$$ is a bounded open subset in $$\mathbb{R}^2$$$$(\mathbb{R}^3)$$ with a polygonal (polyhedral) Lipschitz boundary $$\partial \varOmega$$. Then \begin{equation*} \Vert v\Vert_{H^2\left(\varOmega\right)} \leq c \, \Vert \varDelta v\Vert_{L^2\left(\varOmega\right)} \quad \forall\, v \in H^2\left(\varOmega\right)\cap H^1_0\left(\varOmega\right), \end{equation*} for some constant c. Proof. According to Grisvard (2011, Theorem 3.1.1.2) we have \begin{equation*} \int_\varOmega \left|\operatorname{div} q(x)\right|^2 \, \mathrm{d} x = \int_\varOmega \nabla q(x) : \left(\nabla q(x)\right)^\top \, \mathrm{d}x \end{equation*} for all $$q \in H^2(\varOmega )^d$$ with $$q_T := v - (v\cdot n) \, n = 0$$. Applying this identity to q = ∇v with $$v \in H^3(\varOmega )\,\cap\, H_0^1(\varOmega )$$ we obtain \begin{equation*} \int_\varOmega \left|\varDelta v(x)\right|^2 \, \mathrm{d} x = \int_\varOmega \big\|\nabla^2 v(x)\big\|_F^2 \, \mathrm{d} x = \big\|\nabla^2 v\big\|^2_{L^2\left(\varOmega\right)}. \end{equation*} Since $$H^3(\varOmega )\cap H_0^1(\varOmega )$$ is dense in $$H^2(\varOmega )\cap H_0^1(\varOmega )$$ (see Grisvard,,1992, Theorem 1.6.2), it follows that the identity holds for all $$v \in H^2(\varOmega )\cap H_0^1(\varOmega )$$. Friedrichs’s inequality gives \begin{equation*} \Vert v\Vert_{L^2\left(\varOmega\right)} \leq c_F \,\|\nabla v\|_{L^2\left(\varOmega\right)} \quad \forall \, v\in H_0^1(\varOmega). \end{equation*} From Poincare’s inequality applied to ∇v it follows that \begin{equation*} \|\nabla v\|_{L^2\left(\varOmega\right)} \leq c_P \, \big\|\nabla^2 v\big\|_{L^2\left(\varOmega\right)} \quad \forall \, v\in H^2(\varOmega) \cap H_0^1(\varOmega), \end{equation*} since $$\int _\varOmega \nabla v \, \mathrm{d} x = \int _{\partial \varOmega } v \, n \, \mathrm{d} s = 0$$. Combining these inequalities and the equality completes the proof with the constant $$c^2 = c_P^2 (1+c_F^2)$$. From this a priori estimate the required property of $$S_2$$ follows. Lemma 3.4 Under the assumptions of Lemma 3.3 the operator $$S_2: W \rightarrow W^{\prime}$$ given by (3.2) is bounded, self-adjoint and positive definite, where $$W = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Proof. It is obvious that $$S_2$$ is bounded and self-adjoint. Moreover, it follows from Lemma 3.3 that B is positive definite. Since $$S_2 \ge B$$, $$S_2$$ is positive definite too. As a direct consequence from Corollary 2.4, Lemma 3.4 and the results of Section 2 we have the following result. Corollary 3.5 Under the assumptions of Lemma 3.3 the operator $$\mathcal{A}_\alpha$$ in Problem 3.1 is an isomorphism from $$\textbf{X} = L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ to its dual space with respect to the norm in X given by \begin{equation*} \left\Vert \left(u,f,w\right)\right\Vert^2 = \Vert u\Vert^2_U + \Vert f\Vert^2_F + \Vert w\Vert^2_W \end{equation*} with \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert w\Vert^2_{W} = \frac{1}{\alpha}\Vert w\Vert^2_{L^2\left(\varOmega\right)} + \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}. \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\mathcal{A}_\alpha\right) \leq \frac{\cos\left(\frac{\pi}{5}\right)}{\sin\left(\frac{\pi}{10}\right)}\approx 2.62 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} M & & \\ & \alpha \, M & \\ & & \frac{1}{\alpha} M + B \end{pmatrix}. \end{equation*} 3. Finally, we examine the strong form of the state equation, where $$U = H^2(\varOmega ) \cap H_0^1(\varOmega )$$, $$W = L^2(\varOmega )$$ and $$\left \langle Ku, v\right \rangle = -\left (\varDelta u, v\right )_{L^2\left (\varOmega \right )}$$. With the original setting (3.1) we cannot apply the results of Section 2, since $$A_1$$ is not positive definite. To overcome this we change the ordering of the variables and equations and obtain the following equivalent form of the optimality conditions: $$\widetilde{\mathcal{A}}_\alpha \begin{pmatrix} f \\ w \\ u \end{pmatrix} = \begin{pmatrix} 0\\ 0 \\ M d \end{pmatrix} \quad \textrm{with} \quad \widetilde{\mathcal{A}}_\alpha = \begin{pmatrix} \alpha M & M & 0 \\ M & 0 & K \\ 0 & K^{\prime} & M \end{pmatrix} .$$ (3.4) Here we view $$\widetilde{\mathcal{A}}_\alpha$$ as a block operator of the form (2.1) for n = 3 with \begin{equation*} A_1 = \alpha M, \quad A_2 = 0, \quad A_3 = M, \quad B_1 =M \quad \textrm{and} \quad B_2=K^{\prime}. \end{equation*} The corresponding Schur complements are given by \begin{equation*} S_1 = \alpha M, \quad S_2=\frac{1}{\alpha}M \quad \textrm{and} \quad S_3 = M+\alpha K^{\prime} M^{-1}K. \end{equation*} As before we have the following alternative representation of $$S_3$$: $$S_3 = M+\alpha \, B,$$ with the biharmonic operator, given by (3.3). It is obvious that $$S_1$$ and $$S_2$$ are positive definite. We are left with showing that $$S_3$$ is positive definite, which follows from Lemma 3.4, since K and $$S_3$$ in this case are identical to K′ and $$\alpha \, S_2$$ from the previous case. So, finally we obtain the following result analogously to Corollary 3.5. Corollary 3.6 Under the assumptions of Lemma 3.3 the operator $$\widetilde{\mathcal{A}}_\alpha$$ in equation (3.4) is an isomorphism from $$\textbf{X} = L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ to its dual space with respect to the norm in X given by \begin{equation*} \left\Vert \left(f,w,u\right)\right\Vert^2 = \Vert f\Vert^2_F + \Vert w\Vert^2_W + \Vert u\Vert^2_U \end{equation*} with \begin{equation*} \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert w\Vert^2_{W} = \frac1\alpha \Vert w\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert u\Vert^2_{U} = \alpha \Vert u\Vert^2_{L^2\left(\varOmega\right)}+\Vert \varDelta u\Vert^2_{L^2\left(\varOmega\right)} . \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha \widetilde{\mathcal{A}}_\alpha\right) \leq \frac{\cos\left({\pi}/{7}\right)}{\sin\left({\pi}/{14}\right)}\approx4.05 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} \alpha M & & \\ & \frac{1}{\alpha} \, M & \\ & & M+\alpha \, B \end{pmatrix}. \end{equation*} The characteristic properties of the model problem of this subsection are: distributed observation: this refers to the first term in the objective functional, where the state u is compared to the given data on the whole domain $$\varOmega$$; and distributed control: the state u is controlled by f, which is allowed to live on the whole domain $$\varOmega$$. Alternatively, the comparison with given data might be done on a set $$\varOmega _o$$ different from $$\varOmega,$$ which is called limited observation. Similarly, the control might live on a set $$\varOmega _c$$ different from $$\varOmega$$, which is called limited control. In the next two subsections we will see that the results based on the very weak form of the state equation and on the strong form of the state equation can be extended to problems with distributed observation and limited control and to problems with distributed control and limited observation, respectively. For simplicity we will focus on model problems with $$\varOmega _o = \partial \varOmega$$ or $$\varOmega _c = \partial \varOmega$$. 3.2 Distributed observation and limited control We consider the following variation of the model problem from Section 3.1 as a model problem for distributed observation and limited control: find u ∈ U and $$f \in F = L^2(\partial \varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2(\partial\varOmega)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u &= 0\quad \textrm{in} \quad \varOmega,\\ u &= f \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$d\in U = L^2\left (\varOmega \right )$$ and $$\alpha>0$$ are given data. This model problem and error estimates for a finite element discretization are analysed in the study by May et al. (2013) for convex domains $$\varOmega$$. As in May et al. (2013) we consider the very weak form of the state equation: \begin{equation*} \left( u, -\varDelta w\right)_{L^2\left(\varOmega\right)} +\left(f, \partial_{n} w \right)_{L^2(\partial\varOmega)} = 0\quad \forall\, w\in W, \end{equation*} with $$u \in U = L^2(\varOmega )$$ and $$W = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Here $$\partial _n w$$ denotes the normal derivative of w on $$\partial \varOmega$$. Contrary to May et al. (2013) we do not assume that $$\varOmega$$ is convex. See also the study by Berggren (2004) for another version of a very weak formulation, which coincides with the formulation from above for convex domains. Analogously to Section 3.1 the optimality system can be derived, and it reads as follows. Problem 3.7 Find $$\left (f,w,u\right )\in F\times W \times U$$ such that $$\mathcal{A}_\alpha \begin{pmatrix} u \\ f \\ w \end{pmatrix} = \begin{pmatrix} M d \\ 0 \\ 0 \end{pmatrix} \quad \textrm{with}\quad \mathcal{A}_\alpha= \begin{pmatrix} M & 0 & K^{\prime} \\ 0 & \alpha M_{\partial} & N \\ K & N^{\prime} & 0 \end{pmatrix},$$ (3.5) where \begin{align*} \left\langle M y, z\right\rangle &= \left(y, z\right)_{L^2\left(\varOmega\right)}, & \left\langle K u, w\right\rangle &= -\left(u, \varDelta w\right)_{L^2\left(\varOmega\right)}, \\ \left\langle M_\partial f, g\right\rangle &= \left(f, g\right)_{L^2(\partial\varOmega)}, & \left\langle N w, f\right\rangle &= \left(\partial_n w, f\right)_{L^2(\partial\varOmega)}, \end{align*} and $$U = L^2\left (\varOmega \right )$$, $$F = L^2(\partial \varOmega )$$ and $$W=H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Using similar arguments as for Problem 3.1 with the very weak formulation of the state equation we obtain the following result. Corollary 3.8 Under the same assumptions as Lemma 3.3, the operator $$\mathcal{A}_\alpha$$ in Problem 3.7 is an isomorphism between $$\textbf{X}=L^2\left (\varOmega \right ) \times L^2(\partial \varOmega ) \times H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ and its dual space with respect to the norm in X given by \begin{equation*} \Vert \left(u,f,w\right)\Vert^2 = \Vert u\Vert^2_U + \Vert f\Vert^2_F + \Vert w\Vert^2_W \end{equation*} with \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2(\partial\varOmega)},\quad \Vert w\Vert^2_{W} = \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}+ \frac{1}{\alpha}\Vert \partial_{n}w\Vert^2_{L^2(\partial\varOmega)} . \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\mathcal{A}_\alpha\right) \leq \frac{\cos\left({\pi}/{5}\right)}{\sin\left({\pi}/{10}\right)} \approx 2.62 \quad \textrm{with} \quad{\mathcal{S}}_\alpha = \begin{pmatrix} M & & \\ & \alpha \, M_\partial & \\ & & \frac{1}{\alpha}\, K_{\partial} + B \end{pmatrix} \end{equation*} where \begin{equation*} \left\langle K_\partial y, z\right\rangle = \left(\partial_n y, \partial_n z\right)_{L^2(\partial\varOmega)}. \end{equation*} Remark 3.9 So far we have considered only a model problem with Dirichlet boundary control u = f on $$\partial \varOmega$$. A similar result holds for Neumann/Robin boundary control, that is, $$\partial _n u + a u = f$$ on $$\partial \varOmega$$, where a ≥ 0, and distributed observation with the spaces \begin{equation*} U = L^2\left(\varOmega\right), \quad F = L^2(\partial\varOmega), \quad W = \big\lbrace{w\in H^2\left(\varOmega\right) |\, \partial_nw + a w = 0\big\rbrace}, \end{equation*} equipped with the norms \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2(\partial\varOmega)},\quad \Vert w\Vert^2_{W} = \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}+ \frac{1}{\alpha}\Vert w\Vert^2_{L^2(\partial\varOmega)} . \end{equation*} 3.3 Distributed control and limited observation Finally, we consider a model problem with distributed control and limited observation: find u ∈ U and $$f \in F = L^2(\varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert \partial_n u-d\Vert^2_{L^2(\partial\varOmega)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u + f &= 0\quad \textrm{in} \quad \varOmega,\\ u &= 0 \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$d\in L^2(\partial \varOmega )$$, $$\alpha> 0$$ are given data. Robust preconditioners for this problem were first analysed in the study by Mardal et al. (2017). As in Mardal et al. (2017) the strong form of the state equation is used: $$u \in U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ and \begin{equation*} - \left(\varDelta u, w\right)_{L^2\left(\varOmega\right)} + \left(f, w\right)_{L^2\left(\varOmega\right)} = 0 \quad \forall\, w \in W = L^2\left(\varOmega\right). \end{equation*} Following the same procedure as for Problem 3.1 with $$U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ we obtain the (reordered) optimality system. Problem 3.10 Find $$\left (f,w,u\right )\in W\times W \times U$$ such that \begin{equation*} \widetilde{\mathcal{A}}_\alpha \begin{pmatrix} f \\ w \\ u \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ N^{\prime} d \end{pmatrix}\quad \textrm{with} \quad \widetilde{\mathcal{A}}_\alpha= \begin{pmatrix} \alpha M & M & 0 \\ M& 0 & K \\ 0 & K^{\prime} & K_\partial \end{pmatrix}, \end{equation*} where \begin{equation*} \left\langle K u, w\right\rangle = -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)},\qquad \left\langle N^{\prime} d, v\right\rangle = \left\langle N v, d\right\rangle = \left(\partial_n v, d\right)_{L^2(\partial\varOmega)}, \end{equation*} and $$W = F = L^2\left (\varOmega \right )$$, and $$U=H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Using similar arguments as for Problem 3.1 with $$U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ we obtain the following result. Corollary 3.11 Under the same assumptions as Lemma 3.3 the operator $$\widetilde{\mathcal{A}}_\alpha$$ in Problem 3.10 is an isomorphism between $$\textbf{X}= L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ and its dual space, with respect to the norm in X given by \begin{equation*} \Vert \left(f,w,u\right)\Vert^2 = \Vert f\Vert^2_F + \Vert w\Vert^2_W + \Vert u\Vert^2_U \end{equation*} with \begin{equation*} \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)}, \quad \Vert w\Vert^2_{W} = \frac1\alpha \Vert w\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert u\Vert^2_{U} = \Vert \partial_{n}u\Vert^2_{L^2(\partial\varOmega)} +\alpha \Vert \varDelta u\Vert^2_{L^2\left(\varOmega\right)}. \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\widetilde{\mathcal{A}}_\alpha\right) \leq \frac{\cos\left({\pi}/{7}\right)}{\sin\left({\pi}/{14}\right)}\approx4.05 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} \alpha \,M & & \\ & \frac{1}{\alpha} \, M & \\ & & K_{\partial}+\alpha \, B \end{pmatrix}. \end{equation*} Corollary 3.11 with the preconditioner $$\mathcal{S}_\alpha$$ was originally proven in the study by Mardal et al. (2017), which was the main motivation for this article. In Mardal et al. (2017) convexity of $$\varOmega$$ was required. Remark 3.12 Throughout this section we have discussed optimal control problems with the operator K being the Laplace operator. In principle the discussion can be extended to other state operators as long as the corresponding Schur complements are well defined and positive definite. In order to ensure that the first Schur complement is positive definite we need that either the observation or the control is distributed. The challenging part for extending the results is to ensure that the last Schur complement is positive definite. This relies on an a priori estimate of the form \begin{equation*} \|v\|_W \le c\, \|K^{\prime}w\|_{L^2(\varOmega)} \quad \forall \, v \in W, \quad \text{resp.} \quad \|v\|_U \le c\, \|K v\|_{L^2(\varOmega)} \quad \forall \, v \in U, \end{equation*} for problems with distributed observation, resp. distributed control, as it was provided in Lemma 3.3 for K being the Laplace operator. A discussion of such a priori estimates for other state operators is beyond the scope of this paper. 4. Preconditioners for discretized optimality systems So far we have addressed optimality systems only on the continuous level. In this section we discuss the discretization of optimality systems and efficient preconditioners for the discretized problems. We will focus on Problem 3.10. The same approach also applies to Problems 3.1 and 3.7. Let $$U_h$$ and $$W_h$$ be conforming finite-dimensional approximation spaces for Problem 3.10, that is, \begin{equation*} U_h \subset H^2(\varOmega)\cap H_0^1(\varOmega), \quad W_h \subset L^2(\varOmega). \end{equation*} Applying Galerkin’s principle to Problem 3.10 leads to the following problem. Problem 4.1 Find $$(f_h,w_h,u_h) \in W_h \times W_h \times U_h$$ such that $$\widetilde{\mathcal{A}}_{\alpha,h} \begin{pmatrix} \underline{f}_h \\ \underline{w}_h \\ \underline{u}_h \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ \underline{d}_h \end{pmatrix} \quad \textrm{with} \quad \widetilde{\mathcal{A}}_{\alpha,h} = \begin{pmatrix} \alpha M_h & M_h & 0 \\ M_h & 0 & K_h \\ 0 & K_h^T & K_{\partial,h} \end{pmatrix},$$ (4.1) where $$M_h$$, $$K_h$$, $$K_{\partial ,h}$$ are the matrix representations of linear operators M, K, $$K_\partial$$ on $$W_h$$, $$U_h$$, $$U_h$$ relative to chosen bases in these spaces, respectively, and $$\underline{f}_h$$, $$\underline{w}_h$$, $$\underline{u}_h$$, $$\underline{d}_h$$ are the corresponding vector representations of $$f_h$$, $$w_h$$, $$u_h$$, N′d. Motivated by Corollary 3.11 we propose the following preconditioner for (4.1): $$\mathcal{S}_{\alpha,h} = \begin{pmatrix} \alpha M_h & 0 & 0 \\ 0 & \frac{1}{\alpha} M_h & 0\\ 0 & 0 & K_{\partial,h} + \alpha \, B_h \end{pmatrix},$$ (4.2) where $$B_h$$ is given by $$\left\langle B_h \underline{u}_h, \underline{v}_h\right\rangle = \left(\varDelta u_h, \varDelta v_h\right)_{L^2\left(\varOmega\right)} \quad \forall\, u_h, v_h \in U_h.$$ (4.3) The operator $$\mathcal{S}_{\alpha }$$ is self-adjoint and positive definite. Therefore, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ is symmetric and positive definite, since it is obtained by Galerkin’s principle. Moreover, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ is a sparse matrix, provided the underlying bases of $$U_h$$ and $$W_h$$ consist of functions with local support, which we assume from now on. The application of the preconditioner within a preconditioned Krylov subspace method requires solving linear systems of the form $$\mathcal{S}_{\alpha,h} \, \underline{\textbf{w}}_h = \underline{\textbf{r}}_h$$ (4.4) for given vectors $$\underline{\textbf{r}}_h$$. We use sparse direct solvers on the Schur complements to solve the system. Observe that, in general, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ introduced above is different from the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha,h}) = \begin{pmatrix} \alpha M_h & 0 & 0 \\ 0 & \frac{1}{\alpha} M_h & 0\\ 0 & 0 & K_{\partial,h} + \alpha \, K_h^{\top}M^{-1}_h K_h \end{pmatrix},$$ (4.5) as introduced in (2.4). Therefore, in general, the condition number estimates derived in Section 2 do not hold for $$\mathcal{S}_{\alpha ,h}$$. There is one exception from this rule provided by the next lemma, which is due to the study by Mardal et al. (2017). We include the short proof for completeness. Lemma 4.2 Let $$U_h$$ and $$W_h$$ be conforming discretization spaces to Problem 4.1 with $$\varDelta U_h \subset W_h.$$ (4.6) Then we have \begin{equation*} K_h^{\top}M^{-1}_h K_h = B_h. \end{equation*} Proof. We have \begin{align*} \left\langle K_h^{\top}M^{-1}_h K_h \underline{u}_h, \underline{u}_h\right\rangle & = \sup_{0 \neq w_h\in W_h}\frac{\left\langle K_h \underline{u}_h, \underline{w}_h\right\rangle^{2}}{\left\langle M_h\underline{w}_h, \underline{w}_h\right\rangle} = \sup_{0 \neq w_h\in W_h}\frac{\big(-\varDelta u_h, w_h\big)^{2}_{L^2(\partial\varOmega)}}{\left(w_h, w_h\right)_{L^2\left(\varOmega\right)}} \\ & = \Vert \varDelta u_h\Vert_{L^2\left(\varOmega\right)}^2 = \left\langle B_h \underline{u}_h, \underline{u}_h\right\rangle. \end{align*} Since both $$K_h^{\top}M^{-1}_h K_h$$ and $$B_h$$ are symmetric matrices equality follows. So, under the assumptions of Lemma 4.2 we have $$\mathcal{S}_{\alpha ,h} = \mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$, and, therefore, it follows that $$\kappa\big(\mathcal{S}_{\alpha,h}^{-1} \, \widetilde{\mathcal{A}}_{\alpha,h}\big) \le \frac{\cos(\pi/7)}{\sin(\pi/14)} \approx 4.05,$$ (4.7) showing that $$\mathcal{S}_{\alpha ,h}$$ is a robust preconditioner in $$\alpha$$ and in h. Remark 4.3 In case that condition (4.6) does not hold, the matrix $$K_h^{\top}M^{-1}_h K_h$$ must be expected to be dense. This makes the application of the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ computationally too expensive, while $$\mathcal{S}_{\alpha ,h}$$ is always sparse. While $$\mathcal{S}_{\alpha ,h}$$ is always positive definite the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ is symmetric but, in general, only positive semidefinite. However, a simple and mild condition guarantees the definiteness: Lemma 4.4 Let $$U_h$$ and $$W_h$$ be conforming discretization spaces to Problem 4.1 with $$U_h \subset W_h.$$ (4.8) Then the matrix $$K_{\partial ,h} + \alpha K_h^{\top}M^{-1}_h K_h$$ is symmetric and positive definite. Proof. If (4.8) holds then it follows that \begin{equation*} \left\langle K_h^{\top}M^{-1}_h K_h \underline{u}_h, \underline{u}_h\right\rangle = \sup_{0 \neq w_h\in W_h}\frac{\left(-\varDelta u_h, w_h\right)^{2}_{L^2\left(\varOmega\right)}}{\left(w_h, w_h\right)_{L^2\left(\varOmega\right)}} \ge \frac{\left(-\varDelta u_h, u_h\right)^{2}_{L^2\left(\varOmega\right)}}{\left(u_h, u_h\right)_{L^2\left(\varOmega\right)}}, \end{equation*} by choosing $$w_h = u_h \in W_h$$. Therefore, if $$\underline{u}_h$$ is in the kernel of $$K_h^{\top}M^{-1}_h K_h$$ then $$u_h \in U_h \subset H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and $$\left (\nabla u_h, \nabla u_h\right )_{L^2\left (\varOmega \right )} = \left (-\varDelta u_h, u_h\right )^{2}_{L^2\left (\varOmega \right )} = 0$$, which imply $$u_h = 0$$. This lemma shows the importance of condition (4.8), which we will adopt as a condition for our choice of $$U_h$$ and $$W_h$$; see below. Additionally, it allows us to compare the practical preconditioner $$\mathcal{S}_{\alpha ,h}$$ with the theoretical Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$, which would guarantee the derived uniform bound of the condition number but is computationally too costly. Observe that it is required that $$U_h \subset H^2(\varOmega )\cap H_0^1(\varOmega )$$. In order to meet this condition, $$C^1$$ finite element spaces were proposed for $$U_h$$ in the study by Mardal et al. (2017). In particular, the Bogner–Fox–Schmit element on a rectangular mesh was used. Here we advocate instead for spline spaces of sufficiently smooth functions as provided in IgA. For the purpose of this paper we restrict ourselves to a simple version of such approximation spaces, which are briefly described now. Let $$\hat{S}^{p}_{k, \ell }$$ be the space of spline functions on the unit interval (0, 1) which are k-times continuously differentiable and piecewise polynomials of degree p on a mesh of mesh size $$2^{-\ell }$$, which is obtained by ℓ uniform refinements of (0, 1). The value k = −1 is used for discontinuous spline functions. On $$(0,1)^d$$ we use the corresponding tensor-product spline space, which, for simplicity, is again denoted by $$\hat{S}^{p}_{k,\ell }$$. It will be always clear from the context what the actual space dimension d is. It is assumed that the physical domain $$\varOmega$$ can be parametrized by a mapping $$\textbf{F}: (0,1)^d\rightarrow \varOmega$$ with components $$\textbf{F}_i \in \hat{S}^{p}_{k,\ell }$$. The discretization space $$S^{p}_{k,\ell }$$ on the domain $$\varOmega$$ is defined by \begin{equation*} S^{p}_{k,\ell} := \left\lbrace f \circ \textbf{F}^{-1} : f \in \hat{S}^{p}_{k,\ell}\right\rbrace . \end{equation*} All information on this discretization space is summarized by the triple h = (p, k, ℓ). See the monograph by Cottrell et al. (2009) for more details and more sophisticated discretization spaces in IgA. We propose the following approximation spaces of equal order: \begin{equation*} U_h = \big\{ v_h \in S^{p}_{k,\ell} \colon M_{\partial,h} \underline{u}_h = 0 \big\}, \end{equation*} where $$M_{\partial ,h}$$ is the matrix representation of $$M_\partial$$ on $$S^{p}_{k,\ell }$$, and \begin{equation*} W_h = S^{p}_{k,\ell}. \end{equation*} For this setting condition (4.6) is not satisfied and the analysis of the proposed preconditioner is not covered by the results of Section 2. Condition (4.8) is obviously satisfied and we will report on promising numerical results in Section 5. Remark 4.5 Condition (4.6) is rather restrictive. Even if the geometry mapping F is the identity, the smallest tensor-product spline space for $$W_h$$ for which condition (4.6) holds is the space $$S^{p}_{k-2,\ell }$$ if d ≥ 2. This space has a much higher dimension than the choice $$S^{p}_{k,\ell }$$ from above without significantly improved approximation properties. Remark 4.6 A completely analogous discussion can be had for the model problems in Sections 3.1 and 3.2. For example, a sparse preconditioner for the discretized version of Problem 3.7 is given by \begin{equation*} {\mathcal{S}}_{\alpha,h} = \begin{pmatrix} M_h & & \\ & \alpha \, M_{\partial,h} & \\ & & \frac{1}{\alpha}\, K_{\partial,h} + B_h \end{pmatrix}, \end{equation*} motivated by Corollary 3.8. 5. Numerical results In this section we present numerical results for two examples of Problem 3.10. First we consider a two-dimensional example, where the physical domain $$\varOmega$$ is given by its parametrization $$\textbf{F}\colon (0,1)^2 \rightarrow \mathbb{R}^2$$ with \begin{equation*} \textbf{F}(\xi) = \begin{pmatrix} 1+\xi_1 - \xi_1 \xi_2 - \xi_2^2 \\ 2 \xi_1 \xi_2 - \xi_1 \xi_2^2 +2 \xi_2 - \xi_2^2 \end{pmatrix}, \end{equation*} and the prescribed data d are given by $$d(\textbf{x}) = \partial _n (\sin{\left (2\pi x_1\right )} \sin{\left (4\pi x_2\right )})$$ on the boundary $$\partial \varOmega$$. The domain $$\varOmega = \textbf{F}((0,1)^2)$$ is a close approximation of a quarter of an annulus; see Fig. 1. Fig. 1. View largeDownload slide The two-dimensional domain $$\varOmega$$ is an approximation of a quarter of an annulus. Fig. 1. View largeDownload slide The two-dimensional domain $$\varOmega$$ is an approximation of a quarter of an annulus. For a fixed polynomial degree p we choose the following discretization spaces of maximal smoothness k = p − 1: \begin{equation*} U_h = \big\{ v_h \in S^{p}_{p-1,\ell} \colon M_{\partial,h} \underline{u}_h = 0 \big\} \quad \textrm{and} \quad W_h = S^{p}_{p-1,\ell}. \end{equation*} The resulting linear system of equations \begin{equation*} \widetilde{\mathcal{A}}_{\alpha,h} \, \underline{\textbf{x}}_h = \underline{\textbf{b}}_h \end{equation*} is solved by using the preconditioned minimal residual method (MINRES). We will present results for the preconditioner $$\mathcal{S}_{\alpha ,h}$$ (see (4.2)), and for comparison only, for the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ (see (4.5)). The iteration starts with the initial guess 0 and stops when $$\Vert \underline{\textbf{r}}_k\Vert \le \epsilon \, \Vert \textbf{r}_0\Vert \quad \textrm{with} \quad \epsilon = 10^{-8},$$ (5.1) where $$\underline{\textbf{r}}_k:=\underline{\textbf{b}}_h-\widetilde{\mathcal{A}}_{\alpha ,h}\underline{\textbf{x}}_k$$ denotes the residual of the problem at $$\underline{\textbf{x}}_k$$ and ∥⋅∥ is the Euclidean norm. All computations are done with the C++ library G+Smo (Hofreither et al., 2014). For polynomial degree p = 2, Table 1 shows the total number of degrees of freedom (dof) and the number of iterations for different values of the refinement level ℓ and the parameter $$\alpha$$, when using the (more costly) Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$. Table 1 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 Table 1 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 As predicted from the analysis the numbers of iterations are bounded uniformly with respect to $$\alpha$$ and ℓ. Observe that the iteration numbers are lower for $$\alpha = 1$$ and for small $$\alpha$$. Further numerical investigations, partly supported by analytic studies, showed that the eigenvalues of the preconditioned system matrix cluster around (some of) the values given in Theorem 2.6 for large as well as for small values of $$\alpha$$. This might explain why the iteration numbers are lower for $$\alpha = 1$$ (which turn out to be already large in this context) and for small values of $$\alpha$$. Table 2 shows the number of iterations when using the sparse preconditioner $$\mathcal{S}_{\alpha , h}$$. Table 2 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 Table 2 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 The numbers in Table 2 are only slightly larger than the numbers in Table 1 for large $$\alpha$$. For small $$\alpha$$ some additional iterations are required, nevertheless it appears that method is robust with respect to $$\alpha$$ and the refinement level ℓ. As a second example we consider a three-dimensional variant of the two-dimensional example. The physical domain $$\varOmega$$ is obtained by twisting a cylindrical extension of the two-dimensional domain from the first example. The parametrization is given by the geometry map $$\textbf{F}\colon (0,1)^3 \rightarrow \mathbb{R}^3$$ with \begin{equation*} \textbf{F}(\xi) = \begin{pmatrix} \frac32 \xi_1 \xi_2^3 \xi_3 - \xi_1 \xi_2^3 -\frac32 \xi_1 \xi_2^2 \xi_3 + \xi_1 +\frac12 \xi_2^3 \xi_3 +\frac12 \xi_2^3 + \frac32 \xi_2^2 \xi_3 -\frac32 \xi_2^2 + 1\\[6pt] \xi_2\left(\xi_1 \xi_2^2 - 3 \xi_1 \xi_2 + 3\xi_1 -\frac12 \xi_2^2 +\frac32 \right)\\[6pt] -\xi_2^3 \xi_3 +\frac12 \xi_2^3 + \frac32 \xi_2^2 + \xi_3 \end{pmatrix} \end{equation*} and the prescribed data d are given by $$d(\textbf{x}) = \partial _n (\sin{\left (2\pi x_1\right )} \sin{\left (4\pi x_2\right )} \sin{\left (6\pi x_3\right )})$$ on the boundary $$\partial \varOmega$$. For polynomial degree p = 3, Table 3 shows the numbers of iterations for the three-dimensional example (see Fig. 2), using the preconditioner $$\mathcal{S}_{\alpha , h}$$. Table 3 Three-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 View Large Table 3 Three-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 View Large Fig. 2. View largeDownload slide The three-dimensional domain viewed from two different angles. Fig. 2. View largeDownload slide The three-dimensional domain viewed from two different angles. The numbers of iterations for the three-dimensional example are similar to their two-dimensional counterparts. Remark 5.1 The solution strategy proposed in this paper for solving discretized linear optimality systems is a combination of an iterative solver and direct solvers. For the overall system we use a Krylov subspace method. For preconditioning we use sparse direct solvers on the Schur complements $$S_1$$, $$S_2$$ and $$S_3$$ to solve (4.4). One might wonder whether it would be better to use a sparse direct solver directly on the overall system. Note that the matrices $$S_1$$, $$S_2$$ and $$S_3$$ are symmetric and positive definite, whereas the matrix $$\widetilde{\mathcal{A}}_{\alpha ,h}$$ is symmetric but indefinite, which makes a significant difference for sparse direct solvers. Indeed, our numerical experiments showed that our strategy outperforms direct solvers applied to the overall indefinite optimality system for mid-sized problems. For large-sized problems the direct sparse solvers eventually failed due to memory limits. Therefore, we believe our hybrid strategy exploits the advantages of both (iterative and direct) methods very well. However, an attractive alternative is to consider iterative methods, like multigrid methods (see, e.g., Trottenberg et al., 2001), also for solving (4.4). There is ongoing work on multigrid methods for biharmonic problems as they occur in (4.4), which we will address in a forthcoming paper for IgA-based discretization methods. 6. Conclusions Two main results have been shown: new existence results for optimality systems in Hilbert spaces and sharp condition number estimates. Typical applications for the new existence results are model problems from optimal control problems with second-order elliptic state equations. For boundary observation and distributed control the existence of the optimal state in $$H^2(\varOmega )$$ follows for polygonal/polyhedral domains without additional convexity assumptions, although the state equation alone does not guarantee the existence of a solution in $$H^2(\varOmega )$$ if the right-hand side lies in $$L^2(\varOmega )$$. For this class of problems, which initially are seen as classical saddle point problems, it turned out that the reformulation as multiple saddle point problems is beneficial. Similarly, for distributed observation and boundary control the existence of the Lagrangian multiplier w in $$H^2(\varOmega )$$ follows for polygonal/polyhedral domains without convexity assumptions. These new existence results were obtained by replacing the standard weak formulation of the second-order problem by a strong or a very weak formulation depending on the type of optimal control problems. The new sharp condition number estimates for multiple saddle point problems are to be seen as extensions of well-known sharp bounds for standard saddle point problems. The analysis of saddle-point problems in function spaces motivates the construction of sparse preconditioners for discretized optimality systems. The interpretation of standard saddle point problems with primal and dual variables as multiple saddle point problems with possibly more than two types of variables allows the construction of preconditioners based on Schur complements for a wider class of problems. Finally, the required discretization spaces of higher smoothness can be handled with techniques from IgA, which open the door to possible extensions to optimal control problems with other classes of state equations like biharmonic equations. Acknowledgements The authors would like to thank the anonymous referees for their valuable comments and suggestions which helped to improve this manuscript. Funding Austrian Science Fund (S11702-N23). References Beik , F. P. A. & Benzi , M. ( 2017 ) Iterative methods for double saddle point systems . Technical Report . Atlanta, GA : Department of Mathematics and Computer Science, Emory University . Benzi , M. , Golub , G. H. & Liesen , J. ( 2005 ) Numerical solution of saddle point problems . Acta Numer. , 14 , 1 – 137 . Google Scholar CrossRef Search ADS Berggren , M . ( 2004 ) Approximations of very weak solutions to boundary-value problems . SIAM J. Numer. Anal. , 42 , 860 – 877 . Google Scholar CrossRef Search ADS Cottrell , J. A. , Hughes , T. J. R. & Bazilevs , Y. ( 2009 ) Isogeometric Analysis: Toward Integration of CAD and FEA . New Jersey : John Wiley . Google Scholar CrossRef Search ADS Gatica , G. N. , Gatica , L. F. & Stephan , E. P. ( 2007 ) A dual-mixed finite element method for nonlinear incompressible elasticity with mixed boundary conditions . Comput. Methods Appl. Mech. Eng. , 196 , 3348 – 3369 . Google Scholar CrossRef Search ADS Gatica , G. N. & Heuer , N. ( 2000 ) A dual-dual formulation for the coupling of mixed-FEM and BEM in hyperelasticity . SIAM J. Numer. Anal. , 38 , 380 – 400 . Google Scholar CrossRef Search ADS Gatica , G. & Heuer , N. ( 2002 ) Conjugate gradient method for dual-dual mixed formulations . Math. Comput. , 71 , 1455 – 1472 . Google Scholar CrossRef Search ADS Grisvard , P. ( 1992 ) Singularities in Boundary Value Problems . Berlin : Springer . Grisvard , P. ( 2011 ) Elliptic Problems in Nonsmooth Domains . Philadelphia : SIAM . Reprint of the 1985 hardback edn. Google Scholar CrossRef Search ADS Hofreither , C. , Mantzaflaris , A. & Sogn , J. ( 2014 ) G+Smo v0.8 . Available at http://gs.jku.at/gismo. Kuznetsov , Y. A. ( 1995 ) Efficient iterative solvers for elliptic finite element problems on nonmatching grids . Russian J. Numer. Anal. Math. Modelling , 10 , 187 – 212 . Langer , U. , Of , G. , Steinbach , O. & Zulehner , W. ( 2007 ) Inexact data-sparse boundary element tearing and interconnecting methods . SIAM J. Sci. Comput. , 29 , 290 – 314 . Google Scholar CrossRef Search ADS Mardal , K.-A. , Nielsen , B. F. & Nordaas , M. ( 2017 ) Robust preconditioners for PDE-constrained optimization with limited observations . BIT Numer. Math. , 57 , 405 – 431 . Google Scholar CrossRef Search ADS Mardal , K.-A. & Winther , R. ( 2011 ) Preconditioning discretizations of systems of partial differential equations . Numer. Linear Algebra Appl. , 18 , 1 – 40 . Google Scholar CrossRef Search ADS May , S. , Rannacher , R. & Vexler , B. ( 2013 ) Error analysis for a finite element approximation of elliptic Dirichlet boundary control problems . SIAM J. Control Optim. , 51 , 2585 – 2611 . Google Scholar CrossRef Search ADS Murphy , M. F. , Golub , G. H. & Wathen , A. J. ( 2000 ) A note on preconditioning for indefinite linear systems . SIAM J. Sci. Comput. , 21 , 1969 – 1972 . Google Scholar CrossRef Search ADS Pearson , J. W. & Wathen , A. J. ( 2012 ) A new approximation of the Schur complement in preconditioners for PDE-constrained optimization . Numer. Linear Algebra Appl. , 19 , 816 – 829 . Google Scholar CrossRef Search ADS Pestana , J. & Rees , T. ( 2016 ) Null-space preconditioners for saddle point systems . SIAM J. Matrix Anal. Appl. , 37 , 1103 – 1128 . Google Scholar CrossRef Search ADS Rivlin , T. J. ( 1990 ) Chebyshev Polynomials: From Approximation Theory to Algebra and Number Theory , 2nd edn. New Jersey : John Wiley . Schöberl , J. & Zulehner , W. ( 2007 ) Symmetric indefinite preconditioners for saddle point problems with applications to PDE-constrained optimization problems . SIAM J. Matrix Anal. Appl. , 29 , 752 – 773 . Google Scholar CrossRef Search ADS Tröltzsch , F. ( 2010 ) Optimal Control of Partial Differential Equations. Theory, Methods and Applications . Providence, RI : American Mathematical Society . Trottenberg , U. , Oosterlee , C. W. & Schüller , A. ( 2001 ) Multigrid . With guest contributions by Brandt , A. , Oswald , P. & Stüben , K . Orlando : Academic Press . Appendix A. Chebyshev polynomials of second kind are defined by the recurrence relation \begin{equation*} U_{0}(x) = 1, \quad U_{1}(x) = 2x, \quad U_{i+1}(x) = 2xU_{i}(x)-U_{i-1}(x) \quad \textrm{for} \ i \ge 1. \end{equation*} Their closed-form representation is given by $$U_j(\cos \theta) = \frac{\sin\left(\left(j+1\right)\theta\right)}{\sin\left(\theta\right)};$$ (A.1) see the book by Rivlin (1990). It immediately follows that the polynomials $$P_j(x) := U_j(x/2)$$ satisfy the related recurrence relation \begin{equation*} P_{0}(x) = 1, \quad P_{1}(x) = x, \quad P_{i+1}(x) = xP_{i}(x)-P_{i-1}(x) \quad \textrm{for} \ i \ge 1, \end{equation*} which shows that the polynomials $$P_j(x)$$ coincide with the polynomials used in the proof of Theorem 2.2. Analogously, it follows that the polynomials $$\bar{P}_j(x) := P_j(x) - P_{j-1}(x) = U_j(x/2) - U_{j-1}(x/2)$$ satisfy the related recurrence relation \begin{equation*} \bar{P}_{0}(x) = 1, \quad \bar{P}_{1}(x) = x-1, \quad \bar{P}_{i+1}(x) = x\bar{P}_{i}(x)-\bar{P}_{i-1}(x) \quad \textrm{for} \ i \ge 1, \end{equation*} which shows that the polynomials $$\bar{P}_j(x)$$ coincide with the polynomials used in the proofs of Theorems 2.2 and 2.6. In the next lemma properties of the roots of the polynomials $$\bar{P}_j(x)$$ are collected, which were used in these theorems. Lemma A1 1. The roots of the polynomial $$\bar{P}_j$$ are given by \begin{equation*} x^{i}_{j} = 2 \, \cos\left(\frac{2i-1}{2j+1}\pi\right), \quad \textrm{for } i= 1,\ldots,j. \end{equation*} 2. For fixed j the root of largest modulus is $$x_j^1$$. Moreover, \begin{equation*} x_j^1> 1 \quad \textrm{and} \quad \bar{P}_i\big(x_j^1\big) > 0 \quad \forall \, i = 0,1,\ldots,j-1. \end{equation*} 3. For fixed j the root of smallest modulus $$x_j^{\ast }$$ is given by $$x_j^{\ast } = x_j^{i^{\ast }}$$ with $$i^{\ast } = [j/2]+1$$, where [y] denotes the largest integer less than or equal to y. Moreover, \begin{equation*} \big|x_j^{\ast}\big| = 2 \sin \left( \frac{1}{2(2j+1)} \pi \right). \end{equation*} Proof. From (A.1) we obtain \begin{equation*} \bar{P}_j(2\cos \theta) = \frac{1}{\sin \theta} \left(\sin\left(\left(j+1\right)\theta\right)-\sin\left(j\theta\right)\right) = \frac{2\sin(\theta/2)}{\sin \theta} \, \cos\left(\frac{2j+1}{2}\theta\right). \end{equation*} Then the roots of $$\bar{P}_j$$ directly follow from the known zeros $$\frac{2i-1}{2} \pi$$ of $$\cos (x)$$. For fixed j, $$x_j^i$$ is a decreasing sequence in i, for which the rest of the lemma can be deduced by elementary calculations. In the proof of Theorem 2.3 a sequence of matrices $$Q_j$$ is introduced whose spectral norms are needed. It is easy to verify that $$Q^{-1}_{j} = \left(\begin{array}{ccccc} 1 & -1 & & &\\ -1 & 0 & 1 & &\\ & 1 & 0& \ddots&\\ & &\ddots & \ddots & (-1)^{j-1} \\ & & & (-1)^{j-1} & 0 \end{array}\right).$$ (A.2) By Laplace’s formula one sees that the polynomials $$\det (\lambda \, I - Q_n^{-1})$$ satisfy the same recurrence relation as the polynomials $$\bar{P}_j(\lambda )$$, and therefore we have \begin{equation*} \det \big(\lambda \, I - Q_n^{-1}\big) = \bar{P}_j(\lambda). \end{equation*} Hence, with the notation from above it follows that \begin{equation*} \|Q_j\| = \frac{1}{\big|x_j^{\ast}\big|} = \frac{1}{2 \sin \left( \frac{1}{2(2j+1)} \pi \right)}, \end{equation*} which was used for the calculation of $$\|\mathcal{M}^{-1}\|_{\mathcal{L}(\textbf{X},\textbf{X})}$$ in Theorem 2.3. © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Numerical Analysis Oxford University Press

# Schur complement preconditioners for multiple saddle point problems of block tridiagonal form with application to optimization problems

, Volume Advance Article – May 21, 2018
32 pages

Publisher
Oxford University Press
ISSN
0272-4979
eISSN
1464-3642
D.O.I.
10.1093/imanum/dry027
Publisher site
See Article on Publisher Site

### Abstract

Abstract The importance of Schur-complement-based preconditioners is well established for classical saddle point problems in $$\mathbb{R}^N \times \mathbb{R}^M$$. In this paper we extend these results to multiple saddle point problems in Hilbert spaces $$X_1\times X_2 \times \cdots \times X_n$$. For such problems with a block tridiagonal Hessian and a well-defined sequence of associated Schur complements, sharp bounds for the condition number of the problem are derived, which do not depend on the involved operators. These bounds can be expressed in terms of the roots of the difference of two Chebyshev polynomials of the second kind. If applied to specific classes of optimal control problems the abstract analysis leads to new existence results as well as to the construction of efficient preconditioners for the associated discretized optimality systems. 1. Introduction In this paper we discuss the well-posedness of a particular class of saddle point problems in function spaces and the related topic of efficient preconditioning of such problems after discretization. Problems of this class arise as the optimality systems of optimization problems in function spaces with a quadratic objective functional and constrained by linear partial differential equations. Another source for such problems is mixed formulations of elliptic boundary value problems. For numerous applications of saddle point problems we refer to the seminal survey article Benzi et al. (2005). To be more specific we consider saddle point problems of the following form: for a given functional $$\mathcal{L}(x_1,x_2,\ldots ,x_n)$$ defined on a product space $$X_1 \times X_2 \times \cdots \times X_n$$ of Hilbert spaces $$X_i$$ with n ≥ 2 find an n-tuple $$(x_1^{\ast },x_2^{\ast },\ldots ,x_n^{\ast })$$ from this space such that its component $$x_i^{\ast }$$ minimizes $$\mathcal{L}^{[i]}(x_i)$$ for all odd indices i and maximizes $$\mathcal{L}^{[i]}(x_i)$$ for all even indices i, where $$\mathcal{L}^{[i]}(x_i) = \mathcal{L}(x_1^{\ast },\ldots ,x_{i-1}^{\ast },x_i,x_{i+1}^{\ast },\ldots ,x_n^{\ast })$$. Very often the discussion of saddle point problems is restricted to the case n = 2. We will refer to these problems as classical saddle point problems. In this paper we are interested in the general case n ≥ 2. We call such problems multiple saddle point problems. Saddle-point problems with n = 3 and n = 4 are typically addressed in the literature as double (or twofold) and triple (or threefold) saddle point problems, respectively. For notational convenience n-tuples $$(x_1,x_2,\ldots ,x_n) \in X_1 \times X_2 \times \cdots \times X_n$$ are identified with the corresponding column vectors $$\textbf{x}=(x_1,x_2,\ldots ,x_n)^\top$$ from the corresponding space X. We consider only linear problems; that is, we assume that \begin{equation*} \mathcal{L}(\textbf{x}) = \tfrac12 \left\langle \mathcal{A} \textbf{x}, \textbf{x}\right\rangle - \left\langle \textbf{b}, \textbf{x}\right\rangle, \end{equation*} where $$\mathcal{A}$$ is a bounded and self-adjoint linear operator which maps from X to its dual space X′, b ∈ X′, and $$\left \langle ., .\right \rangle$$ denotes the duality product. Observe that $$\mathcal{A}$$ is the (constant) Hessian of $$\mathcal{L}(\textbf{x})$$ and has a natural n-by-n block structure consisting of elements $$\mathcal{A}_{ij}$$ which map from $$X_j$$ to $$X_i^{\prime}$$. Since $$x_i^{\ast }$$ minimizes the quadratic functional $$\mathcal{L}^{[i]}(x_i)$$ for odd indices i and maximizes $$\mathcal{L}^{[i]}(x_i)$$ for even indices i, the Hessian of $$\mathcal{L}^{[i]}(x_i)$$, which is constant and coincides with the diagonal block $$\mathcal{A}_{ii}$$ of $$\mathcal{A}$$, must be positive semidefinite for odd indices i and negative semidefinite for even indices i, according to the necessary second-order optimality conditions. Under this assumption on the diagonal blocks of $$\mathcal{A}$$ the problem of finding a saddle point of $$\mathcal{L}$$ is equivalent to finding a solution $$\textbf{x}^{\ast } \in \textbf{X}$$ of the linear operator equation $$\mathcal{A} \textbf{x} = \textbf{b}.$$ (1.1) Typical examples for the case n = 2 are optimality systems of constrained quadratic optimization problems, where $$\mathcal{L}$$ is the associated Lagrangian, $$x_1$$ is the primal variable and $$x_2$$ is the Lagrangian multiplier associated to the constraint. Optimal control problems viewed as constrained optimization problems fall also into this category with n = 2. However, since in this case the primal variable itself consists of two components, the state variable and the control variable, we can view such problems also as double saddle problems (after some reordering of the variables); see the study by Mardal et al. (2017) and Section 3. Other examples of double and triple saddle point problems result from dual–dual formulations, for example, of elasticity problems; see the studies by Gatica & Heuer (2000), Gatica & Heuer (2002) and Gatica et al. (2007). Double and triple saddle point problems also arise in boundary element tearing and interconnecting methods; see, e.g., the study by Langer et al. (2007). For double saddle point problems from potential fluid flow modeling and liquid crystal modeling, see the report by Beik & Benzi (2017) and the references therein, where further applications can be found. The goal of this paper is to extend well-established results on block diagonal preconditioners for classical saddle point problems in $$\mathbb{R}^N \times \mathbb{R}^M$$ as presented in the studies by Kuznetsov (1995) and Murphy et al. (2000) to multiple saddle point problems in Hilbert spaces. This goal is achieved for operators $$\mathcal{A}$$ of block tridiagonal form, which possess an associated sequence of positive definite Schur complements. We will show for a particular norm built from these Schur complements that the condition number of the operator $$\mathcal{A}$$ is bounded by a constant independent of $$\mathcal{A}$$. So, if $$\mathcal{A}$$ contains any sensitive model parameters (like a regularization parameter) or $$\mathcal{A}$$ depends on some discretization parameters (like the mesh size) the bound of the condition number is independent of these quantities. This, for example, prevents the performance of iterative methods from deteriorating for small regularization parameters or small mesh sizes. Moreover, we will show that the bounds are solely given in terms of the roots of the difference of two Chebyshev polynomials of the second kind and that the bounds are sharp for the discussed class of block tridiagonal operators. The abstract analysis allows us to recover known existence results under less restrictive assumptions. This was the main motivation for extending the analysis to Hilbert spaces. We will exemplify this for optimal control problems with a second-order elliptic state equation, distributed observation and boundary control, as discussed, e.g., in the study by May et al. (2013), and for boundary observation and distributed control, as discussed, e.g., in the study by Mardal et al. (2017). Another outcome of the abstract analysis is the construction of preconditioners for discretized optimality systems which perform well in combination with Krylov subspace methods for solving the linear system. Here we were able to recover known results from Mardal et al. (2017) and extend them to other problems. The article Mardal et al. (2017) has been very influential for the study presented here. As already noticed in Mardal et al. (2017) there is a price to pay for the construction of these efficient preconditioners: for second-order elliptic state equations, discretization spaces of continuously differentiable functions are required, for which we use here technology from Isogeometric Analysis (IgA); see the monograph by Cottrell et al. (2009). Observe that the analysis presented here is valid for any number n ≥ 2 of blocks. There are numerous contributions for preconditioning classical saddle point problems; see the study by Benzi et al. (2005) and the references cited there. See, in particular, among many other contributions, Pearson & Wathen (2012) for Schur-complement-based approaches. For other results on the analysis and the construction of preconditioners for double/twofold and triple/threefold saddle point problems see, e.g., the studies by Gatica & Heuer (2000, 2002), Gatica et al. (2007), Langer et al. (2007), Pestana & Rees (2016) and Beik & Benzi (2017). The paper is organized as follows. The abstract analysis of a class of multiple saddle point problems of block tridiagonal form is given in Section 2. Section 3 deals with the application to particular optimization problems in function spaces. Discretization and efficient realization of the preconditioner are discussed in Section 4. A few numerical experiments are shown in Section 5 for illustrating the abstract results. The paper ends with some conclusions in Section 6 and an appendix, which contains some technical details related to Chebyshev polynomials of the second kind used in the proofs of the abstract results in Section 2. 2. Schur complement preconditioners The following notation is used throughout the paper. Let X and Y be Hilbert spaces with dual spaces X′ and Y′. For a bounded linear operator $$B: X \rightarrow Y^{\prime}$$ its adjoint $$B^{\prime}: Y \rightarrow X^{\prime}$$ is given by \begin{equation*} \langle B^{\prime}y,x\rangle = \langle Bx,y\rangle \quad \forall\, x\in X, \ y \in Y, \end{equation*} where $$\left \langle \cdot , \cdot \right \rangle$$ the denotes the duality product. For a bounded linear operator $$L: X \rightarrow Y$$ its Hilbert space adjoint $$L^{*}: Y \rightarrow X$$ is given by \begin{equation*} \left(L^{*}y, x\right)_{X} = \left(Lx, y\right)_{Y} \quad \forall\,x\in X, \ y \in Y, \end{equation*} where $$\left (\cdot , \cdot \right )_X$$ and $$\left (\cdot , \cdot \right )_Y$$ are the inner products of X and Y with associated norms $$\|\cdot \|_X$$ and $$\|\cdot \|_Y$$, respectively. Let X = U × V with Hilbert spaces U and V. Then its dual X′ can be identified with U′× V′. For a linear operator $$T : U \times V \rightarrow U^{\prime} \times V^{\prime}$$ of a 2-by-2 block form \begin{equation*} T = \begin{pmatrix} A & C \\ B & D \end{pmatrix}, \end{equation*} with an invertible operator $$A : U \rightarrow U^{\prime}$$, its Schur complement $$\operatorname{Schur} T : V \rightarrow V^{\prime}$$ is given by \begin{equation*} \operatorname{Schur} T = D - B A^{-1} C . \end{equation*} With this notation we will now precisely formulate the assumptions on problem (1.1) as already indicated in the introduction. Let $$\textbf{X} = X_1 \times X_2\times \cdots \times X_n$$ with Hilbert spaces $$X_i$$ for i = 1, 2, … , n, and let the linear operator $$\mathcal{A}: \textbf{X} \rightarrow \textbf{X}^{\prime}$$ be of n-by-n block tridiagonal form $$\mathcal{A} = \left(\begin{array}{cccc} A_1 & B_1^{\prime} & & \\ B_1 & -A_2 & \ddots & \\ &\ddots & \ddots & B_{n-1}^{\prime} \\[1ex] & & B_{n-1} & (-1)^{n-1}A_n \end{array}\right),$$ (2.1) where $$A_i:X_i \rightarrow X_i^{\prime}$$, $$B_i: X_i \rightarrow X_{i+1}^{\prime}$$ are bounded operators, and, additionally, $$A_i$$ are self-adjoint and positive semidefinite, that is, $$\langle A_i y_i,x_i \rangle = \langle A_i x_i,y_i \rangle \quad \textrm{and} \quad \langle A_i x_i, x_i \rangle \geq 0 \quad \forall\, x_i,\ y_i \in X_i.$$ (2.2) The basic assumption of the whole paper is now that the operators $$\mathcal{A}_i$$ consisting of the first i rows and columns of $$\mathcal{A}$$ are invertible operators from $$X_1 \times \cdots \times X_i$$ to $$X_1^{\prime} \times \cdots \times X_i^{\prime}$$. That allows one to introduce the linear operators \begin{equation*} S_{i+1} = (-1)^i \, \operatorname{Schur} \mathcal{A}_{i+1} \quad \textrm{for}\enspace i=1,\ldots,n-1, \end{equation*} where, for the definition of the Schur complement, $$\mathcal{A}_{i+1}$$ is interpreted as the 2-by-2 block operator \begin{equation*} \mathcal{A}_{i+1} = \begin{pmatrix} \mathcal{A}_i & \textbf{B}_i^{\prime} \\ \textbf{B}_i & (-1)^i \, A_{i+1} \end{pmatrix}, \quad \textrm{where} \quad \textbf{B}_i = \left(\begin{array}{cccc} \!\!0 & \ldots & 0 & B_i\!\! \end{array}\right) . \end{equation*} It is easy to see that $$S_{i+1} = A_{i+1} + B_i S^{-1}_i B_i^{\prime}, \quad \textrm{for}\enspace i = 1, 2,\ldots,n-1,$$ (2.3) with initial setting $$S_1 = A_1$$. The following basic result holds. Lemma 2.1 Assume that $$A_i:X_i \rightarrow X_i^{\prime}$$, i = 1, 2… , n are bounded operators satisfying (2.2), $$B_i: X_i \rightarrow X_{i+1}^{\prime}$$, i = 1, … , n − 1 are bounded operators and $$S_i$$, i = 1, 2, … , n, given by (2.3), are well defined and positive definite, that is, \begin{equation*} \langle S_i x_i, x_i \rangle \geq \sigma_i \, \|x_i\|_{X_i}^2 \quad \forall\, x_i \in X_i, \end{equation*} for some positive constants $$\sigma _i$$. Then $$\mathcal{A}$$ is an isomorphism from X to X′. Proof. From the lemma of Lax–Milgram it follows that $$S_i$$, i = 1, 2, … , n are invertible, which allows us to derive a block LU-decomposition of $$\mathcal{A}$$ into invertible factors: \begin{equation*} \mathcal{A} = \left(\begin{array}{cccc} I& & & \\ B_1 S_1^{-1} & I && \\ & \ddots & \ddots \\ && (-1)^{n-2} B_{n-1} S_{n-1}^{-1} & I \end{array}\right) \left(\begin{array}{cccc} S_1 & B_1^{\prime} & & \\ & -S_2 & \ddots& \\ & & \ddots & B_{n-1}^{\prime}\\ & & & (-1)^{n-1} \, S_n \end{array}\right) . \end{equation*} So $$\mathcal{A}$$ is a bounded linear operator, which is invertible. Therefore, $$\mathcal{A}$$ is an isomorphism by the open mapping theorem. With a slight abuse of notation we call $$S_i$$ Schur complements, although they are actually positive or negative Schur complements in the literal sense. Under the assumptions made so far we define the Schur complement preconditioner as the block diagonal linear operator $$\mathcal{S}(\mathcal{A}) \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ given by $$\mathcal{S}(\mathcal{A}) = \left(\begin{array}{cccc} S_1 & & & \\ & S_2 & & \\ & & \ddots & \\ & & & S_n \end{array}\right).$$ (2.4) If it is clear from the context which operator $$\mathcal{A}$$ is meant, we will omit the argument $$\mathcal{A}$$ and simply use $$\mathcal{S}$$ for the Schur complement preconditioner. Since $$\mathcal{S}$$ is bounded, self-adjoint and positive definite it induces an inner product on X by \begin{equation*} \left(\textbf{x},\textbf{y}\right)_{\mathcal{S}} = \left\langle \mathcal{S}\textbf{x},\textbf{y}\right\rangle \quad \textrm{for }\, \textbf{x},\textbf{y}\in \textbf{X}, \end{equation*} whose associated norm $$\|\textbf{x}\|_{\mathcal{S}} = \left (\textbf{x},\textbf{x}\right )_{\mathcal{S}}^{1/2}$$ is equivalent to the canonical product norm in X. Note that \begin{equation*} \left(\textbf{x},\textbf{y}\right)_{\mathcal{S}} = \sum_{i=1}^n (x_i,y_i)_{S_i} \quad \textrm{with} \enspace (x_i,y_i)_{S_i} = \langle S_i x_i,y_i\rangle. \end{equation*} Here $$x_i \in X_i$$ and $$y_i \in X_i$$ denote the ith component of x ∈ X and y ∈ X, respectively. From now on we assume that the spaces X and X′ are equipped with the norm $$\Vert .\Vert _{\mathcal{S}}$$ and the associated dual norm, respectively. The question of whether (1.1) is well posed translates to the question of whether $$\mathcal{A} \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ is an isomorphism. The condition number $$\kappa (\mathcal{A})$$, given by \begin{equation*} \kappa\left(\mathcal{A}\right) = \Vert \mathcal{A}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}^{\prime}\right)} \|\mathcal{A}^{-1}\|_{\mathcal{L}\left(\textbf{X}^{\prime},\textbf{X}\right)}, \end{equation*} measures the sensitivity of the solution of (1.1) with respect to data perturbations. Here $$\mathcal{L}(X,Y)$$ denotes the space of bounded linear operators from X to Y, equipped with the standard operator norm. By the Riesz representation theorem the linear operator $$\mathcal{S} \colon \textbf{X} \rightarrow \textbf{X}^{\prime}$$ is an isomorphism from X to X′. Therefore, $$\mathcal{A}$$ is an isomorphism if and only if $$\mathcal{M}\colon \textbf{X} \rightarrow \textbf{X}$$, given by \begin{equation*} \mathcal{M} = \mathcal{S}^{-1}\mathcal{A}, \end{equation*} is an isomorphism. In this context the operator $$\mathcal{S}$$ can be seen as a preconditioner for $$\mathcal{A}$$ and $$\mathcal{M}$$ is the associated preconditioned operator. Moreover, it is easy to see that \begin{equation*} \|\mathcal{A}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}^{\prime}\right)} = \|\mathcal{M}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \end{equation*} and, in the case of well-posedness, \begin{equation*} \kappa\left(\mathcal{A}\right) = \kappa\left(\mathcal{M}\right) \quad \textrm{with} \quad \kappa\left(\mathcal{M}\right) = \Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \|\mathcal{M}^{-1}\|_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)}. \end{equation*} The condition number $$\kappa (\mathcal{M})$$ is of significant influence on the convergence rate of preconditioned Krylov subspace methods for solving (1.1). We will now derive bounds for $$\kappa (\mathcal{M})$$, from which we will simultaneously learn about both the efficiency of the preconditioner $$\mathcal{S}$$ and the well-posedness of (1.1) with respect to the norm $$\Vert .\Vert _{\mathcal{S}}$$. See the study by Mardal & Winther (2011) for more on this topic of operator preconditioning. We start the discussion by observing that $$\mathcal{M} = \left(\begin{array}{cccc} I & C^{*}_1 & & \\ C_1 & -\left(I-C_1C^{*}_1\right) & \ddots & \\ & \ddots& \ddots& C^{*}_{n-1}\\ & & C_{n-1} & (-1)^{n-1}\left(I-C_{n-1}C^{*}_{n-1}\right) \end{array}\right),$$ (2.5) where \begin{equation*} C_{i}=S^{-1}_{i+1}B_i \quad\textrm{for } \, i = 1, 2,\ldots,n-1. \end{equation*} For its Hilbert space adjoint $$C^{*}_{i}$$ with respect to the inner products $$\left (x_i,y_i\right )_{S_i}$$ and $$\left (x_{i+1},y_{i+1}\right )_{S_{i+1}}$$ we have the following representation: \begin{equation*} C^{*}_{i}=S^{-1}_iB_i^{\prime}\quad\textrm{for } \, i = 1, 2,\ldots,n-1. \end{equation*} In the next two theorems we will derive bounds for the norm of $$\mathcal{M}$$ and its inverse. Theorem 2.2 For the operator $$\mathcal{M}$$ the following estimate holds: \begin{equation*} \Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \leq 2\cos\left(\frac{\pi}{2n+1}\right). \end{equation*} Proof. First we note that the norm can be written in the following way: $$\Vert \mathcal{M}\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} = \sup_{0 \neq \textbf{x}\in \textbf{X}} \frac{\vert\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}\vert}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}}.$$ (2.6) We will now estimate the numerator $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$. Let x ∈ X and let $$x_i \in X_i$$ be the ith component of x. Then it follows from (2.5) that \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} = \Vert x_1\Vert_{S_1}^2 + 2\sum^{n-1}_{i=1} \left(C^{*}_{i} x_{i+1}, x_i\right)_{S_i} + \sum^{n-1}_{i=1} (-1)^{i} \left(\left(I-C_{i}C^{*}_{i}\right)x_{i+1}, x_{i+1}\right)_{S_{i+1}} . \end{equation*} By applying Cauchy’s inequality and Young’s inequality we obtain for parameters $$\epsilon _i> 0$$, \begin{align*} 2\left(C^{*}_{i} x_{i+1}, x_i\right)_{S_i} \leq 2\left\Vert C^{*}_{i} x_{i+1}\right\Vert_{S_i} \Vert x_i\Vert_{S_i} &\leq \epsilon_i\left(C^{*}_{i}x_{i+1}, C^{*}_{i}x_{i+1}\right)_{S_i}+ \frac{1}{\epsilon_i} \Vert x_i\Vert_{S_i}^2\\ &= \epsilon_i\left(C_{i}C^{*}_{i}x_{i+1}, x_{i+1}\right)_{S_{i+1}} + \frac{1}{\epsilon_i} \Vert x_i\Vert_{S_i}^2. \end{align*} Therefore, \begin{align*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \Vert x_1\Vert_{S_1}^2 + \sum^{n-1}_{i=1} \frac{1}{\epsilon_i}\Vert x_{i}\Vert_{S_{i}}^2 +\sum^{n-1}_{i=1} \left( \epsilon_i - (-1)^i\right) \, \left(C_{i}C^{*}_{i}x_{i+1}, x_{i+1}\right)_{S_{i+1}}+ \sum^{n-1}_{i=1} (-1)^{i}\Vert x_{i+1}\Vert_{S_{i+1}}^2. \end{align*} Since $$A_{i+1}$$ is positive semidefinite it follows that \begin{align} \left(C_iC^{*}_ix_{i+1}, x_{i+1}\right)_{S_{i+1}} =&\, \left\langle B_{i}S^{-1}_{i}B_{i}^{\prime}x_{i+1}, x_{i+1}\right\rangle \nonumber\\ \leq&\, \left\langle A_{i+1}x_{i+1}, x_{i+1}\right\rangle +\left\langle B_{i}S^{-1}_{i}B_{i}^{\prime}x_{i+1}, x_{i+1}\right\rangle = \left\langle S_{i+1}x_{i+1}, x_{i+1}\right\rangle = \Vert x_{i+1}\Vert_{S_{i+1}}^2. \end{align} (2.7) Now we make an essential assumption on the choice of the parameters $$\epsilon_i$$: $$\epsilon_i \geq 1 \quad \textrm{for} \ i = 1,\ldots,n-1.$$ (2.8) By using (2.8) and (2.7) the estimate for $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$ from above simplifies to \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \Vert x_1\Vert_{S_1}^2 + \sum^{n-1}_{i=1}\frac{1}{\epsilon_i} \Vert x_{i}\Vert_{S_{i}}^2 +\sum^{n-1}_{i=1} \epsilon_i \Vert x_{i+1}\Vert_{S_{i+1}}^2 = \textbf{y}^\top D_u \, \textbf{y}, \end{equation*} where \begin{equation*} D_u = \left(\begin{array}{ccccc} 1+\frac{1}{\epsilon_1} & & & &\\ & \epsilon_1 +\frac{1}{\epsilon_2}& & &\\ & & \ddots & & \\ & & & \epsilon_{n-2} +\frac{1}{\epsilon_{n-1}} & \\ & & & & \epsilon_{n-1} \end{array}\right) \quad \textrm{and} \quad \textbf{y} = \left(\begin{array}{c} \|x_1\|_{S_1} \\ \|x_2\|_{S_2} \\ \vdots \\ \|x_n\|_{S_i} \end{array}\right). \end{equation*} Next we choose $$\epsilon_1, \ldots , \epsilon_{n-1}$$ such that the diagonal elements in $$D_u$$ are all equal, that is, \begin{equation*} 1 +\frac{1}{\epsilon_1} = \epsilon_1 +\frac{1}{\epsilon_2} = \cdots = \epsilon_{n-2} +\frac{1}{\epsilon_{n-1}} = \epsilon_{n-1}. \end{equation*} We can successively eliminate $$\epsilon_1, \ldots , \epsilon_{n-2}$$ from these equations and obtain \begin{equation*} 1 = \epsilon_{n-1} -\cfrac{1}{\epsilon_1} = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_2} } = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_3}} } = \cdots , \end{equation*} which eventually leads to the following equation for $$\epsilon_{n-1}$$: $$1 = \epsilon_{n-1} -\cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \cfrac{1}{\epsilon_{n-1} - \ddots \genfrac{}{}{0pt}{}{}{\cfrac{1}{\epsilon_{n-1}}} }}} \ .$$ (2.9) The right-hand side of (2.9) is a continued fraction of depth n−1. It can easily be shown that this continued fraction is a rational function in $$\epsilon_{n-1}$$ of the form $$P_n(\epsilon_{n-1})/P_{n-1}(\epsilon_{n-1})$$, where $$P_j(\epsilon )$$ are polynomials of degree j, recursively given by \begin{equation*} P_0 (\epsilon) = 1, \quad P_1(\epsilon) = \epsilon \quad \textrm{and} \quad P_{i+1}(\epsilon) = \epsilon \, P_i(\epsilon)-P_{i-1}(\epsilon) \quad \textrm{for } i \geq 1. \end{equation*} Therefore, (2.9) becomes $$1 = P_n(\epsilon_{n-1})/P_{n-1}(\epsilon_{n-1})$$ or, equivalently, \begin{equation*} \bar{P}_n(\epsilon_{n-1}) = 0 \quad \textrm{with} \quad \bar{P}_j(\epsilon) = P_j(\epsilon) - P_{j-1}(\epsilon). \end{equation*} For the other parameters $$\epsilon_1,\ldots ,\epsilon_{n-2}$$ it follows that \begin{equation*} \epsilon_{n-i} = \frac{P_{i}(\epsilon_{n-1})}{P_{i-1}(\epsilon_{n-1}) } \quad \textrm{for}\ \ i = 2,\ldots,n-1. \end{equation*} With this setting of the parameters the basic assumption (2.8) is equivalent to the following conditions: $$\bar{P}_i(\epsilon_{n-1}) \geq 0 \quad \textrm{and} \quad \epsilon_{n-1} \geq 1.$$ (2.10) To summarize, the parameter $$\epsilon_{n-1}$$ must be a root of $$\bar{P}_n$$ satisfying (2.10). In the appendix it will be shown that \begin{equation*} \epsilon_{n-1} = 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which is the largest root of $$\bar{P}_n$$, is an appropriate choice. Hence, \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \leq \textbf{y}^\top D_u \, \textbf{y} = 2\cos\left(\frac{\pi}{2n+1}\right)\left(\textbf{x}, \textbf{x}\right)_{\mathcal{S}}, \end{equation*} and therefore \begin{equation*} \frac{\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}} \leq 2\cos\left(\frac{\pi}{2n+1}\right) . \end{equation*} Following the same line of arguments a lower bound of $$\left (\mathcal{M}\textbf{x},\textbf{x}\right )_{\mathcal{S}}$$ can be derived: \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \geq \textbf{y}^\top D_l \, \textbf{y}, \end{equation*} where \begin{equation*} D_l = \left(\begin{array}{ccccc} 1-\frac{1}{\epsilon_1} & & & &\\ & -\epsilon_1 -\frac{1}{\epsilon_2}& & &\\ & & \ddots & & \\ & & & -\epsilon_{n-2} -\frac{1}{\epsilon_{n-1}} & \\ & & & & -\epsilon_{n-1} \end{array}\right), \end{equation*} with the same values for $$\epsilon_i$$ as before. From comparing $$D_u$$ and $$D_l$$ it follows that the diagonal elements of $$D_l$$ are equal to $$-2\cos (\pi /(2n+1))$$, except for the first element, which has the larger value $$2-2\cos \left (\pi /(2n+1)\right )$$. This directly implies \begin{equation*} \left(\mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}} \geq \left(D_l\textbf{x}, \textbf{x}\right)_{\mathcal{S}}\geq -2\cos\left(\frac{\pi}{2n+1}\right)\left(\textbf{x}, \textbf{x}\right)_{\mathcal{S}}, \end{equation*} and therefore \begin{equation*} - 2\cos\left(\frac{\pi}{2n+1}\right) \le \frac{\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}}. \end{equation*} To summarize, we have shown that \begin{equation*} \frac{\vert\left( \mathcal{M}\textbf{x},\textbf{x}\right)_{\mathcal{S}}\vert}{\left(\textbf{x},\textbf{x}\right)_{\mathcal{S}}} \leq 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which completes the proof using (2.6). For investigating the inverse operator $$\mathcal{M}^{-1}$$ notice first that $$\mathcal{M} = \mathcal{M}_n$$, where $$\mathcal{M}_j$$, j = 1, 2, … , n are recursively given by \begin{equation*} \mathcal{M}_1 = I, \quad \mathcal{M}_{i+1} = \left(\begin{array}{ccc} \mathcal{M}_i & \textbf{C}^{*}_i \\ \textbf{C}_i & (-1)^i \, \big(I - C_i C_i^{\ast}\big) \end{array}\right) \quad \textrm{with} \ \textbf{C}_i = \left(0\quad \ldots\quad 0\quad C_i \right) \quad \textrm{for}\ \ i \geq 1. \end{equation*} Under the assumptions of Lemma 2.1 one can show by elementary calculations that $$\mathcal{M}_j^{-1}$$, j = 1, 2, … , n exist and satisfy the following recurrence relation: $$\mathcal{M}^{-1}_1 = I, \quad \mathcal{M}^{-1}_{i+1} = \begin{pmatrix} \mathcal{M}^{-1}_i & 0 \\ 0 & 0 \end{pmatrix} + (-1)^i \begin{pmatrix} \mathcal{M}^{-1}_i\textbf{C}^{*}_i\textbf{C}_i \mathcal{M}^{-1}_i & - \mathcal{M}^{-1}_i\textbf{C}^{*}_i \\ - \textbf{C}_i\mathcal{M}^{-1}_i & I \end{pmatrix} \quad \textrm{for}\ \ i \geq 1.$$ (2.11) Theorem 2.3 The operator $$\mathcal{M}$$ is invertible and we have \begin{equation*} \big\Vert \mathcal{M}^{-1}\big\Vert_{\mathcal{L}\left(\textbf{X},\textbf{X}\right)} \leq \frac{1}{2\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)} . \end{equation*} Proof. Let $$\textbf{x} \in \textbf{X} = X_1\times \cdots \times X_n$$ with components $$x_i \in X_i$$. The restriction of x to its first j components is denoted by $$\textbf{x}_j \in X_1\times \cdots \times X_j$$. The corresponding restriction of $$\mathcal{S}$$ to its first j components is denoted by $$\mathcal{S}_j$$. From (2.11) we obtain \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} = \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} + (-1)^i \, \left\|\textbf{C}_i\mathcal{M}^{-1}_i \textbf{x}_i - x_{i+1}\right\|_{S_{i+1}}^2, \end{equation*} which implies that $$\left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \begin{cases} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} & \textrm{for odd}\ \ i, \\[4pt] \ge \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} & \textrm{for even}\ \ i . \end{cases}$$ (2.12) For estimates in the opposite direction observe that (2.11) also implies that \begin{aligned} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} =&\, \left(\left[\mathcal{M}_i^{-1} + (-1)^i \, \mathcal{M}^{-1}_i\textbf{C}^{*}_i\textbf{C}_i \mathcal{M}^{-1}_i\right] \, \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \\ &+ (-1)^i \, \left[ \Vert x_{i+1}\Vert_{S_{i+1}}^2 - 2 \, \left(\textbf{C}_i\mathcal{M}^{-1}_i \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} \right]. \end{aligned} (2.13) For the first term on the right-hand side of (2.13) observe that \begin{equation*} \mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i = \begin{pmatrix} \mathcal{M}_{i-1}^{-1} & 0 \\[2pt] 0 & 0 \end{pmatrix} + (-1)^{i-1} \, \mathcal{P}_{i} \quad \textrm{for} \ i> 1 \end{equation*} with \begin{equation*} \mathcal{P}_i = \begin{pmatrix} \mathcal{M}_{i-1}^{-1} \textbf{C}_{i-1}^{\ast} \big(I -C_i^{\ast} \, C_i\big) \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & -\mathcal{M}_{i-1}^{-1} \textbf{C}_{i-1}^{\ast} \big(I -C_i^{\ast} \, C_i\big)\\[4pt] - \big(I -C_i^{\ast} \, C_i\big) \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & \big(I -C_i^{\ast} \, C_i\big) \end{pmatrix}, \end{equation*} which easily follows by using (2.11) with i replaced by i − 1. The operator $$\mathcal{P}_i$$ is positive semidefinite, since \begin{equation*} ( \mathcal{P}_i \textbf{x}_i, \textbf{x}_i )_{\mathcal{S}_i} = \left(\left[I -C_i^{\ast} \, C_i\right]\left(\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1} - x_i\right), \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1} - x_i\right)_{S_i} \end{equation*} and $$I -C_i^{\ast } \, C_i$$ is positive semidefinite. The positive semidefiniteness of $$I -C_i^{\ast } \, C_i$$ is a consequence of (2.7), which can also be written as \begin{equation*} \left\|C_i^{\ast} x_{i+1}\right\|_{S_i} \le \|x_{i+1}\|_{S_{i+1}} \quad \forall \, x_{i+1} \in X_{i+1}, \end{equation*} that is $$\|C_i^{\ast }\|_{L(X_{i+1},X_i)} \le 1$$. Since $$\|C_i\|_{L(X_i,X_{i+1})} = \|C_i^{\ast }\|_{L(X_{i+1},X_i)}$$ we have $$\|C_i\|_{L(X_i,X_{i+1})} \le 1$$, that is $$\|C_i x_i\|_{S_{i+1}} \le \|x_i\|_{S_i} \quad \forall \, x_i \in X_i,$$ (2.14) or, equivalently, $$I -C_i^{\ast } \, C_i$$ is positive semidefinite. Therefore, \begin{equation*} \left(\left[\mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i\right] \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \begin{cases} \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} & \textrm{for odd} \ i {}> 1, \\[6pt] \le \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} & \textrm{for even} \ i. \end{cases} \end{equation*} Then it follows from (2.13) that for odd i > 1, \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} + 2 \left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} - \|x_{i+1}\|_{S_{i+1}}^2 \end{equation*} and for even i, \begin{equation*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} - 2 \left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}} + \|x_{i+1}\|_{S_{i+1}}^2 . \end{equation*} In order to estimate $$(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1})_{S_{i+1}}$$ observe that \begin{equation*} \textbf{C}_i \mathcal{M}_i^{-1} = -(-1)^i \, C_i \, \begin{pmatrix} -\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} & I \end{pmatrix} \quad \forall \, i> 1, \end{equation*} which is obtained from (2.11) with i replaced by i − 1 by multiplying with $$\textbf{C}_i$$ from the left. By using (2.14) it follows that \begin{align*} \big\|\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i\big\|_{S_{i+1}} & \le \left\| C_i \textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1}\right\|_{S_{i+1}} + \|C_i x_i\|_{S_{i+1}} \\ & \le \left\|\textbf{C}_{i-1} \mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1}\right\|_{S_i} + \|x_i\|_{S_i} \quad \forall \, i> 1, \end{align*} which recursively applied eventually leads to \begin{align*} \left\|\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i\right\|_{S_{i+1}} &\le \left\|\textbf{C}_1 \mathcal{M}_1^{-1} \textbf{x}_1\right\|_{S_{2}}+ \sum_{j=2}^i \|x_j\|_{S_j}\\ &= \|C_1 x_1\|_{S_{2}}+ \sum_{j=2}^i \|x_j\|_{S_j} \le \sum_{j=1}^i \|x_j\|_{S_j} \quad \forall \, i \ge 1, \end{align*} using (2.14). Hence, \begin{equation*} \left|\left(\textbf{C}_i \mathcal{M}_i^{-1} \textbf{x}_i,x_{i+1}\right)_{S_{i+1}}\right| \le \sum_{j=1}^i \|x_j\|_{S_j} \, \|x_{i+1}\|_{S_{i+1}} \quad \textrm{for} \ i \ge 1. \end{equation*} Using this estimate we obtain for odd i > 1, \begin{align*} \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} & \ge \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} - 2 \sum_{j=1}^i \|x_j\|_{S_j} \, \|x_{i+1}\|_{S_{i+1}} - \|x_{i+1}\|_{S_{i+1}}^2 \\ & = \left(\mathcal{M}_{i-1}^{-1} \textbf{x}_{i-1},\textbf{x}_{i-1}\right)_{\mathcal{S}_{i-1}} + y_{i+1}^\top L_{i+1} \, y_{+1}, \end{align*} where $$y_j= (\Vert x_1\Vert _{S_1},\Vert x_2\Vert _{S_2},\ldots ,\Vert x_j\Vert _{S_j})^\top$$ and $$L_{i+1}$$ is the $$(i+1)\times (i+1)$$ matrix whose only nonzero entries are −1 in the last row and last column. For the leftover case i = 1 we have \begin{equation*} \left(\left[\mathcal{M}^{-1}_i + (-1)^i \, \mathcal{M}^{-1}_i \textbf{C}^{*}_i \textbf{C}_i \mathcal{M}^{-1}_i\right] \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} = \left(\left[I - C_1^{\ast} C_1\right] x_1,x_1\right)_{S_1} \ge 0. \end{equation*} Then it follows directly from (2.13) that \begin{align*} \left(\mathcal{M}_2^{-1} \textbf{x}_2, \textbf{x}_2\right)_{\mathcal{S}_2} & \ge 2 \, \left(\textbf{C}_1\mathcal{M}^{-1}_1 \textbf{x}_1,x_2\right)_{S_2} - \Vert x_2\Vert_{S_2}^2 \\ & = 2 \, \left(\textbf{C}_1\mathcal{M}^{-1}_1 \textbf{x}_1,x_2\right)_{S_2} - \Vert x_2\Vert_{S_2}^2 \\ & \ge - 2 \, \|x_1\|_{S-1} \, \|x_2\|_{S_2} - \Vert x_2\Vert_{S_2}^2 = \textbf{y}_2^\top L_2 \textbf{y}_2 . \end{align*} Applying these estimates recursively eventually leads to \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \ge y_j^\top Q_j \, y_j, \end{equation*} where $$Q_j$$, j = 2, 4, 6, … are given by the recurrence relation \begin{equation*} Q_2 = \begin{pmatrix} 0 & -1 \\ -1 & -1\end{pmatrix}, \quad Q_{i+2} = \left(\begin{array}{@{}c|cc@{}} Q_{i} & \begin{matrix} 0 \\ \vdots \end{matrix} & \begin{matrix} -1 \\ \vdots \end{matrix} \\ \hline \begin{matrix} 0 & \cdots \end{matrix} & 0 & -1\\ \begin{matrix} -1 & \cdots \end{matrix} & -1 & -1 \end{array}\right) \quad \textrm{for} \ i = 2,4,\ldots . \end{equation*} Therefore, \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \ge - \|Q_j\| \, \left(\textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \quad \textrm{for even} \ j, \end{equation*} where $$\|Q_j\|$$ denotes the spectral norm of $$Q_j$$. It follows analogously that \begin{equation*} \left(\mathcal{M}_j^{-1} \textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \le \|Q_j\| \, \left(\textbf{x}_j, \textbf{x}_j\right)_{\mathcal{S}_j} \quad \textrm{for odd} \ j, \end{equation*} where $$Q_j$$, j = 1, 3, 5, … are given by the recurrence relation \begin{equation*} Q_1 = 1, \quad Q_{i+2} = \left(\begin{array}{@{}c|cc@{}} Q_{i} & \begin{matrix} 0 \\ \vdots \end{matrix} & \begin{matrix} 1 \\ \vdots \end{matrix} \\ \hline \begin{matrix} 0 & \cdots \end{matrix} & 0 & 1\\ \begin{matrix} 1 & \cdots \end{matrix} & 1 & 1 \end{array}\right) \quad \textrm{for} \ i = 1,3,\ldots \end{equation*} Together with (2.12) it follows for odd i that \begin{equation*} - \|Q_{i+1}\| \, \left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \le \|Q_i\| \, \left(\textbf{x}_i, \textbf{x}_i\right)_{\mathcal{S}_i}, \end{equation*} and for even i that \begin{equation*} - \|Q_i\| \, \left(\textbf{x}_i, \textbf{x}_i\right)_{\mathcal{S}_i} \le \left(\mathcal{M}_i^{-1} \textbf{x}_i,\textbf{x}_i\right)_{\mathcal{S}_i} \le \left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}} \le \|Q_{i+1}\| \, \left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}. \end{equation*} So in both cases we obtain \begin{equation*} \frac{\left|\left(\mathcal{M}_{i+1}^{-1} \textbf{x}_{i+1},\textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}\right|} {\left(\textbf{x}_{i+1}, \textbf{x}_{i+1}\right)_{\mathcal{S}_{i+1}}} \le \max\left(\|Q_i\|,\|Q_{i+1}\|\right) \quad \forall\, \textbf{x}_{i+1} \neq 0. \end{equation*} Since \begin{equation*} \|Q_j\| = \frac{1}{2\sin\left(\frac{\pi}{2\left(2 j+1\right)}\right)} \end{equation*} (see the appendix), the proof is completed. A direct consequence of Theorems 2.2 and 2.3 is the following corollary. Corollary 2.4 Under the assumptions of Lemma 2.1 the block operators $$\mathcal{A}$$ and $$\mathcal{M}$$, given by (2.1) and (2.5), are isomorphisms from X to X′ and from X to X, respectively. Moreover, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{A}\right)= \kappa\left(\mathcal{M}\right) \leq \frac{\cos\left(\frac{\pi}{2n+1}\right)}{\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)} . \end{equation*} Remark 2.5 For n = 2 we have $$2 \sin \left( \frac{\pi}{10} \right) = \frac{1}{2}\big(\sqrt{5}-1\big) \quad \textrm{and} \quad 2 \cos \left( \frac{\pi}{5} \right) = \frac{1}{2}\big(\sqrt{5}+1\big),$$ and therefore \begin{equation*} \kappa\left(\mathcal{M}\right) \le \frac{\sqrt{5}+1}{\sqrt{5}-1} = \frac{3+\sqrt{5}}{2}. \end{equation*} This result is well known for finite-dimensional spaces; see the studies by Kuznetsov (1995) and Murphy et al. (2000). In Kuznetsov (1995) and Murphy et al. (2000) it was also shown for the case n = 2 and $$A_2 = 0$$ that $$\mathcal{M}$$ has only three eigenvalues: \begin{equation*} \left\{-\frac{1}{2}\big(\sqrt{5}-1\big), 1, \frac{1}{2}\big(\sqrt{5}+1\big) \right\}. \end{equation*} This result can also be extended for n ≥ 2 and for general Hilbert spaces. Theorem 2.6 If the assumptions of Lemma 2.1 hold and if, additionally, $$A_i=0$$ for i = 2, … , n, then the set $$\sigma _p(\mathcal{M})$$ of all eigenvalues of $$\mathcal{M}$$ is given by \begin{equation*} \sigma_p(\mathcal{M}) = \left\lbrace 2\cos\left(\frac{2i-1}{2j+1}\pi\right) \colon j = 1,\ldots,n, \ i = 1,\ldots, j \right\rbrace. \end{equation*} Moreover, \begin{equation*} \|\mathcal{M}\|_{\mathcal{L}(\textbf{X},\textbf{X})} = \max \left\{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \right\} = 2\cos\left(\frac{\pi}{2n+1}\right) \end{equation*} and \begin{equation*} \big\|\mathcal{M}^{-1}\big\|_{\mathcal{L}(\textbf{X},\textbf{X})} = \max \left\{\frac{1}{|\lambda|} : \lambda \in \sigma_p(\mathcal{M}) \right\} = \frac{1}{2\sin\left(\frac{\pi}{2\left(2n+1\right)}\right)}. \end{equation*} So, equality is attained in the estimates of Theorems 2.2, 2.3 and Corollary 2.4. All estimates are sharp. Proof. Since $$A_i=0$$ for i = 2, … , n it follows that $$C_iC^{*}_i = I$$ and the block operator $$\mathcal{M}$$ simplifies to \begin{equation*} \mathcal{M} = \left(\begin{array}{cccc} I & C^{*}_1 & &\\ C_1 & 0 & \ddots &\\ & \ddots& \ddots& C^{*}_{n-1}\\ & & C_{n-1} & 0 \end{array}\right). \end{equation*} The eigenvalue problem $$\mathcal{M} \textbf{x} = \lambda \, \textbf{x}$$ reads, in detail, \begin{align*} x_1 +C^{*}_{1}x_2 &= \lambda x_1,\\ C_1x_1 +C^{*}_{2}x_3 &= \lambda x_2,\\ &\vdots\\ C_{n-2}x_{n-2} +C^{*}_{n-1}x_{n} &= \lambda x_{n-1},\\ C_{n-1}x_{n-1} &= \lambda x_{n}. \end{align*} From the first equation \begin{equation*} C^{*}_{1} x_2 = \bar{P}_1(\lambda) \, x_1, \quad \textrm{where} \quad \bar{P}_1(\lambda) = \lambda - 1, \end{equation*} we conclude that the root $$\lambda _{11} = 1$$ of $$\bar{P}_1(\lambda )$$ is an eigenvalue by setting $$x_2 = x_3 = \cdots = x_n = 0$$. If $$\lambda \neq \lambda _{11}$$ then we obtain from the second equation, by eliminating $$x_1$$, \begin{equation*} C^{*}_{2}x_3 = C_1 x_1 - \lambda \, x_2 = \frac{1}{\bar{P}_1(\lambda)} \, C_1 C_1^{\ast} x_2 - \lambda \, x_2 = \frac{1}{\bar{P}_1(\lambda)} \, x_2 - \lambda \, x_2 = R_2(\lambda) \, x_2, \end{equation*} where \begin{equation*} R_2(\lambda) = \lambda -\frac{1}{\bar{P}_1(\lambda)} = \frac{\bar{P}_2(\lambda)}{\bar{P}_1(\lambda)} \quad \textrm{with} \quad \bar{P}_2(\lambda) = \lambda \, \bar{P}_1(\lambda) -1. \end{equation*} We conclude that the two roots $$\lambda _{21}$$ and $$\lambda _{22}$$ of the polynomial $$\bar{P}_2(\lambda )$$ of degree 2 are eigenvalues by setting $$x_3 = \cdots = x_n = 0$$. Repeating this procedure gives \begin{equation*} C^{*}_{j}x_{j+1} = R_j(\lambda) \, x_j, \textrm{ for } j=2,\ldots,n-1, \textrm{ and } 0 = R_n(\lambda) \, x_n \textrm{ with } R_j(\lambda) = \frac{\bar{P}_j(\lambda)}{\bar{P}_{j-1}(\lambda)}, \end{equation*} where the polynomials $$\bar{P}_j(\lambda )$$ are recursively given by \begin{equation*} \bar{P}_0(\lambda) = 1,\quad \bar{P}_1(\lambda) = \lambda - 1, \quad \bar{P}_{i+1}(\lambda) = \lambda \, \bar{P}_i(\lambda) - \bar{P}_{i-1}(\lambda) \quad \textrm{for}\ \, i \ge 1. \end{equation*} So the eigenvalues of $$\mathcal{M}$$ are the roots of the polynomials $$\bar{P}_1\left (\lambda \right ),\bar{P}_2(\lambda ),\ldots ,\bar{P}_n\left (\lambda \right )$$. For the roots of $$\bar{P}_j\left (\lambda \right )$$ we obtain \begin{equation*} \lambda = 2 \,\cos\left(\frac{2i-1}{2j+1}\pi\right) \quad \textrm{for } i= 1,\ldots,j; \end{equation*} see Lemma A1. It is easy to see that \begin{equation*} \max \{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \} = 2\cos\left(\frac{\pi}{2n+1}\right). \end{equation*} Therefore, with Theorem 2.2 it follows that \begin{equation*} 2\cos\left(\frac{\pi}{2n+1}\right) \ge \|\mathcal{M}\|_{\mathcal{L}(\textbf{X},\textbf{X})} \ge \max \left\{|\lambda| : \lambda \in \sigma_p(\mathcal{M}) \right\} = 2\cos\left(\frac{\pi}{2n+1}\right), \end{equation*} which implies equality. An analogous argument applies to $$\mathcal{M}^{-1}$$. 3. Application: optimization problems in function space In this section we apply the theory from Section 2 to optimization problems in function spaces with an elliptic partial differential equation as constraint. First, we present a standard model problem. Then we look at two more challenging variations of the model problem in Sections 3.2 and 3.3. 3.1 Distributed observation and distributed control We start with the following model problem: find u ∈ U and $$f \in F = L^2(\varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u + f &= 0\quad \textrm{in} \quad \varOmega,\\ u &= 0 \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$\varOmega$$ is a bounded open subset of $$\mathbb{R}^d$$ with Lipschitz boundary $$\partial \varOmega$$, $$\varDelta$$ is the Laplacian operator, $$d\in L^2\left (\varOmega \right )$$, $$\alpha> 0$$ are given data and $$U \subset L^2\left (\varOmega \right )$$ is a Hilbert space. Here $$L^2(\varOmega )$$ denotes the standard Lebesgue space of square-integrable functions on $$\varOmega$$ with inner product $$\left (\cdot , \cdot \right )_{L^2(\varOmega )}$$ and norm $$\Vert \cdot \Vert _{L^2(\varOmega )}$$. This problem can be seen either as an inverse problem for identifying f from the data d or as an optimal control problem with state u, control f and the desired state d. In the first case the parameter $$\alpha$$ is a regularization parameter, and in the second case, a cost parameter. Throughout this paper we adopt the terminology of an optimal control problem and call U the state space and F the control space. We discuss now the construction of preconditioners for the associated optimality system such that the condition number of the preconditioned system is bounded independently of $$\alpha$$. We will call such preconditioners $$\alpha$$-robust. This is of particular interest in the context of inverse problems, where $$\alpha$$ is typically small, in which case the unpreconditioned operator becomes severely ill posed. The problem is not yet fully specified. We need a variational formulation for the constraint, which will eventually lead to the definition of the state space U. The most natural way is to use the standard weak formulation with $$U =H_0^1(\varOmega )$$: \begin{equation*} \big(\nabla u, \nabla w\big)_{L^2\left(\varOmega\right)} +\left(f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = H_0^1(\varOmega), \end{equation*} where ∇ denotes the gradient of a scalar function. Here we use $$H^m(\varOmega )$$ to denote the standard Sobolev spaces of functions on $$\varOmega$$ with associated norm $$\Vert \cdot \Vert _{H^m(\varOmega )}$$. The subspace of functions $$u\in H^1(\varOmega )$$ with vanishing trace is denoted by $$H_0^1(\varOmega )$$. This problem is well studied; see, for example, the book by Tröltzsch (2010). Other options for the state equation are the very weak form in the following sense: $$U = L^2(\varOmega )$$ and \begin{equation*} - \left(u, \varDelta w\right)_{L^2\left(\varOmega\right)} +\left(\,f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = H^2(\varOmega) \cap H_0^1(\varOmega), \end{equation*} and the strong form with $$U = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and \begin{equation*} -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)} +\left(\,f, w\right)_{L^2\left(\varOmega\right)} = 0\quad \forall\, w\in W = L^2(\varOmega). \end{equation*} In each of these different formulations of the optimization problem, only two bilinear forms are involved: the $$L^2$$-inner product (twice in the objective functional and once in the state equation as the second term) and the bilinear form representing the negative Laplacian. In operator notation we use $$M \colon L^2(\varOmega ) \rightarrow (L^2(\varOmega ))^{\prime}$$ for representing the $$L^2$$-inner product, \begin{equation*} \langle M y,z \rangle = \left(y, z\right)_{L^2(\varOmega)} \end{equation*} and $$K \colon U \rightarrow W^{\prime}$$ for representing the bilinear form associated to the negative Laplacian \begin{equation*} \langle K u,w \rangle = \begin{cases} \left(\nabla u, \nabla w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=W = H_0^1(\varOmega), \\ -\left(u, \varDelta w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=L^2(\varOmega),\ W = H^2(\varOmega) \cap H_0^1(\varOmega), \\ -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)}& \textrm{with }U=H^2(\varOmega) \cap H_0^1(\varOmega),\ W=L^2(\varOmega), \end{cases} \end{equation*} depending on the choice of U. With this notation the state equation reads \begin{equation*} \left\langle K u, w\right\rangle + \left\langle M f, w\right\rangle = 0 \quad \forall\, w\in W . \end{equation*} For discretizing the problem we use the optimize-then-discretize approach. Therefore, we start by introducing the associated Lagrangian functional which reads \begin{equation*} \mathcal{L}\left(u,f,w\right) = \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} +\left\langle Ku, w\right\rangle +\left\langle Mf, w\right\rangle, \end{equation*} with u ∈ U, f ∈ F and the Lagrangian multiplier w ∈ W. From the first-order necessary optimality conditions \begin{equation*} \frac{\partial \mathcal{L}}{\partial u}\left(u,f,w\right)=0, \quad \frac{\partial \mathcal{L}}{\partial f}\left(u,f,w\right)=0, \quad \frac{\partial \mathcal{L}}{\partial w}\left(u,f,w\right)=0, \end{equation*} which are also sufficient here, we obtain the optimality system, which leads to the following problem. Problem 3.1 Find $$\left (u,f,w\right )\in U\times F \times W$$ such that \begin{equation*} \mathcal{A}_\alpha \begin{pmatrix} u \\ f \\ w \end{pmatrix} = \begin{pmatrix} M d\\ 0 \\ 0 \end{pmatrix} \quad \textrm{with} \quad \mathcal{A}_\alpha = \begin{pmatrix} M & 0 & K^{\prime} \\ 0 & \alpha M & M \\ K & M & 0 \end{pmatrix} . \end{equation*} Strictly speaking, the four operators M appearing in $$\mathcal{A}_\alpha$$ are restrictions of the original operator M introduced above on the corresponding spaces U, F and W. The block operator in Problem 3.1 is of the form (2.1) for n = 2 with $$A_1 = \begin{pmatrix} M & 0 \\ 0 & \alpha M \end{pmatrix},\quad A_2 = 0 \quad\textrm{and}\quad B_1 = \begin{pmatrix} K & M \end{pmatrix}.$$ (3.1) We now analyse the three possible choices of U, which were considered above: 1. We start with the weak formulation of the state equation, where $$U =H_0^1(\varOmega )$$, $$W=H_0^1(\varOmega )$$ and $$\left \langle Ku, w\right \rangle = \left (\nabla u, \nabla w\right )_{L^2\left (\varOmega \right )}$$. In this case it is obvious that $$A_1$$ is not positive definite on $$X_1 = U \times F = H_0^1(\varOmega ) \times L^2(\varOmega )$$. So the results of Section 2 do not apply. However, there exist other preconditioners that are $$\alpha$$-robust for this choice of $$U =H_0^1(\varOmega )$$; see the study by Schöberl & Zulehner (2007). 2. Next we examine the very weak form of the state equation, where $$U = L^2(\varOmega )$$, $$W = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and $$\left \langle Ku, v\right \rangle = -\left (u, \varDelta v\right )_{L^2\left (\varOmega \right )}$$. For this choice it is easy to see that $$S_1: U \times F \rightarrow U^{\prime} \times F^{\prime}$$ is positive definite, $$S_2$$ is well defined with $$S_1 = \begin{pmatrix} M & 0 \\ 0 & \alpha M \end{pmatrix}\quad \textrm{and} \quad S_2 = \frac{1}{\alpha} \, M+K M^{-1}K^{\prime}.$$ (3.2) In order to apply the results of Section 2 we are left with showing that $$S_2: W\rightarrow W^{\prime}$$ is positive definite. First, observe that we have the following alternative representation of $$S_2$$. Lemma 3.2 \begin{equation*} S_2 = \frac{1}{\alpha} \, M + B, \end{equation*} where the (biharmonic) operator B is given by $$\left\langle B y, z\right\rangle = \left(\varDelta y, \varDelta z\right)_{L^2\left(\varOmega\right)} \quad \forall\, y, z \in H^2(\varOmega) \cap H_0^1(\varOmega).$$ (3.3) Proof. For $$w \in W = H^2(\varOmega ) \cap H_0^1(\varOmega )$$ we have \begin{equation*} \left\langle K M^{-1} K^{\prime} w, w\right\rangle = \sup_{0 \neq v \in U}\frac{\left\langle K^{\prime} w, v\right\rangle^{2}}{\left\langle M v, v\right\rangle} = \sup_{0 \neq v \in U}\frac{\left(v, -\varDelta w\right)^{2}_{L^2(\varOmega)}}{\left(v, v\right)_{L^2\left(\varOmega\right)}} = \Vert \varDelta w\Vert_{L^2\left(\varOmega\right)}^2, \end{equation*} from which it follows that $$K M^{-1} K^{\prime} = B$$, since both operators are self-adjoint. The second ingredient for showing the positive definiteness of $$S_2$$ is the following result from the books by Grisvard (1992, 2011). Lemma 3.3 Assume that $$\varOmega$$ is a bounded open subset in $$\mathbb{R}^2$$$$(\mathbb{R}^3)$$ with a polygonal (polyhedral) Lipschitz boundary $$\partial \varOmega$$. Then \begin{equation*} \Vert v\Vert_{H^2\left(\varOmega\right)} \leq c \, \Vert \varDelta v\Vert_{L^2\left(\varOmega\right)} \quad \forall\, v \in H^2\left(\varOmega\right)\cap H^1_0\left(\varOmega\right), \end{equation*} for some constant c. Proof. According to Grisvard (2011, Theorem 3.1.1.2) we have \begin{equation*} \int_\varOmega \left|\operatorname{div} q(x)\right|^2 \, \mathrm{d} x = \int_\varOmega \nabla q(x) : \left(\nabla q(x)\right)^\top \, \mathrm{d}x \end{equation*} for all $$q \in H^2(\varOmega )^d$$ with $$q_T := v - (v\cdot n) \, n = 0$$. Applying this identity to q = ∇v with $$v \in H^3(\varOmega )\,\cap\, H_0^1(\varOmega )$$ we obtain \begin{equation*} \int_\varOmega \left|\varDelta v(x)\right|^2 \, \mathrm{d} x = \int_\varOmega \big\|\nabla^2 v(x)\big\|_F^2 \, \mathrm{d} x = \big\|\nabla^2 v\big\|^2_{L^2\left(\varOmega\right)}. \end{equation*} Since $$H^3(\varOmega )\cap H_0^1(\varOmega )$$ is dense in $$H^2(\varOmega )\cap H_0^1(\varOmega )$$ (see Grisvard,,1992, Theorem 1.6.2), it follows that the identity holds for all $$v \in H^2(\varOmega )\cap H_0^1(\varOmega )$$. Friedrichs’s inequality gives \begin{equation*} \Vert v\Vert_{L^2\left(\varOmega\right)} \leq c_F \,\|\nabla v\|_{L^2\left(\varOmega\right)} \quad \forall \, v\in H_0^1(\varOmega). \end{equation*} From Poincare’s inequality applied to ∇v it follows that \begin{equation*} \|\nabla v\|_{L^2\left(\varOmega\right)} \leq c_P \, \big\|\nabla^2 v\big\|_{L^2\left(\varOmega\right)} \quad \forall \, v\in H^2(\varOmega) \cap H_0^1(\varOmega), \end{equation*} since $$\int _\varOmega \nabla v \, \mathrm{d} x = \int _{\partial \varOmega } v \, n \, \mathrm{d} s = 0$$. Combining these inequalities and the equality completes the proof with the constant $$c^2 = c_P^2 (1+c_F^2)$$. From this a priori estimate the required property of $$S_2$$ follows. Lemma 3.4 Under the assumptions of Lemma 3.3 the operator $$S_2: W \rightarrow W^{\prime}$$ given by (3.2) is bounded, self-adjoint and positive definite, where $$W = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Proof. It is obvious that $$S_2$$ is bounded and self-adjoint. Moreover, it follows from Lemma 3.3 that B is positive definite. Since $$S_2 \ge B$$, $$S_2$$ is positive definite too. As a direct consequence from Corollary 2.4, Lemma 3.4 and the results of Section 2 we have the following result. Corollary 3.5 Under the assumptions of Lemma 3.3 the operator $$\mathcal{A}_\alpha$$ in Problem 3.1 is an isomorphism from $$\textbf{X} = L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ to its dual space with respect to the norm in X given by \begin{equation*} \left\Vert \left(u,f,w\right)\right\Vert^2 = \Vert u\Vert^2_U + \Vert f\Vert^2_F + \Vert w\Vert^2_W \end{equation*} with \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert w\Vert^2_{W} = \frac{1}{\alpha}\Vert w\Vert^2_{L^2\left(\varOmega\right)} + \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}. \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\mathcal{A}_\alpha\right) \leq \frac{\cos\left(\frac{\pi}{5}\right)}{\sin\left(\frac{\pi}{10}\right)}\approx 2.62 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} M & & \\ & \alpha \, M & \\ & & \frac{1}{\alpha} M + B \end{pmatrix}. \end{equation*} 3. Finally, we examine the strong form of the state equation, where $$U = H^2(\varOmega ) \cap H_0^1(\varOmega )$$, $$W = L^2(\varOmega )$$ and $$\left \langle Ku, v\right \rangle = -\left (\varDelta u, v\right )_{L^2\left (\varOmega \right )}$$. With the original setting (3.1) we cannot apply the results of Section 2, since $$A_1$$ is not positive definite. To overcome this we change the ordering of the variables and equations and obtain the following equivalent form of the optimality conditions: $$\widetilde{\mathcal{A}}_\alpha \begin{pmatrix} f \\ w \\ u \end{pmatrix} = \begin{pmatrix} 0\\ 0 \\ M d \end{pmatrix} \quad \textrm{with} \quad \widetilde{\mathcal{A}}_\alpha = \begin{pmatrix} \alpha M & M & 0 \\ M & 0 & K \\ 0 & K^{\prime} & M \end{pmatrix} .$$ (3.4) Here we view $$\widetilde{\mathcal{A}}_\alpha$$ as a block operator of the form (2.1) for n = 3 with \begin{equation*} A_1 = \alpha M, \quad A_2 = 0, \quad A_3 = M, \quad B_1 =M \quad \textrm{and} \quad B_2=K^{\prime}. \end{equation*} The corresponding Schur complements are given by \begin{equation*} S_1 = \alpha M, \quad S_2=\frac{1}{\alpha}M \quad \textrm{and} \quad S_3 = M+\alpha K^{\prime} M^{-1}K. \end{equation*} As before we have the following alternative representation of $$S_3$$: $$S_3 = M+\alpha \, B,$$ with the biharmonic operator, given by (3.3). It is obvious that $$S_1$$ and $$S_2$$ are positive definite. We are left with showing that $$S_3$$ is positive definite, which follows from Lemma 3.4, since K and $$S_3$$ in this case are identical to K′ and $$\alpha \, S_2$$ from the previous case. So, finally we obtain the following result analogously to Corollary 3.5. Corollary 3.6 Under the assumptions of Lemma 3.3 the operator $$\widetilde{\mathcal{A}}_\alpha$$ in equation (3.4) is an isomorphism from $$\textbf{X} = L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ to its dual space with respect to the norm in X given by \begin{equation*} \left\Vert \left(f,w,u\right)\right\Vert^2 = \Vert f\Vert^2_F + \Vert w\Vert^2_W + \Vert u\Vert^2_U \end{equation*} with \begin{equation*} \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert w\Vert^2_{W} = \frac1\alpha \Vert w\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert u\Vert^2_{U} = \alpha \Vert u\Vert^2_{L^2\left(\varOmega\right)}+\Vert \varDelta u\Vert^2_{L^2\left(\varOmega\right)} . \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha \widetilde{\mathcal{A}}_\alpha\right) \leq \frac{\cos\left({\pi}/{7}\right)}{\sin\left({\pi}/{14}\right)}\approx4.05 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} \alpha M & & \\ & \frac{1}{\alpha} \, M & \\ & & M+\alpha \, B \end{pmatrix}. \end{equation*} The characteristic properties of the model problem of this subsection are: distributed observation: this refers to the first term in the objective functional, where the state u is compared to the given data on the whole domain $$\varOmega$$; and distributed control: the state u is controlled by f, which is allowed to live on the whole domain $$\varOmega$$. Alternatively, the comparison with given data might be done on a set $$\varOmega _o$$ different from $$\varOmega,$$ which is called limited observation. Similarly, the control might live on a set $$\varOmega _c$$ different from $$\varOmega$$, which is called limited control. In the next two subsections we will see that the results based on the very weak form of the state equation and on the strong form of the state equation can be extended to problems with distributed observation and limited control and to problems with distributed control and limited observation, respectively. For simplicity we will focus on model problems with $$\varOmega _o = \partial \varOmega$$ or $$\varOmega _c = \partial \varOmega$$. 3.2 Distributed observation and limited control We consider the following variation of the model problem from Section 3.1 as a model problem for distributed observation and limited control: find u ∈ U and $$f \in F = L^2(\partial \varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert u-d\Vert^2_{L^2\left(\varOmega\right)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2(\partial\varOmega)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u &= 0\quad \textrm{in} \quad \varOmega,\\ u &= f \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$d\in U = L^2\left (\varOmega \right )$$ and $$\alpha>0$$ are given data. This model problem and error estimates for a finite element discretization are analysed in the study by May et al. (2013) for convex domains $$\varOmega$$. As in May et al. (2013) we consider the very weak form of the state equation: \begin{equation*} \left( u, -\varDelta w\right)_{L^2\left(\varOmega\right)} +\left(f, \partial_{n} w \right)_{L^2(\partial\varOmega)} = 0\quad \forall\, w\in W, \end{equation*} with $$u \in U = L^2(\varOmega )$$ and $$W = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Here $$\partial _n w$$ denotes the normal derivative of w on $$\partial \varOmega$$. Contrary to May et al. (2013) we do not assume that $$\varOmega$$ is convex. See also the study by Berggren (2004) for another version of a very weak formulation, which coincides with the formulation from above for convex domains. Analogously to Section 3.1 the optimality system can be derived, and it reads as follows. Problem 3.7 Find $$\left (f,w,u\right )\in F\times W \times U$$ such that $$\mathcal{A}_\alpha \begin{pmatrix} u \\ f \\ w \end{pmatrix} = \begin{pmatrix} M d \\ 0 \\ 0 \end{pmatrix} \quad \textrm{with}\quad \mathcal{A}_\alpha= \begin{pmatrix} M & 0 & K^{\prime} \\ 0 & \alpha M_{\partial} & N \\ K & N^{\prime} & 0 \end{pmatrix},$$ (3.5) where \begin{align*} \left\langle M y, z\right\rangle &= \left(y, z\right)_{L^2\left(\varOmega\right)}, & \left\langle K u, w\right\rangle &= -\left(u, \varDelta w\right)_{L^2\left(\varOmega\right)}, \\ \left\langle M_\partial f, g\right\rangle &= \left(f, g\right)_{L^2(\partial\varOmega)}, & \left\langle N w, f\right\rangle &= \left(\partial_n w, f\right)_{L^2(\partial\varOmega)}, \end{align*} and $$U = L^2\left (\varOmega \right )$$, $$F = L^2(\partial \varOmega )$$ and $$W=H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Using similar arguments as for Problem 3.1 with the very weak formulation of the state equation we obtain the following result. Corollary 3.8 Under the same assumptions as Lemma 3.3, the operator $$\mathcal{A}_\alpha$$ in Problem 3.7 is an isomorphism between $$\textbf{X}=L^2\left (\varOmega \right ) \times L^2(\partial \varOmega ) \times H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ and its dual space with respect to the norm in X given by \begin{equation*} \Vert \left(u,f,w\right)\Vert^2 = \Vert u\Vert^2_U + \Vert f\Vert^2_F + \Vert w\Vert^2_W \end{equation*} with \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2(\partial\varOmega)},\quad \Vert w\Vert^2_{W} = \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}+ \frac{1}{\alpha}\Vert \partial_{n}w\Vert^2_{L^2(\partial\varOmega)} . \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\mathcal{A}_\alpha\right) \leq \frac{\cos\left({\pi}/{5}\right)}{\sin\left({\pi}/{10}\right)} \approx 2.62 \quad \textrm{with} \quad{\mathcal{S}}_\alpha = \begin{pmatrix} M & & \\ & \alpha \, M_\partial & \\ & & \frac{1}{\alpha}\, K_{\partial} + B \end{pmatrix} \end{equation*} where \begin{equation*} \left\langle K_\partial y, z\right\rangle = \left(\partial_n y, \partial_n z\right)_{L^2(\partial\varOmega)}. \end{equation*} Remark 3.9 So far we have considered only a model problem with Dirichlet boundary control u = f on $$\partial \varOmega$$. A similar result holds for Neumann/Robin boundary control, that is, $$\partial _n u + a u = f$$ on $$\partial \varOmega$$, where a ≥ 0, and distributed observation with the spaces \begin{equation*} U = L^2\left(\varOmega\right), \quad F = L^2(\partial\varOmega), \quad W = \big\lbrace{w\in H^2\left(\varOmega\right) |\, \partial_nw + a w = 0\big\rbrace}, \end{equation*} equipped with the norms \begin{equation*} \Vert u\Vert^2_{U} = \Vert u\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2(\partial\varOmega)},\quad \Vert w\Vert^2_{W} = \Vert \varDelta w\Vert^2_{L^2\left(\varOmega\right)}+ \frac{1}{\alpha}\Vert w\Vert^2_{L^2(\partial\varOmega)} . \end{equation*} 3.3 Distributed control and limited observation Finally, we consider a model problem with distributed control and limited observation: find u ∈ U and $$f \in F = L^2(\varOmega )$$ that minimize the objective functional \begin{equation*} \frac12 \Vert \partial_n u-d\Vert^2_{L^2(\partial\varOmega)} + \frac{\alpha}{2}\Vert f\Vert^2_{L^2\left(\varOmega\right)} \end{equation*} subject to the constraints \begin{align*} -\varDelta u + f &= 0\quad \textrm{in} \quad \varOmega,\\ u &= 0 \quad \textrm{on} \quad \partial\varOmega, \end{align*} where $$d\in L^2(\partial \varOmega )$$, $$\alpha> 0$$ are given data. Robust preconditioners for this problem were first analysed in the study by Mardal et al. (2017). As in Mardal et al. (2017) the strong form of the state equation is used: $$u \in U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ and \begin{equation*} - \left(\varDelta u, w\right)_{L^2\left(\varOmega\right)} + \left(f, w\right)_{L^2\left(\varOmega\right)} = 0 \quad \forall\, w \in W = L^2\left(\varOmega\right). \end{equation*} Following the same procedure as for Problem 3.1 with $$U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ we obtain the (reordered) optimality system. Problem 3.10 Find $$\left (f,w,u\right )\in W\times W \times U$$ such that \begin{equation*} \widetilde{\mathcal{A}}_\alpha \begin{pmatrix} f \\ w \\ u \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ N^{\prime} d \end{pmatrix}\quad \textrm{with} \quad \widetilde{\mathcal{A}}_\alpha= \begin{pmatrix} \alpha M & M & 0 \\ M& 0 & K \\ 0 & K^{\prime} & K_\partial \end{pmatrix}, \end{equation*} where \begin{equation*} \left\langle K u, w\right\rangle = -\left(\varDelta u, w\right)_{L^2\left(\varOmega\right)},\qquad \left\langle N^{\prime} d, v\right\rangle = \left\langle N v, d\right\rangle = \left(\partial_n v, d\right)_{L^2(\partial\varOmega)}, \end{equation*} and $$W = F = L^2\left (\varOmega \right )$$, and $$U=H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$. Using similar arguments as for Problem 3.1 with $$U = H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )$$ we obtain the following result. Corollary 3.11 Under the same assumptions as Lemma 3.3 the operator $$\widetilde{\mathcal{A}}_\alpha$$ in Problem 3.10 is an isomorphism between $$\textbf{X}= L^2\left (\varOmega \right )\times L^2\left (\varOmega \right ) \times \left (H^2\left (\varOmega \right )\cap H^1_0\left (\varOmega \right )\right )$$ and its dual space, with respect to the norm in X given by \begin{equation*} \Vert \left(f,w,u\right)\Vert^2 = \Vert f\Vert^2_F + \Vert w\Vert^2_W + \Vert u\Vert^2_U \end{equation*} with \begin{equation*} \Vert f\Vert^2_{F} = \alpha \Vert f\Vert^2_{L^2\left(\varOmega\right)}, \quad \Vert w\Vert^2_{W} = \frac1\alpha \Vert w\Vert^2_{L^2\left(\varOmega\right)},\quad \Vert u\Vert^2_{U} = \Vert \partial_{n}u\Vert^2_{L^2(\partial\varOmega)} +\alpha \Vert \varDelta u\Vert^2_{L^2\left(\varOmega\right)}. \end{equation*} Furthermore, the following condition number estimate holds: \begin{equation*} \kappa\left(\mathcal{S}^{-1}_\alpha\widetilde{\mathcal{A}}_\alpha\right) \leq \frac{\cos\left({\pi}/{7}\right)}{\sin\left({\pi}/{14}\right)}\approx4.05 \quad \textrm{with} \quad \mathcal{S}_\alpha = \begin{pmatrix} \alpha \,M & & \\ & \frac{1}{\alpha} \, M & \\ & & K_{\partial}+\alpha \, B \end{pmatrix}. \end{equation*} Corollary 3.11 with the preconditioner $$\mathcal{S}_\alpha$$ was originally proven in the study by Mardal et al. (2017), which was the main motivation for this article. In Mardal et al. (2017) convexity of $$\varOmega$$ was required. Remark 3.12 Throughout this section we have discussed optimal control problems with the operator K being the Laplace operator. In principle the discussion can be extended to other state operators as long as the corresponding Schur complements are well defined and positive definite. In order to ensure that the first Schur complement is positive definite we need that either the observation or the control is distributed. The challenging part for extending the results is to ensure that the last Schur complement is positive definite. This relies on an a priori estimate of the form \begin{equation*} \|v\|_W \le c\, \|K^{\prime}w\|_{L^2(\varOmega)} \quad \forall \, v \in W, \quad \text{resp.} \quad \|v\|_U \le c\, \|K v\|_{L^2(\varOmega)} \quad \forall \, v \in U, \end{equation*} for problems with distributed observation, resp. distributed control, as it was provided in Lemma 3.3 for K being the Laplace operator. A discussion of such a priori estimates for other state operators is beyond the scope of this paper. 4. Preconditioners for discretized optimality systems So far we have addressed optimality systems only on the continuous level. In this section we discuss the discretization of optimality systems and efficient preconditioners for the discretized problems. We will focus on Problem 3.10. The same approach also applies to Problems 3.1 and 3.7. Let $$U_h$$ and $$W_h$$ be conforming finite-dimensional approximation spaces for Problem 3.10, that is, \begin{equation*} U_h \subset H^2(\varOmega)\cap H_0^1(\varOmega), \quad W_h \subset L^2(\varOmega). \end{equation*} Applying Galerkin’s principle to Problem 3.10 leads to the following problem. Problem 4.1 Find $$(f_h,w_h,u_h) \in W_h \times W_h \times U_h$$ such that $$\widetilde{\mathcal{A}}_{\alpha,h} \begin{pmatrix} \underline{f}_h \\ \underline{w}_h \\ \underline{u}_h \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ \underline{d}_h \end{pmatrix} \quad \textrm{with} \quad \widetilde{\mathcal{A}}_{\alpha,h} = \begin{pmatrix} \alpha M_h & M_h & 0 \\ M_h & 0 & K_h \\ 0 & K_h^T & K_{\partial,h} \end{pmatrix},$$ (4.1) where $$M_h$$, $$K_h$$, $$K_{\partial ,h}$$ are the matrix representations of linear operators M, K, $$K_\partial$$ on $$W_h$$, $$U_h$$, $$U_h$$ relative to chosen bases in these spaces, respectively, and $$\underline{f}_h$$, $$\underline{w}_h$$, $$\underline{u}_h$$, $$\underline{d}_h$$ are the corresponding vector representations of $$f_h$$, $$w_h$$, $$u_h$$, N′d. Motivated by Corollary 3.11 we propose the following preconditioner for (4.1): $$\mathcal{S}_{\alpha,h} = \begin{pmatrix} \alpha M_h & 0 & 0 \\ 0 & \frac{1}{\alpha} M_h & 0\\ 0 & 0 & K_{\partial,h} + \alpha \, B_h \end{pmatrix},$$ (4.2) where $$B_h$$ is given by $$\left\langle B_h \underline{u}_h, \underline{v}_h\right\rangle = \left(\varDelta u_h, \varDelta v_h\right)_{L^2\left(\varOmega\right)} \quad \forall\, u_h, v_h \in U_h.$$ (4.3) The operator $$\mathcal{S}_{\alpha }$$ is self-adjoint and positive definite. Therefore, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ is symmetric and positive definite, since it is obtained by Galerkin’s principle. Moreover, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ is a sparse matrix, provided the underlying bases of $$U_h$$ and $$W_h$$ consist of functions with local support, which we assume from now on. The application of the preconditioner within a preconditioned Krylov subspace method requires solving linear systems of the form $$\mathcal{S}_{\alpha,h} \, \underline{\textbf{w}}_h = \underline{\textbf{r}}_h$$ (4.4) for given vectors $$\underline{\textbf{r}}_h$$. We use sparse direct solvers on the Schur complements to solve the system. Observe that, in general, the preconditioner $$\mathcal{S}_{\alpha ,h}$$ introduced above is different from the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha,h}) = \begin{pmatrix} \alpha M_h & 0 & 0 \\ 0 & \frac{1}{\alpha} M_h & 0\\ 0 & 0 & K_{\partial,h} + \alpha \, K_h^{\top}M^{-1}_h K_h \end{pmatrix},$$ (4.5) as introduced in (2.4). Therefore, in general, the condition number estimates derived in Section 2 do not hold for $$\mathcal{S}_{\alpha ,h}$$. There is one exception from this rule provided by the next lemma, which is due to the study by Mardal et al. (2017). We include the short proof for completeness. Lemma 4.2 Let $$U_h$$ and $$W_h$$ be conforming discretization spaces to Problem 4.1 with $$\varDelta U_h \subset W_h.$$ (4.6) Then we have \begin{equation*} K_h^{\top}M^{-1}_h K_h = B_h. \end{equation*} Proof. We have \begin{align*} \left\langle K_h^{\top}M^{-1}_h K_h \underline{u}_h, \underline{u}_h\right\rangle & = \sup_{0 \neq w_h\in W_h}\frac{\left\langle K_h \underline{u}_h, \underline{w}_h\right\rangle^{2}}{\left\langle M_h\underline{w}_h, \underline{w}_h\right\rangle} = \sup_{0 \neq w_h\in W_h}\frac{\big(-\varDelta u_h, w_h\big)^{2}_{L^2(\partial\varOmega)}}{\left(w_h, w_h\right)_{L^2\left(\varOmega\right)}} \\ & = \Vert \varDelta u_h\Vert_{L^2\left(\varOmega\right)}^2 = \left\langle B_h \underline{u}_h, \underline{u}_h\right\rangle. \end{align*} Since both $$K_h^{\top}M^{-1}_h K_h$$ and $$B_h$$ are symmetric matrices equality follows. So, under the assumptions of Lemma 4.2 we have $$\mathcal{S}_{\alpha ,h} = \mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$, and, therefore, it follows that $$\kappa\big(\mathcal{S}_{\alpha,h}^{-1} \, \widetilde{\mathcal{A}}_{\alpha,h}\big) \le \frac{\cos(\pi/7)}{\sin(\pi/14)} \approx 4.05,$$ (4.7) showing that $$\mathcal{S}_{\alpha ,h}$$ is a robust preconditioner in $$\alpha$$ and in h. Remark 4.3 In case that condition (4.6) does not hold, the matrix $$K_h^{\top}M^{-1}_h K_h$$ must be expected to be dense. This makes the application of the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ computationally too expensive, while $$\mathcal{S}_{\alpha ,h}$$ is always sparse. While $$\mathcal{S}_{\alpha ,h}$$ is always positive definite the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ is symmetric but, in general, only positive semidefinite. However, a simple and mild condition guarantees the definiteness: Lemma 4.4 Let $$U_h$$ and $$W_h$$ be conforming discretization spaces to Problem 4.1 with $$U_h \subset W_h.$$ (4.8) Then the matrix $$K_{\partial ,h} + \alpha K_h^{\top}M^{-1}_h K_h$$ is symmetric and positive definite. Proof. If (4.8) holds then it follows that \begin{equation*} \left\langle K_h^{\top}M^{-1}_h K_h \underline{u}_h, \underline{u}_h\right\rangle = \sup_{0 \neq w_h\in W_h}\frac{\left(-\varDelta u_h, w_h\right)^{2}_{L^2\left(\varOmega\right)}}{\left(w_h, w_h\right)_{L^2\left(\varOmega\right)}} \ge \frac{\left(-\varDelta u_h, u_h\right)^{2}_{L^2\left(\varOmega\right)}}{\left(u_h, u_h\right)_{L^2\left(\varOmega\right)}}, \end{equation*} by choosing $$w_h = u_h \in W_h$$. Therefore, if $$\underline{u}_h$$ is in the kernel of $$K_h^{\top}M^{-1}_h K_h$$ then $$u_h \in U_h \subset H^2(\varOmega ) \cap H_0^1(\varOmega )$$ and $$\left (\nabla u_h, \nabla u_h\right )_{L^2\left (\varOmega \right )} = \left (-\varDelta u_h, u_h\right )^{2}_{L^2\left (\varOmega \right )} = 0$$, which imply $$u_h = 0$$. This lemma shows the importance of condition (4.8), which we will adopt as a condition for our choice of $$U_h$$ and $$W_h$$; see below. Additionally, it allows us to compare the practical preconditioner $$\mathcal{S}_{\alpha ,h}$$ with the theoretical Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$, which would guarantee the derived uniform bound of the condition number but is computationally too costly. Observe that it is required that $$U_h \subset H^2(\varOmega )\cap H_0^1(\varOmega )$$. In order to meet this condition, $$C^1$$ finite element spaces were proposed for $$U_h$$ in the study by Mardal et al. (2017). In particular, the Bogner–Fox–Schmit element on a rectangular mesh was used. Here we advocate instead for spline spaces of sufficiently smooth functions as provided in IgA. For the purpose of this paper we restrict ourselves to a simple version of such approximation spaces, which are briefly described now. Let $$\hat{S}^{p}_{k, \ell }$$ be the space of spline functions on the unit interval (0, 1) which are k-times continuously differentiable and piecewise polynomials of degree p on a mesh of mesh size $$2^{-\ell }$$, which is obtained by ℓ uniform refinements of (0, 1). The value k = −1 is used for discontinuous spline functions. On $$(0,1)^d$$ we use the corresponding tensor-product spline space, which, for simplicity, is again denoted by $$\hat{S}^{p}_{k,\ell }$$. It will be always clear from the context what the actual space dimension d is. It is assumed that the physical domain $$\varOmega$$ can be parametrized by a mapping $$\textbf{F}: (0,1)^d\rightarrow \varOmega$$ with components $$\textbf{F}_i \in \hat{S}^{p}_{k,\ell }$$. The discretization space $$S^{p}_{k,\ell }$$ on the domain $$\varOmega$$ is defined by \begin{equation*} S^{p}_{k,\ell} := \left\lbrace f \circ \textbf{F}^{-1} : f \in \hat{S}^{p}_{k,\ell}\right\rbrace . \end{equation*} All information on this discretization space is summarized by the triple h = (p, k, ℓ). See the monograph by Cottrell et al. (2009) for more details and more sophisticated discretization spaces in IgA. We propose the following approximation spaces of equal order: \begin{equation*} U_h = \big\{ v_h \in S^{p}_{k,\ell} \colon M_{\partial,h} \underline{u}_h = 0 \big\}, \end{equation*} where $$M_{\partial ,h}$$ is the matrix representation of $$M_\partial$$ on $$S^{p}_{k,\ell }$$, and \begin{equation*} W_h = S^{p}_{k,\ell}. \end{equation*} For this setting condition (4.6) is not satisfied and the analysis of the proposed preconditioner is not covered by the results of Section 2. Condition (4.8) is obviously satisfied and we will report on promising numerical results in Section 5. Remark 4.5 Condition (4.6) is rather restrictive. Even if the geometry mapping F is the identity, the smallest tensor-product spline space for $$W_h$$ for which condition (4.6) holds is the space $$S^{p}_{k-2,\ell }$$ if d ≥ 2. This space has a much higher dimension than the choice $$S^{p}_{k,\ell }$$ from above without significantly improved approximation properties. Remark 4.6 A completely analogous discussion can be had for the model problems in Sections 3.1 and 3.2. For example, a sparse preconditioner for the discretized version of Problem 3.7 is given by \begin{equation*} {\mathcal{S}}_{\alpha,h} = \begin{pmatrix} M_h & & \\ & \alpha \, M_{\partial,h} & \\ & & \frac{1}{\alpha}\, K_{\partial,h} + B_h \end{pmatrix}, \end{equation*} motivated by Corollary 3.8. 5. Numerical results In this section we present numerical results for two examples of Problem 3.10. First we consider a two-dimensional example, where the physical domain $$\varOmega$$ is given by its parametrization $$\textbf{F}\colon (0,1)^2 \rightarrow \mathbb{R}^2$$ with \begin{equation*} \textbf{F}(\xi) = \begin{pmatrix} 1+\xi_1 - \xi_1 \xi_2 - \xi_2^2 \\ 2 \xi_1 \xi_2 - \xi_1 \xi_2^2 +2 \xi_2 - \xi_2^2 \end{pmatrix}, \end{equation*} and the prescribed data d are given by $$d(\textbf{x}) = \partial _n (\sin{\left (2\pi x_1\right )} \sin{\left (4\pi x_2\right )})$$ on the boundary $$\partial \varOmega$$. The domain $$\varOmega = \textbf{F}((0,1)^2)$$ is a close approximation of a quarter of an annulus; see Fig. 1. Fig. 1. View largeDownload slide The two-dimensional domain $$\varOmega$$ is an approximation of a quarter of an annulus. Fig. 1. View largeDownload slide The two-dimensional domain $$\varOmega$$ is an approximation of a quarter of an annulus. For a fixed polynomial degree p we choose the following discretization spaces of maximal smoothness k = p − 1: \begin{equation*} U_h = \big\{ v_h \in S^{p}_{p-1,\ell} \colon M_{\partial,h} \underline{u}_h = 0 \big\} \quad \textrm{and} \quad W_h = S^{p}_{p-1,\ell}. \end{equation*} The resulting linear system of equations \begin{equation*} \widetilde{\mathcal{A}}_{\alpha,h} \, \underline{\textbf{x}}_h = \underline{\textbf{b}}_h \end{equation*} is solved by using the preconditioned minimal residual method (MINRES). We will present results for the preconditioner $$\mathcal{S}_{\alpha ,h}$$ (see (4.2)), and for comparison only, for the Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ (see (4.5)). The iteration starts with the initial guess 0 and stops when $$\Vert \underline{\textbf{r}}_k\Vert \le \epsilon \, \Vert \textbf{r}_0\Vert \quad \textrm{with} \quad \epsilon = 10^{-8},$$ (5.1) where $$\underline{\textbf{r}}_k:=\underline{\textbf{b}}_h-\widetilde{\mathcal{A}}_{\alpha ,h}\underline{\textbf{x}}_k$$ denotes the residual of the problem at $$\underline{\textbf{x}}_k$$ and ∥⋅∥ is the Euclidean norm. All computations are done with the C++ library G+Smo (Hofreither et al., 2014). For polynomial degree p = 2, Table 1 shows the total number of degrees of freedom (dof) and the number of iterations for different values of the refinement level ℓ and the parameter $$\alpha$$, when using the (more costly) Schur complement preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$. Table 1 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 Table 1 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}(\widetilde{\mathcal{A}}_{\alpha ,h})$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 21 36 33 22 9 5 4 904 21 35 38 26 9 5 5 3 336 21 35 35 29 10 5 6 12 808 19 34 34 27 9 4 As predicted from the analysis the numbers of iterations are bounded uniformly with respect to $$\alpha$$ and ℓ. Observe that the iteration numbers are lower for $$\alpha = 1$$ and for small $$\alpha$$. Further numerical investigations, partly supported by analytic studies, showed that the eigenvalues of the preconditioned system matrix cluster around (some of) the values given in Theorem 2.6 for large as well as for small values of $$\alpha$$. This might explain why the iteration numbers are lower for $$\alpha = 1$$ (which turn out to be already large in this context) and for small values of $$\alpha$$. Table 2 shows the number of iterations when using the sparse preconditioner $$\mathcal{S}_{\alpha , h}$$. Table 2 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 Table 2 Two-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 3 264 24 38 39 34 23 19 4 904 25 38 41 36 22 18 5 3 336 25 38 40 34 22 17 6 12 808 25 38 39 31 19 13 The numbers in Table 2 are only slightly larger than the numbers in Table 1 for large $$\alpha$$. For small $$\alpha$$ some additional iterations are required, nevertheless it appears that method is robust with respect to $$\alpha$$ and the refinement level ℓ. As a second example we consider a three-dimensional variant of the two-dimensional example. The physical domain $$\varOmega$$ is obtained by twisting a cylindrical extension of the two-dimensional domain from the first example. The parametrization is given by the geometry map $$\textbf{F}\colon (0,1)^3 \rightarrow \mathbb{R}^3$$ with \begin{equation*} \textbf{F}(\xi) = \begin{pmatrix} \frac32 \xi_1 \xi_2^3 \xi_3 - \xi_1 \xi_2^3 -\frac32 \xi_1 \xi_2^2 \xi_3 + \xi_1 +\frac12 \xi_2^3 \xi_3 +\frac12 \xi_2^3 + \frac32 \xi_2^2 \xi_3 -\frac32 \xi_2^2 + 1\\[6pt] \xi_2\left(\xi_1 \xi_2^2 - 3 \xi_1 \xi_2 + 3\xi_1 -\frac12 \xi_2^2 +\frac32 \right)\\[6pt] -\xi_2^3 \xi_3 +\frac12 \xi_2^3 + \frac32 \xi_2^2 + \xi_3 \end{pmatrix} \end{equation*} and the prescribed data d are given by $$d(\textbf{x}) = \partial _n (\sin{\left (2\pi x_1\right )} \sin{\left (4\pi x_2\right )} \sin{\left (6\pi x_3\right )})$$ on the boundary $$\partial \varOmega$$. For polynomial degree p = 3, Table 3 shows the numbers of iterations for the three-dimensional example (see Fig. 2), using the preconditioner $$\mathcal{S}_{\alpha , h}$$. Table 3 Three-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 View Large Table 3 Three-dimensional example; numbers of iterations for the preconditioner $$\mathcal{S}_{\alpha , h}$$ $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 $$\alpha$$ ℓ dof 1 0.1 0.01 1e−3 1e−5 1e−7 2 811 20 35 41 32 19 16 3 3 391 23 35 43 40 22 18 4 18 631 23 35 43 37 22 17 5 121 687 19 33 38 34 20 13 View Large Fig. 2. View largeDownload slide The three-dimensional domain viewed from two different angles. Fig. 2. View largeDownload slide The three-dimensional domain viewed from two different angles. The numbers of iterations for the three-dimensional example are similar to their two-dimensional counterparts. Remark 5.1 The solution strategy proposed in this paper for solving discretized linear optimality systems is a combination of an iterative solver and direct solvers. For the overall system we use a Krylov subspace method. For preconditioning we use sparse direct solvers on the Schur complements $$S_1$$, $$S_2$$ and $$S_3$$ to solve (4.4). One might wonder whether it would be better to use a sparse direct solver directly on the overall system. Note that the matrices $$S_1$$, $$S_2$$ and $$S_3$$ are symmetric and positive definite, whereas the matrix $$\widetilde{\mathcal{A}}_{\alpha ,h}$$ is symmetric but indefinite, which makes a significant difference for sparse direct solvers. Indeed, our numerical experiments showed that our strategy outperforms direct solvers applied to the overall indefinite optimality system for mid-sized problems. For large-sized problems the direct sparse solvers eventually failed due to memory limits. Therefore, we believe our hybrid strategy exploits the advantages of both (iterative and direct) methods very well. However, an attractive alternative is to consider iterative methods, like multigrid methods (see, e.g., Trottenberg et al., 2001), also for solving (4.4). There is ongoing work on multigrid methods for biharmonic problems as they occur in (4.4), which we will address in a forthcoming paper for IgA-based discretization methods. 6. Conclusions Two main results have been shown: new existence results for optimality systems in Hilbert spaces and sharp condition number estimates. Typical applications for the new existence results are model problems from optimal control problems with second-order elliptic state equations. For boundary observation and distributed control the existence of the optimal state in $$H^2(\varOmega )$$ follows for polygonal/polyhedral domains without additional convexity assumptions, although the state equation alone does not guarantee the existence of a solution in $$H^2(\varOmega )$$ if the right-hand side lies in $$L^2(\varOmega )$$. For this class of problems, which initially are seen as classical saddle point problems, it turned out that the reformulation as multiple saddle point problems is beneficial. Similarly, for distributed observation and boundary control the existence of the Lagrangian multiplier w in $$H^2(\varOmega )$$ follows for polygonal/polyhedral domains without convexity assumptions. These new existence results were obtained by replacing the standard weak formulation of the second-order problem by a strong or a very weak formulation depending on the type of optimal control problems. The new sharp condition number estimates for multiple saddle point problems are to be seen as extensions of well-known sharp bounds for standard saddle point problems. The analysis of saddle-point problems in function spaces motivates the construction of sparse preconditioners for discretized optimality systems. The interpretation of standard saddle point problems with primal and dual variables as multiple saddle point problems with possibly more than two types of variables allows the construction of preconditioners based on Schur complements for a wider class of problems. Finally, the required discretization spaces of higher smoothness can be handled with techniques from IgA, which open the door to possible extensions to optimal control problems with other classes of state equations like biharmonic equations. Acknowledgements The authors would like to thank the anonymous referees for their valuable comments and suggestions which helped to improve this manuscript. Funding Austrian Science Fund (S11702-N23). References Beik , F. P. A. & Benzi , M. ( 2017 ) Iterative methods for double saddle point systems . Technical Report . Atlanta, GA : Department of Mathematics and Computer Science, Emory University . Benzi , M. , Golub , G. H. & Liesen , J. ( 2005 ) Numerical solution of saddle point problems . Acta Numer. , 14 , 1 – 137 . Google Scholar CrossRef Search ADS Berggren , M . ( 2004 ) Approximations of very weak solutions to boundary-value problems . SIAM J. Numer. Anal. , 42 , 860 – 877 . Google Scholar CrossRef Search ADS Cottrell , J. A. , Hughes , T. J. R. & Bazilevs , Y. ( 2009 ) Isogeometric Analysis: Toward Integration of CAD and FEA . New Jersey : John Wiley . Google Scholar CrossRef Search ADS Gatica , G. N. , Gatica , L. F. & Stephan , E. P. ( 2007 ) A dual-mixed finite element method for nonlinear incompressible elasticity with mixed boundary conditions . Comput. Methods Appl. Mech. Eng. , 196 , 3348 – 3369 . Google Scholar CrossRef Search ADS Gatica , G. N. & Heuer , N. ( 2000 ) A dual-dual formulation for the coupling of mixed-FEM and BEM in hyperelasticity . SIAM J. Numer. Anal. , 38 , 380 – 400 . Google Scholar CrossRef Search ADS Gatica , G. & Heuer , N. ( 2002 ) Conjugate gradient method for dual-dual mixed formulations . Math. Comput. , 71 , 1455 – 1472 . Google Scholar CrossRef Search ADS Grisvard , P. ( 1992 ) Singularities in Boundary Value Problems . Berlin : Springer . Grisvard , P. ( 2011 ) Elliptic Problems in Nonsmooth Domains . Philadelphia : SIAM . Reprint of the 1985 hardback edn. Google Scholar CrossRef Search ADS Hofreither , C. , Mantzaflaris , A. & Sogn , J. ( 2014 ) G+Smo v0.8 . Available at http://gs.jku.at/gismo. Kuznetsov , Y. A. ( 1995 ) Efficient iterative solvers for elliptic finite element problems on nonmatching grids . Russian J. Numer. Anal. Math. Modelling , 10 , 187 – 212 . Langer , U. , Of , G. , Steinbach , O. & Zulehner , W. ( 2007 ) Inexact data-sparse boundary element tearing and interconnecting methods . SIAM J. Sci. Comput. , 29 , 290 – 314 . Google Scholar CrossRef Search ADS Mardal , K.-A. , Nielsen , B. F. & Nordaas , M. ( 2017 ) Robust preconditioners for PDE-constrained optimization with limited observations . BIT Numer. Math. , 57 , 405 – 431 . Google Scholar CrossRef Search ADS Mardal , K.-A. & Winther , R. ( 2011 ) Preconditioning discretizations of systems of partial differential equations . Numer. Linear Algebra Appl. , 18 , 1 – 40 . Google Scholar CrossRef Search ADS May , S. , Rannacher , R. & Vexler , B. ( 2013 ) Error analysis for a finite element approximation of elliptic Dirichlet boundary control problems . SIAM J. Control Optim. , 51 , 2585 – 2611 . Google Scholar CrossRef Search ADS Murphy , M. F. , Golub , G. H. & Wathen , A. J. ( 2000 ) A note on preconditioning for indefinite linear systems . SIAM J. Sci. Comput. , 21 , 1969 – 1972 . Google Scholar CrossRef Search ADS Pearson , J. W. & Wathen , A. J. ( 2012 ) A new approximation of the Schur complement in preconditioners for PDE-constrained optimization . Numer. Linear Algebra Appl. , 19 , 816 – 829 . Google Scholar CrossRef Search ADS Pestana , J. & Rees , T. ( 2016 ) Null-space preconditioners for saddle point systems . SIAM J. Matrix Anal. Appl. , 37 , 1103 – 1128 . Google Scholar CrossRef Search ADS Rivlin , T. J. ( 1990 ) Chebyshev Polynomials: From Approximation Theory to Algebra and Number Theory , 2nd edn. New Jersey : John Wiley . Schöberl , J. & Zulehner , W. ( 2007 ) Symmetric indefinite preconditioners for saddle point problems with applications to PDE-constrained optimization problems . SIAM J. Matrix Anal. Appl. , 29 , 752 – 773 . Google Scholar CrossRef Search ADS Tröltzsch , F. ( 2010 ) Optimal Control of Partial Differential Equations. Theory, Methods and Applications . Providence, RI : American Mathematical Society . Trottenberg , U. , Oosterlee , C. W. & Schüller , A. ( 2001 ) Multigrid . With guest contributions by Brandt , A. , Oswald , P. & Stüben , K . Orlando : Academic Press . Appendix A. Chebyshev polynomials of second kind are defined by the recurrence relation \begin{equation*} U_{0}(x) = 1, \quad U_{1}(x) = 2x, \quad U_{i+1}(x) = 2xU_{i}(x)-U_{i-1}(x) \quad \textrm{for} \ i \ge 1. \end{equation*} Their closed-form representation is given by $$U_j(\cos \theta) = \frac{\sin\left(\left(j+1\right)\theta\right)}{\sin\left(\theta\right)};$$ (A.1) see the book by Rivlin (1990). It immediately follows that the polynomials $$P_j(x) := U_j(x/2)$$ satisfy the related recurrence relation \begin{equation*} P_{0}(x) = 1, \quad P_{1}(x) = x, \quad P_{i+1}(x) = xP_{i}(x)-P_{i-1}(x) \quad \textrm{for} \ i \ge 1, \end{equation*} which shows that the polynomials $$P_j(x)$$ coincide with the polynomials used in the proof of Theorem 2.2. Analogously, it follows that the polynomials $$\bar{P}_j(x) := P_j(x) - P_{j-1}(x) = U_j(x/2) - U_{j-1}(x/2)$$ satisfy the related recurrence relation \begin{equation*} \bar{P}_{0}(x) = 1, \quad \bar{P}_{1}(x) = x-1, \quad \bar{P}_{i+1}(x) = x\bar{P}_{i}(x)-\bar{P}_{i-1}(x) \quad \textrm{for} \ i \ge 1, \end{equation*} which shows that the polynomials $$\bar{P}_j(x)$$ coincide with the polynomials used in the proofs of Theorems 2.2 and 2.6. In the next lemma properties of the roots of the polynomials $$\bar{P}_j(x)$$ are collected, which were used in these theorems. Lemma A1 1. The roots of the polynomial $$\bar{P}_j$$ are given by \begin{equation*} x^{i}_{j} = 2 \, \cos\left(\frac{2i-1}{2j+1}\pi\right), \quad \textrm{for } i= 1,\ldots,j. \end{equation*} 2. For fixed j the root of largest modulus is $$x_j^1$$. Moreover, \begin{equation*} x_j^1> 1 \quad \textrm{and} \quad \bar{P}_i\big(x_j^1\big) > 0 \quad \forall \, i = 0,1,\ldots,j-1. \end{equation*} 3. For fixed j the root of smallest modulus $$x_j^{\ast }$$ is given by $$x_j^{\ast } = x_j^{i^{\ast }}$$ with $$i^{\ast } = [j/2]+1$$, where [y] denotes the largest integer less than or equal to y. Moreover, \begin{equation*} \big|x_j^{\ast}\big| = 2 \sin \left( \frac{1}{2(2j+1)} \pi \right). \end{equation*} Proof. From (A.1) we obtain \begin{equation*} \bar{P}_j(2\cos \theta) = \frac{1}{\sin \theta} \left(\sin\left(\left(j+1\right)\theta\right)-\sin\left(j\theta\right)\right) = \frac{2\sin(\theta/2)}{\sin \theta} \, \cos\left(\frac{2j+1}{2}\theta\right). \end{equation*} Then the roots of $$\bar{P}_j$$ directly follow from the known zeros $$\frac{2i-1}{2} \pi$$ of $$\cos (x)$$. For fixed j, $$x_j^i$$ is a decreasing sequence in i, for which the rest of the lemma can be deduced by elementary calculations. In the proof of Theorem 2.3 a sequence of matrices $$Q_j$$ is introduced whose spectral norms are needed. It is easy to verify that $$Q^{-1}_{j} = \left(\begin{array}{ccccc} 1 & -1 & & &\\ -1 & 0 & 1 & &\\ & 1 & 0& \ddots&\\ & &\ddots & \ddots & (-1)^{j-1} \\ & & & (-1)^{j-1} & 0 \end{array}\right).$$ (A.2) By Laplace’s formula one sees that the polynomials $$\det (\lambda \, I - Q_n^{-1})$$ satisfy the same recurrence relation as the polynomials $$\bar{P}_j(\lambda )$$, and therefore we have \begin{equation*} \det \big(\lambda \, I - Q_n^{-1}\big) = \bar{P}_j(\lambda). \end{equation*} Hence, with the notation from above it follows that \begin{equation*} \|Q_j\| = \frac{1}{\big|x_j^{\ast}\big|} = \frac{1}{2 \sin \left( \frac{1}{2(2j+1)} \pi \right)}, \end{equation*} which was used for the calculation of $$\|\mathcal{M}^{-1}\|_{\mathcal{L}(\textbf{X},\textbf{X})}$$ in Theorem 2.3. © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com

### Journal

IMA Journal of Numerical AnalysisOxford University Press

Published: May 21, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations