Add Journal to My Library
IMA Journal of Numerical Analysis
, Volume Advance Article – Sep 16, 2017

48 pages

/lp/ou_press/parametric-pdes-sparse-or-low-rank-approximations-QkKHkQDK6s

- Publisher
- Oxford University Press
- Copyright
- © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
- ISSN
- 0272-4979
- eISSN
- 1464-3642
- D.O.I.
- 10.1093/imanum/drx052
- Publisher site
- See Article on Publisher Site

Abstract We consider adaptive approximations of the parameter-to-solution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial expansions, general low-rank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their near-optimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity is a new operator compression result in the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized low-rank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem. 1. Introduction Complex design, optimization or uncertainty quantification tasks based on parameter dependent families of partial differential equations (PDEs) arise in virtually all branches of science and engineering. Typical scenarios are models whose physical properties—such as diffusivity, transport velocity or domain geometry—are described by a finite number of real parameter values. In certain instances, one may even encounter infinitely many parameters of decreasing influence. This occurs, for instance, in the case of a random stochastic diffusion field represented by an infinite expansion in a given basis. The development and analysis of numerical strategies for capturing the dependence of the PDE on the parameters has been the subject of intensive research efforts in recent years. 1.1 Problem formulation The problems that are addressed in this article have the following general form. Let $$V$$ be a separable Hilbert space. We consider a parametric operator $$A(y)\colon V\to V'$$ of the form \begin{equation}\label{Arep} A(y) := A_0 + \sum_{j\in{\mathcal{I}}} y_j A_j,\quad y\in{Y} := [-1,1]^{\mathcal{I}}\!, \end{equation} (1.1) where $${\mathcal{I}}=\{1,\dots,d\}$$ or $${\mathcal{I}}= {\mathbb{N}}$$ in the finite or infinite dimensional case, respectively. In the infinite dimensional case, we require that the above series converges in $${\cal L}(V,V')$$ for any $$y\in {Y}$$. We assume uniform boundedness and ellipticity of $$A(y)$$ over the parameter domain, that is, \begin{equation} \label{ellip} \langle A(y)v,w\rangle \leq R \|v\|_V\|w\|_V \quad {\rm and} \quad \langle A(y)v,v\rangle \geq r \|v\|_V^2, \quad v,w\in V, \; y\in {Y}, \end{equation} (1.2) for some $$0<r\leq R<\infty$$, which implies in particular that $$A(y)$$ is boundedly invertible uniformly in $$y\in {Y}$$, with \begin{equation} \|A(y)\|_{{\cal L}(V',V)}\leq r^{-1}, \quad y\in {Y}. \end{equation} (1.3) We also consider parametric data $$f\colon {Y} \to V'$$, and for each $$y\in Y$$, we define $$u(y)\in V$$ the solution to the equation \begin{equation}\label{paramgeneral} A(y)\,u(y) = f(y). \end{equation} (1.4) A guiding example is provided by affinely parametrized diffusion problems of the form \begin{equation} \label{paramdiffusion-0} A(y) u := -{\mathrm{div}}\bigl( a(y) \nabla u \bigr) = f , \quad a(y) := \bar a + \sum_{j \in {\mathcal{I}}} y_j {\theta}_j , \end{equation} (1.5) with homogeneous Dirichlet boundary conditions, posed in the weak sense on a spatial domain $$D \subset {\mathbb{R}}^m$$. In this particular case of frequent interest, the right-hand side $$f\in V'$$ is independent of $$y$$. The validity of (1.2) is then usually ensured by the uniform ellipticity assumption \begin{equation}\label{uea} \sum_{j\geq 1} {\lvert{{\theta}_j(x)}\rvert} \leq \bar a(x) - \underline{\alpha}, \quad x\in D, \end{equation} (1.6) for some $$\underline{\alpha}>0$$. We then have $$V=H^{1}_0(D)$$, and the corresponding operators $$A_j \colon V \to V'$$ for $$j \in \{0\}\cup {\mathcal{I}}$$ are defined by \begin{equation*} \langle A_0 u, v \rangle := \int_D \bar{a} \nabla u\cdot\nabla v \,{\rm{d}}x ,\qquad \langle A_j u, v \rangle := \int_D {\theta}_j \nabla u\cdot\nabla v \,{\rm{d}}x ,\quad i \in{\mathcal{I}}, \end{equation*} for $$u,v\in V$$. Thus, $$V$$ is typically a function space defined over some physical domain $$D\subset {\mathbb{R}}^m$$, with $$m=1,2,3$$, for example, the Sobolev space $$H^1_0(D)$$ in the above case of second-order elliptic equations with homogeneous Dirichlet boundary conditions. Therefore, the solution may either be viewed as the Hilbert space-valued map \begin{equation} y\mapsto u(y), \label{solutionmap} \end{equation} (1.7) which acts from $${Y}$$ to $$V$$ or as the scalar-valued map \begin{equation} (x,y)\mapsto u(x,y):=u(y)(x), \end{equation} (1.8) where $$x\in D$$ and $$y\in {Y}$$ are referred to as the spatial and parametric variables. Approximating such solution maps amounts to approximating functions of a large or even infinite number of variables. In applications, one is often interested in specific functionals of the solution. Here, we focus on the basic question of approximating the entire solution map in an appropriate norm. The guiding questions, to be made precise below, are the following: What are the most suitable approximations to cope with the high dimensionality in problems of the form (1.4), and what features of problems (1.4) favour certain approaches over others? Moreover, at what numerical cost can one find these approximations, and how do these costs depend on particular features of the given problem? To address the latter question, for each setting, we construct adaptive computational methods that exhibit near-optimal complexity, in a sense to be made precise below. 1.2 Sparse and low-rank approximability Before addressing any concrete numerical schemes, we discuss the basic concepts of approximations for the solution map $$u$$ in (1.7). We focus on the mean squared error $$\|u-\tilde u\|_{L^2({Y},V)}$$ for an approximation $$\tilde u$$, where \[ \|v\|_{L^2({Y},V)}^2:=\int_{Y} \|v(y)\|_V^2 {\rm \,{d}}\mu(y), \] for a given probability measure $$\mu$$ over $${Y}$$. In what follows, we assume that $$\mu$$ is the uniform probability measure on $${Y}$$. The results carry over, however, to other product measures on $${Y}$$. The following types of approximation make essential use of the tensor product structure of the Bochner space $$L^2({Y},V)=V\otimes L^2({Y})$$, where $$L^2({Y})=L^2({Y},\mu)$$. Sparse polynomial expansions A first approach to approximating $$y\mapsto u(y)$$ is to employ an a priori chosen basis $$\{u_1^{{\rm y}}, \dots, u_n^{{\rm y}}\}\subset L^2(Y)$$ and compute the $$u_i^{{\rm x}}\in V$$ as the corresponding coefficients of this approximation. One prominent example of this approach is the orthogonal polynomial expansion method (see, e.g., Ghanem & Spanos, 1990, 1991; Le Maître & Knio, 2010; Xiu, 2010). In this case, the parametric functions $$u_i^{{\rm y}}$$ are picked from the set of tensorized Legendre polynomials \begin{equation} L_\nu(y)=\prod_{j\geq 1} L_{\nu_j}(y_j), \quad \nu=(\nu_j)_{j\geq 1}, \label{tensorleg} \end{equation} (1.9) with $$(L_k)_{k\geq 1}$$ the univariate Legendre polynomial sequence normalized in $$L^2([-1,1],\frac {\,{\rm d}t} {2})$$. The functions $$(L_\nu)_{\nu\in{\mathcal{F}}}$$ are an orthonormal basis of $$L^2({Y})$$, where $${\mathcal{F}}$$ is $${\mathbb{N}}_0^d$$ in the case $${\mathcal{I}}=\{1,\dots,d\}$$ or the set of finitely supported sequences of non-negative integers in the case $${\mathcal{I}}={\mathbb{N}}$$, that is, \begin{equation} {\mathcal{F}} := \{ \nu\in{\mathbb{N}}_0^{{\mathbb{N}}} \colon \#\,{\mathrm{supp}}\,\, \nu < \infty \}. \end{equation} (1.10) One thus has \begin{equation} \label{Legendre} u(y)=\sum_{\nu\in{\mathcal{F}}} u_\nu L_\nu(y), \quad u_\nu= \int_{Y} u(y)L_\nu(y) {\rm \,{d}}\mu(y). \end{equation} (1.11) Then, one natural choice is the best $$n$$-term approximation $$u_n$$ obtained by restricting the above expansion to the set $${\it{\Lambda}}^{\rm y}_{n}\subset {\mathcal{F}}$$ of indices $$\nu$$ corresponding to the $$n$$ largest $$\|u_\nu\|_V$$, because this set minimizes the error $$\|u-u_n\|_{L^2({Y},V)}$$ among all possible choices of $$n$$-term truncations. This strategy for generating sparse polynomial approximations in the context of parametric PDEs was first introduced and analysed in Cohen et al. (2010, 2011). In practice, the set $${\it{\Lambda}}^{\rm y}_n$$ is not accessible, but provides a benchmark for the performance of algorithms. This representation complexity, however, does not yet determine the resulting computational complexity, because the coefficients $$u_\nu$$ in (1.11) in turn need to be approximated as well. For instance, one may choose a fixed basis $$\{ \psi_\lambda\}_{\lambda\in{\mathcal{S}}}$$ of $$V$$ and expand \[ u_\nu = \sum_{\lambda\in {\mathcal{S}}} {\bf u}_{\lambda,\nu} \psi_\lambda \] in (1.11), where $${\bf u}_{\lambda,\nu}\in{\mathbb{R}}$$. The simplest strategy is to use the same discretization for all $$u_\nu$$ by selecting a finite $${\it{\Lambda}}^{\rm x}$$, which yields the approximation \[ u \approx \sum_{\nu \in {\it{\Lambda}}_{n}^{\rm y}} \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}} {\bf u}_{\lambda,\nu} \psi_\lambda \right)\otimes L_\nu. \] Using instead an independently adapted spatial discretization for each $$u_\nu$$ corresponds to adaptive sparse polynomial approximations of the form \begin{equation}\label{fullnterm} u \approx \sum_{(\lambda, \nu) \in {\it{\Lambda}}} {\bf u}_{\lambda,\nu}\, \psi_\lambda\otimes L_\nu, \end{equation} (ASP) with $${\it{\Lambda}} \subset {\mathcal{S}} \times {\mathcal{F}}$$. It is natural to quantify the complexity of such an approximation by the number of activated degrees of freedom $$\#{\it{\Lambda}}$$. Here, one can again ask for best $$N$$-term approximations, now with respect to the fixed basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$, obtained by minimizing the error over all $${\it{\Lambda}}$$ with $$\#{\it{\Lambda}} = N$$. This now results in a fully discrete approximation. Low-rank approximation More generally, one may consider approximations of the form \begin{equation}\label{basiclrapprox} u \approx u_n :=\sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k, \end{equation} (1.12) where $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ are functions of the spatial and parametric variables, respectively. This contains (1.11) as a special case, but we also now allow $$u^{\rm y}_k \in L^2(Y)$$ to be arbitrary functions that are not given a priori, but adapted to the given problem. The shortest expansion of the form (1.12) that achieves a prescribed error in $$L^2({Y},V)$$ is given by truncation of the Hilbert–Schmidt decomposition of $$u$$ interpreted as the operator \begin{equation}\label{Tu} T_u: v \to \int_Y u(x,y)v(y){\rm \,{d}}\mu(y), \end{equation} (1.13) acting from $$L^2({Y})$$ to $$V$$. In this context, we define $${\mathrm{rank}}(u)$$ as the rank of the operator $$T_u$$, so that in particular $$u_n$$ with a representation by $$n$$ separable terms as in (1.12) has $${\mathrm{rank}}(u_n)\leq n$$. The functions $$u_1^{\rm x}, \dots, u_n^{\rm x}$$ and $$u_1^{\rm y}, \dots, u_n^{\rm y}$$ are given by the left and right singular functions, respectively, which yield the optimal rank-$$n$$ approximation of $$u$$ in $$L^2({Y},V)$$. This particular system of basis functions is a natural benchmark, as it minimizes the rank $$n=n(\varepsilon)$$ required to ensure a mean squared accuracy $$\varepsilon$$. However, it is not obvious how to compute sufficiently good approximations of these basis functions at affordable cost, a point to be taken up again later. The methods considered in this article are based on computing approximations of both $$u_k^{\rm x}$$ and $$u_k^{\rm y}$$. A low-rank approximation trying to approximately realizing a truncated Hilbert–Schmidt decomposition would be a first example for this category aiming at meeting the aforementioned benchmark. In this case, the error caused by truncation should ideally be balanced against the error in approximating the unknown basis functions $$u^{\rm x}_k, u^{\rm y}_k$$. Note that there exist alternative approaches for deriving computable expansions of the form (1.12), where only the functions $$u_1^{{\rm x}}, \dots, u_n^{{\rm x}}$$ and their span $$V_n$$ are constructed, which we comment on in Section 1.4. To obtain numerically realizable approximations, we may again use bases of $$V$$ and $$L^2(Y)$$ as in (ASP) and consider expansions for $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ to arrive at fully discrete low-rank approximations of the form \begin{equation}\label{fulllr} u \approx \sum_{k=1}^n \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}_k} {\bf u}^{\rm x}_{k,\lambda} \psi_\lambda \right)\otimes \left(\sum_{\nu\in {\it{\Lambda}}^{\rm y}_k} {\bf u}^{\rm y}_{k,\nu} L_\nu \right)\! , \end{equation} (LR) with $${\it{\Lambda}}^{\rm x}_k \subset {\mathcal{S}}$$, $${\it{\Lambda}}^{\rm y}_k\subset {\mathcal{F}}$$, $$k=1,\ldots,n$$. Hierarchical tensor decompositions One may as well go beyond the Hilbert–Schmidt decomposition (1.12) and consider higher-order low-rank tensor representations that correspond to further decompositions of the factors $$u^{\rm y}_k$$ in (1.12). For simplicity, at this point, let us consider this in the finite-dimensional case $$d<\infty$$, possibly after truncating the expansion (1.1) for $$A(y)$$. Introducing an additional tensor decomposition of the factors $${\bf u}^{\rm y}_k$$, we obtain the general approximations in subspace-based tensor formats, \begin{equation} \label{general} u_n = \sum_{k_{\rm x}=1}^{r_{\rm x}} u_{k_{\rm x}}^{\rm x} \otimes \left(\sum_{k_1=1}^{r_1}\cdots\sum_{k_d=1}^{r_d} {\bf a}_{k_{\rm x},k_1,\ldots,k_d}\bigotimes_{j=1}^d u^{{\rm y},j}_{k_j} \right)\!, \end{equation} (STF) where each $$u_{k_j}^{{\rm y},j}$$ is a function of the individual variable $$y_j$$. The minimal $$r_j$$ such that $$u_n$$ can be represented in the form (STF) are called multilinear ranks of $$u_n$$. We confine our discussion to hierarchical tensor representations, with the tensor train format as a special case (see, e.g., Oseledets & Tyrtyshnikov, 2009; Grasedyck, 2010; Hackbusch, 2012), where the high-order core tensor $${{\bf a}}=({\bf a}_{k_{\rm x},k_1,\ldots,k_d})_{k_{\rm x},k_1,\ldots,k_d}$$ is further decomposed in terms of lower-order tensors, based on matricizations of $${\bf a}$$. For instance, if \begin{equation} \label{ranks} \tilde r_i= {\mathrm{rank}}\big({\bf a}_{(k_0,\ldots, k_i),(k_{i+1},\ldots k_d)}\big), \end{equation} (1.14) one has a factorized representation of the form \begin{equation}\label{htcore} {{\bf a}}_{k_{\rm x},k_1,\ldots,k_d} = \sum_{\ell_1=1}^{\tilde r_1}{\mathbf{M}}^{(1)}_{k_{\rm x},\ell_1} \sum_{\ell_2=1}^{\tilde r_2} {\mathbf{M}}^{(2)}_{\ell_1,k_1,\ell_2} \cdots \sum_{\ell_{d-1}=1}^{\tilde r_{d-1}} {\mathbf{M}}^{(d-1)}_{\ell_{d-2},k_{d-2},\ell_{d-1}} {\mathbf{M}}^{(d)}_{\ell_{d-1},k_{d-1},k_d} \end{equation} (1.15) in terms of the tensors $${\mathbf{M}}^{(i)}$$, $$i=1,\ldots,d$$, of order at most three, and only these low-order tensors need to be stored and manipulated. The representation (STF) contains (ASP) and (LR) as special cases. For instance, to recover a sparse polynomial expansion (ASP), let $$\nu(k_{\rm x})$$, $$k_{\rm x}=1,\ldots,r_{\rm x}$$, be an enumeration of elements of $${\mathbb{N}}^d_0$$, and choose $${\bf a}_{k_{\rm x},k_1,\ldots,k_d} = \delta_{\nu(k_{\rm x}),(k_1,\ldots,k_d)}$$, $$u^{{\rm y},j}_{k_j}= L_{k_j}$$. With $$r_{\rm x} = r_1=\ldots=r_d$$ and diagonal core tensor $${\bf a}$$ having nonzero entries $${\bf a}_{k,k,\ldots,k}=1$$ for $$k = 1,\ldots, r_{\rm x}$$, one obtains representations (LR). 1.3 Guiding questions In the different types of approximation outlined above, the degrees of freedom enter in varying degrees of nonlinearity. More strongly nonlinear approximations (STF) with hierarchical decomposition (1.15) can potentially yield more strongly compressed representations, in the sense that the number of degrees of freedom $$n_{\text{dof}}(\varepsilon)$$ required for a target accuracy $$\varepsilon$$ in $$L^2(Y,V)$$ scales more favourably. Handling this stronger compression in the computation of such representations, however, leads to additional difficulties, and the number of required operations $$n_{\rm op}(\varepsilon)$$ may in fact scale less favourably than $$n_{\text{dof}}(\varepsilon)$$. Here, we aim for algorithms which, for each of the above types of approximation, are guaranteed to achieve any prescribed accuracy $$\varepsilon$$ and which are universal. This means that they do not require a priori knowledge on the approximability of the solution (e.g., on the decay of coefficients), but adjust to such approximability automatically. This goes hand in hand with a mechanism for obtaining a posteriori error bounds, making use only of the given data. This leads us to our first guiding question: (I) For a given parametric problem and approximation format (ASP), (LR) or (STF), can one contrive a universal numerical scheme that can achieve any given target accuracy $$\varepsilon$$, with approximate solutions close to the minimum required representation complexity $$n_{\rm dof} (\varepsilon)$$, and can $$n_{\rm op}$$ be related to $$\varepsilon$$, and hence to $$n_{\rm dof}(\varepsilon)$$? The minimum required representation complexity can be expressed in terms of the intrinsic approximability properties of the parametrized solutions $$u$$ in each of the formats. The corresponding required number of operations also depends on the problem data that are used in the solution process. We construct algorithms, based on a common generic strategy for (ASP), (LR) and (STF), which are near-optimal in this regard. With such algorithms at hand, a natural further question is the following. (II) Which of the approximation types (ASP), (LR) or (STF) is best suited for a given parametric problem, in the sense of leading to the smallest growth of $$n_{\rm op}(\varepsilon)$$ as $$\varepsilon\to 0$$? This amounts to asking for which parametric problems the investment into approximations of higher structural nonlinearity pays off, or conversely, for which problems possible gains in approximation efficiency are offset by more demanding computations. We address this point by analyses of the approximability of model problems, complemented by numerical experiments, with conclusions depending on the particular problem type. For problems with finitely many parameters that are each of comparable influence, hierarchical tensor representations of the form (STF) with (1.15) turn out to be clearly advantageous. In the case of an anisotropic dependence on infinitely many parameters, for representative model problems, we demonstrate that (ASP) can in general yield faster convergence than (LR) or (STF). The particular structure of such infinite parameter expansions also turns out to have a major influence on the efficiency of the adaptive schemes. 1.4 Relation to previous work There are a variety of results on the convergence of sparse polynomial expansions (1.11) (see, e.g., Cohen et al., 2010, 2011; Bachmayr et al., 2017). Furthermore, some estimates are available that include multilevel spatial discretizations and, hence, provide upper bounds for the error of best $$n$$-term approximation (ASP) (see, e.g., Cohen et al., 2011; Dũng, 2015). Concerning our question (II), there are only few specialized results comparing the different approximation formats. In the case of general bivariate functions, a systematic comparison between sparse grids and low-rank approximation is discussed in Griebel & Harbrecht (2014), showing in particular that for Sobolev classes the latter does not bring any improvement. In the case of high-dimensional functions associated with parametric PDEs, possible gains by low-rank approximations have been identified in Lassila et al. (2013) and Bachmayr & Cohen (2017) by exploiting the particular structure of the problem, all concerning the case of finitely many parameters. There are various approaches for generating sparse polynomial expansions, for instance, based on collocation (Babuška et al., 2010; Beck et al., 2012) or adaptive Taylor expansion (Chkifa et al., 2014). Note that these strategies do not currently yield a posteriori error bounds for the computed solutions, and their performance is thus described by a priori estimates that may not be sharp. The adaptive methods proposed in Eigel et al. (2014, 2015), based on finite element discretization for the spatial variable, yields a posteriori error bounds for the full approximations. However, the complexity bounds proved in Eigel et al. (2015) are given only in terms of the resulting finite element meshes. Adaptive schemes using wavelet-based spatial discretizations, which yield approximations of the form (ASP), have been studied by Gittelson (2013, 2014). In this case, bounds for the complete computational complexity are proved which, however, do not fully comply with the approximability properties of the solution. Reduced basis and proper orthogonal decomposition (POD) methods (Kahlbacher & Volkwein, 2007; Rozza et al., 2008; Lassila et al., 2013; Rozza, 2014) correspond to expansions of the form (1.12), where only the spatial basis elements $$u^{\rm x}_k$$ spanning $$V_n$$ are explicitly computed in an offline stage. Then, in an online stage, for any given $$y\in {Y}$$, the approximate solution $$u_r(y)$$ is defined as the Galerkin projection of $$u(y)$$ on the space $$V_n$$. For known variants of these methods, accuracy guarantees in the respective norms (where reduced basis methods usually aim at the error in $$L^\infty$$-norm $$\|v\|_{L^\infty({Y},V)} :=\sup_{y\in {Y}} \|v(y)\|_V$$) require a sufficiently dense sampling of the parameter domain. This becomes prohibitive for large $$d$$, and only one obtains a posteriori bounds for the resulting $$V$$-error in each given $$y\in{Y}$$. In methods based on higher-order tensor representations, instead of sampling in the parameter domain, one also approximates $$u^{\rm y}_k$$ as in (STF), at the price of additional approximability requirements as in (1.15). A variety of schemes have been proposed that operate on fixed discretizations (Khoromskij & Oseledets, 2010; Khoromskij & Schwab, 2011; Kressner & Tobler, 2011; Matthies & Zander, 2012), which do not yield information on the discretization error. Based on Eigel et al. (2014), an adaptive scheme for hierarchical tensor approximation is proposed in Eigel et al. (2017). It provides rigorous a posteriori bounds for the approximation error, but is not proven to converge. 1.5 Novelty of the article and outline Question (I) is addressed in Sections 2–5. A generic algorithm is described in Section 2 based on the work in Bachmayr & Dahmen (2015), which is guaranteed to converge without any a priori assumptions on the solution. Furthermore, it yields rigorous a posteriori error bounds, using only information on the problem data. Suitable specifications cover all the aforementioned types of approximations (ASP), (LR) and (STF). The scheme is formulated in a general sequence space framework, using a discretization of the space $$L^2({Y},V)$$ through a basis with elements of the form $$\psi_\lambda\otimes L_\nu$$. Here, $$\{\psi_\lambda\}_{\mu\in {\mathcal{S}}}$$ is a given Riesz basis of $$V$$ (e.g., a wavelet basis in the case where $$V$$ is a Sobolev space) and $$\{L_\nu\}_{\nu \in{\mathcal{F}}}$$ is the previously described multivariate Legendre basis. The algorithm performs an iteration in the sequence space $$\ell^2({\mathcal{S}}\times {\mathcal{F}})$$. It involves at each step specific routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ aiming at controlling the ranks of the current approximation as well as the number of degrees of freedom in each of its factors, respectively. We then describe realizations of this generic algorithm corresponding to two distinct settings. In Section 3, we apply the algorithm for the generation of approximations (STF) in the setting of finitely many parametric variables. In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on a truncation of a hierarchical singular value decomposition of the coefficient tensor. We analyse the performance of the algorithms for classes described by the decay of the corresponding singular values and joint sparsity of the corresponding singular vectors. Sections 4 and 5 are devoted to the case of anisotropic dependence on infinitely many parameters in the diffusion problem (1.5). In Section 4, we analyse a specialized version of Algorithm 2.1 producing $$n$$-term sparse Legendre expansions, see (ASP). In this version, the routine $${\small{\text{RECOMPRESS}}}$$ is simply the identity, and hence, Algorithm 2.1 agrees with the adaptive solver developed and analysed in Cohen et al. (2002). In Section 5, we consider, in the same setting as in Section 4, a solver for approximations (LR). In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on standard SVD truncation. The corresponding notions of approximability are analogous to those arising in Section 3. A key ingredient in Sections 4 and 5 is the adaptive approximation of the operator based on matrix compression results in Appendix A. Here, we obtain new estimates for wavelet-type multilevel expansions of the parametrized coefficients that are more favourable than what is known for Karhunen–Loève-type expansions. Our further algorithmic developments also require substantially weaker assumptions on the $$A_j$$ in (1.1) than the methods in Eigel et al. (2014, 2017), which require summability of $$({\lVert{A_j}\rVert})_{j\geq 1}$$. By the new operator compression results, we establish, in particular, computational complexity estimates for (ASP), which significantly improve on those of similar schemes in Gittelson (2013, 2014). Based on these complexity estimates, Question (II) is then addressed in Section 6. Although the presented algorithms are guaranteed to converge, the corresponding computational cost can only be quantified in terms of approximability properties of solutions $$u$$. In Section 6, we study the corresponding properties, which are different for each realization of the scheme, in representative examples of parametric problems of the form (1.5). In particular, for a certain class of such problems, we prove that the best $$n$$-term Legendre approximation is already asymptotically near-optimal among all rank-$$n$$ approximations. For other examples, we prove that optimized low-rank approximations can achieve significantly better complexity than best $$n$$-term Legendre approximations. This is illustrated further by numerical tests, demonstrating that these observations also hold for more involved model problems. 2. A generic algorithm In this section, we follow the approach developed in Bachmayr & Dahmen (2015) by first reformulating the general equation (1.4) in a sequence space, and then introducing a generic resolution algorithm based on this equivalent formulation. We first notice that (1.4) may also be written as \begin{equation} Au=f, \label{paramgeneral1} \end{equation} (2.1) where $$A$$ is elliptic and boundedly invertible from $$L^2({Y},V)$$ to $$L^2({Y},V')$$ and can be defined in a weak sense by \begin{equation} \langle Au,v\rangle :=\int_{{Y}}\langle A(y)u(y),v(y)\rangle {\rm \,{d}}\mu(y), \quad u,v\in L^2({Y},V). \end{equation} (2.2) We assume that $$f\in L^2({Y},V')$$, so that there exists a unique solution $$u\in L^2({Y},V)$$. Given a Riesz basis $$\{ \psi_\lambda \}_{\lambda \in {\mathcal{S}}}$$ of $$V$$, we tensorize it with the orthonormal basis $$\{L_\nu\}_{\nu\in {\mathcal{F}}}$$ of $$L^2({Y})$$. The resulting system $$\{\psi_\lambda \otimes L_\nu\}_{(\lambda,\nu)\in {\mathcal{S}} \times {\mathcal{F}}}$$ is a Riesz basis of $$L_2({Y},V)$$, which we now use to discretize (2.1). For this purpose, we define the matrices \begin{equation} {\mathbf{A}}_j := \bigl( \langle A_j \psi_{\lambda'}, \psi_{\lambda}\rangle \bigr)_{\lambda,\lambda' \in {\mathcal{S}}} \quad{\rm and} \quad {\mathbf{M}}_j = \left( \int_{{Y}} y_j L_\nu(y) L_{\nu'}(y) \,{\rm{d}}\mu(y) \right)_{\nu,\nu'\in{\mathcal{F}}}, \end{equation} (2.3) where $${\mathbf{M}}_0$$ is set to be the identity on $$\ell^2({\mathcal{F}})$$, and the right-hand side column vector \begin{equation} {\mathbf{f}} := \big(\langle f, \psi_\lambda\otimes L_\nu\rangle\big)_{(\lambda,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}. \end{equation} (2.4) We thus obtain an equivalent problem \begin{equation} \label{equiv} {\mathbf{A}} {\bf u} = {\mathbf{f}} \end{equation} (2.5) on $$\ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, where \begin{equation} \label{Arep2} {\mathbf{A}} := \sum_{j\geq 0} {\mathbf{A}}_j \otimes {\mathbf{M}}_j \end{equation} (2.6) and $${\bf u}=\big(u_{\lambda,\nu}\big)_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$ is the coordinate vector of $$u$$ in the basis $$\{\psi_\mu \otimes L_\nu\}_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$. Regarding $$\nu\in{\mathcal{F}}$$ as the column index of the infinite matrix $${\bf u}=({\bf u}_{\mu,\nu})_{\mu \in {\mathcal{S}},\nu\in{\mathcal{F}}}$$, we denote by $${\bf u}_\nu$$ the columns of $${\bf u}$$, which are precisely the basis representations of the Legendre coefficients $$u_\nu\in V$$. In what follows, we always denote by $${\lVert{\cdot}\rVert}$$ the $$\ell^{2}$$-norm on the respective index set that could be $${\mathcal{S}}$$, $${\mathcal{F}}$$ or $${\mathcal{S}}\times{\mathcal{F}}$$ or the corresponding operator norm when this is clear from the context. Since $$\{\psi_\mu\}_{\mu\in {\mathcal{S}}}$$ is a Riesz basis for $$V$$ we have $$\|u_\nu\|_{V}\sim \|{\bf u}_\nu\|$$ uniformly in $$\nu\in{\mathcal{F}}$$, which together with boundedness and ellipticity of $$A$$ implies that $${\mathbf{A}}$$ is bounded and elliptic on $$\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$, and that we have \begin{equation} \label{normequiv2} \|{\bf u}\|\sim \|{\mathbf{A}}{\bf u}\|\sim \|Au\|_{L^2({Y},V')}\sim \|u\|_{L^2({Y},V)} \end{equation} (2.7) with uniform constants. On account of (2.7), solving (2.5) approximately up to some target accuracy is equivalent to solving (2.5) in $$\ell^{2}$$ to essentially the same accuracy. As a further consequence, one can find a fixed positive $$\omega$$ such that $${\lVert{\mathbf{I} - \omega {\mathbf{A}}}\rVert}\le \rho <1$$, ensuring that a simple Richardson iteration converges with a fixed error reduction rate per step. This serves as the conceptual starting point for the adaptive low-rank approximation scheme introduced in Bachmayr & Dahmen (2015) as given in Algorithm 2.1. This basic algorithmic template can be used to produce various types of sparse and low-rank approximations, with appropriate choices of the subroutines $${\small{\text{APPLY}}}$$, $${\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$. The procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ are independent of the considered $${\mathbf{A}}$$ and $${\mathbf{f}}$$, and satisfy \begin{equation}\label{craccuracy} {\lVert{{\small{\text{COARSEN}}}({\bf v} ; \eta ) - {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RECOMPRESS}}}({\bf v};\eta) - {\bf v}}\rVert} \leq \eta, \end{equation} (2.8) for any $$\eta \geq 0$$ and any compactly supported $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$. Here, $${\small{\text{COARSEN}}}$$ is intended to reduce the support of the sequence $${\bf v}$$, whereas $${\small{\text{RECOMPRESS}}}$$ reduces the rank of $${\bf v}$$ in a low-rank tensor representation. The particular realizations of these routines depend on the dimensionality of the problem and on the type of approximation. We shall use the constructions given in Bachmayr & Dahmen (2015). In the case of the sparse approximations considered in Section 4, $${\small{\text{RECOMPRESS}}}$$ is chosen as the identity, and Algorithm 2.1 essentially reduces to the method analysed in Cohen et al. (2002). The routines $${\small{\text{APPLY}}}$$ and $${\small{\text{RHS}}}$$ are assumed to satisfy, for compactly supported $${\bf v}$$ and any $$\eta>0$$, the requirements \begin{equation}\label{applyaccuracy} {\lVert{{\small{\text{APPLY}}}({\bf v} ; \eta ) - {\mathbf{A}} {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RHS}}}(\eta) - {\mathbf{f}}}\rVert} \leq \eta. \end{equation} (2.9) Their construction depends on not only the type of approximation, but also the specific problem under consideration. These two routines are indeed the main driver of adaptivity in Algorithm 2.1, and a major part of what follows concerns the construction of $${\small{\text{APPLY}}}$$ in different scenarios. It hinges on the compression of matrices by exploiting their near-sparsity in certain basis representations. We use the following notion introduced in Cohen et al. (2001): A bi-infinite matrix $${\mathbf{B}}$$ is called $$s^*$$-compressible if there exist matrices $${\mathbf{B}}_{n}$$ with $$\alpha_{n} 2^n$$ entries per row and column and such that \begin{equation} \label{compress} {\lVert{{\mathbf{B}} - {\mathbf{B}}_{n}}\rVert} \leq \beta_{n} 2^{-s n}, \quad \mbox{ for }\,\, 0 < s < s^*, \end{equation} (2.10) and where the sequences $${\boldsymbol{\alpha}} = (\alpha_n)_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}} = (\beta_n)_{n\in{\mathbb{N}}_0}$$ are summable. Here, we always assume $${\mathbf{B}}_{0} = 0$$. Remark 2.1 As shown in Bachmayr & Dahmen (2015), regardless of the specifications of the routines $${\small{\text{APPLY}}}, {\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}\,\, and \,\,{\small{\text{RECOMPRESS}}}$$, Algorithm 2.1 terminates after finitely many steps and its output $${\bf u}_\varepsilon$$ satisfies $$\|{\bf u} - {\bf u}_\varepsilon\|\le \varepsilon$$. At this point, we record for later usage a particular feature of $${\mathbf{A}}$$ that arises as a consequence of our choice of tensor product orthogonal polynomials for parameter dependence: the approximate application of $${\mathbf{A}}$$ is facilitated by the fact that the matrices $${\mathbf{M}}_j$$ are bidiagonal. That is, in view of the three-term recurrence relation \begin{equation} \label{threeterm} tL_n(t)= p_{n+1} L_{n+1}(t) + p_{n} L_{n-1}(t), \quad L_{-1}\equiv 0, \end{equation} (2.11) where \begin{equation} \label{recurr_coeff} p_0 = 0, \qquad p_n= \frac{1}{\sqrt{4 - n^{-2}}} , \quad n>0, \end{equation} (2.12) one has $$\int_Y y_j\,L_\nu(y)L_{\nu'}(y)\,{\rm{d}}\mu(y) =0$$ unless $$| \nu_j - \nu'_j | = 1$$ and $$\nu_i = \nu'_i$$ for $$i \neq j$$ providing \begin{equation} \label{threeterm_matrix} ({\mathbf{M}}_j)_{\nu,\nu'}= p_{\nu_j + 1}\delta_{\nu +{\it \,{e}}^j,\nu'}+ p_{\nu_j}\delta_{\nu-{\it \,{e}}^j,\nu'} \end{equation} (2.13) with the Kronecker sequence $$({\it {e}}^j_i)_{i\in {\mathcal{I}}}:=(\delta_{i,j} )_{i\in{\mathcal{I}}} \in {\mathcal{F}}$$. 3. Hierarchical tensor representations in the case of finitely many parameters We begin by considering the setting \begin{equation} \label{cId} {\mathcal{I}}=\{1,\dots,d\}, \end{equation} (3.1) where $${\mathcal{F}} = {\mathbb{N}}_0^d$$ and $${\bf u} \in \ell^{2}({\mathcal{S}} \times {\mathbb{N}}_0\times\cdots\times{\mathbb{N}}_0)$$. Here, we are interested in the case that all coordinates in $${\mathcal{I}}$$ have comparable influence. As illustrated in Section 6, a direct sparse Legendre expansion of $${\bf u}$$ over $${\mathcal{S}}\times{\mathcal{F}}$$ will then in general be infeasible already for moderately large $$d$$. However, one may as well exploit Cartesian product structure in $${\mathcal{F}}$$, regarding $${\bf u}$$ as a higher-order tensor, and use corresponding hierarchical low-rank representations. As we shall detail in what follows, the results of Bachmayr & Dahmen (2015) can be adapted to this problem in a rather straightforward manner. It will be convenient to introduce a numbering of tensor modes as follows: $${\mathcal{G}}_{\rm x} := {\mathcal{S}}$$, $${\mathcal{G}}_1 := {\mathbb{N}}_0, \ldots$$, $${\mathcal{G}}_d := {\mathbb{N}}_0$$. We additionally introduce the notation \[ \hat{\mathcal{I}} := \{{\rm x}\}\cup{\mathcal{I}} . \] The representations of higher-order tensors that we consider are built on the Hilbert–Schmidt case via matricizations: for each nonempty $$M \subset \hat{\mathcal{I}}$$, any $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ induces a compact operator $$T_{\bf v}^{(M)} \colon \ell^{2}({\Large{\times}}_{i\in \hat{\mathcal{I}}\setminus M} {\mathcal{G}}_i) \to \ell^{2}({\Large{\times}}_{i\in M} {\mathcal{G}}_i)$$. In terms of the left singular vectors $$\{ {\bf U}^{(i)}_k \}_{k \in {\mathbb{N}}}$$ of $$T_{\bf v}^{(\{i\})}$$, $$i\in \hat{\mathcal{I}}$$, we obtain the HOSVD representation (Lathauwer et al., 2000) in the Tucker format (Tucker, 1964, 1966), \begin{equation}\label{tuckerformat} {\bf v} = \sum_{k_{\rm x}=1}^{r_{\rm x}} \sum_{k_1 = 1}^{r_1} \cdots \sum_{k_d=1}^{r_d} {\bf a}_{k_{\rm x},k_1,\ldots,k_d} \bigotimes_{i\in\hat{\mathcal{I}}} {\bf U}^{(i)}_{k_i}. \end{equation} (3.2) Here the tensor $${\bf a} = ({\bf a}_{\mathsf{k}})_{\mathsf{k}\in{\mathbb{N}}^{d+1}}$$ of order $$d+1$$ is referred to as core tensor and $$(r_{\rm x},r_1,\ldots,r_d)$$ as the multilinear ranks of $${\bf v}$$. The hierarchical tensor format (Hackbusch & Kühn, 2009), on which the variant of our scheme described in this section is based, can be interpreted as a further decomposition of $${\bf a}$$ into tensors of order at most three. This decomposition is obtained using further matricizations of the tensor according to a recursive decomposition of the set of modes $$\hat{\mathcal{I}}$$ into a binary tree, which we denote by $$\mathcal{D}$$. For simplicity, we focus in our exposition on linear trees corresponding to factorizations (1.15), where \begin{equation}\label{lindimtree} \mathcal{D} = \bigl\{ \hat{\mathcal{I}}, \{{\rm x}\}, \{1,\ldots,d\}, \{1\}, \{2,\ldots, d\}, \ldots, \{d-1\}, \{d\} \bigr\}. \end{equation} (3.3) For each $$\alpha \in \mathcal{D}$$, the rank of the corresponding matricization $$T^{(\alpha)}_{\bf v}$$ is denoted by $${\mathrm{rank}}_\alpha({\bf v})$$, its singular values by $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$. Here, $${\mathrm{rank}}_{\hat{\mathcal{I}}}({\bf v}) = 1$$ for all $${\bf v} \neq 0$$, and we set \begin{equation} {\mathrm{rank}}({\bf v}) := \bigl({\mathrm{rank}}_\alpha({\bf v})\bigr)_{\alpha\in\mathcal{D}\setminus\hat{\mathcal{I}}}. \label{multirank} \end{equation} (3.4) In this section, we specialize the generic template Algorithm 2.1 to produce approximate solutions to (2.1) of the form (STF) with core tensor in hierarchical form as in (1.15). More precisely, the output is given in the form of a tensor $${\bf u}_\varepsilon$$ of order $$d+1$$, supported on $${\it{\Lambda}}^\varepsilon_{\rm x}\times {\it{\Lambda}}^\varepsilon_1 \times\cdots\times{\it{\Lambda}}^\varepsilon_d$$ with finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$. This tensor is given (assuming $$\mathcal{D}$$ as in (3.3)) in the following form: let $$r^\varepsilon_i := {\mathrm{rank}}_{\{i\}}({\bf u}_\varepsilon)$$ for $$i \in \hat {\mathcal{I}}$$ and $$\tilde r^\varepsilon_i := {\mathrm{rank}}_{\{i+1,\ldots, d\}}({\bf u}_\varepsilon)$$ for $$i = 1,\ldots,d-1$$, then \begin{align*} {\bf u}_{\varepsilon, \lambda,\nu_1,\ldots,\nu_d} &= \sum_{k_{\rm x}=1}^{r^\varepsilon_{\rm x}} \cdots \sum_{k_d=1}^{r^\varepsilon_d} \sum_{\ell_1=1}^{\tilde r^\varepsilon_1}\cdots\sum_{\ell_{d-1}=1}^{\tilde r^\varepsilon_{d-1}} {\mathbf{M}}^{(1),\varepsilon}_{k_{\rm x},\ell_1} \,{\mathbf{M}}^{(2),\varepsilon}_{\ell_1,k_1,\ell_2} \cdots {\mathbf{M}}^{(d-1),\varepsilon}_{\ell_{d-2},k_{d-2},\ell_{d-1}} \\ &\quad\times {\mathbf{M}}^{(d),\varepsilon}_{\ell_{d-1},k_{d-1},k_d} \, {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \, {\bf U}^{(1),\varepsilon}_{k_1, \nu_1}\cdots {\bf U}^{(d),\varepsilon}_{k_d, \nu_d}, \quad \lambda\in {\it{\Lambda}}^\varepsilon_{\rm x},\; \nu_i \in {\it{\Lambda}}^\varepsilon_i. \end{align*} The adaptive scheme identifies, in an intertwined fashion, the ranks $$r^\varepsilon_i$$, $$\tilde r^\varepsilon_i$$, the sets $${\it{\Lambda}}^\varepsilon_i$$, as well as the coefficient tensors $${\mathbf{M}}^{(i),\varepsilon}$$ and $${\bf U}^{(i),\varepsilon}$$ of $${\bf u}_\varepsilon$$. The function represented by $${\bf u}_\varepsilon$$ has precisely the form (STF), where \[ u^{{\rm x}}_{k_{\rm x}} = \sum_{\lambda \in {\mathcal{S}}} {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \psi_\lambda, \quad u^{{\rm y},i}_{k_i} = \sum_{n \in {\mathcal{G}}_i} {\bf U}^{(i),\varepsilon}_{k_i, n} L_n . \] The hierarchical format can offer substantially more favourable complexity characteristics for large $$d$$ than (3.2). The left singular vectors of the involved matricizations yield a hierarchical singular value decomposition (Grasedyck, 2010). We refer also to Falcó & Hackbusch (2012), Grasedyck et al. (2013), Hackbusch & Kühn (2009), Hackbusch (2012) and Kolda & Bader (2009) for detailed expositions regarding the finitely supported case (see also Oseledets & Tyrtyshnikov, 2009; Oseledets, 2011 for the related tensor train representation) and to Bachmayr & Dahmen (2015) for analogous results of tensors in sequence spaces, with notation analogous to this article. For quantifying the approximability of tensors on $${\Large{\times}}_{i \in \hat{\mathcal{I}}}{\mathcal{G}}_i$$ in terms of the best selection of finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$ as above, a pivotal role is played by the quantitites \begin{equation} \label{contr} \pi^{(i)}({\bf v}) \,{:=}\, \bigl(\pi^{(i)}_{\nu_i}({\bf v})\bigr)_{\nu_i\in {\mathcal{G}}_i},\quad \pi^{(i)}_{\mu}({\bf v}) \,{:=}\, \left(\sum_{\substack{\nu\in{\mathcal{F}}\\ \nu_i=\mu}} |{\bf v}_\nu|^2\right)^{1/2} , \quad i \in \hat{\mathcal{I}} = \{{\rm x}, 1, \ldots, d\}. \end{equation} (3.5) They are introduced in Bachmayr & Dahmen (2015) and called contractions in analogy to the terminology in tensor analysis. An efficient evaluation (without any $$d$$-dimensional summations) is possible due to the relation \begin{equation} \label{sumN} \pi^{(i)}_{\mu}({\bf v})= \left(\sum_{k}|{\bf U}^{(i)}_{k,\mu}|^2|\sigma^{(i)}_k|^2\right)^{1/2}, \end{equation} (3.6) where $$\sigma^{(i)}_k$$ are the mode-$$i$$ singular values of $${\bf v}$$. As in our previous notation, we abbreviate $${\mathrm{supp}}_i {\bf v}:= {\mathrm{supp}}\bigl(\pi^{(i)}({\bf v})\bigr)$$, $$i \in \hat{\mathcal{I}}$$. 3.1 Adaptive scheme In this case, we consider Algorithm 2.1 with the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ for the hierarchical format as given in Bachmayr & Dahmen (2015, Remark 15). $${\small{\text{RECOMPRESS}}}$$ is based on a truncation of a hierarchical singular value decomposition up to a prescribed accuracy $$\eta>0$$, which can be ensured based on the $$\ell^2$$-norm of omitted singular values of matricizations. We denote this operation by $$\hat{\mathrm{P}}_{\eta}$$. As shown in Grasedyck (2010), it satisfies the quasi-optimality property \begin{equation}\label{recompress quasiopt} {\lVert{ {\bf v} - \hat{\mathrm{P}}_{\eta}({\bf v})}\rVert} \leq \sqrt{2d-3} \inf \bigl\{ {\lVert{{\bf v} - {\bf w}}\rVert} \colon {{\mathrm{rank}}({\bf w})\leq {\mathrm{rank}}(\hat{\mathrm{P}}_{\eta}({\bf v}))} \bigr\}, \end{equation} (3.7) with the inequality between ranks as defined in (3.4) to be understood componentwise. $${\small{\text{COARSEN}}}$$ retains the degrees of freedom for each mode that correspond to the largest contractions (3.5). Let $$(\mu^*_{i,k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{(i)}_{\mu^*_{i,k}}({\bf v}))_{k\in{\mathbb{N}}}$$ is nonincreasing. Denote for $${\it{\Lambda}} \subset {\mathcal{S}}\times{\mathcal{F}}$$ by $${\mathrm{R}_{{\it{\Lambda}}}}{\bf v}$$ the array obtained by retaining all entries of $${\bf v}$$ corresponding to indices in $${\it{\Lambda}}$$, while replacing all others by zero. Given $$\eta >0$$, we define the product set \[ {\it{\Lambda}}(\eta)= {\Large{\times}}_{i \in \hat{\mathcal{I}}}\{ \mu^*_{i,k} \colon k \leq N_i\}, \] where $$N_i$$, $$i\in \hat{\mathcal{I}}$$, are chosen to such that $$\sum_{i \in \hat{\mathcal{I}}} N_i$$ is minimal subject to the condition \begin{equation} \label{hatC} \left( \sum_{i \in \hat{\mathcal{I}}} \sum_{k > N_i\vphantom{\hat{\mathcal{I}}}}|\pi^{(i)}_{\mu^*_{i,k}}({\bf v})|^2 \right)^{1/2} \leq \eta. \end{equation} (3.8) Noting that the left side in (3.8) is an upper bound for $${\lVert{{\bf v} - {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}}\rVert}$$, we define $${\small{\text{COARSEN}}}$$ as a numerical realization of $$\hat{\mathrm{C}}_{\eta}{\bf v} := {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}$$, for which one has an analogous quasi-optimality property as in (3.7) with constant $$\sqrt{d}$$. Furthermore, $${\mathbf{A}}$$ as defined in (2.6) is in this case of finitely many parameters a finite sum of Kronecker product operators, \[ {\mathbf{A}} = \sum_{j=0}^d {\mathbf{A}}_j \otimes {\mathbf{M}}_j, \] which considerably simplifies the construction of the corresponding routine $${\small{\text{APPLY}}}$$. The action of $${\mathbf{A}}$$ can thus increase each hierarchical rank of its argument at most by a factor of $$d+1$$. Remark 3.1 In contrast to the case considered in Bachmayr & Dahmen (2016), here the Hilbert space $${\mathcal{H}} =V \otimes L^{2}({Y})$$ on which the problem is posed is endowed with a cross norm. As a consequence, the isomorphism that takes $$v\in {\mathcal{H}}$$ to its coefficients $${\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with respect to the tensor product basis is of Kronecker rank one. The original low-rank structure (1.1) of $$A(y)$$ is therefore preserved in the $$\ell^{2}$$-representation (2.6) of the problem. Consequently, the routine $${\small{\text{APPLY}}}$$ that adaptively approximates the action of $${\mathbf{A}}$$ can be obtained following the generic construction given in Bachmayr & Dahmen (2015), provided that the operators $${\mathbf{A}}_j$$ and $${\mathbf{M}}_j$$ acting on each mode have the required compressibility properties. Recall that by (2.13), the infinite matrices $${\mathbf{M}}_j$$ are bidiagonal and, hence, do not require any further approximation. To use the construction of Bachmayr & Dahmen (2015), we thus only need that the operators $${\mathbf{A}}_0, \ldots, {\mathbf{A}}_d$$ acting on the spatial variables are $$s^*$$-compressible. 3.2 Convergence analysis Our complexity results aim at the following type of statements: given a certain approximability of the solution, the algorithm recovers the corresponding convergence rates without their explicit knowledge. To describe these approximability properties, we now recall the definition of approximation classes to quantify the convergence of hierarchical low-rank approximations from Bachmayr & Dahmen (2015), in terms of the hierarchical rank defined by (3.4). Let $$\gamma = \bigl(\gamma(n)\bigr)_{n\in {\mathbb{N}}_0}$$ be positive and strictly increasing with $$\gamma(0)=1$$ and $$\gamma(n)\to\infty$$ as $$n\to\infty$$, for $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ let \[ {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, \sup_{r\in{\mathbb{N}}_0} \gamma({r})\,\inf_{{\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r} {\lVert{{\bf v} - {\bf w}}\rVert} , \] where $$\inf \{ {\lVert{{\bf v} - {\bf w}}\rVert}\colon {\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r\}$$ – that is, the errors of approximation by hierarchical ranks at most $$r$$ – can be bounded in terms of the hierarchical singular values $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$ for $$\alpha\in{\cal D}$$. We introduce the approximation classes \[ {\mathcal{A}_\mathcal{H}({\gamma})} \,{:=}\, \bigl\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}) : {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} <\infty \bigr\} , \quad {\lVert{{\bf v}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, {\lVert{{\bf v}}\rVert} + {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}. \] We restrict our considerations to $$\gamma$$ that satisfy \begin{equation*} \rho_\gamma \,{:=}\, \sup_{n\in{\mathbb{N}}} {\gamma}(n)/{\gamma}(n-1)<\infty, \end{equation*} which corresponds to a restriction to at most exponential growth; our considerations would still apply for faster growth, but lead to less sharp results. For an approximation $${\bf v}$$ of bounded support to $${\bf u}$$, the number of nonzero coefficients $$\#{\mathrm{supp}}_i {\bf v}$$ required in each tensor mode to achieve a certain accuracy depends on the best $$n$$-term approximability of the sequences $$\pi^{(i)}({\bf u})$$. This approximability by sparse sequences is quantified by the classical approximation classes$${\mathcal{A}^s} = {\mathcal{A}^s}(\mathcal{J})$$, where $$s>0$$ and $$\mathcal{J}$$ is a countable index set, comprised of all $${\bf w}\in \ell_2(\mathcal{J})$$ for which the quasi-norm \begin{equation} \label{Asdef} {\lVert{\mathbf{{\bf w}}}\rVert}_{{\mathcal{A}^s}(\mathcal{J})} := \sup_{N\in{\mathbb{N}}_0} (N+1)^s \inf_{\substack{{\it{\Lambda}}\subset \mathcal{J}\\ \#{\it{\Lambda}}\leq N}} {\lVert{{\bf w} - {\mathrm{R}_{{\it{\Lambda}}}} {\bf w}}\rVert} \end{equation} (3.9) is finite. In particular, if $$\pi^{(i)}({\bf u}) \in {\mathcal{A}^s}(\mathcal{G}_i)$$ then these sequences can be approximated within accuracy $$\eta$$ by finitely supported sequences with $${\cal \rm O}(\eta^{-1/s})$$ nonzero entries. In what follows, we do not explicitly specify the index set in the spaces $${\mathcal{A}^s}$$ when this is clear from the context. We analyse the complexity of the algorithm under the following benchmark assumptions, see the discussion in Section 6.1. Assumption 3.2 For the hierarchical tensor approximation in the case (3.1) of $$d$$ parametric variables, we assume the following: (i) $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s}({\mathcal{G}}_i)$$, $$i\in\hat{\mathcal{I}}$$, for an $$s>0$$. (ii) $${\bf u}, {\mathbf{f}} \in {\mathcal{A}}_{\mathcal{H}}(\gamma)$$, where $$\gamma(n) \,{:=}\, {\it \,{e}}^{\bar c n^{1/\bar b}}$$ with $$\bar b\geq 1$$, $$\bar c>0$$. (iii) The $${\mathbf{A}}_j$$, $$j \in \hat{\mathcal{I}}$$, are $$s^*$$-compressible for an $$s^*>s$$, and hence there exist matrices $${\mathbf{A}}_{j,n}$$ with $$\alpha_{j,n} 2^n$$ entries per row and column and such that $${\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n}}\rVert} \leq \beta_{j,n} 2^{-s n}$$, and where the sequences $${\boldsymbol{\alpha}}_j = (\alpha_{j,n})_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}}_j = (\beta_{j,n})_{n\in{\mathbb{N}}_0}$$ are summable. We will use the above assumptions as a reference point for the scaling with respect to $$\varepsilon$$ of the computational complexity. Note that Assumption 3.2(i),(ii) mean in particular that best $$N$$-term approximations of $$\pi^{(i)}({\bf u})$$ converge at least as $${\mathcal{O}} (N^{-s})$$ for $$i\in\hat{\mathcal{I}}$$ and low-rank approximations $${\bf u}_n$$ of $${\bf u}$$ with $${\lvert{{\mathrm{rank}}({\bf u}_n)}\rvert}_\infty\,{\leq}\,n$$ converge as $${\mathcal{O}}(\!{\it {e}}^{-\bar c n^{1/\bar b}})$$. Assumption 3.2(iii) needs to be established for each given problem. Under these conditions, it is possible to realize a routine $${\small{\text{RHS}}}$$ that satisfies the following conditions: for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta:={\small{\text{RHS}}}(\eta)$$, \begin{equation}\label{rhsass1} \begin{gathered} \sum_{i\in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim d \eta^{-\frac {1}{s}} \left( \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}}, \quad \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^s} \lesssim d^{1+\max\{1,s\}} \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s}, \\ {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \lesssim \bigl(d^{-1} \ln ({\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}_{\mathcal{H}}(\gamma)}/\eta) \bigr)^{\bar{b}}, \end{gathered} \end{equation} (3.10) with hidden constants that do not depend on $$d$$. Such approximations can be obtained, for instance, by combining $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with parameters adjusted as in Bachmayr & Dahmen (2015, Theorem 7). Assuming full knowledge of $${\mathbf{f}}$$, $${\small{\text{RHS}}}$$ can be realized such that the required number of operations is bounded, with $$C>0$$ independent of $$d$$, by \begin{equation}\label{rhsass2} C\left( d {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty^3 + {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \sum_{i \in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\mathbf{f}}_\eta) \right)\!, \end{equation} (3.11) and we shall make this idealized assumption in the subsequent analysis. Note that these requirements greatly simplify when the corresponding right-hand side $$f$$ is independent of the parametric variable. To compare different parametric dimensionalities $$d$$ in the complexity bounds, we additionally need a specific reference family of $$d$$-dependent problems. We introduce the following model assumptions, which we shall also consider in more detail for a concrete class of problems in Section 6. Assumption 3.3 For the quantities in Assumption 3.2, in addition, let the following hold: (i) There exist $$C_1, c_1>0$$ independent of $$d$$ such that \begin{align*} &\max\bigl\{{\lVert{\pi^{({\rm x})}({\bf u})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\bf u})}\rVert}_{{\mathcal{A}}^s}, \ldots,{\lVert{\pi^{(x)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s},\ldots,\\ &\quad\bar c^{-1} ,\, {\lVert{{\bf u}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}, {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}\bigr\}\leq C_1 d^{c_1} . \end{align*} (ii) There exists $$C_2>0$$ independent of $$d$$ such that \[ \max\bigl\{ \bar b,\, {\lVert{{\mathbf{A}}_{\rm x}}\rVert}, {\lVert{{\mathbf{A}}_1}\rVert}, \ldots,{\lVert{\boldsymbol{\alpha}_{\rm x}}\rVert}_{\ell^1}, {\lVert{\boldsymbol{\alpha}_1}\rVert}_{\ell^1}, \ldots, {\lVert{\boldsymbol{\beta}_{\rm x}}\rVert}_{\ell^1}, {\lVert{\boldsymbol{\beta}_1}\rVert}_{\ell^1}, \ldots\} \leq C_2. \] It needs to be emphasized that Algorithm 2.1 does not require any knowledge on the approximability of $${\bf u}$$ stated in Assumptions 3.2 and 3.3; these merely describe a model case for complexity bounds and their dependence on $$d$$. Recall from Remark 2.1 that Algorithm 2.1 always produces $${\bf u}_\varepsilon$$, satisfying $${\lVert{{\bf u}-{\bf u}_\varepsilon}\rVert}\leq \varepsilon$$ in finitely many steps. Theorem 3.4 Let Assumption 3.2 hold, let $$\alpha>0$$ and let $$\kappa_1,\kappa_2,\kappa_3$$ in Algorithm 2.1 be chosen as \begin{gather*} \kappa_1 = \bigl( 1 + (1 +\alpha)(\sqrt{d} + \sqrt{2d-3} + \sqrt{d(2d-3)}) \bigr)^{-1}, \\ \kappa_2 = \sqrt{2d-3}(1+\alpha) \kappa_1,\quad \kappa_3 = \sqrt{d}(1+\sqrt{2d-3})(1+\alpha) \kappa_1. \end{gather*} Then for each $$\varepsilon>0$$ with $$\varepsilon < \varepsilon_0$$, the approximation $${\bf u}_\varepsilon$$ produced by Algorithm 2.1 satisfies \begin{equation}\label{htrankbound} {\lvert{{\mathrm{rank}}({\bf u}_\varepsilon)}\rvert}_\infty \leq \, \bigl( \bar c^{-1} \ln\bigl[ 2(\alpha \kappa_1)^{-1} \rho_{\gamma} \,{\lVert{{\bf u}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}\,\varepsilon^{-1}\bigr] \bigr)^{ \bar b} \lesssim \biggl(\frac{{\lvert{\ln \varepsilon}\rvert} + \ln d}{\bar c}\biggr)^{\bar b} \end{equation} (3.12) as well as \begin{equation}\label{htsuppbound} \sum_{i\in\hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\bf u}_\varepsilon) \lesssim d^{1+\frac {1}{s}}\left( \sum_{i\in\hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}} \varepsilon^{-\frac {1}{s}}. \end{equation} (3.13) Let in addition Assumption 3.3 hold and let $${\small{\text{RHS}}}$$ satisfy (3.10), (3.11), then there exist $$c,C>0$$, such that the number of required operations is bounded by \begin{equation}\label{htcomplexity} C d^{c\ln d} {\lvert{\ln \varepsilon}\rvert}^{2\bar b} \varepsilon^{-\frac {1}{s}}, \end{equation} (3.14) where $$c$$ and $$C$$ depend on $$\alpha, \rho,\omega, s$$ and on the constants $$C_1,c_1,C_2$$ in Assumption 3.3. Proof. The validity of (3.12) and (3.13) follows by Bachmayr & Dahmen (2015, Theorem 7), which can be immediately applied to the result of line 11 in Algorithm 2.1. Concerning (3.14), we can apply Bachmayr & Dahmen (2015, Theorem 8) (with $$R_i = d$$ and uniform constants $$\hat C^{(i)}_{\hat \alpha}$$, $$\hat C^{(i)}_{\hat \beta}$$ and $$\hat C^{(i)}_{\tilde{\mathbf{A}}}$$ in the notation used there) to obtain, for $${\bf w}_\eta\,{:=}\, {\small{\text{APPLY}}}({\bf v};\eta)$$, \[ \#{\mathrm{supp}}_i ({\bf w}_\eta) \lesssim d^{1+s^{-1}} \eta^{-\frac {1}{s}} \left(\sum_{j \in {\mathcal{I}}_{\rm x}} {\lVert{\pi^{(i)}({\bf v})}\rVert}_{{\mathcal{A}}^s} \right)^{\frac {1}{s}} , \quad {\lVert{\pi^{(i)}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^s} \lesssim d^{1+s} {\lVert{\pi^{(i)}({\bf v})}\rVert}_{{\mathcal{A}}^s}, \] as well as $${\mathrm{rank}}({\bf w}_\eta)\leq (d+1) {\mathrm{rank}}({\bf v})$$. With these estimates, (3.14) follows exactly as in Bachmayr & Dahmen (2015, Theorem 9). □ For each fixed $$d$$, the produced solution $${\bf u}_\varepsilon$$ thus requires a number of parameters that is proportional to the optimal one for this type of approximation. Taking into account that in the operation count, the best approximation ranks of order $${\mathcal{O}}({\lvert{\ln \varepsilon}\rvert}^{\bar b})$$ enter at least quadratically due to orthogonalizations required in the algorithm, the number of operations in (3.14) also scales optimally with respect to $$\varepsilon$$. The dependence of the constant on the parametric dimension is subexponential, since $$d^{c\ln d} = {\it \,{e}}^{c (\ln d)^2}$$. 4. Spatial-parametric sparse approximation We now turn to the case $${\mathcal{I}}={\mathbb{N}}$$, that is, problems involving countably many parameters $$(y_j)_{j\geq 1}$$ that have decreasing influence as $$j$$ increases. Here, we consider problems of the type (1.5), \begin{equation} \label{thetaj} a(y) = \bar a + \sum_{j = 1}^\infty y_j {\theta}_j , \end{equation} (4.1) under the uniform ellipticity assumption (1.6) on $$a$$. This variant of Algorithm 2.1 is similar to the scheme proposed in Gittelson (2014), following the approach of Cohen et al. (2001, 2002). In this section, we consider a version of Algorithm 2.1 that produces $$n$$-term approximations to $$u\in L^2({Y},V)$$ in terms of the wavelet–Legendre tensor product basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$. That is, the approximation that we seek in this case is of the form (ASP), that is, \begin{equation}\label{fullsparse} u \approx u_n = \sum_{(\lambda,\nu)\in {\it{\Lambda}}_n} {\bf u}_{\lambda \nu} \psi_\lambda\otimes L_\nu, \end{equation} (4.2) where we aim to identify $${\it{\Lambda}}_n$$, which yields an error close to that of the best $$n$$-term approximation in this basis. Here, $${\small{\text{COARSEN}}}$$ performs a standard coarsening operation on a sequence, and we set $${\small{\text{RECOMPRESS}}}({\bf v};\eta) \,\,{:=}\,\, {\bf v}$$ for any $$\eta$$. The scheme thus reduces to the adaptive method of Cohen et al. (2002), which has been considered for this particular type of approximation of parametric PDEs also in Gittelson (2014). The key ingredient that remains to be described is the adaptive application of $${\mathbf{A}}$$ to representations of the form (4.2) based on its $$s^*$$-compressibility. Let $${\bf u} \in {\mathcal{A}}^s({\mathcal{S}}\times{\mathcal{F}})$$ and let $${\mathbf{A}}$$ be $$s^*$$-compressible with $$s^*>s$$ according to (2.10). Then it follows by Cohen et al. (2001, Proposition 3.8) that $${\mathbf{f}} \in {\mathcal{A}}^s$$, hence we can construct $${\small{\text{RHS}}}$$, satisfying \[ \#{\mathrm{supp}} ({\small{\text{RHS}}}(\eta)) \lesssim \eta^{-\frac {1}{s}} {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}^s}^{\frac {1}{s}},\quad {\lVert{{\small{\text{RHS}}}(\eta)}\rVert}_{{\mathcal{A}}^s} \lesssim {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}^s}. \] Moreover, by the standard construction of $${\small{\text{APPLY}}}$$ in Cohen et al. (2001, Corollary 3.10) based on the $$s^*$$-compressibility of $${\mathbf{A}}$$, the results in Cohen et al. (2002) yield the following complexity bound for the present realization of Algorithm 2.1. Theorem 4.1 Let $${\bf u}\in{\mathcal{A}}^s$$ and let $${\mathbf{A}}$$ be $$s^*$$-compressible with $$0<s<s^*$$. Then for any given $$\varepsilon>0$$, the approximation $${\bf u}_\varepsilon$$ produced by the above variant of Algorithm 2.1 operating on approximations of the form (4.2) satisfies \[ \# {\mathrm{supp}} ({\bf u}_\varepsilon ) \lesssim \varepsilon^{-\frac {1}{s}} {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s}^{\frac {1} {s}}, \quad {\lVert{{\bf u}_\varepsilon}\rVert}_{{\mathcal{A}}^s} \lesssim {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s}, \] and the number of operations is bounded up to a multiplicative constant by $$ 1 + \varepsilon^{-\frac {1}{s}} {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s}^{\frac {1}{s}} $$. We next consider the compressibility of $${\mathbf{A}}$$, which determines the range of $$s$$ for which Theorem 4.1 yields optimality, and a corresponding procedure $${\small{\text{APPLY}}}$$. In Section 6.2, we consider in further detail for which values of $$s$$ one can indeed expect $${\bf u} \in {\mathcal{A}}^s$$. 4.1 Adaptive operator application Any numerical scheme $${\small{\text{APPLY}}}$$ necessarily involves a truncation of the series (4.1). Defining for each non-negative integer $$M$$ the corresponding truncation error \begin{equation} \label{eM} e_{M} := {\left\lVert{ \sum_{j > M } {\mathbf{A}}_j \otimes {\mathbf{M}}_j }\right\rVert} \end{equation} (4.3) of replacing $${\mathbf{A}}$$ by $$\sum_{j=1}^M {\mathbf{A}}_j \otimes {\mathbf{M}}_j$$, where $$e_0 = {\lVert{{\mathbf{A}}}\rVert}$$, the decay of $$e_M$$ describes the approximability of $${\mathbf{A}}$$. We will be concerned with algebraic rates \begin{equation} \label{S} e_M\le C M^{-S}, \quad M\in {\mathbb{N}}, \end{equation} (4.4) where $$C, S>0$$ are fixed constants. Note that, in particular, our further developments do not require summability of $$({\lVert{{\theta}_j}\rVert}_{L^\infty})_{j\geq 1}$$ as assumed, e.g., in Eigel et al. (2014, 2017). A first limitation to the $$s^*$$-compressibility of $${\mathbf{A}}$$ lies in the decay of the truncation errors (4.4), which arise from replacing $${\mathbf{A}}$$ by a finite sum. This amounts to approximating all, but finitely many $${\mathbf{A}}_j$$ by zero. A second limitation is the compressibility of the remaining $${\mathbf{A}}_j$$, which depends on the particular expansion system $$({\theta}_j)_{j\in{\mathbb{N}}}$$. As we show next, a favourable $$s^*$$-compressibility result for $${\mathbf{A}}$$ (almost matching the truncation error decay (4.4)) can be obtained when $$({\theta}_j)_{j\in{\mathbb{N}}}$$ has multiscale structure, as summarized in the following set of assumptions. In Section 4.2, our findings are compared with previous results that hold under more generic assumptions. Assumption 4.2 Let $$\{\xi_\mu\}_{\mu \in {\it{\Lambda}}}$$ be a system of compactly supported multilevel basis functions with $$\operatorname{diam}({\rm supp}(\xi_\mu))\sim 2^{-|\mu|}$$ and $${\lVert{\xi_\mu}\rVert}_{L^\infty(D)} = 1$$. With $$(\mu_j)_{j\geq 1}$$ an enumeration of $${\it{\Lambda}}$$ by increasing level and some fixed $$\alpha>0$$, let \begin{equation} \label{multiscaleexpansion} {\theta}_j = c_{\mu_j} \xi_{\mu_j}, \quad \text{ where $c_{\mu_j} = 2^{-\alpha {\lvert{\mu_j}\rvert}}$.} \end{equation} (4.5) To simplify notation, let $$c_{\mu_0}\,{\,{:=}\,}\,1$$, $$\xi_{\mu_0}\,{\,{:=}\,}\,\bar a$$, and $${\lvert{\mu_0}\rvert}\,{\,{:=}\,}\,0$$. Note that for what follows, it would in fact suffice to assume $$c_{\mu} \sim 2^{-\alpha {\lvert{\mu}\rvert}}$$, with a constant that is uniform over $${\it{\Lambda}}$$, but we assume equality to simplify the exposition. Under Assumption 4.2, \begin{equation} \label{multiscale eMbound} e_M \leq \sup_{y\in Y} {\left\lVert{\sum_{j>M} y_j \theta_j }\right\rVert}_{L^\infty(D)} \leq \sum_{\ell \geq \max\{ {\lvert{\mu_j}\rvert} \colon j \leq M\}} 2^{-\alpha\ell} \lesssim 2^{-\alpha \max\{ {\lvert{\mu_j}\rvert} \colon j \leq M\}} \lesssim M^{-\frac{\alpha}{m}}, \end{equation} (4.6) and we thus obtain (4.4) with $$S = \alpha/m$$. We now give a new result for the compressibility of $${\mathbf{A}}$$ arising from a wavelet-type parametrization as in (4.5). As we shall see, making use of a multilevel structure in the parametrization, one can obtain substantially better compressibility of $${\mathbf{A}}$$ than under the more generic assumptions used in Gittelson (2014). The result is based on compressibility properties of the corresponding matrices $${\mathbf{A}}_j$$. These are analysed in Appendix A, where the assumptions of the following Proposition are established under conditions on the regularity of $$\xi_\mu$$ and of the spatial wavelets. Proposition 4.3 Let $$\{ {\theta}_j \}_{j\geq 1}$$ and $$c_{\mu_j}$$ be as in Assumption 4.2, and let us assume that there exist a $$\tau > \frac{\alpha}{m}$$ and matrices $${\mathbf{A}}_{j,n}$$, $$n\in {\mathbb{N}}_0$$, where $${\mathbf{A}}_{j,0} = 0$$, with the following properties: (i) One has \begin{equation} \label{compress2} \|{\mathbf{A}}_j - {\mathbf{A}}_{j,n}\|\lesssim c_{\mu_j} 2^{-\tau n},\quad n\in{\mathbb{N}}_0, \end{equation} (4.7) where the hidden constant is independent of $$j,n$$. (ii) The number of nonvanishing entries in each column of $${\mathbf{A}}_{n,j}$$ does not exceed a uniform constant multiple of $$\bigl(1+|\mu_j|^q\bigr)2^n$$, for some $$q\geq 1$$. Then $${\mathbf{A}}$$ is $$s^*$$-compressible with \begin{equation}\label{sparsecomprlimit} s^* = \frac{\alpha}{m} \frac{2\tau}{1+2\tau}. \end{equation} (4.8) Specifically, it is shown in Appendix A that the above assumptions can be realized for arbitrarily large $$\tau$$ by choosing the functions $$\xi_\mu$$ and the spatial wavelets sufficiently smooth, the latter having sufficiently many vanishing moments. By the above result, we thus obtain that $${\mathbf{A}}$$ is then $$s^*$$-compressible, where $$s^*<\alpha/m$$ comes as close to $$\alpha/m$$ as one wishes when $$\tau$$ is suitably large. As discussed in further detail in Section 6, this means that the $$n$$-term approximability of $${\bf u}$$ can be essentially fully exploited by the adaptive scheme. Proof. We construct approximations $${\mathbf{A}}_{\mathbf{n}}$$ of $${\mathbf{A}}$$ by choosing sequences $$\mathbf{n} = (n_{j})_{j\geq 0}$$ of bounded support and defining $${\mathbf{A}}_{\mathbf{n}}\colon \ell^{2}({\mathcal{S}}\times{\mathcal{F}})\to \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ by \begin{equation}\label{defAn} {\mathbf{A}}_\mathbf{n} \,{:=}\, \sum_{j\geq 0} {\mathbf{A}}_{j, n_{j}} \otimes {\mathbf{M}}_j. \end{equation} (4.9) Our aim is to find such $$\mathbf{n}^J$$ such that the corresponding $${\mathbf{A}}^J := {\mathbf{A}}_{\mathbf{n}^J}$$ satisfy \begin{equation} {\lVert{{\mathbf{A}} - {\mathbf{A}}^J}\rVert} \lesssim J^{-2}2^{-s J},\quad J\in{\mathbb{N}}, \end{equation} (4.10) with $$s<s^*$$ and $$s^*$$ as in the assertion, and such that the number of nonzero entries in the each row and column of $${\mathbf{A}}^{J}$$ is bounded by a fixed constant multiple of $$J^{-2}2^J$$. We take $$L\in{\mathbb{N}}$$ arbitrary but fixed. Recall that we assume $$\mu_j$$ to be ordered by increasing level, that is, $${\lvert{\mu_{j+1}}\rvert} \geq {\lvert{\mu_j}\rvert}$$. We now consider $$(n_j)_{j\geq 0}$$, satisfying $$n_j = 0$$ for $$j > M_L \,{:=}\, {\lceil{L^{2m/\alpha}2^{mL}}\rceil}$$, so that $${\mathbf{A}}_{j,n_j} = 0$$ for $$j > M_L$$. Since $$e_{M_L} \lesssim L^{-2} 2^{-\alpha L}$$ by (4.4) and (4.6), we obtain \begin{equation}\label{operr_triineq} {\lVert{{\mathbf{A}} - {\mathbf{A}}_\mathbf{n}}\rVert} \lesssim {\left\lVert{ \sum_{j=0}^{M_L} {\bigl( {\mathbf{A}}_j - {\mathbf{A}}_{j,n_j} \bigr)} \otimes {\mathbf{M}}_j }\right\rVert} + L^{-2} 2^{-\alpha L}. \end{equation} (4.11) Within each level $$\ell \geq 0$$, that is, for each $$\mu$$ with $${\lvert{\mu}\rvert}=\ell$$, there are only finitely many $$\mu'$$ with $${\lvert{\mu'}\rvert}=\ell$$ such that $${\mathrm{supp}}\,\, \xi_\mu \cap {\mathrm{supp}} \,\, \xi_{\mu'} \neq \emptyset$$. Since the images of $${\mathbf{A}}_j$$ corresponding to $$\xi_{\mu_j}$$ with disjoint support are orthogonal, we obtain \begin{equation} {\left\lVert{ \sum_{j=0}^{M_L} {\bigl( {\mathbf{A}}_j - {\mathbf{A}}_{j,n_j} \bigr) } \otimes {\mathbf{M}}_j }\right\rVert} \lesssim \sum_{\ell=0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} \biggl( \sum_{j\colon{\lvert{\mu_j}\rvert} = \ell} {\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n_j}}\rVert}^2 \biggr)^{\frac {1}{2}} , \end{equation} (4.12) where the constant depends on the maximum number of $$\xi_\mu$$ of overlapping support on each level. Taking \[ n_j = n_\ell = \left\lceil{ {\frac{m \ell}{2\tau} + \frac{\alpha}{\tau} \left(L+ \frac{2}{\alpha}\log_2 L - \ell\right)} + \frac {1}{\tau} \log_2(1+\ell)^2 } \right\rceil \] for $$\mu_j$$ of level $$\ell$$ and recalling that for such $$j$$ we have $$|c_{\mu_j}|=2^{-\alpha\ell}$$ gives \begin{equation} \label{errL} \sum_{\ell=0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} \biggl( \sum_{j\colon{\lvert{\mu_j}\rvert} = \ell} {\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n_j}}\rVert}^2 \biggr)^{\frac {1}{2}} \lesssim \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} 2^{\frac{m}{2} \ell} 2^{-\alpha \ell} 2^{-\tau n_\ell} \lesssim L^{-2} 2^{-\alpha L}. \end{equation} (4.13) Let $$N_L$$ be the resulting maximum number of entries per row and column in $${\mathbf{A}}_{\mathbf{n}}$$, then \begin{align} N_L &\lesssim \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} (1+\ell^q)2^{m\ell} 2^{n_\ell} \lesssim L^{\frac{2}{\alpha}}2^{\frac{\alpha}{\tau} L } \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} (1+\ell)^{q+\frac {2}{\tau}} 2^{(1 + \frac {1}{2\tau}) m \ell - \frac {\alpha}{\tau} \ell} \nonumber\\ & \lesssim L^{q + \frac{2(1+m)}{\alpha}} 2^{\frac{1+2\tau}{2\tau} m L}, \end{align} (4.14) where we have used $$\tau > \alpha/m$$. We now fix $$s\,{>}\,0$$ with $$s\,{<}\,t\,{\,{:=}\,}\,\frac{\alpha}{m} \frac{2\tau}{1+2\tau}$$ and take $$J\,{\,{:=}\,}\,{\lceil{\frac{t}{s} \frac{1+2\tau}{2\tau} m L}\rceil}\,{=}\,{\lceil{\frac {\alpha}{s} L}\rceil}$$ and $$\mathbf{n}^J\,{\,{:=}\,}\,\mathbf{n}$$. Since then $$N_L \lesssim J^{q+ \frac{2(1+m)}{\alpha}}2^{\frac{s}{t} J}$$ we see that $$N_L\lesssim J^{-2}2^J$$ with a constant that depends on $$\alpha, m$$ and increases when $$s$$ approaches $$t$$. It immediately follows from (4.13) that \begin{equation} {\lVert{{\mathbf{A}} - {\mathbf{A}}^J}\rVert} \lesssim J^{-2}2^{- s J}, \end{equation} (4.15) with a constant depending on $$ m$$. Thus, $${\mathbf{A}}$$ is $$s^*$$-compressible with $$s^* = t$$. □ 4.2 Coefficient expansions In our compressibility result Proposition 4.3 for $${\mathbf{A}}$$, we have made use of the multiscale structure of the expansion functions $$\theta_j$$. Let us now briefly compare this with previous results for globally supported $$\theta_j$$ as they arise in Karhunen–Loève expansions. In fact, for certain problems, one has equivalent expansions in either globally supported or wavelet-type $$\theta_j$$. This is demonstrated, for instance, in Bachmayr et al. (2016) for log-normal diffusion coefficients with Gaussian random fields of Matérn covariance. To illustrate the basic issues in approximation $${\mathbf{A}}$$ in the case of typical globally supported $$\theta_j$$, we consider the following spatially one-dimensional setting with $$D=]0,1[$$ as in Gittelson (2014): for a monotonically decreasing positive sequence $$(c_j)_{j\in{\mathbb{N}}}$$ with $$\sum_{j\geq 1} c_j \leq \frac {1}{2}$$, take $${\theta}_j = c_j \sin(j\pi \cdot)$$, so that \begin{equation} \label{cosexp} a(y) = 1 + \sum_{ j \geq 1 } y_j c_j \sin(j\pi \cdot). \end{equation} (4.16) This model is representative in that such increasingly oscillatory $${\theta}_j$$ as $$j\to\infty$$ also arise in more general Karhunen–Loève expansions. As a concrete example, with $$\beta> 1$$, let $$c_j := \delta j^{-\beta}$$ with $$\delta>0$$ sufficiently small. Then, $${\lVert{{\mathbf{A}}_j}\rVert} \sim c_j$$, from which we only obtain \[ e_M \lesssim M^{-\beta + 1}, \] and therefore $$S = \beta - 1$$. As shown in Gittelson (2014), taking the compression of the individual $${\mathbf{A}}_j$$ into account, one obtains $$s^*$$-compressibility of $${\mathbf{A}}$$ with $$s^* = \frac{1}{2}(\beta - 1) = \frac {1}{2} S$$. In this case, instead of conditions (i) and (ii) in Proposition 4.3, one has $$ {\lVert{{\mathbf{A}}_j -{\mathbf{A}}_{j,n}}\rVert} \lesssim j^{-(\alpha+\frac {1}{2})} 2^{-\gamma n}$$ with $${\mathcal{O}}( j ( 1 + \log_2 j) 2^n)$$ entries per row and column. We comment further in Remark A.3 on how this leads to the limitation to $$s^* = \frac{1}{2}(\beta - 1)$$. 5. Low-rank approximation We now turn to an adaptive method for finding low-rank approximations of the form (LR), based on the Hilbert–Schmidt decomposition of $${\bf u}$$. These approximations are of the form \begin{align}\label{fulllrcoeff} {\bf u} \approx \sum_{k=1}^n {\bf u}^{\rm x}_{k,\lambda} \otimes {\bf u}^{\rm y}_{k,\nu}, \end{align} (5.1) with finitely supported vectors $${\bf u}^{\rm x}_{k}$$, $${\bf u}^{\rm y}_{k}$$, $$k=1,\ldots,n$$. As in the scheme considered in Section 3, adaptivity in rank and in the basis expansions is intertwined by iteratively improving low-rank expansions of varying ranks, while at the same time identifying finitely supported approximations in $$\ell^{2}({\mathcal{S}})$$ and $$\ell^{2}({\mathcal{F}})$$, both based on approximate residual evaluations. In principle, the results of Section 3 concerning a full separation of variables based on hierarchical tensor formats could be applied with any finite truncation dimension $$d$$. However, assuming (4.4), a total error of order $$\varepsilon$$ requires $$\,{\rm d}(\varepsilon) \sim \varepsilon^{-1/S}$$. As a consequence, due to the $$d$$-dependent quasi-optimality (3.7) of the hierarchical SVD truncation, we can only obtain a highly suboptimal complexity bound in (3.14) for the hierarchical format. Concerning low-rank decompositions, we therefore concentrate here on a more basic case, namely, a separation of spatial and parametric variables as in (5.1). Since this separation also occurs in any hierarchical representation, the resulting Hilbert–Schmidt rank provides a lower bound for the hierarchical ranks that are required in a hierarchical format involving further matricizations. The efficiency of the obtained low-rank approximations is measured against the singular value decomposition of the Hilbert–Schmidt operator $$\ell^{2}({\mathcal{F}})\to \ell^{2}({\mathcal{S}})$$ induced by $${\bf u}$$, \begin{align}\label{hsapprox} {\bf u} = \sum_{k=1}^\infty \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k\!, \end{align} (5.2) where $$\sigma_k \geq 0$$, $$\{ {\bf U}^{({\rm x})}_k \}$$, $$\{ {\bf U}^{({\rm y})}_k \}$$ are orthonormal in $$\ell^{2}({\mathcal{S}})$$ and $$\ell^{2}({\mathcal{F}})$$, respectively, and \begin{align}\label{hsoptcoeff} {\left\lVert{ {\bf u} - \sum_{k=1}^r \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k}\right\rVert}^2 = \sum_{k>r} \sigma_k^2 = \min_{{\mathrm{rank}}({\bf w})\leq r} {\lVert{{\bf u}-{\bf w}}\rVert}^2. \end{align} (5.3) Ideally, the ranks of computed approximations should be comparable to the minimum $$r$$ for achieving the same error in (5.3). Moreover, we quantify in terms of $$\#\left( \bigcup_{k=1}^n {\mathrm{supp}}\,\, {\bf u}^i_{k} \right)$$, $$i={\rm x},{\rm y}$$ the number of nonzero coefficients in (5.1). The reasons for not considering each individual $$\# {\mathrm{supp}}\,\, {\bf u}^i_{k}$$ separately are mainly algorithmic: because the numerical methods require orthogonalizations of the sets $$( {\bf u}^i_{k} )_{k=1,\ldots,n}$$, their complexity is determined by the unions of the respective supports. To understand the joint approximability of the infinite vectors $${\bf U}^{(i)}_k$$, $$i={\rm x},{\rm y}$$, in (5.2) serving as our reference point, we consider the particular contractions defined, for $${\bf v} \in \ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, by $$\pi^{({\rm x})}({\bf v}) = (\pi^{({\rm x})}_\lambda({\bf v}))_{\lambda\in{\mathcal{S}}}$$, $$\pi^{({\rm y})}({\bf v}) = (\pi^{({\rm y})}_\nu({\bf v}))_{\nu\in{\mathcal{F}}}$$ with \begin{align} \label{contrxy} \pi^{({\rm x})}_\lambda({\bf v}) := \left(\sum_{\nu\in {\mathcal{F}}}|{\bf v}_{\lambda,\nu}|^2\right)^{1/2},\quad \pi^{({\rm y})}_\nu({\bf v}) := \left(\sum_{\lambda\in {\mathcal{S}}}|{\bf v}_{\lambda,\nu}|^2\right)^{1/2}. \end{align} (5.4) Note that $$\pi^{({\rm y})}_\nu({\bf u})$$ is uniformly proportional to the norm of the corresponding Legendre coefficient of $$u$$, that is, $$\pi^{({\rm y})}_\nu({\bf u}) \sim {\lVert{u_\nu}\rVert}_V$$. Let $$(\lambda^*_{k})_{k \in {\mathbb{N}}}$$ and $$(\nu^*_{k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{({\rm x})}_{\lambda^*_{k}}({\bf v}))_{k\in{\mathbb{N}}}$$ and $$(\pi^{({\rm y})}_{\nu^*_{k}}({\bf v}))_{k\in{\mathbb{N}}}$$ are nonincreasing, respectively. Then, the singular values $$\sigma_k({\bf v})$$ of $${\bf v}$$ satisfy \begin{equation} \label{svpibound} \sigma_k({\bf v}) \, \leq \, \pi^{({\rm x})}_{\lambda^*_k}({\bf v}),\,\pi^{({\rm y})}_{\nu^*_k}({\bf v}), \quad k \in {\mathbb{N}}. \end{equation} (5.5) In view of our results, for Example 6.4 (and the further numerical experiments of Example 6.10), we cannot generally expect faster than algebraic decay of singular values, which we quantify in terms of classes $${\mathcal{A}_\mathcal{H}({\gamma})}$$ specialized to tensors of order 2 and to the specific sequence $$\gamma(k):= (1+k)^{\bar s}$$. This yields the approximation classes \[ {{\it{\Sigma}}^{\bar s}} :=\left\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}): \sup_{k\in{\mathbb{N}}}(1+k)^{\bar s}\left(\sum_{j>k}\sigma_k({\bf v})^2\right)^{1/2} =: \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}<\infty \right\}. \] The approximate sparsity of the sequences $$\pi^{({\rm x})}({\bf v})$$, $$\pi^{({\rm y})}({\bf v})$$ is measured in terms of the largest $$s_{\rm x}, s_{\rm y}>0$$, such that $$\pi^{({\rm x})}({\bf v}) \in {\mathcal{A}}^{s_{\rm x}}({\mathcal{S}})$$, $$\pi^{({\rm y})}({\bf v}) \in {\mathcal{A}}^{s_{\rm y}}({\mathcal{F}})$$ according to (3.9). For the low-rank approximation, the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ used in Algorithm 2.1 are based on the specialization to tensors of order 2 of the routines described in the previous section. $${\small{\text{RECOMPRESS}}}({\bf v};\eta)$$ is a numerical realization of $$\hat{\mathrm{P}}_{\eta}({\bf v})$$, which we define as the operator producing the best low-rank approximation of $${\bf v}$$ with error at most $$\eta$$ with respect to $${\lVert{\cdot}\rVert}$$, obtained by truncating the singular value decomposition of its argument. The routine $${\small{\text{COARSEN}}}({\bf v};\eta)$$ is constructed as in Section 3 based on the contractions $$\pi^{({\rm x})}({\bf v})$$, $$\pi^{({\rm y})}({\bf v})$$ defined as in (5.4). The following result differs from Bachmayr & Dahmen (2015, Theorem 7), which is formulated for general hierarchical tensors, in that we now consider differing sparsity classes for the contractions $$\pi^{(i)}$$, $$i={\rm x},{\rm y}$$. In view of the preceding discussion, it is reasonable to assume possibly different, but algebraic decay for both contractions. Theorem 5.1 Let $$\mathbf{u}, \mathbf{v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with $${\lVert{\mathbf{u}-\mathbf{v}}\rVert} \leq \eta$$. Then for \begin{align} \label{weta} \mathbf{w}_{\eta} := \hat{\mathrm{C}}_{2^{3/2}(1+\alpha)\eta} \bigl(\hat{\mathrm{P}}_{ (1+\alpha)\eta} (\mathbf{v}) \bigr)\!, \end{align} (5.6) we have \begin{align} \label{eq:combinedcoarsen_errest} {\lVert{{\bf u} - {\bf w}_\eta}\rVert} \leq \bigl( 2+\alpha + 2^{3/2} (1+\alpha)\bigr) \eta . \end{align} (5.7) Moreover, when $$\mathbf{u}\in{{\it{\Sigma}}^{\bar s}}$$, $$\pi^{(i)}(\mathbf{u}) \in {\cal A}^{s_i}$$, $$i={\rm x},{\rm y}$$, we have \begin{align}\label{eq:combinedcoarsen_rankest} {\lvert{{\mathrm{rank}}({\bf w}_\eta)}\rvert}_\infty \leq 2 \bigl(\alpha^{-1} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \bigr)^{\frac {1}{\bar s}} \eta^{-\frac {1}{\bar s}},\qquad {\lVert{{\bf w}_\eta}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \leq 2 (1 + \alpha^{-1}) {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}} , \end{align} (5.8) and \begin{equation}\label{eq:combinedcoarsen_suppest} \begin{aligned} \#{\mathrm{supp}}_{\rm x} (\mathbf{w}_\eta) + \#{\mathrm{supp}}_{\rm y} (\mathbf{w}_\eta) &\leq 2+ \biggl( \frac{2\|\pi^{({\rm x})}({\bf u})\|_{{\mathcal{A}}^{s_{\rm x}}}}{\alpha\eta}\biggr)^{\frac {1}{s_{\rm x}}} + \biggl( \frac{2\|\pi^{({\rm y})}({\bf u})\|_{{\mathcal{A}}^{s_{\rm y}}}}{\alpha\eta}\biggr)^{\frac {1}{s_{\rm y}}} \\ {\lVert{\pi^{(i)}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^{s_i}} &\leq C {\lVert{ \pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}} , \quad i={\rm x},{\rm y}, \end{aligned} \end{equation} (5.9) where $$C$$ depends on $$\alpha$$ and $$s_i$$, $$i={\rm x},{\rm y}$$. The estimates (5.6) and (5.7) have been already shown in Bachmayr & Dahmen (2015). The only deviation concerns the stability estimate (5.9), which we prove in Appendix B. To apply Algorithm 2.1 it remains to specify the approximate application of $${\mathbf{A}}$$ by the procedure $${\small{\text{APPLY}}}$$ to representations of the form (5.2). As part of this procedure, we shall also use a modified routine $${\small{\text{COARSEN}}}_{\rm y}$$, which operates only on the second tensor mode and leaves $${\mathrm{supp}}_{\rm x}$$ unchanged. For this routine, we shall only use the simpler statement that for any $${\bf v}\in\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with $$\pi^{({\rm y})}({\bf v}) \in {\mathcal{A}}^{s_{\rm y}}({\mathcal{F}})$$, $${\bf v}_{\rm y}\,{\,{:=}\,}\,{\small{\text{COARSEN}}}_{\rm y}({\bf v};\eta)$$ satisfies \[ \#{\mathrm{supp}}_{\rm y} ({\bf v}_{\rm y}) \lesssim \eta^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{s_{\rm y}}}, \quad {\lVert{\pi^{({\rm y})}({\bf v}_{\rm y})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}} \lesssim {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}\!. \] 5.1 Adaptive operator application We now describe a specification of the more generic routine $${\small{\text{APPLY}}}$$ used in Bachmayr & Dahmen (2015) that is tailored to exploit anisotropy in the parametrizations of parametric operators. For any given $$\eta>0$$ and finitely supported $${\bf v}$$, we aim to construct $${\bf w}_\eta$$ such that $${\lVert{{\mathbf{A}} {\bf v} - {\bf w}_\eta}\rVert} \leq \eta$$. We follow here the general strategy of combining a priori knowledge on $${\mathbf{A}}$$ with a posteriori information on $${\bf v}$$, which is given in terms of a suitable decomposition of $${\bf v}$$. The routine $${\small{\text{APPLY}}}$$ is structured as follows: (S1) Preprocessing and decomposing the input: We first apply a preprocessing step to the finitely supported input $${\bf v}$$ that consists of applications of $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}_{\rm y}$$. We choose for a given $$\eta >0$$ the tolerances of order $$\eta$$ in such a way that the resulting $${\bf v}_\eta$$ satisfies \begin{equation} \label{etahalf} \|{\bf v} - {\bf v}_\eta\|\le \frac {\eta}{2\|{\mathbf{A}}\|}. \end{equation} (5.10) As a consequence, for any positive $$s_{\rm y}$$, $$\bar s$$ we have \begin{equation} \label{vcoarsened} {\mathrm{rank}}({\bf v}_\eta)\lesssim \eta^{-\frac{1}{\bar s}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}},\quad \|{\bf v}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}, \end{equation} (5.11) and \begin{equation} \label{supps} \#{\mathrm{supp}}_{\rm y} (\mathbf{v}_\eta ) \lesssim {\eta}^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}^{\frac {1}{s_{\rm y}}}_{{\mathcal{A}}^{s_{\rm y}}}, \quad {\lVert{\pi^{({\rm y})}({\bf v}_\eta)}\rVert}_{{\mathcal{A}}^{s_{\rm y}}} \lesssim {\lVert{ \pi^{({\rm y})}( {\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}} \!. \end{equation} (5.12) We then have the SVD of $${\bf v}_\eta$$ at hand, \begin{equation} \label{SVD} {\bf v}_\eta = \sum_{k = 1}^K \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k\!, \end{equation} (5.13) and set $$K_p = \{ 2^p ,\ldots, \min\{K,2^{p+1} - 1\}\}$$, for $$p = 0,1,\ldots$$, $$p\le \log_2 K$$. Furthermore, for $$q=0,1,\ldots$$, let $$\hat{\it{\Lambda}}^{({\rm y})}_{q}$$ be the support of the best $$2^q$$-term approximation of $$\pi^{({\rm y})}({\bf v}_\eta)$$. We set $${\it{\Lambda}}^{({\rm y})}_{0} := \hat{\it{\Lambda}}^{({\rm y})}_{0}$$ and $${\it{\Lambda}}^{({\rm y})}_{q} :=\hat {\it{\Lambda}}^{({\rm y})}_{q} \setminus \hat{\it{\Lambda}}^{({\rm y})}_{q-1}$$ for $$q>0$$. With this subdivision of $${\mathrm{supp}}_{\rm y}({\bf v}_\eta)$$, we now define \begin{equation} \label{vpq} {\bf v}_{[p,q]} := {\mathrm{R}_{{\mathcal{S}} \times {\it{\Lambda}}^{({\rm y})}_{q}}} \sum_{k\in K_p} \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k = \sum_{k\in K_p} \sigma_k {\bf U}^{({\rm x})}_k\otimes {\mathrm{R}_{{\it{\Lambda}}^{({\rm y})}_{q}}} {\bf U}^{({\rm y})}_k \, \end{equation} (5.14) and obtain \begin{equation} \label{decoAv} {\mathbf{A}} {\bf v}_\eta = \sum_{p, q \geq 0} \sum_{j=0}^\infty ({\mathbf{A}}_j \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]} = \sum_{p, q \geq 0} \sum_{j=0}^\infty \sum_{k\in K_p} \sigma_k \,\bigl( {\mathbf{A}}_j{\bf U}^{({\rm x})}_k\bigr) \otimes \bigl({\mathbf{M}}_j {\mathrm{R}_{{\it{\Lambda}}^{({\rm y})}_{q}}} {\bf U}^{({\rm y})}_k \bigr)\!. \end{equation} (5.15) (S2) Adaptive operator truncation: To construct an approximation $${\bf w}_\eta$$ of $${\mathbf{A}} {\bf v}_\eta$$ based on this decomposition, we truncate the summations over $$j$$ for each $$p,q$$ at some index $$M_{p,q}\in {\mathbb{N}}$$, to be determined later, and then replace the remaining terms $${\mathbf{A}}_j$$ by compressed versions, again depending on the respective $$p,q$$. With $$e_M$$ defined for non-negative integer $$M$$ as in (4.3), for any given choice of $$M_{p,q}$$, we have \begin{equation} \label{est1} {\left\lVert{{\mathbf{A}} {\bf v} - \sum_{p, q \geq 0} \sum_{j=0}^{M_{p,q}} ({\mathbf{A}}_{j} \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]} }\right\rVert} \leq \sum_{p, q \geq 0 } e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert}. \end{equation} (5.16) We now choose the $$M_{p,q}= M_{p,q}(\eta)$$ such that \begin{equation} \label{est2} \sum_{p, q \geq 0 } e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert} \leq \frac{\eta}{4}. \end{equation} (5.17) This can be done by choosing positive weights $$\alpha_{p,q}$$ such that $$\sum_{p,q}\alpha_{p,q}=1$$, computing $${\lVert{{\bf v}_{[p,q]}}\rVert}$$ and adjusting the $$M_{p,q}$$ so as to guarantee that \begin{equation} \label{etapq0} e_{M_{p,q}}\|{\bf v}_{[p,q]}\|\le \eta_{p,q} := \alpha_{p,q}\eta/4. \end{equation} (5.18) We will give an a priori choice for $$M_{p,q}$$ in (5.29) below, but one may as well use, e.g., the Greedy scheme proposed in Gittelson (2013) for selecting these values. (S3) Adaptive application of the spatial components $${\mathbf{A}}_j$$: Next, to realize an approximate application of the (generally) infinite matrices $${\mathbf{A}}_j$$ to $${\bf U}^{({\rm x})}_k$$ in (5.15), we replace $${\mathbf{A}}_j {\bf v}_{[p,q]}$$ by an approximation $$\tilde{\mathbf{A}}_{j,p,q} {\bf v}_{[p,q]} $$ using (2.10) so as to satisfy \begin{equation} \label{etapq} {\left\lVert{ \sum_{j=0}^{M_{p,q}} ({\mathbf{A}}_j - \tilde{\mathbf{A}}_{j,p,q})\otimes {\mathbf{M}}_j \, {\bf v}_{[p,q]} }\right\rVert} \leq \eta_{p,q}. \end{equation} (5.19) The approximate operators $$\tilde{\mathbf{A}}_{j,p,q}$$ will be specified later. The sought approximation of $${\mathbf{A}}{\bf v}$$ can now be obtained as \begin{equation} \label{weta assembly} {\bf w}_\eta := \sum_{p,q \geq 0} \sum_{j=0}^{M_{p,q}} (\tilde{\mathbf{A}}_{j,p,q} \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]} , \end{equation} (5.20) which by the above construction satisfies the computable error bound \begin{equation} \label{errorbound} {\lVert{{\mathbf{A}} {\bf v}_\eta - {\bf w}_\eta}\rVert} \leq \sum_{p,q \geq 0 } \bigl( e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert} + \eta_{p,q} \bigr) \leq \eta/2 , \end{equation} (5.21) so that in summary $$\|{\mathbf{A}} {\bf v} - {\bf w}_\eta\|\le \eta$$. In summary, the above adaptive approximation of $${\mathbf{A}}$$ to a given finitely supported $${\bf v}$$ involves the following steps: $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). 5.2 Complexity analysis To quantify the complexity of computing $${\bf w}_\eta$$ in (5.20), we need to specify the properties of the operator $$A(y)$$ as well as the sparsity properties of the input. In view of our preceding discussion, in the scenario of primary interest, the singular values of the solution $${\bf u}$$, as well as the best $$n$$-term approximations of the contractions $$\pi^{(i)}({\bf u})$$, $$i\in \{{\rm x}, {\rm y}\}$$, exhibit algebraic decay rates. As before, these rates are denoted by $$\bar s$$ and $$s_{\rm x}$$, $$s_{\rm y}$$, respectively. As indicated earlier, the complexity of the above scheme depends, in particular, on the operator approximability by truncation. We adhere to the natural assumption that $$e_M \le CM^{-S}$$ for some positive $$S$$, see (4.4). In the subsequent discussion, we assume $$S> s_i$$, $$i\in \{{\rm x},{\rm y}\}$$. As discussed in detail in Section 6.2, this holds true for the expansion model of Assumption 4.2 with $$S = \frac{\alpha}{m}$$. We gather next the properties upon which the complexity analysis will be based. Assumption 5.2 The solution $${\bf u}$$ to (2.5) and the matrix $${\mathbf{A}}$$ have the following properties: (i) One has $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s_i}$$, $$i={\rm x},{\rm y}$$, with $$s_{\rm x}, s_{\rm y} >0$$. (ii) $${\bf u}, {\mathbf{f}} \in {{\it{\Sigma}}^{\bar s}}$$ for some $$\bar s \ge s_{\rm x}, s_{\rm y}$$. (iii) There exists a constant $$C$$ such that $$e_M \le CM^{-S}$$, $$M\in {\mathbb{N}}$$, where $$e_M$$ is defined by (4.3) and \begin{align} \label{Ssi} S\ge \bar s , s_{\rm y}. \end{align} (5.22) (iv) The representations $${\mathbf{A}}_j$$, $$j\in {\mathbb{N}}$$, satisfy the assumptions of Proposition 4.3, where $$\tau$$ satisfies \begin{align} \label{tau} \frac{2\tau}{1+2\tau} \frac {\alpha}{m} = \frac{2\tau}{1+2\tau} S > s_{\rm x}. \end{align} (5.23) Under these conditions, one can construct a routine $${\small{\text{RHS}}}$$ that satisfies, for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta \,{:=}\, {\small{\text{RHS}}}(\eta)$$, \begin{gather*} \#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim \eta^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^{s_i}}, \quad {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^{s_i}} \lesssim {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^{s_i}}, \quad i \in \{{\rm x},{\rm y} \}, \\ {\mathrm{rank}}({\mathbf{f}}_\eta) \lesssim \eta^{-\frac {1}{\bar s}} {\lVert{{\mathbf{f}}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}} , \quad {\lVert{{\mathbf{f}}_\eta}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \lesssim {\lVert{{\mathbf{f}}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}. \end{gather*} Assuming full knowledge of $${\mathbf{f}}$$, one can also realize $${\small{\text{RHS}}}$$ using $$\mathcal{O}\Big(\eta^{-\tfrac{1}{\bar s} - \tfrac{1}{\min\{ s_{\rm x}, s_{\rm y} \}}}\Big)$$ operations, and we shall make this idealized assumption in what follows. As in Section 3, the requirements on $${\small{\text{RHS}}}$$ simplify substantially when the corresponding right-hand side $$f$$ is independent of the parametric variable. The main result of this section states that up to a logarithmic factor the sparsity properties of the input are preserved by the output of $${\small{\text{APPLY}}}$$. Theorem 5.3 Suppose that the properties listed under Assumption 5.2 hold. Then, given any finitely supported input $${\bf v}\in \ell^{2}({\cal S}\times {\cal F})$$, the output $${\bf w}_\eta$$ produced by the procedure $${\small{\text{APPLY}}}$$, based on the steps (S1)–(S3), satisfies \begin{equation} \label{accuracy} \|{\mathbf{A}} {\bf v} - {\bf w}_\eta\|\le \eta. \end{equation} (5.24) Moreover, with some $$b\le 2+ \frac{4}{s_{\rm x}}$$ one has \begin{equation} \label{rankbars} {\mathrm{rank}}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{\bar s}}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}} (1+{\lvert{\log\eta}\rvert})^b , \quad \|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}} (1+{\lvert{\log\eta}\rvert})^{\bar s b }, \end{equation} (5.25) and \begin{equation} \label{rp} \begin{gathered} \#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{s_{\rm y}}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac{1}{s_{\rm y}}}(1+{\lvert{\log \eta}\rvert})^{b}, \\ \|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}} \lesssim \|{\bf v}\|_{{\mathcal{A}}^{s_{\rm y}}} (1+{\lvert{\log \eta}\rvert})^{s_{\rm y} b}, \end{gathered} \end{equation} (5.26) as well as \begin{equation} \label{sstab} \begin{gathered} \#{\mathrm{supp}}_{\rm x}({\bf w}_\eta)\lesssim \eta^{-\frac {1}{s_{\rm x}}}\|\pi^{({\rm x})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}} (1+{\lvert{\log \eta}\rvert})^{b}, \\ \|\pi^{({\rm x})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm x}}} \lesssim \|\pi^{({\rm x})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm x}}} (1+{\lvert{\log \eta}\rvert})^{s_{\rm x} b}, \end{gathered} \end{equation} (5.27) where the constants depend also on $$s_i$$, on $${\lvert{\log \|\pi^{(i)}({\bf v})\|_{{\mathcal{A}}^{s_i}}}\rvert}$$, $$i\in \{{\rm x}, {\rm y}\}$$ and on $$\tau$$ in Assumption 5.2. Proof. The error bound (5.24) is implied by the construction. As for the remaining claims, to assess the complexity of computing $${\bf w}_\eta$$, given by (5.20), we estimate first $$M_{p,q}=M_{p,q}(\eta)$$ in terms of $$\eta$$. To obtain a priori bounds for the $$M_{p,q}$$, we use Assumption 5.2(i) and (ii) to conclude that \begin{equation} \label{vpqest} {\lVert{{\bf v}_{[p,q]}}\rVert} \leq 2^{-s_{\rm y} q}{\lVert{\pi^{({\rm y})}({\bf v})}\rVert} _{\mathcal{A}^{s_{\rm y}}} , \qquad {\lVert{{\bf v}_{[p,q]}}\rVert} \leq 2^{-\bar s p} {\lVert{ {\bf v} }\rVert}_{{{\it{\Sigma}}^{\bar s}}}\!. \end{equation} (5.28) Then Assumption 5.2(iii) and (5.28) yield the sufficient conditions \begin{equation} \label{Mpqest} M_{p,q}= M_{p,q}(\eta) \ge \left( \frac{ 4 C \min \{ 2^{-\bar s p} {\lVert{ {\bf v} }\rVert}_{{{\it{\Sigma}}^{\bar s}}} , 2^{-s_{\rm y} q} {\lVert{ \pi^{({\rm y})}({\bf v}) }\rVert}_{\mathcal{A}^{s_{\rm y}}} \}}{ \alpha_{p,q} \, \eta } \right)^{\frac{1}{S}} . \end{equation} (5.29) From (5.20) and the decomposition (5.15), we see that \begin{equation} \label{rankw} {\mathrm{rank}}({\bf w}_\eta) \leq \sum_{p,q\geq 0} M_{p,q} 2^p , \quad \# {\mathrm{supp}}_{\rm y}({\bf w}_\eta) \leq \sum_{p,q\geq 0 } 3 M_{p,q} 2^q\!. \end{equation} (5.30) □ Note that the factor of $$3$$ in the bound for $$\# {\mathrm{supp}}_{\rm y}({\bf w}_\eta)$$ results from the bidiagonal form of the matrices $${\mathbf{M}}_j$$; that is, the action for each of these matrices can add at most twice the number of nonzero entries in the preimage sequence, in addition to the existing ones. The following lemma provides bounds for the right-hand sides in (5.30). Lemma 5.4 For any fixed constant $$a>1$$ choose \begin{align} \label{alphapq} \alpha_{p,q}=c\,\big((1+ p)(1+q)\big)^{-a}, \quad c: = \left(\sum_{p,q\geq 0} \big((1+ p)(1+q)\big)^{-a}\right)^{-1}, \end{align} (5.31) as weights in (5.29). Then for $$S\geq \bar s$$ one has \begin{align} \sum_{p,q}2^p M_{p,q}&\lesssim \eta^{-\frac {1}{S}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}\bigl(1+\log_2\#{\mathrm{supp}}_{\rm y}({\bf v})\bigr)^{1+\frac {a}{S}} \notag\\ &\quad\times \bigl(1+\log_2{\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\bigl({\mathrm{rank}}({\bf v}_\eta)\bigr)^{1- \frac{\bar s}{S}},\label{simplep} \end{align} (5.32) where the constant depends on $$a,S$$ and $$\bar s$$ on $$c$$ in (5.31) and on $$C$$ in Assumption 5.2(iii). Similarly, for $$S \geq s_{\rm y}$$ one has \begin{align} \sum_{p,q}2^q M_{p,q} &\lesssim \eta^{-\frac {1}{S}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}}\bigl(1+ \log_2 {\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}} \notag\\ &\quad\times \bigl(1+ \log_2\#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}} \bigl(\#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1- \frac{s_{\rm y}}{S}},\label{simpleq} \end{align} (5.33) with similar dependencies of the constants as before, but with $$\bar s$$ replaced by $$s_{\rm y}$$. Proof. Bounding $$M_{p,q}\lesssim \eta^{-\frac {1}{S}} \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}(1+q)^{\frac {a}{S}}(1+p)^{\frac {a}{S}} 2^{-\frac{\bar s p}{S}}$$, we derive \begin{align} \label{pestimate} \sum_{p,q}2^p M_{p,q}\lesssim \eta^{-\frac {1}{S}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}\bigl(1+\log_2 \#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\sum_p (1+p)^{\frac {a}{S}} 2^{p \left(1-\frac{\bar s}{S} \right)}\!, \end{align} (5.34) which gives (5.32), where the constant depends on $$a,S, \bar s$$ and $$c,C$$ from (5.29). To bound $$\sum_{p,q}2^qM_{p,q}$$, we use $$M_{p,q}\lesssim \eta^{-\frac {1}{S}}\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}} (1+p)^{\frac {a}{S}} (1+q)^{\frac {a}{S}}2^{-\frac{s_{\rm y} q}{S}}$$ and obtain \begin{align*} \sum_{p,q}2^q M_{p,q} \lesssim \eta^{-\frac {1}{S}}\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}}\bigl(1+ \log_2{\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\sum_q (1+q)^{\frac {a}{S}}2^{q \left(1- \frac{s_{\rm y}}{S} \right)}\!, \end{align*} which yields (5.33). □ We proceed estimating the various sparsity norms of $${\bf w}_\eta$$. We first address rank growth and parametric sparsity, which are independent of the specific choice of $$\tilde{\mathbf{A}}_{j,p,q}$$. Using (5.30) and (5.32) in Lemma 5.4 together with (5.11) and (5.12), for $$S \geq \bar s$$ we obtain \begin{align} \label{rankfirst} {\mathrm{rank}}({\bf w}_\eta) &\lesssim \eta^{-\frac {1}{S}}(1+{\lvert{\log\eta}\rvert})^{2\left(1+\frac {a}{S}\right)}\|{\bf v}\|^{\frac {1}{S}}_{{{\it{\Sigma}}^{\bar s}}} \,\eta^{-\frac{1}{\bar s}\left(1-\frac{\bar s}{S}\right)}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}\left(1-\frac{\bar s}{S}\right)}\!, \nonumber\\ & = \eta^{-\frac{1}{\bar s}}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}} (1+{\lvert{\log\eta}\rvert})^{2\left(1+\frac {a}{S}\right)}\!, \end{align} (5.35) where the constant depends also on $${\lvert{\log \|\pi^{(i)}({\bf v})\|_{{\mathcal{A}}^{s_i}}}\rvert}$$, $$i\in \{{\rm x}, {\rm y}\}$$. Now suppose that $$N_\eta$$ is an upper bound for $${\mathrm{rank}}({\bf w}_\eta)$$. To simplify the exposition, let us assume without loss of generality that $$\eta \in(0,1)$$. Then, by definition, one has \begin{align*} \|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}& = \sup_{N\le N_\eta}N^{\bar s}\inf_{{\mathrm{rank}}({\bf w})\le N}\|{\bf w}_\eta - {\bf w}\| \le \sup_{B\in [1,\eta^{-1}]}N_{B\eta}^{\bar s}\|{\bf w}_\eta-{\bf w}_{B\eta}\|\\ &\le \sup_{B\in [1,\eta^{-1}]} N_{B\eta}^{\bar s}\big(\|{\bf w}_\eta -{\mathbf{A}} {\bf v}_\eta\|+ \|{\mathbf{A}} {\bf v}_\eta-{\bf w}_{B\eta}\|\big)\le \sup_{B\in [1,\eta^{-1}]} 2B\eta N_{B\eta}^{\bar s}. \end{align*} Now we can invoke for each $$B\in [1,\eta^{-1}]$$ the upper bound for $${\mathrm{rank}}({\bf v}_\eta)$$ given by (5.35) and observe that the resulting bound is maximized for $$B=\eta^{-1}$$ when $$S\ge \bar s$$. This gives \begin{equation} \label{stabbars} \|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}} (1+{\lvert{\log\eta}\rvert})^{2\bar s\left(1+\frac {a}{S}\right)}\!, \end{equation} (5.36) which confirms (5.25). Similarly, using the second estimate in (5.30) and (5.33) in Lemma 5.4 and invoking (5.12) yields, for $$S \geq s_{\rm y}$$, \begin{equation} \label{suppp} \#{\mathrm{supp}}_{\rm y}({\bf w}_\eta) \lesssim \eta^{-\frac {1}{S}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}} (1+{\lvert{\log_2 \eta}\rvert})^{2+\frac{2 a}{S}} \left(\frac{\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}}{\eta} \right)^{\frac{1}{s_{\rm y}}\left(1- \frac{s_{\rm y}}{S}\right)}\!. \end{equation} (5.37) By the same argument as before, one obtains \begin{equation} \label{suppp2} \#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{s_{\rm y}}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac{1}{s_{\rm y}}} (1+{\lvert{\log_2 \eta}\rvert})^{2+\frac{2 a}{S}}. \end{equation} (5.38) We can then continue as above, denoting by $$M_\eta$$ an upper bound for $$\#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)$$, to argue \begin{align*} \|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}} &\le \sup_{B\in [1,\eta^{-1}]} M_{B\eta}^{s_{\rm y}}\big(\|\pi^{({\rm y})}({\bf w}_\eta)-\pi^{({\rm y})}({\mathbf{A}}{\bf v}_\eta)\| + \|\pi^{({\rm y})}({\bf w}_{B\eta})-\pi^{({\rm y})}({\mathbf{A}}{\bf v}_\eta)\|\big)\\ &\leq \sup_{B\in [1,\eta^{-1}]} M_{B\eta}^{s_{\rm y}}\big(\| {\bf w}_\eta - {\mathbf{A}}{\bf v}\| + \| {\bf w}_{B\eta}- {\mathbf{A}}{\bf v}\|\big)\\ &\le \sup_{B\in [1,\eta^{-1}]} 2B\eta M_{B\eta}^{s_{\rm y}}. \end{align*} Thus, we obtain \begin{equation} \label{stabsp} \|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}} \lesssim \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}} (1+{\lvert{\log_2 \eta}\rvert})^{2s_{\rm y}\left(1+\frac{a}{S}\right)}\!, \end{equation} (5.39) which together with (5.38) shows (5.26). We now turn to estimating $$\#{\mathrm{supp}}_{\rm x}({\bf w}_\eta)$$ and $$\|\pi^{({\rm x})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm x}}}$$. To this end, we specify suitable compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$ in (5.19). Denoting by $$\pi^{({\rm x})}({\bf v}_{[p,q]})_{\ell}$$ the best $$\ell$$-term approximation of $$\pi^{({\rm x})}({\bf v}_{[p,q]})$$, we set $${\it{\Lambda}}_{p,q,0} := {\mathrm{supp}}(\pi^{({\rm x})}({\bf v}_{[p,q]})_{1})$$ and \[ {\it{\Lambda}}_{p,q,n}:= {\mathrm{supp}}\big(\pi^{({\rm x})}({\bf v}_{[p,q]})_{2^n}\big) \setminus {\mathrm{supp}}\big(\pi^{({\rm x})}({\bf v}_{[p,q]})_{2^{n-1}}\big), \quad n\in{\mathbb{N}}. \] Note that \[ {\lVert{ {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}\times {\mathcal{F}}}} {\bf v}_{[p,q]} }\rVert} \leq {\lVert{ {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}} \pi^{({\rm x})}({\bf v}_{[p,q]}) }\rVert} \leq 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}\!. \] To proceed, we employ the following convenient reformulation of Proposition 4.3. Remark 5.5 Let $$M \in {\mathbb{N}}$$ and $$s < \frac{2\tau}{1+2\tau} S$$. Then for any $$J\in {\mathbb{N}}$$ we can find $${\mathbf{A}}_j^J$$, $$j\geq 0$$, such that \begin{equation*} {\left\lVert{\sum_{j=0}^M \bigl( {\mathbf{A}}_j - {\mathbf{A}}_j^J \bigr) \otimes {\mathbf{M}}_j}\right\rVert} \leq \beta_J 2^{-s J}, \end{equation*} and the following holds: for each $$\lambda\in{\mathcal{S}}$$, for the sum of the number of corresponding nonzero column entries of the $${\mathbf{A}}_j^J$$ we have the bound \begin{equation}\label{sparsecompr_sum} \sum_{j = 0}^M \#{\mathrm{supp}} \,\bigl( {\mathbf{A}}_{j,\lambda'\lambda}^J \bigr)_{\lambda'\in{\mathcal{S}}} \leq \alpha_J 2^J. \end{equation} (5.40) Here $$\boldsymbol{\alpha},\boldsymbol{\beta}$$ are positive summable sequences. For a suitable non-negative integer $$N=N_{j,p,q,\eta}$$, let $$\tilde{\mathbf{A}}_{j,p,q} \,{:=}\, \sum_{n = 0}^N {\mathbf{A}}_j^{N - n} {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}}$$ and \begin{equation}\label{wpq} {\bf w}_{p,q} := \sum_{j=0}^{M_{p,q}}(\tilde{\mathbf{A}}_{j,p,q} \otimes {\mathbf{M}}_j){\bf v}_{[p,q]}. \end{equation} (5.41) Then \begin{equation*} {\left\lVert{{\bf w}_{p,q} - \sum_{j=0}^{M_{p,q}}\big( {\mathbf{A}}_j \otimes{\mathbf{M}}_j \big){\bf v}_{[p,q]} }\right\rVert} = {\left\lVert{\sum_{j=0}^{M_{p,q}} \sum_{n = 0}^N \big( ({\mathbf{A}}_j^{N-n} - {\mathbf{A}}_{j}){\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}} \otimes{\mathbf{M}}_j\big) {\bf v}_{[p,q]} }\right\rVert} . \end{equation*} Using Remark 5.5 with $$s=s_{\rm x}$$, the right side can be estimated by \begin{align*} &\sum_{n = 0}^N \beta_{N-n} 2^{-s_{\rm x}(N-n)} 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} + 2 {\lVert{{\mathbf{A}}}\rVert} \sum_{n>N} 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} \\ &\quad\lesssim 2^{-s_{\rm x} N} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} , \end{align*} where the constant depends on $$s_{\rm x}$$, $${\lVert{{\mathbf{A}}}\rVert}$$ and $${\lVert{ \boldsymbol{\beta} }\rVert}_{\ell^1}$$. By (5.40), we obtain \begin{equation}\label{supp pq} \#{\mathrm{supp}}_{\rm x} ({\bf w}_{p,q}) \lesssim \sum_{n = 0}^N 2^n \alpha_{N-n} 2^{N-n} \lesssim 2^N. \end{equation} (5.42) If we now choose the smallest $$N$$ such that (5.19) holds, i.e., $$2^{-s_{\rm x} N} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} \lesssim \eta_{p,q}$$, we obtain \begin{equation*} \#{\mathrm{supp}}_{\rm x} ({\bf w}_{p,q}) \lesssim \eta_{p,q}^{-\frac {1}{s_{\rm x}}} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}} \lesssim \eta_{p,q}^{-\frac {1}{s_{\rm x}}}{\lVert{ \pi^{({\rm x})}({\bf v}_{\eta})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}}\!. \end{equation*} Keeping the definition of $$\eta_{p,q}=\alpha_{p,q}\eta$$ and (5.11), (5.12) in mind, summing over $$p,q$$ gives (5.27) with $$b= 2\big(1+\frac{a}{s_{\rm x}}\big) > 2\big(1+\frac{a}{S}\big)$$, where the bound on $${\lVert{\pi^{({\rm y})}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}$$ follows as in (5.36) and (5.39). Remark 5.6 Note that in Assumption 5.2, we state that $$S\geq \bar s, s_{\rm y}$$ and $$S > s_{\rm x}$$. Although other cases can in principle be considered in the same manner, the convergence rate $$S$$ of the operator truncation then limits the achievable efficiency: if $$S< \bar s$$, for instance, it is easy to see that in general one can only obtain $${\mathrm{rank}}({\bf w}_\eta) = {\mathcal{O}}(\eta^{-1/S})$$. Proposition 5.7 Under the assumptions of Theorem 5.3, let $${\bf v}$$ be given by its SVD with $$r\,{\,{:=}\,}\,{\mathrm{rank}}({\bf v})$$ and $$n_i\,{\,{:=}\,}\,\#{\mathrm{supp}}_i({\bf v})$$ for $$i\in \{{\rm x},{\rm y}\}$$. Then for the number of operations $${\mathrm{ops}}({\bf w}_\eta)$$ required to obtain $${\bf w}_\eta$$, one has \begin{align} {\mathrm{ops}}({\bf w}_\eta) &\lesssim (n_{\rm x} + n_{\rm y}) r^2 + \left( ( 1 + {\lvert{\log \eta}\rvert})^{\frac{2a}{s_{\rm x}}} \eta^{-\frac {1}{s_{\rm x}}}{\lVert{ \pi^{({\rm x})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}} \right.\notag\\ &\quad\left. + (1+{\lvert{\log \eta}\rvert} )^{\frac {2a}{S}} \eta^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}^{\frac {1}{s_{\rm y}}}_{{\mathcal{A}}^{s_{\rm y}}} \right) \eta^{-\frac 1{\bar s}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac 1{\bar s}}. \end{align} (5.43) For the proof of this proposition, we refer to Appendix B. With these preparations, we obtain the following complexity estimate for Algorithm 2.1. Theorem 5.8 Let Assumption 5.2 hold. Then for any $$\varepsilon>0$$, the approximation $${\bf u}_\varepsilon$$ of the form (LR) produced by Algorithm 2.1 satisfies \begin{equation}\label{hsrankbound} {\mathrm{rank}}({\bf u}_\varepsilon) \lesssim \varepsilon^{-\frac {1}{\bar s}} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}}, \quad {\lVert{{\bf u}_\varepsilon}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \lesssim {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \end{equation} (5.44) and \begin{equation}\label{hssuppbound} \sum_{i\in\{{\rm x},{\rm y}\}} \#{\mathrm{supp}}_i ({\bf u}_\varepsilon) \lesssim \sum_{i\in\{{\rm x},{\rm y}\}} \varepsilon^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}^{-\frac{1}{s_i}}, \quad {\lVert{\pi^{(i)}({\bf u}_\varepsilon)}\rVert}_{{\mathcal{A}}^{s_i}} \lesssim {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}\!. \end{equation} (5.45) The number of operations $${\mathrm{ops}}({\bf u}_\varepsilon)$$ required to produce $$\varepsilon$$ then satisfies \begin{equation}\label{hsopbound} {\mathrm{ops}}({\bf u}_\varepsilon) \lesssim 1+ ( 1 + {\lvert{\log \varepsilon}\rvert})^\zeta \left( \varepsilon^{-\frac {1}{\bar s}} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}} \right)^2 \sum_{i\in\{{\rm x},{\rm y}\}} \varepsilon^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}^{-\frac{1}{s_i}} , \end{equation} (5.46) where $$\zeta>0$$ depends on $$s_{\rm x}$$, on $$\operatorname{cond}({\mathbf{A}})$$, and on the choice of $$\kappa_1, \beta$$ in Algorithm 2.1. The constants in (5.44), (5.45) and (5.46) may also depend on $$S$$, $$\bar s$$, $$s_{\rm y}$$ and on the further parameters of Algorithm 2.1. Proof. We follow the general strategy of the proofs as in Bachmayr & Dahmen (2015) and in Theorem 3.4, combining the properties of the complexity reduction procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with the specific adaptive operator application that we have constructed for the present problem. The bound (5.44) and (5.45) follow from Theorem 5.1 applied to the result of line 11 in Algorithm 2.1. Note that here, the number $$J$$ of inner iterations depends only on $$\operatorname{cond}({\mathbf{A}})$$ (via $$\rho,\omega$$) and on the choice of $$\kappa_1$$ and $$\beta$$. With the complexity estimates for $${\small{\text{APPLY}}}$$ from Theorem 5.3 and Proposition 5.7 at hand, we obtain (5.46). □ Remark 5.9 The present version of Algorithm 2.1 necessarily performs orthogonalizations of the basis vectors in the computed low-rank expansions. Under the ensuing requirement of common supports, Theorem 5.8 shows that the output of Algorithm 2.1 for the format (LR) has optimal representation complexity. Regarding the computational complexity of the algorithm, as can be seen from the proofs of Theorem 5.3 and Proposition 5.7, the numerical cost for the approximate operator application is dominated by the cost of performing orthogonalizations of the input. In particular, this leads to a quadratic dependence on the approximation ranks, and up to the logarithmic term, (5.46) represents the best possible bound for an algorithm performing such orthogonalizations. The number of subsequent operations required to construct the low-rank representation of the output, however, remains proportional to the respective number of degrees of freedom. 6. Approximability of parametric problems In this section, we consider representative instances of (1.5) to compare the respective properties that determine the efficiency of the variants of our scheme for (ASP), (LR) and (STF). 6.1 Isotropic dependence on finitely many parameters As simple yet instructive examples, we consider problems with $$\bar{a}=1$$ and \begin{equation} \label{cm} {\theta}_j = b_j \chi_{D_j}, \end{equation} (6.1) where $$b_j \in ]0,1[$$ are constants and the subdomains $$D_j$$ of the domain $$D$$ have disjoint closures so that the diffusion coefficient is a strictly positive piecewise constant, \begin{equation} \label{inclusions} a(y) = \bar{a} + \sum_{j\geq 1} y_j b_j \chi_{D_j}. \end{equation} (6.2) As a first problem of this type with $$D = ]0,1[$$, we consider the following. Example 6.1 Let $$d := \#\mathcal{I} < \infty$$, $$D_j\subset D\,{=}\,] 0, 1[$$ for $$j=1,\ldots,d$$ with pairwise disjoint $$\overline{D_j}$$, and $$b_j =\xi$$ for some $$\xi\,{\in}\,]0,1[$$. For low-rank approximation, we then have the following result for the rank of the Hilbert–Schmidt decomposition (5.2). Proposition 6.2 In Example 6.1, for any $$f\in V'$$, one has $${\mathrm{rank}}({\bf u}) \leq 4 d + 1$$. Proof. This follows by the same arguments as in Bachmayr & Cohen (2017, Example 2.2): the end points of the $$D_j$$ induce a partition of $$]0,1[$$ into at most $$2d+1$$ intervals. For each such interval $$I$$, for any $$F$$ such that $$F''=f$$, we have $$u(y)|_I \in {\mathrm{span}}\{ \chi_{I} ,x\,\chi_{I} , F\,\chi_{I} \}$$. Hence, $$u(y)$$ is contained in a $$y$$-independent space of dimension $$6d+3$$ for all $$y$$. In addition, there are $$2d+2$$ continuity conditions, independent of $$y$$, at the interval boundaries, which leaves at most $$4d+1$$ degrees of freedom. □ We observe, on the other hand, that the Legendre expansions for this problem involves infinitely many nonzero coefficients, that is, the solution map $$y\mapsto u(y)$$ is not a polynomial in $$y$$. This can be checked, for example, by considering the Taylor coefficients of $$u$$. For any $$\nu=(\nu_j)_{j\geq 1}\in {\mathcal{F}}$$, the coefficients in the Taylor expansion of $$u$$ are given by \begin{equation} t_\nu(y) = \frac {1}{\nu!}\partial^\nu u(y), \quad \nu !:=\prod_{j \geq 1}\nu_j !. \end{equation} (6.3) Denoting by $$e^j=(0,\dots,0,1,0,\dots)$$ the $$j$$th Kronecker sequence, differentiating the equation we find that these coefficients are given by the recursion \begin{equation}\label{eq:taylor_recursion} t_\nu(y) := -A(y)^{-1}\sum_{j\in{\mathrm{supp}}\nu} A_j t_{\nu - {\it \,{e}}^j}(y), \quad t_0(y) = A(y)^{-1} f=u(y). \end{equation} (6.4) We now consider the Taylor coefficients of order $$n$$ in a given variable $$j$$ at the origin, that is, \begin{equation} t_{n,j}:=t_{ne^j} (0)=\frac 1 {n!} \partial^{n}_{y_j} u(0). \end{equation} (6.5) As a particular case of (6.4), we have \begin{equation} \int_D \bar a \nabla t_{n,j}\cdot \nabla v \,{\rm{d}}x =-\int_D \theta_j \nabla t_{n-1,j} \cdot\nabla v \,{\rm{d}}x . \end{equation} (6.6) Since $$t_{0,j}=u(0)$$ is not trivial, there is at least one variable $$j$$ such that $$t_{1,j}$$ does not vanish on $$D_j$$. Then, taking $$v=t_{n-1,j}$$ in the above recursion shows by contradiction that $$t_{n,j}$$ does not vanish on $$D_j$$, for all values of $$n\geq 0$$. Thus $$y\mapsto u(y)$$ cannot be a polynomial. Low-rank approximations thus give substantially faster convergence than Legendre expansions in this case. Similar results showing substantial advantages of best low-rank approximations have also been obtained for spatially two-dimensional examples of analogous structure in Bachmayr & Cohen (2017). The test problems considered there are of the form (6.2) as well, with the coefficients piecewise constant on $$D\,{:=}\,]0,1[^2$$ and where $$D_j$$, $$j=1,\ldots, d$$ are a partition of $$D$$ into congruent square subdomains. The resulting ‘checkerboard’ geometry is illustrated for $$d = 16$$ in Fig. 1. Fig. 1. View largeDownload slide Example geometry of piecewise constant coefficients $$a$$ with $$d=16$$ and decay of Legendre coefficients of corresponding $$u$$. Fig. 1. View largeDownload slide Example geometry of piecewise constant coefficients $$a$$ with $$d=16$$ and decay of Legendre coefficients of corresponding $$u$$. The low-rank approximability of such problems with respect to space–parameter separation has been studied in Bachmayr & Cohen (2017). For the case $$d=4$$ (i.e., a $$2\times 2$$-checkerboard), it is shown in Bachmayr & Cohen (2017) that for each $$n\in {\mathbb{N}}$$ one can find $$u^{\rm x}_k$$, $$u^{\rm y}_k$$ for $$k=1,\ldots,n$$ such that for some $$c>0$$, \[ {\left\lVert{ u - \sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k}\right\rVert}_{L^2({Y},V)} \lesssim {\it \,{e}}^{- c n}. \] Numerical tests indicate that an analogous estimate can be achieved also for geometries of the type shown in Fig. 1 with $$d=9,16,25,\ldots$$, where $$c$$ has a moderate dependence on $$d$$. Note also that for a hierarchical tensor representation, the ranks of further matricizations enter as well. We are not aware of any bounds for these additional ranks. The numerically observed decay of the corresponding singular values for different values of $$d$$ (using a linear dimension tree) are shown in Fig. 2. Note that the singular values of the matricization $$T^{\{{\rm x}\}}_{\bf u}$$ are precisely those in the decomposition (5.2) underlying (LR). Fig. 2. View largeDownload slide Hierarchical singular values of $${\bf u}$$, where $$D$$ has $$\sqrt{d}\times\sqrt{d}$$-checkerboard geometry as in Fig. 1. Solid lines: singular values of matricizations $$T^{(\{i\})}_\mathbf{u}$$ associated with $$i\in\hat {\mathcal{I}}$$ (essentially identical for $$i\in{\mathcal{I}}$$, with the line of slower decay corresponding to $$i={\rm x}$$) and dashed lines: singular values of further matricizations in the hierarchical representation. The horizontal axes show the numbers of the decreasingly ordered singular values. Fig. 2. View largeDownload slide Hierarchical singular values of $${\bf u}$$, where $$D$$ has $$\sqrt{d}\times\sqrt{d}$$-checkerboard geometry as in Fig. 1. Solid lines: singular values of matricizations $$T^{(\{i\})}_\mathbf{u}$$ associated with $$i\in\hat {\mathcal{I}}$$ (essentially identical for $$i\in{\mathcal{I}}$$, with the line of slower decay corresponding to $$i={\rm x}$$) and dashed lines: singular values of further matricizations in the hierarchical representation. The horizontal axes show the numbers of the decreasingly ordered singular values. Remark 6.3 As we have noted for the spatially one-dimensional case in Example 6.1 in Section 6, for the separation between spatial and parametric variables for that case one always obtains fixed finite ranks that grow linearly in the number of parameters $$d$$. Note, however, that the approximation ranks corresponding to further separations among the parametric variables may then still not be uniformly bounded; see e.g., (Khoromskij & Schwab, 2011, Proposition 2.5) for an analysis of a simple example. In all examples considered above, we observe exponential-type decay of singular values. In particular, the numerical results in Fig. 2 indicate that Assumptions 3.2 and 3.3 are met for this family of problems. In fact, the obtained hierarchical singular values are consistent with the decay $$\exp({-\bar c n^{1/\bar b}})$$ with some $$\bar b > 1$$ independent of $$d$$ and $$\bar c >0$$ algebraic in $$d$$. The compressibility of any desired order of the operators $${\mathbf{A}}_j$$ is known from classical wavelet theory. In contrast, the decay of Legendre coefficients $${\lVert{u_\nu}\rVert} \sim \pi^{({\rm y})}_\nu({\bf u})$$ is significantly slower, where as in Fig. 1 one only observes a decay of the form $$\exp({-c n^{1/d}})$$ for the $$n$$th largest $${\lVert{u_\nu}\rVert}$$ as predicted by the available estimates (see, e.g., Bachmayr & Cohen, 2017). This indicates a clear advantage of hierarchical tensor approximations of the form (STF) over sparse polynomial expansions (ASP) for such problems. 6.2 Anisotropic dependence on infinitely many parameters We next consider a problem of the form (6.2) with countably many parameters of decreasing influence, where our conclusion are quite different from those concerning Example 6.1. Example 6.4 Let $$\mathcal{I} = {\mathbb{N}}$$, and let $$D_j\,{\subset}\,]0,1[$$ be disjoint with $${\lvert{D_j}\rvert}>0$$ for all $$j$$. In addition, let $$(b_j)_{j\geq 1} \in \ell^{q}({\mathbb{N}})$$ for some $$q>0$$. As an immediate consequence of the results in Bachmayr et al. (2017, Section 4.1), one has the following: Proposition 6.5 In Example 6.4, for all right-hand sides $$f\,{\in}\,V'$$, one has $$({\lVert{u_\nu}\rVert}_V)_{\nu \in {\mathcal{F}}}\,{\in}\,\ell^{p}({\mathcal{F}})$$ for $$p = \frac{2q}{2+q}$$. If $$(b_j)\notin \ell^{q'}({\mathbb{N}})$$ for any $$0<q'<q$$, then there exists $$f\in V'$$ such that $$({\lVert{u_\nu}\rVert}_V)_{\nu \in {\mathcal{F}}} \notin \ell^{p'}({\mathcal{F}})$$ for $$0\,{<}\,p'\,{<}\,p$$. If $$\sigma_n$$ are the singular values of $$u$$, then for the decreasing rearrangement $$(u^*_n)_{n\geq 1}$$ of $$({\lVert{u_\nu}\rVert}_V)_{\nu\in{\mathcal{F}}}$$ we clearly have $$u^*_n \geq \sigma_n$$. As the following new result shows by similar arguments as in Bachmayr et al. (2017, Section 4.1), the singular values do not necessarily have faster asymptotic decay in this situation than the ordered norms of the Legendre coefficients. Proposition 6.6 In Example 6.4, if $$(b_j)\notin \ell^{q'}({\mathbb{N}})$$ for any $$0<q'<q$$ then there exists an $$f \in V'$$ such that the singular values of $$u$$ are not in $$\ell^{p'}({\mathbb{N}})$$ for $$0<p' < p = \frac{2q}{2+q}$$. Proof. We first observe that the singular values of $$u = \sum_{\nu\in{\mathcal{F}}} u_\nu\otimes L_\nu$$ are bounded from below by those of $$\tilde u =\sum_{j\geq 1} u_{e^j} \otimes L_{e^j}$$, with $$e^j$$ denoting the $$j$$th Kronecker sequence. This follows from the fact that $$\tilde u = ({\rm I} \otimes \tilde P) u$$, where $$\tilde P$$ is the projector onto $$\overline{\operatorname{span}}\{L_{e^j}\}_{j\geq 1}$$. For $$u_{e^j}$$, one has by Rodrigues’ formula the explicit representation \begin{equation}\label{rodrigues2} u_{e^j} = \frac{\sqrt3}{2} \int_Y t_{e^j}(y)\, (1 - y_j^2) \,{\rm{d}}\mu(y) \end{equation} (6.7) in terms of the first-order derivatives $$t_{e^j}(y)=\partial_{y_j} u(y)$$. L