# Parametric PDEs: sparse or low-rank approximations?

Parametric PDEs: sparse or low-rank approximations? Abstract We consider adaptive approximations of the parameter-to-solution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial expansions, general low-rank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their near-optimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity is a new operator compression result in the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized low-rank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem. 1. Introduction Complex design, optimization or uncertainty quantification tasks based on parameter dependent families of partial differential equations (PDEs) arise in virtually all branches of science and engineering. Typical scenarios are models whose physical properties—such as diffusivity, transport velocity or domain geometry—are described by a finite number of real parameter values. In certain instances, one may even encounter infinitely many parameters of decreasing influence. This occurs, for instance, in the case of a random stochastic diffusion field represented by an infinite expansion in a given basis. The development and analysis of numerical strategies for capturing the dependence of the PDE on the parameters has been the subject of intensive research efforts in recent years. 1.1 Problem formulation The problems that are addressed in this article have the following general form. Let $$V$$ be a separable Hilbert space. We consider a parametric operator $$A(y)\colon V\to V'$$ of the form $$\label{Arep}A(y) := A_0 + \sum_{j\in{\mathcal{I}}} y_j A_j,\quad y\in{Y} := [-1,1]^{\mathcal{I}}\!,$$ (1.1) where $${\mathcal{I}}=\{1,\dots,d\}$$ or $${\mathcal{I}}= {\mathbb{N}}$$ in the finite or infinite dimensional case, respectively. In the infinite dimensional case, we require that the above series converges in $${\cal L}(V,V')$$ for any $$y\in {Y}$$. We assume uniform boundedness and ellipticity of $$A(y)$$ over the parameter domain, that is, $$\label{ellip}\langle A(y)v,w\rangle \leq R \|v\|_V\|w\|_V \quad {\rm and} \quad\langle A(y)v,v\rangle \geq r \|v\|_V^2, \quad v,w\in V, \; y\in {Y},$$ (1.2) for some $$0<r\leq R<\infty$$, which implies in particular that $$A(y)$$ is boundedly invertible uniformly in $$y\in {Y}$$, with $$\|A(y)\|_{{\cal L}(V',V)}\leq r^{-1}, \quad y\in {Y}.$$ (1.3) We also consider parametric data $$f\colon {Y} \to V'$$, and for each $$y\in Y$$, we define $$u(y)\in V$$ the solution to the equation $$\label{paramgeneral}A(y)\,u(y) = f(y).$$ (1.4) A guiding example is provided by affinely parametrized diffusion problems of the form $$\label{paramdiffusion-0}A(y) u := -{\mathrm{div}}\bigl( a(y) \nabla u \bigr) = f , \quad a(y) := \bar a + \sum_{j \in {\mathcal{I}}} y_j {\theta}_j ,$$ (1.5) with homogeneous Dirichlet boundary conditions, posed in the weak sense on a spatial domain $$D \subset {\mathbb{R}}^m$$. In this particular case of frequent interest, the right-hand side $$f\in V'$$ is independent of $$y$$. The validity of (1.2) is then usually ensured by the uniform ellipticity assumption $$\label{uea}\sum_{j\geq 1} {\lvert{{\theta}_j(x)}\rvert} \leq \bar a(x) - \underline{\alpha}, \quad x\in D,$$ (1.6) for some $$\underline{\alpha}>0$$. We then have $$V=H^{1}_0(D)$$, and the corresponding operators $$A_j \colon V \to V'$$ for $$j \in \{0\}\cup {\mathcal{I}}$$ are defined by \begin{equation*}\langle A_0 u, v \rangle := \int_D \bar{a} \nabla u\cdot\nabla v \,{\rm{d}}x ,\qquad \langle A_j u, v \rangle := \int_D {\theta}_j \nabla u\cdot\nabla v \,{\rm{d}}x ,\quad i \in{\mathcal{I}}, \end{equation*} for $$u,v\in V$$. Thus, $$V$$ is typically a function space defined over some physical domain $$D\subset {\mathbb{R}}^m$$, with $$m=1,2,3$$, for example, the Sobolev space $$H^1_0(D)$$ in the above case of second-order elliptic equations with homogeneous Dirichlet boundary conditions. Therefore, the solution may either be viewed as the Hilbert space-valued map $$y\mapsto u(y), \label{solutionmap}$$ (1.7) which acts from $${Y}$$ to $$V$$ or as the scalar-valued map $$(x,y)\mapsto u(x,y):=u(y)(x),$$ (1.8) where $$x\in D$$ and $$y\in {Y}$$ are referred to as the spatial and parametric variables. Approximating such solution maps amounts to approximating functions of a large or even infinite number of variables. In applications, one is often interested in specific functionals of the solution. Here, we focus on the basic question of approximating the entire solution map in an appropriate norm. The guiding questions, to be made precise below, are the following: What are the most suitable approximations to cope with the high dimensionality in problems of the form (1.4), and what features of problems (1.4) favour certain approaches over others? Moreover, at what numerical cost can one find these approximations, and how do these costs depend on particular features of the given problem? To address the latter question, for each setting, we construct adaptive computational methods that exhibit near-optimal complexity, in a sense to be made precise below. 1.2 Sparse and low-rank approximability Before addressing any concrete numerical schemes, we discuss the basic concepts of approximations for the solution map $$u$$ in (1.7). We focus on the mean squared error $$\|u-\tilde u\|_{L^2({Y},V)}$$ for an approximation $$\tilde u$$, where $\|v\|_{L^2({Y},V)}^2:=\int_{Y} \|v(y)\|_V^2 {\rm \,{d}}\mu(y),$ for a given probability measure $$\mu$$ over $${Y}$$. In what follows, we assume that $$\mu$$ is the uniform probability measure on $${Y}$$. The results carry over, however, to other product measures on $${Y}$$. The following types of approximation make essential use of the tensor product structure of the Bochner space $$L^2({Y},V)=V\otimes L^2({Y})$$, where $$L^2({Y})=L^2({Y},\mu)$$. Sparse polynomial expansions A first approach to approximating $$y\mapsto u(y)$$ is to employ an a priori chosen basis $$\{u_1^{{\rm y}}, \dots, u_n^{{\rm y}}\}\subset L^2(Y)$$ and compute the $$u_i^{{\rm x}}\in V$$ as the corresponding coefficients of this approximation. One prominent example of this approach is the orthogonal polynomial expansion method (see, e.g., Ghanem & Spanos, 1990, 1991; Le Maître & Knio, 2010; Xiu, 2010). In this case, the parametric functions $$u_i^{{\rm y}}$$ are picked from the set of tensorized Legendre polynomials $$L_\nu(y)=\prod_{j\geq 1} L_{\nu_j}(y_j), \quad \nu=(\nu_j)_{j\geq 1}, \label{tensorleg}$$ (1.9) with $$(L_k)_{k\geq 1}$$ the univariate Legendre polynomial sequence normalized in $$L^2([-1,1],\frac {\,{\rm d}t} {2})$$. The functions $$(L_\nu)_{\nu\in{\mathcal{F}}}$$ are an orthonormal basis of $$L^2({Y})$$, where $${\mathcal{F}}$$ is $${\mathbb{N}}_0^d$$ in the case $${\mathcal{I}}=\{1,\dots,d\}$$ or the set of finitely supported sequences of non-negative integers in the case $${\mathcal{I}}={\mathbb{N}}$$, that is, $${\mathcal{F}} := \{ \nu\in{\mathbb{N}}_0^{{\mathbb{N}}} \colon \#\,{\mathrm{supp}}\,\, \nu < \infty \}.$$ (1.10) One thus has $$\label{Legendre} u(y)=\sum_{\nu\in{\mathcal{F}}} u_\nu L_\nu(y), \quad u_\nu= \int_{Y} u(y)L_\nu(y) {\rm \,{d}}\mu(y).$$ (1.11) Then, one natural choice is the best $$n$$-term approximation $$u_n$$ obtained by restricting the above expansion to the set $${\it{\Lambda}}^{\rm y}_{n}\subset {\mathcal{F}}$$ of indices $$\nu$$ corresponding to the $$n$$ largest $$\|u_\nu\|_V$$, because this set minimizes the error $$\|u-u_n\|_{L^2({Y},V)}$$ among all possible choices of $$n$$-term truncations. This strategy for generating sparse polynomial approximations in the context of parametric PDEs was first introduced and analysed in Cohen et al. (2010, 2011). In practice, the set $${\it{\Lambda}}^{\rm y}_n$$ is not accessible, but provides a benchmark for the performance of algorithms. This representation complexity, however, does not yet determine the resulting computational complexity, because the coefficients $$u_\nu$$ in (1.11) in turn need to be approximated as well. For instance, one may choose a fixed basis $$\{ \psi_\lambda\}_{\lambda\in{\mathcal{S}}}$$ of $$V$$ and expand $u_\nu = \sum_{\lambda\in {\mathcal{S}}} {\bf u}_{\lambda,\nu} \psi_\lambda$ in (1.11), where $${\bf u}_{\lambda,\nu}\in{\mathbb{R}}$$. The simplest strategy is to use the same discretization for all $$u_\nu$$ by selecting a finite $${\it{\Lambda}}^{\rm x}$$, which yields the approximation $u \approx \sum_{\nu \in {\it{\Lambda}}_{n}^{\rm y}} \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}} {\bf u}_{\lambda,\nu} \psi_\lambda \right)\otimes L_\nu.$ Using instead an independently adapted spatial discretization for each $$u_\nu$$ corresponds to adaptive sparse polynomial approximations of the form $$\label{fullnterm}u \approx \sum_{(\lambda, \nu) \in {\it{\Lambda}}} {\bf u}_{\lambda,\nu}\, \psi_\lambda\otimes L_\nu,$$ (ASP) with $${\it{\Lambda}} \subset {\mathcal{S}} \times {\mathcal{F}}$$. It is natural to quantify the complexity of such an approximation by the number of activated degrees of freedom $$\#{\it{\Lambda}}$$. Here, one can again ask for best $$N$$-term approximations, now with respect to the fixed basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$, obtained by minimizing the error over all $${\it{\Lambda}}$$ with $$\#{\it{\Lambda}} = N$$. This now results in a fully discrete approximation. Low-rank approximation More generally, one may consider approximations of the form $$\label{basiclrapprox}u \approx u_n :=\sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k,$$ (1.12) where $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ are functions of the spatial and parametric variables, respectively. This contains (1.11) as a special case, but we also now allow $$u^{\rm y}_k \in L^2(Y)$$ to be arbitrary functions that are not given a priori, but adapted to the given problem. The shortest expansion of the form (1.12) that achieves a prescribed error in $$L^2({Y},V)$$ is given by truncation of the Hilbert–Schmidt decomposition of $$u$$ interpreted as the operator $$\label{Tu}T_u: v \to \int_Y u(x,y)v(y){\rm \,{d}}\mu(y),$$ (1.13) acting from $$L^2({Y})$$ to $$V$$. In this context, we define $${\mathrm{rank}}(u)$$ as the rank of the operator $$T_u$$, so that in particular $$u_n$$ with a representation by $$n$$ separable terms as in (1.12) has $${\mathrm{rank}}(u_n)\leq n$$. The functions $$u_1^{\rm x}, \dots, u_n^{\rm x}$$ and $$u_1^{\rm y}, \dots, u_n^{\rm y}$$ are given by the left and right singular functions, respectively, which yield the optimal rank-$$n$$ approximation of $$u$$ in $$L^2({Y},V)$$. This particular system of basis functions is a natural benchmark, as it minimizes the rank $$n=n(\varepsilon)$$ required to ensure a mean squared accuracy $$\varepsilon$$. However, it is not obvious how to compute sufficiently good approximations of these basis functions at affordable cost, a point to be taken up again later. The methods considered in this article are based on computing approximations of both $$u_k^{\rm x}$$ and $$u_k^{\rm y}$$. A low-rank approximation trying to approximately realizing a truncated Hilbert–Schmidt decomposition would be a first example for this category aiming at meeting the aforementioned benchmark. In this case, the error caused by truncation should ideally be balanced against the error in approximating the unknown basis functions $$u^{\rm x}_k, u^{\rm y}_k$$. Note that there exist alternative approaches for deriving computable expansions of the form (1.12), where only the functions $$u_1^{{\rm x}}, \dots, u_n^{{\rm x}}$$ and their span $$V_n$$ are constructed, which we comment on in Section 1.4. To obtain numerically realizable approximations, we may again use bases of $$V$$ and $$L^2(Y)$$ as in (ASP) and consider expansions for $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ to arrive at fully discrete low-rank approximations of the form $$\label{fulllr}u \approx \sum_{k=1}^n \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}_k} {\bf u}^{\rm x}_{k,\lambda} \psi_\lambda \right)\otimes \left(\sum_{\nu\in {\it{\Lambda}}^{\rm y}_k} {\bf u}^{\rm y}_{k,\nu} L_\nu \right)\! ,$$ (LR) with $${\it{\Lambda}}^{\rm x}_k \subset {\mathcal{S}}$$, $${\it{\Lambda}}^{\rm y}_k\subset {\mathcal{F}}$$, $$k=1,\ldots,n$$. Hierarchical tensor decompositions One may as well go beyond the Hilbert–Schmidt decomposition (1.12) and consider higher-order low-rank tensor representations that correspond to further decompositions of the factors $$u^{\rm y}_k$$ in (1.12). For simplicity, at this point, let us consider this in the finite-dimensional case $$d<\infty$$, possibly after truncating the expansion (1.1) for $$A(y)$$. Introducing an additional tensor decomposition of the factors $${\bf u}^{\rm y}_k$$, we obtain the general approximations in subspace-based tensor formats, $$\label{general}u_n = \sum_{k_{\rm x}=1}^{r_{\rm x}} u_{k_{\rm x}}^{\rm x} \otimes \left(\sum_{k_1=1}^{r_1}\cdots\sum_{k_d=1}^{r_d}{\bf a}_{k_{\rm x},k_1,\ldots,k_d}\bigotimes_{j=1}^d u^{{\rm y},j}_{k_j} \right)\!,$$ (STF) where each $$u_{k_j}^{{\rm y},j}$$ is a function of the individual variable $$y_j$$. The minimal $$r_j$$ such that $$u_n$$ can be represented in the form (STF) are called multilinear ranks of $$u_n$$. We confine our discussion to hierarchical tensor representations, with the tensor train format as a special case (see, e.g., Oseledets & Tyrtyshnikov, 2009; Grasedyck, 2010; Hackbusch, 2012), where the high-order core tensor $${{\bf a}}=({\bf a}_{k_{\rm x},k_1,\ldots,k_d})_{k_{\rm x},k_1,\ldots,k_d}$$ is further decomposed in terms of lower-order tensors, based on matricizations of $${\bf a}$$. For instance, if $$\label{ranks} \tilde r_i= {\mathrm{rank}}\big({\bf a}_{(k_0,\ldots, k_i),(k_{i+1},\ldots k_d)}\big),$$ (1.14) one has a factorized representation of the form $$\label{htcore}{{\bf a}}_{k_{\rm x},k_1,\ldots,k_d} = \sum_{\ell_1=1}^{\tilde r_1}{\mathbf{M}}^{(1)}_{k_{\rm x},\ell_1} \sum_{\ell_2=1}^{\tilde r_2} {\mathbf{M}}^{(2)}_{\ell_1,k_1,\ell_2} \cdots \sum_{\ell_{d-1}=1}^{\tilde r_{d-1}} {\mathbf{M}}^{(d-1)}_{\ell_{d-2},k_{d-2},\ell_{d-1}} {\mathbf{M}}^{(d)}_{\ell_{d-1},k_{d-1},k_d}$$ (1.15) in terms of the tensors $${\mathbf{M}}^{(i)}$$, $$i=1,\ldots,d$$, of order at most three, and only these low-order tensors need to be stored and manipulated. The representation (STF) contains (ASP) and (LR) as special cases. For instance, to recover a sparse polynomial expansion (ASP), let $$\nu(k_{\rm x})$$, $$k_{\rm x}=1,\ldots,r_{\rm x}$$, be an enumeration of elements of $${\mathbb{N}}^d_0$$, and choose $${\bf a}_{k_{\rm x},k_1,\ldots,k_d} = \delta_{\nu(k_{\rm x}),(k_1,\ldots,k_d)}$$, $$u^{{\rm y},j}_{k_j}= L_{k_j}$$. With $$r_{\rm x} = r_1=\ldots=r_d$$ and diagonal core tensor $${\bf a}$$ having nonzero entries $${\bf a}_{k,k,\ldots,k}=1$$ for $$k = 1,\ldots, r_{\rm x}$$, one obtains representations (LR). 1.3 Guiding questions In the different types of approximation outlined above, the degrees of freedom enter in varying degrees of nonlinearity. More strongly nonlinear approximations (STF) with hierarchical decomposition (1.15) can potentially yield more strongly compressed representations, in the sense that the number of degrees of freedom $$n_{\text{dof}}(\varepsilon)$$ required for a target accuracy $$\varepsilon$$ in $$L^2(Y,V)$$ scales more favourably. Handling this stronger compression in the computation of such representations, however, leads to additional difficulties, and the number of required operations $$n_{\rm op}(\varepsilon)$$ may in fact scale less favourably than $$n_{\text{dof}}(\varepsilon)$$. Here, we aim for algorithms which, for each of the above types of approximation, are guaranteed to achieve any prescribed accuracy $$\varepsilon$$ and which are universal. This means that they do not require a priori knowledge on the approximability of the solution (e.g., on the decay of coefficients), but adjust to such approximability automatically. This goes hand in hand with a mechanism for obtaining a posteriori error bounds, making use only of the given data. This leads us to our first guiding question: (I) For a given parametric problem and approximation format (ASP), (LR) or (STF), can one contrive a universal numerical scheme that can achieve any given target accuracy $$\varepsilon$$, with approximate solutions close to the minimum required representation complexity $$n_{\rm dof} (\varepsilon)$$, and can $$n_{\rm op}$$ be related to $$\varepsilon$$, and hence to $$n_{\rm dof}(\varepsilon)$$? The minimum required representation complexity can be expressed in terms of the intrinsic approximability properties of the parametrized solutions $$u$$ in each of the formats. The corresponding required number of operations also depends on the problem data that are used in the solution process. We construct algorithms, based on a common generic strategy for (ASP), (LR) and (STF), which are near-optimal in this regard. With such algorithms at hand, a natural further question is the following. (II) Which of the approximation types (ASP), (LR) or (STF) is best suited for a given parametric problem, in the sense of leading to the smallest growth of $$n_{\rm op}(\varepsilon)$$ as $$\varepsilon\to 0$$? This amounts to asking for which parametric problems the investment into approximations of higher structural nonlinearity pays off, or conversely, for which problems possible gains in approximation efficiency are offset by more demanding computations. We address this point by analyses of the approximability of model problems, complemented by numerical experiments, with conclusions depending on the particular problem type. For problems with finitely many parameters that are each of comparable influence, hierarchical tensor representations of the form (STF) with (1.15) turn out to be clearly advantageous. In the case of an anisotropic dependence on infinitely many parameters, for representative model problems, we demonstrate that (ASP) can in general yield faster convergence than (LR) or (STF). The particular structure of such infinite parameter expansions also turns out to have a major influence on the efficiency of the adaptive schemes. 1.4 Relation to previous work There are a variety of results on the convergence of sparse polynomial expansions (1.11) (see, e.g., Cohen et al., 2010, 2011; Bachmayr et al., 2017). Furthermore, some estimates are available that include multilevel spatial discretizations and, hence, provide upper bounds for the error of best $$n$$-term approximation (ASP) (see, e.g., Cohen et al., 2011; Dũng, 2015). Concerning our question (II), there are only few specialized results comparing the different approximation formats. In the case of general bivariate functions, a systematic comparison between sparse grids and low-rank approximation is discussed in Griebel & Harbrecht (2014), showing in particular that for Sobolev classes the latter does not bring any improvement. In the case of high-dimensional functions associated with parametric PDEs, possible gains by low-rank approximations have been identified in Lassila et al. (2013) and Bachmayr & Cohen (2017) by exploiting the particular structure of the problem, all concerning the case of finitely many parameters. There are various approaches for generating sparse polynomial expansions, for instance, based on collocation (Babuška et al., 2010; Beck et al., 2012) or adaptive Taylor expansion (Chkifa et al., 2014). Note that these strategies do not currently yield a posteriori error bounds for the computed solutions, and their performance is thus described by a priori estimates that may not be sharp. The adaptive methods proposed in Eigel et al. (2014, 2015), based on finite element discretization for the spatial variable, yields a posteriori error bounds for the full approximations. However, the complexity bounds proved in Eigel et al. (2015) are given only in terms of the resulting finite element meshes. Adaptive schemes using wavelet-based spatial discretizations, which yield approximations of the form (ASP), have been studied by Gittelson (2013, 2014). In this case, bounds for the complete computational complexity are proved which, however, do not fully comply with the approximability properties of the solution. Reduced basis and proper orthogonal decomposition (POD) methods (Kahlbacher & Volkwein, 2007; Rozza et al., 2008; Lassila et al., 2013; Rozza, 2014) correspond to expansions of the form (1.12), where only the spatial basis elements $$u^{\rm x}_k$$ spanning $$V_n$$ are explicitly computed in an offline stage. Then, in an online stage, for any given $$y\in {Y}$$, the approximate solution $$u_r(y)$$ is defined as the Galerkin projection of $$u(y)$$ on the space $$V_n$$. For known variants of these methods, accuracy guarantees in the respective norms (where reduced basis methods usually aim at the error in $$L^\infty$$-norm $$\|v\|_{L^\infty({Y},V)} :=\sup_{y\in {Y}} \|v(y)\|_V$$) require a sufficiently dense sampling of the parameter domain. This becomes prohibitive for large $$d$$, and only one obtains a posteriori bounds for the resulting $$V$$-error in each given $$y\in{Y}$$. In methods based on higher-order tensor representations, instead of sampling in the parameter domain, one also approximates $$u^{\rm y}_k$$ as in (STF), at the price of additional approximability requirements as in (1.15). A variety of schemes have been proposed that operate on fixed discretizations (Khoromskij & Oseledets, 2010; Khoromskij & Schwab, 2011; Kressner & Tobler, 2011; Matthies & Zander, 2012), which do not yield information on the discretization error. Based on Eigel et al. (2014), an adaptive scheme for hierarchical tensor approximation is proposed in Eigel et al. (2017). It provides rigorous a posteriori bounds for the approximation error, but is not proven to converge. 1.5 Novelty of the article and outline Question (I) is addressed in Sections 2–5. A generic algorithm is described in Section 2 based on the work in Bachmayr & Dahmen (2015), which is guaranteed to converge without any a priori assumptions on the solution. Furthermore, it yields rigorous a posteriori error bounds, using only information on the problem data. Suitable specifications cover all the aforementioned types of approximations (ASP), (LR) and (STF). The scheme is formulated in a general sequence space framework, using a discretization of the space $$L^2({Y},V)$$ through a basis with elements of the form $$\psi_\lambda\otimes L_\nu$$. Here, $$\{\psi_\lambda\}_{\mu\in {\mathcal{S}}}$$ is a given Riesz basis of $$V$$ (e.g., a wavelet basis in the case where $$V$$ is a Sobolev space) and $$\{L_\nu\}_{\nu \in{\mathcal{F}}}$$ is the previously described multivariate Legendre basis. The algorithm performs an iteration in the sequence space $$\ell^2({\mathcal{S}}\times {\mathcal{F}})$$. It involves at each step specific routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ aiming at controlling the ranks of the current approximation as well as the number of degrees of freedom in each of its factors, respectively. We then describe realizations of this generic algorithm corresponding to two distinct settings. In Section 3, we apply the algorithm for the generation of approximations (STF) in the setting of finitely many parametric variables. In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on a truncation of a hierarchical singular value decomposition of the coefficient tensor. We analyse the performance of the algorithms for classes described by the decay of the corresponding singular values and joint sparsity of the corresponding singular vectors. Sections 4 and 5 are devoted to the case of anisotropic dependence on infinitely many parameters in the diffusion problem (1.5). In Section 4, we analyse a specialized version of Algorithm 2.1 producing $$n$$-term sparse Legendre expansions, see (ASP). In this version, the routine $${\small{\text{RECOMPRESS}}}$$ is simply the identity, and hence, Algorithm 2.1 agrees with the adaptive solver developed and analysed in Cohen et al. (2002). In Section 5, we consider, in the same setting as in Section 4, a solver for approximations (LR). In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on standard SVD truncation. The corresponding notions of approximability are analogous to those arising in Section 3. A key ingredient in Sections 4 and 5 is the adaptive approximation of the operator based on matrix compression results in Appendix A. Here, we obtain new estimates for wavelet-type multilevel expansions of the parametrized coefficients that are more favourable than what is known for Karhunen–Loève-type expansions. Our further algorithmic developments also require substantially weaker assumptions on the $$A_j$$ in (1.1) than the methods in Eigel et al. (2014, 2017), which require summability of $$({\lVert{A_j}\rVert})_{j\geq 1}$$. By the new operator compression results, we establish, in particular, computational complexity estimates for (ASP), which significantly improve on those of similar schemes in Gittelson (2013, 2014). Based on these complexity estimates, Question (II) is then addressed in Section 6. Although the presented algorithms are guaranteed to converge, the corresponding computational cost can only be quantified in terms of approximability properties of solutions $$u$$. In Section 6, we study the corresponding properties, which are different for each realization of the scheme, in representative examples of parametric problems of the form (1.5). In particular, for a certain class of such problems, we prove that the best $$n$$-term Legendre approximation is already asymptotically near-optimal among all rank-$$n$$ approximations. For other examples, we prove that optimized low-rank approximations can achieve significantly better complexity than best $$n$$-term Legendre approximations. This is illustrated further by numerical tests, demonstrating that these observations also hold for more involved model problems. 2. A generic algorithm In this section, we follow the approach developed in Bachmayr & Dahmen (2015) by first reformulating the general equation (1.4) in a sequence space, and then introducing a generic resolution algorithm based on this equivalent formulation. We first notice that (1.4) may also be written as $$Au=f, \label{paramgeneral1}$$ (2.1) where $$A$$ is elliptic and boundedly invertible from $$L^2({Y},V)$$ to $$L^2({Y},V')$$ and can be defined in a weak sense by $$\langle Au,v\rangle :=\int_{{Y}}\langle A(y)u(y),v(y)\rangle {\rm \,{d}}\mu(y), \quad u,v\in L^2({Y},V).$$ (2.2) We assume that $$f\in L^2({Y},V')$$, so that there exists a unique solution $$u\in L^2({Y},V)$$. Given a Riesz basis $$\{ \psi_\lambda \}_{\lambda \in {\mathcal{S}}}$$ of $$V$$, we tensorize it with the orthonormal basis $$\{L_\nu\}_{\nu\in {\mathcal{F}}}$$ of $$L^2({Y})$$. The resulting system $$\{\psi_\lambda \otimes L_\nu\}_{(\lambda,\nu)\in {\mathcal{S}} \times {\mathcal{F}}}$$ is a Riesz basis of $$L_2({Y},V)$$, which we now use to discretize (2.1). For this purpose, we define the matrices $${\mathbf{A}}_j := \bigl( \langle A_j \psi_{\lambda'}, \psi_{\lambda}\rangle \bigr)_{\lambda,\lambda' \in {\mathcal{S}}}\quad{\rm and} \quad {\mathbf{M}}_j = \left( \int_{{Y}} y_j L_\nu(y) L_{\nu'}(y) \,{\rm{d}}\mu(y) \right)_{\nu,\nu'\in{\mathcal{F}}},$$ (2.3) where $${\mathbf{M}}_0$$ is set to be the identity on $$\ell^2({\mathcal{F}})$$, and the right-hand side column vector $${\mathbf{f}} := \big(\langle f, \psi_\lambda\otimes L_\nu\rangle\big)_{(\lambda,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}.$$ (2.4) We thus obtain an equivalent problem $$\label{equiv}{\mathbf{A}} {\bf u} = {\mathbf{f}}$$ (2.5) on $$\ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, where $$\label{Arep2}{\mathbf{A}} := \sum_{j\geq 0} {\mathbf{A}}_j \otimes {\mathbf{M}}_j$$ (2.6) and $${\bf u}=\big(u_{\lambda,\nu}\big)_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$ is the coordinate vector of $$u$$ in the basis $$\{\psi_\mu \otimes L_\nu\}_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$. Regarding $$\nu\in{\mathcal{F}}$$ as the column index of the infinite matrix $${\bf u}=({\bf u}_{\mu,\nu})_{\mu \in {\mathcal{S}},\nu\in{\mathcal{F}}}$$, we denote by $${\bf u}_\nu$$ the columns of $${\bf u}$$, which are precisely the basis representations of the Legendre coefficients $$u_\nu\in V$$. In what follows, we always denote by $${\lVert{\cdot}\rVert}$$ the $$\ell^{2}$$-norm on the respective index set that could be $${\mathcal{S}}$$, $${\mathcal{F}}$$ or $${\mathcal{S}}\times{\mathcal{F}}$$ or the corresponding operator norm when this is clear from the context. Since $$\{\psi_\mu\}_{\mu\in {\mathcal{S}}}$$ is a Riesz basis for $$V$$ we have $$\|u_\nu\|_{V}\sim \|{\bf u}_\nu\|$$ uniformly in $$\nu\in{\mathcal{F}}$$, which together with boundedness and ellipticity of $$A$$ implies that $${\mathbf{A}}$$ is bounded and elliptic on $$\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$, and that we have $$\label{normequiv2}\|{\bf u}\|\sim \|{\mathbf{A}}{\bf u}\|\sim \|Au\|_{L^2({Y},V')}\sim \|u\|_{L^2({Y},V)}$$ (2.7) with uniform constants. On account of (2.7), solving (2.5) approximately up to some target accuracy is equivalent to solving (2.5) in $$\ell^{2}$$ to essentially the same accuracy. As a further consequence, one can find a fixed positive $$\omega$$ such that $${\lVert{\mathbf{I} - \omega {\mathbf{A}}}\rVert}\le \rho <1$$, ensuring that a simple Richardson iteration converges with a fixed error reduction rate per step. This serves as the conceptual starting point for the adaptive low-rank approximation scheme introduced in Bachmayr & Dahmen (2015) as given in Algorithm 2.1. This basic algorithmic template can be used to produce various types of sparse and low-rank approximations, with appropriate choices of the subroutines $${\small{\text{APPLY}}}$$, $${\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$. The procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ are independent of the considered $${\mathbf{A}}$$ and $${\mathbf{f}}$$, and satisfy $$\label{craccuracy}{\lVert{{\small{\text{COARSEN}}}({\bf v} ; \eta ) - {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RECOMPRESS}}}({\bf v};\eta) - {\bf v}}\rVert} \leq \eta,$$ (2.8) for any $$\eta \geq 0$$ and any compactly supported $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$. Here, $${\small{\text{COARSEN}}}$$ is intended to reduce the support of the sequence $${\bf v}$$, whereas $${\small{\text{RECOMPRESS}}}$$ reduces the rank of $${\bf v}$$ in a low-rank tensor representation. The particular realizations of these routines depend on the dimensionality of the problem and on the type of approximation. We shall use the constructions given in Bachmayr & Dahmen (2015). In the case of the sparse approximations considered in Section 4, $${\small{\text{RECOMPRESS}}}$$ is chosen as the identity, and Algorithm 2.1 essentially reduces to the method analysed in Cohen et al. (2002). The routines $${\small{\text{APPLY}}}$$ and $${\small{\text{RHS}}}$$ are assumed to satisfy, for compactly supported $${\bf v}$$ and any $$\eta>0$$, the requirements $$\label{applyaccuracy}{\lVert{{\small{\text{APPLY}}}({\bf v} ; \eta ) - {\mathbf{A}} {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RHS}}}(\eta) - {\mathbf{f}}}\rVert} \leq \eta.$$ (2.9) Their construction depends on not only the type of approximation, but also the specific problem under consideration. These two routines are indeed the main driver of adaptivity in Algorithm 2.1, and a major part of what follows concerns the construction of $${\small{\text{APPLY}}}$$ in different scenarios. It hinges on the compression of matrices by exploiting their near-sparsity in certain basis representations. We use the following notion introduced in Cohen et al. (2001): A bi-infinite matrix $${\mathbf{B}}$$ is called $$s^*$$-compressible if there exist matrices $${\mathbf{B}}_{n}$$ with $$\alpha_{n} 2^n$$ entries per row and column and such that $$\label{compress}{\lVert{{\mathbf{B}} - {\mathbf{B}}_{n}}\rVert} \leq \beta_{n} 2^{-s n}, \quad \mbox{ for }\,\, 0 < s < s^*,$$ (2.10) and where the sequences $${\boldsymbol{\alpha}} = (\alpha_n)_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}} = (\beta_n)_{n\in{\mathbb{N}}_0}$$ are summable. Here, we always assume $${\mathbf{B}}_{0} = 0$$. Remark 2.1 As shown in Bachmayr & Dahmen (2015), regardless of the specifications of the routines $${\small{\text{APPLY}}}, {\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}\,\, and \,\,{\small{\text{RECOMPRESS}}}$$, Algorithm 2.1 terminates after finitely many steps and its output $${\bf u}_\varepsilon$$ satisfies $$\|{\bf u} - {\bf u}_\varepsilon\|\le \varepsilon$$. At this point, we record for later usage a particular feature of $${\mathbf{A}}$$ that arises as a consequence of our choice of tensor product orthogonal polynomials for parameter dependence: the approximate application of $${\mathbf{A}}$$ is facilitated by the fact that the matrices $${\mathbf{M}}_j$$ are bidiagonal. That is, in view of the three-term recurrence relation $$\label{threeterm}tL_n(t)= p_{n+1} L_{n+1}(t) + p_{n} L_{n-1}(t), \quad L_{-1}\equiv 0,$$ (2.11) where $$\label{recurr_coeff}p_0 = 0, \qquad p_n= \frac{1}{\sqrt{4 - n^{-2}}} , \quad n>0,$$ (2.12) one has $$\int_Y y_j\,L_\nu(y)L_{\nu'}(y)\,{\rm{d}}\mu(y) =0$$ unless $$| \nu_j - \nu'_j | = 1$$ and $$\nu_i = \nu'_i$$ for $$i \neq j$$ providing $$\label{threeterm_matrix}({\mathbf{M}}_j)_{\nu,\nu'}= p_{\nu_j + 1}\delta_{\nu +{\it \,{e}}^j,\nu'}+ p_{\nu_j}\delta_{\nu-{\it \,{e}}^j,\nu'}$$ (2.13) with the Kronecker sequence $$({\it {e}}^j_i)_{i\in {\mathcal{I}}}:=(\delta_{i,j} )_{i\in{\mathcal{I}}} \in {\mathcal{F}}$$. 3. Hierarchical tensor representations in the case of finitely many parameters We begin by considering the setting $$\label{cId}{\mathcal{I}}=\{1,\dots,d\},$$ (3.1) where $${\mathcal{F}} = {\mathbb{N}}_0^d$$ and $${\bf u} \in \ell^{2}({\mathcal{S}} \times {\mathbb{N}}_0\times\cdots\times{\mathbb{N}}_0)$$. Here, we are interested in the case that all coordinates in $${\mathcal{I}}$$ have comparable influence. As illustrated in Section 6, a direct sparse Legendre expansion of $${\bf u}$$ over $${\mathcal{S}}\times{\mathcal{F}}$$ will then in general be infeasible already for moderately large $$d$$. However, one may as well exploit Cartesian product structure in $${\mathcal{F}}$$, regarding $${\bf u}$$ as a higher-order tensor, and use corresponding hierarchical low-rank representations. As we shall detail in what follows, the results of Bachmayr & Dahmen (2015) can be adapted to this problem in a rather straightforward manner. It will be convenient to introduce a numbering of tensor modes as follows: $${\mathcal{G}}_{\rm x} := {\mathcal{S}}$$, $${\mathcal{G}}_1 := {\mathbb{N}}_0, \ldots$$, $${\mathcal{G}}_d := {\mathbb{N}}_0$$. We additionally introduce the notation $\hat{\mathcal{I}} := \{{\rm x}\}\cup{\mathcal{I}} .$ The representations of higher-order tensors that we consider are built on the Hilbert–Schmidt case via matricizations: for each nonempty $$M \subset \hat{\mathcal{I}}$$, any $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ induces a compact operator $$T_{\bf v}^{(M)} \colon \ell^{2}({\Large{\times}}_{i\in \hat{\mathcal{I}}\setminus M} {\mathcal{G}}_i) \to \ell^{2}({\Large{\times}}_{i\in M} {\mathcal{G}}_i)$$. In terms of the left singular vectors $$\{ {\bf U}^{(i)}_k \}_{k \in {\mathbb{N}}}$$ of $$T_{\bf v}^{(\{i\})}$$, $$i\in \hat{\mathcal{I}}$$, we obtain the HOSVD representation (Lathauwer et al., 2000) in the Tucker format (Tucker, 1964, 1966), $$\label{tuckerformat}{\bf v} = \sum_{k_{\rm x}=1}^{r_{\rm x}} \sum_{k_1 = 1}^{r_1} \cdots \sum_{k_d=1}^{r_d} {\bf a}_{k_{\rm x},k_1,\ldots,k_d} \bigotimes_{i\in\hat{\mathcal{I}}} {\bf U}^{(i)}_{k_i}.$$ (3.2) Here the tensor $${\bf a} = ({\bf a}_{\mathsf{k}})_{\mathsf{k}\in{\mathbb{N}}^{d+1}}$$ of order $$d+1$$ is referred to as core tensor and $$(r_{\rm x},r_1,\ldots,r_d)$$ as the multilinear ranks of $${\bf v}$$. The hierarchical tensor format (Hackbusch & Kühn, 2009), on which the variant of our scheme described in this section is based, can be interpreted as a further decomposition of $${\bf a}$$ into tensors of order at most three. This decomposition is obtained using further matricizations of the tensor according to a recursive decomposition of the set of modes $$\hat{\mathcal{I}}$$ into a binary tree, which we denote by $$\mathcal{D}$$. For simplicity, we focus in our exposition on linear trees corresponding to factorizations (1.15), where $$\label{lindimtree}\mathcal{D} = \bigl\{ \hat{\mathcal{I}}, \{{\rm x}\}, \{1,\ldots,d\}, \{1\}, \{2,\ldots, d\}, \ldots, \{d-1\}, \{d\} \bigr\}.$$ (3.3) For each $$\alpha \in \mathcal{D}$$, the rank of the corresponding matricization $$T^{(\alpha)}_{\bf v}$$ is denoted by $${\mathrm{rank}}_\alpha({\bf v})$$, its singular values by $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$. Here, $${\mathrm{rank}}_{\hat{\mathcal{I}}}({\bf v}) = 1$$ for all $${\bf v} \neq 0$$, and we set $${\mathrm{rank}}({\bf v}) := \bigl({\mathrm{rank}}_\alpha({\bf v})\bigr)_{\alpha\in\mathcal{D}\setminus\hat{\mathcal{I}}}.\label{multirank}$$ (3.4) In this section, we specialize the generic template Algorithm 2.1 to produce approximate solutions to (2.1) of the form (STF) with core tensor in hierarchical form as in (1.15). More precisely, the output is given in the form of a tensor $${\bf u}_\varepsilon$$ of order $$d+1$$, supported on $${\it{\Lambda}}^\varepsilon_{\rm x}\times {\it{\Lambda}}^\varepsilon_1 \times\cdots\times{\it{\Lambda}}^\varepsilon_d$$ with finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$. This tensor is given (assuming $$\mathcal{D}$$ as in (3.3)) in the following form: let $$r^\varepsilon_i := {\mathrm{rank}}_{\{i\}}({\bf u}_\varepsilon)$$ for $$i \in \hat {\mathcal{I}}$$ and $$\tilde r^\varepsilon_i := {\mathrm{rank}}_{\{i+1,\ldots, d\}}({\bf u}_\varepsilon)$$ for $$i = 1,\ldots,d-1$$, then \begin{align*}{\bf u}_{\varepsilon, \lambda,\nu_1,\ldots,\nu_d} &= \sum_{k_{\rm x}=1}^{r^\varepsilon_{\rm x}} \cdots \sum_{k_d=1}^{r^\varepsilon_d} \sum_{\ell_1=1}^{\tilde r^\varepsilon_1}\cdots\sum_{\ell_{d-1}=1}^{\tilde r^\varepsilon_{d-1}} {\mathbf{M}}^{(1),\varepsilon}_{k_{\rm x},\ell_1} \,{\mathbf{M}}^{(2),\varepsilon}_{\ell_1,k_1,\ell_2} \cdots {\mathbf{M}}^{(d-1),\varepsilon}_{\ell_{d-2},k_{d-2},\ell_{d-1}} \\ &\quad\times {\mathbf{M}}^{(d),\varepsilon}_{\ell_{d-1},k_{d-1},k_d} \, {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \, {\bf U}^{(1),\varepsilon}_{k_1, \nu_1}\cdots {\bf U}^{(d),\varepsilon}_{k_d, \nu_d}, \quad \lambda\in {\it{\Lambda}}^\varepsilon_{\rm x},\; \nu_i \in {\it{\Lambda}}^\varepsilon_i. \end{align*} The adaptive scheme identifies, in an intertwined fashion, the ranks $$r^\varepsilon_i$$, $$\tilde r^\varepsilon_i$$, the sets $${\it{\Lambda}}^\varepsilon_i$$, as well as the coefficient tensors $${\mathbf{M}}^{(i),\varepsilon}$$ and $${\bf U}^{(i),\varepsilon}$$ of $${\bf u}_\varepsilon$$. The function represented by $${\bf u}_\varepsilon$$ has precisely the form (STF), where $u^{{\rm x}}_{k_{\rm x}} = \sum_{\lambda \in {\mathcal{S}}} {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \psi_\lambda, \quad u^{{\rm y},i}_{k_i} = \sum_{n \in {\mathcal{G}}_i} {\bf U}^{(i),\varepsilon}_{k_i, n} L_n .$ The hierarchical format can offer substantially more favourable complexity characteristics for large $$d$$ than (3.2). The left singular vectors of the involved matricizations yield a hierarchical singular value decomposition (Grasedyck, 2010). We refer also to Falcó & Hackbusch (2012), Grasedyck et al. (2013), Hackbusch & Kühn (2009), Hackbusch (2012) and Kolda & Bader (2009) for detailed expositions regarding the finitely supported case (see also Oseledets & Tyrtyshnikov, 2009; Oseledets, 2011 for the related tensor train representation) and to Bachmayr & Dahmen (2015) for analogous results of tensors in sequence spaces, with notation analogous to this article. For quantifying the approximability of tensors on $${\Large{\times}}_{i \in \hat{\mathcal{I}}}{\mathcal{G}}_i$$ in terms of the best selection of finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$ as above, a pivotal role is played by the quantitites $$\label{contr}\pi^{(i)}({\bf v}) \,{:=}\, \bigl(\pi^{(i)}_{\nu_i}({\bf v})\bigr)_{\nu_i\in {\mathcal{G}}_i},\quad \pi^{(i)}_{\mu}({\bf v}) \,{:=}\, \left(\sum_{\substack{\nu\in{\mathcal{F}}\\ \nu_i=\mu}}|{\bf v}_\nu|^2\right)^{1/2} , \quad i \in \hat{\mathcal{I}} = \{{\rm x}, 1, \ldots, d\}.$$ (3.5) They are introduced in Bachmayr & Dahmen (2015) and called contractions in analogy to the terminology in tensor analysis. An efficient evaluation (without any $$d$$-dimensional summations) is possible due to the relation $$\label{sumN}\pi^{(i)}_{\mu}({\bf v})= \left(\sum_{k}|{\bf U}^{(i)}_{k,\mu}|^2|\sigma^{(i)}_k|^2\right)^{1/2},$$ (3.6) where $$\sigma^{(i)}_k$$ are the mode-$$i$$ singular values of $${\bf v}$$. As in our previous notation, we abbreviate $${\mathrm{supp}}_i {\bf v}:= {\mathrm{supp}}\bigl(\pi^{(i)}({\bf v})\bigr)$$, $$i \in \hat{\mathcal{I}}$$. 3.1 Adaptive scheme In this case, we consider Algorithm 2.1 with the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ for the hierarchical format as given in Bachmayr & Dahmen (2015, Remark 15). $${\small{\text{RECOMPRESS}}}$$ is based on a truncation of a hierarchical singular value decomposition up to a prescribed accuracy $$\eta>0$$, which can be ensured based on the $$\ell^2$$-norm of omitted singular values of matricizations. We denote this operation by $$\hat{\mathrm{P}}_{\eta}$$. As shown in Grasedyck (2010), it satisfies the quasi-optimality property $$\label{recompress quasiopt}{\lVert{ {\bf v} - \hat{\mathrm{P}}_{\eta}({\bf v})}\rVert} \leq \sqrt{2d-3} \inf \bigl\{ {\lVert{{\bf v} - {\bf w}}\rVert} \colon {{\mathrm{rank}}({\bf w})\leq {\mathrm{rank}}(\hat{\mathrm{P}}_{\eta}({\bf v}))} \bigr\},$$ (3.7) with the inequality between ranks as defined in (3.4) to be understood componentwise. $${\small{\text{COARSEN}}}$$ retains the degrees of freedom for each mode that correspond to the largest contractions (3.5). Let $$(\mu^*_{i,k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{(i)}_{\mu^*_{i,k}}({\bf v}))_{k\in{\mathbb{N}}}$$ is nonincreasing. Denote for $${\it{\Lambda}} \subset {\mathcal{S}}\times{\mathcal{F}}$$ by $${\mathrm{R}_{{\it{\Lambda}}}}{\bf v}$$ the array obtained by retaining all entries of $${\bf v}$$ corresponding to indices in $${\it{\Lambda}}$$, while replacing all others by zero. Given $$\eta >0$$, we define the product set ${\it{\Lambda}}(\eta)= {\Large{\times}}_{i \in \hat{\mathcal{I}}}\{ \mu^*_{i,k} \colon k \leq N_i\},$ where $$N_i$$, $$i\in \hat{\mathcal{I}}$$, are chosen to such that $$\sum_{i \in \hat{\mathcal{I}}} N_i$$ is minimal subject to the condition $$\label{hatC} \left( \sum_{i \in \hat{\mathcal{I}}} \sum_{k > N_i\vphantom{\hat{\mathcal{I}}}}|\pi^{(i)}_{\mu^*_{i,k}}({\bf v})|^2 \right)^{1/2}\leq \eta.$$ (3.8) Noting that the left side in (3.8) is an upper bound for $${\lVert{{\bf v} - {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}}\rVert}$$, we define $${\small{\text{COARSEN}}}$$ as a numerical realization of $$\hat{\mathrm{C}}_{\eta}{\bf v} := {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}$$, for which one has an analogous quasi-optimality property as in (3.7) with constant $$\sqrt{d}$$. Furthermore, $${\mathbf{A}}$$ as defined in (2.6) is in this case of finitely many parameters a finite sum of Kronecker product operators, ${\mathbf{A}} = \sum_{j=0}^d {\mathbf{A}}_j \otimes {\mathbf{M}}_j,$ which considerably simplifies the construction of the corresponding routine $${\small{\text{APPLY}}}$$. The action of $${\mathbf{A}}$$ can thus increase each hierarchical rank of its argument at most by a factor of $$d+1$$. Remark 3.1 In contrast to the case considered in Bachmayr & Dahmen (2016), here the Hilbert space $${\mathcal{H}} =V \otimes L^{2}({Y})$$ on which the problem is posed is endowed with a cross norm. As a consequence, the isomorphism that takes $$v\in {\mathcal{H}}$$ to its coefficients $${\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with respect to the tensor product basis is of Kronecker rank one. The original low-rank structure (1.1) of $$A(y)$$ is therefore preserved in the $$\ell^{2}$$-representation (2.6) of the problem. Consequently, the routine $${\small{\text{APPLY}}}$$ that adaptively approximates the action of $${\mathbf{A}}$$ can be obtained following the generic construction given in Bachmayr & Dahmen (2015), provided that the operators $${\mathbf{A}}_j$$ and $${\mathbf{M}}_j$$ acting on each mode have the required compressibility properties. Recall that by (2.13), the infinite matrices $${\mathbf{M}}_j$$ are bidiagonal and, hence, do not require any further approximation. To use the construction of Bachmayr & Dahmen (2015), we thus only need that the operators $${\mathbf{A}}_0, \ldots, {\mathbf{A}}_d$$ acting on the spatial variables are $$s^*$$-compressible. 3.2 Convergence analysis Our complexity results aim at the following type of statements: given a certain approximability of the solution, the algorithm recovers the corresponding convergence rates without their explicit knowledge. To describe these approximability properties, we now recall the definition of approximation classes to quantify the convergence of hierarchical low-rank approximations from Bachmayr & Dahmen (2015), in terms of the hierarchical rank defined by (3.4). Let $$\gamma = \bigl(\gamma(n)\bigr)_{n\in {\mathbb{N}}_0}$$ be positive and strictly increasing with $$\gamma(0)=1$$ and $$\gamma(n)\to\infty$$ as $$n\to\infty$$, for $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ let ${\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, \sup_{r\in{\mathbb{N}}_0} \gamma({r})\,\inf_{{\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r} {\lVert{{\bf v} - {\bf w}}\rVert} ,$ where $$\inf \{ {\lVert{{\bf v} - {\bf w}}\rVert}\colon {\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r\}$$ – that is, the errors of approximation by hierarchical ranks at most $$r$$ – can be bounded in terms of the hierarchical singular values $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$ for $$\alpha\in{\cal D}$$. We introduce the approximation classes ${\mathcal{A}_\mathcal{H}({\gamma})} \,{:=}\, \bigl\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}) : {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} <\infty \bigr\} , \quad{\lVert{{\bf v}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, {\lVert{{\bf v}}\rVert} + {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}.$ We restrict our considerations to $$\gamma$$ that satisfy \begin{equation*}\rho_\gamma \,{:=}\, \sup_{n\in{\mathbb{N}}} {\gamma}(n)/{\gamma}(n-1)<\infty, \end{equation*} which corresponds to a restriction to at most exponential growth; our considerations would still apply for faster growth, but lead to less sharp results. For an approximation $${\bf v}$$ of bounded support to $${\bf u}$$, the number of nonzero coefficients $$\#{\mathrm{supp}}_i {\bf v}$$ required in each tensor mode to achieve a certain accuracy depends on the best $$n$$-term approximability of the sequences $$\pi^{(i)}({\bf u})$$. This approximability by sparse sequences is quantified by the classical approximation classes$${\mathcal{A}^s} = {\mathcal{A}^s}(\mathcal{J})$$, where $$s>0$$ and $$\mathcal{J}$$ is a countable index set, comprised of all $${\bf w}\in \ell_2(\mathcal{J})$$ for which the quasi-norm $$\label{Asdef}{\lVert{\mathbf{{\bf w}}}\rVert}_{{\mathcal{A}^s}(\mathcal{J})} := \sup_{N\in{\mathbb{N}}_0} (N+1)^s \inf_{\substack{{\it{\Lambda}}\subset \mathcal{J}\\ \#{\it{\Lambda}}\leq N}} {\lVert{{\bf w} - {\mathrm{R}_{{\it{\Lambda}}}} {\bf w}}\rVert}$$ (3.9) is finite. In particular, if $$\pi^{(i)}({\bf u}) \in {\mathcal{A}^s}(\mathcal{G}_i)$$ then these sequences can be approximated within accuracy $$\eta$$ by finitely supported sequences with $${\cal \rm O}(\eta^{-1/s})$$ nonzero entries. In what follows, we do not explicitly specify the index set in the spaces $${\mathcal{A}^s}$$ when this is clear from the context. We analyse the complexity of the algorithm under the following benchmark assumptions, see the discussion in Section 6.1. Assumption 3.2 For the hierarchical tensor approximation in the case (3.1) of $$d$$ parametric variables, we assume the following: (i) $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s}({\mathcal{G}}_i)$$, $$i\in\hat{\mathcal{I}}$$, for an $$s>0$$. (ii) $${\bf u}, {\mathbf{f}} \in {\mathcal{A}}_{\mathcal{H}}(\gamma)$$, where $$\gamma(n) \,{:=}\, {\it \,{e}}^{\bar c n^{1/\bar b}}$$ with $$\bar b\geq 1$$, $$\bar c>0$$. (iii) The $${\mathbf{A}}_j$$, $$j \in \hat{\mathcal{I}}$$, are $$s^*$$-compressible for an $$s^*>s$$, and hence there exist matrices $${\mathbf{A}}_{j,n}$$ with $$\alpha_{j,n} 2^n$$ entries per row and column and such that $${\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n}}\rVert} \leq \beta_{j,n} 2^{-s n}$$, and where the sequences $${\boldsymbol{\alpha}}_j = (\alpha_{j,n})_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}}_j = (\beta_{j,n})_{n\in{\mathbb{N}}_0}$$ are summable. We will use the above assumptions as a reference point for the scaling with respect to $$\varepsilon$$ of the computational complexity. Note that Assumption 3.2(i),(ii) mean in particular that best $$N$$-term approximations of $$\pi^{(i)}({\bf u})$$ converge at least as $${\mathcal{O}} (N^{-s})$$ for $$i\in\hat{\mathcal{I}}$$ and low-rank approximations $${\bf u}_n$$ of $${\bf u}$$ with $${\lvert{{\mathrm{rank}}({\bf u}_n)}\rvert}_\infty\,{\leq}\,n$$ converge as $${\mathcal{O}}(\!{\it {e}}^{-\bar c n^{1/\bar b}})$$. Assumption 3.2(iii) needs to be established for each given problem. Under these conditions, it is possible to realize a routine $${\small{\text{RHS}}}$$ that satisfies the following conditions: for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta:={\small{\text{RHS}}}(\eta)$$, $$\label{rhsass1}\begin{gathered}\sum_{i\in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim d \eta^{-\frac {1}{s}} \left( \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}}, \quad \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^s} \lesssim d^{1+\max\{1,s\}} \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s}, \\ {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \lesssim \bigl(d^{-1} \ln ({\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}_{\mathcal{H}}(\gamma)}/\eta) \bigr)^{\bar{b}}, \end{gathered}$$ (3.10) with hidden constants that do not depend on $$d$$. Such approximations can be obtained, for instance, by combining $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with parameters adjusted as in Bachmayr & Dahmen (2015, Theorem 7). Assuming full knowledge of $${\mathbf{f}}$$, $${\small{\text{RHS}}}$$ can be realized such that the required number of operations is bounded, with $$C>0$$ independent of $$d$$, by $$\label{rhsass2}C\left( d {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty^3 + {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \sum_{i \in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\mathbf{f}}_\eta) \right)\!,$$ (3.11) and we shall make this idealized assumption in the subsequent analysis. Note that these requirements greatly simplify when the corresponding right-hand side $$f$$ is independent of the parametric variable. To compare different parametric dimensionalities $$d$$ in the complexity bounds, we additionally need a specific reference family of $$d$$-dependent problems. We introduce the following model assumptions, which we shall also consider in more detail for a concrete class of problems in Section 6. Assumption 3.3 For the quantities in Assumption 3.2, in addition, let the following hold: (i) There exist $$C_1, c_1>0$$ independent of $$d$$ such that \begin{align*}&\max\bigl\{{\lVert{\pi^{({\rm x})}({\bf u})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\bf u})}\rVert}_{{\mathcal{A}}^s}, \ldots,{\lVert{\pi^{(x)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s},\ldots,\\ &\quad\bar c^{-1} ,\, {\lVert{{\bf u}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}, {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}\bigr\}\leq C_1 d^{c_1} . \end{align*} (ii) There exists $$C_2>0$$ independent of $$d$$ such that $\max\bigl\{ \bar b,\, {\lVert{{\mathbf{A}}_{\rm x}}\rVert}, {\lVert{{\mathbf{A}}_1}\rVert}, \ldots,{\lVert{\boldsymbol{\alpha}_{\rm x}}\rVert}_{\ell^1}, {\lVert{\boldsymbol{\alpha}_1}\rVert}_{\ell^1}, \ldots, {\lVert{\boldsymbol{\beta}_{\rm x}}\rVert}_{\ell^1}, {\lVert{\boldsymbol{\beta}_1}\rVert}_{\ell^1}, \ldots\} \leq C_2.$ It needs to be emphasized that Algorithm 2.1 does not require any knowledge on the approximability of $${\bf u}$$ stated in Assumptions 3.2 and 3.3; these merely describe a model case for complexity bounds and their dependence on $$d$$. Recall from Remark 2.1 that Algorithm 2.1 always produces $${\bf u}_\varepsilon$$, satisfying $${\lVert{{\bf u}-{\bf u}_\varepsilon}\rVert}\leq \varepsilon$$ in finitely many steps. Theorem 3.4 Let Assumption 3.2 hold, let $$\alpha>0$$ and let $$\kappa_1,\kappa_2,\kappa_3$$ in Algorithm 2.1 be chosen as \begin{gather*}\kappa_1 = \bigl( 1 + (1 +\alpha)(\sqrt{d} + \sqrt{2d-3} + \sqrt{d(2d-3)}) \bigr)^{-1}, \\ \kappa_2 = \sqrt{2d-3}(1+\alpha) \kappa_1,\quad \kappa_3 = \sqrt{d}(1+\sqrt{2d-3})(1+\alpha) \kappa_1. \end{gather*} Then for each $$\varepsilon>0$$ with $$\varepsilon < \varepsilon_0$$, the approximation $${\bf u}_\varepsilon$$ produced by Algorithm 2.1 satisfies $$\label{htrankbound}{\lvert{{\mathrm{rank}}({\bf u}_\varepsilon)}\rvert}_\infty \leq \, \bigl( \bar c^{-1}\ln\bigl[ 2(\alpha \kappa_1)^{-1} \rho_{\gamma}\,{\lVert{{\bf u}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}\,\varepsilon^{-1}\bigr] \bigr)^{ \bar b}\lesssim \biggl(\frac{{\lvert{\ln \varepsilon}\rvert} + \ln d}{\bar c}\biggr)^{\bar b}$$ (3.12) as well as $$\label{htsuppbound}\sum_{i\in\hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\bf u}_\varepsilon) \lesssim d^{1+\frac {1}{s}}\left( \sum_{i\in\hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}} \varepsilon^{-\frac {1}{s}}.$$ (3.13) Let in addition Assumption 3.3 hold and let $${\small{\text{RHS}}}$$ satisfy (3.10), (3.11), then there exist $$c,C>0$$, such that the number of required operations is bounded by $$\label{htcomplexity}C d^{c\ln d} {\lvert{\ln \varepsilon}\rvert}^{2\bar b} \varepsilon^{-\frac {1}{s}},$$ (3.14) where $$c$$ and $$C$$ depend on $$\alpha, \rho,\omega, s$$ and on the constants $$C_1,c_1,C_2$$ in Assumption 3.3. Proof. The validity of (3.12) and (3.13) follows by Bachmayr & Dahmen (2015, Theorem 7), which can be immediately applied to the result of line 11 in Algorithm 2.1. Concerning (3.14), we can apply Bachmayr & Dahmen (2015, Theorem 8) (with $$R_i = d$$ and uniform constants $$\hat C^{(i)}_{\hat \alpha}$$, $$\hat C^{(i)}_{\hat \beta}$$ and $$\hat C^{(i)}_{\tilde{\mathbf{A}}}$$ in the notation used there) to obtain, for $${\bf w}_\eta\,{:=}\, {\small{\text{APPLY}}}({\bf v};\eta)$$, $\#{\mathrm{supp}}_i ({\bf w}_\eta) \lesssim d^{1+s^{-1}} \eta^{-\frac {1}{s}} \left(\sum_{j \in {\mathcal{I}}_{\rm x}} {\lVert{\pi^{(i)}({\bf v})}\rVert}_{{\mathcal{A}}^s} \right)^{\frac {1}{s}} , \quad{\lVert{\pi^{(i)}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^s} \lesssim d^{1+s} {\lVert{\pi^{(i)}({\bf v})}\rVert}_{{\mathcal{A}}^s},$ as well as $${\mathrm{rank}}({\bf w}_\eta)\leq (d+1) {\mathrm{rank}}({\bf v})$$. With these estimates, (3.14) follows exactly as in Bachmayr & Dahmen (2015, Theorem 9). □ For each fixed $$d$$, the produced solution $${\bf u}_\varepsilon$$ thus requires a number of parameters that is proportional to the optimal one for this type of approximation. Taking into account that in the operation count, the best approximation ranks of order $${\mathcal{O}}({\lvert{\ln \varepsilon}\rvert}^{\bar b})$$ enter at least quadratically due to orthogonalizations required in the algorithm, the number of operations in (3.14) also scales optimally with respect to $$\varepsilon$$. The dependence of the constant on the parametric dimension is subexponential, since $$d^{c\ln d} = {\it \,{e}}^{c (\ln d)^2}$$. 4. Spatial-parametric sparse approximation We now turn to the case $${\mathcal{I}}={\mathbb{N}}$$, that is, problems involving countably many parameters $$(y_j)_{j\geq 1}$$ that have decreasing influence as $$j$$ increases. Here, we consider problems of the type (1.5), $$\label{thetaj}a(y) = \bar a + \sum_{j = 1}^\infty y_j {\theta}_j ,$$ (4.1) under the uniform ellipticity assumption (1.6) on $$a$$. This variant of Algorithm 2.1 is similar to the scheme proposed in Gittelson (2014), following the approach of Cohen et al. (2001, 2002). In this section, we consider a version of Algorithm 2.1 that produces $$n$$-term approximations to $$u\in L^2({Y},V)$$ in terms of the wavelet–Legendre tensor product basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$. That is, the approximation that we seek in this case is of the form (ASP), that is, $$\label{fullsparse}u \approx u_n = \sum_{(\lambda,\nu)\in {\it{\Lambda}}_n} {\bf u}_{\lambda \nu} \psi_\lambda\otimes L_\nu,$$ (4.2) where we aim to identify $${\it{\Lambda}}_n$$, which yields an error close to that of the best $$n$$-term approximation in this basis. Here, $${\small{\text{COARSEN}}}$$ performs a standard coarsening operation on a sequence, and we set $${\small{\text{RECOMPRESS}}}({\bf v};\eta) \,\,{:=}\,\, {\bf v}$$ for any $$\eta$$. The scheme thus reduces to the adaptive method of Cohen et al. (2002), which has been considered for this particular type of approximation of parametric PDEs also in Gittelson (2014). The key ingredient that remains to be described is the adaptive application of $${\mathbf{A}}$$ to representations of the form (4.2) based on its $$s^*$$-compressibility. Let $${\bf u} \in {\mathcal{A}}^s({\mathcal{S}}\times{\mathcal{F}})$$ and let $${\mathbf{A}}$$ be $$s^*$$-compressible with $$s^*>s$$ according to (2.10). Then it follows by Cohen et al. (2001, Proposition 3.8) that $${\mathbf{f}} \in {\mathcal{A}}^s$$, hence we can construct $${\small{\text{RHS}}}$$, satisfying $\#{\mathrm{supp}} ({\small{\text{RHS}}}(\eta)) \lesssim \eta^{-\frac {1}{s}} {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}^s}^{\frac {1}{s}},\quad {\lVert{{\small{\text{RHS}}}(\eta)}\rVert}_{{\mathcal{A}}^s} \lesssim {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}^s}.$ Moreover, by the standard construction of $${\small{\text{APPLY}}}$$ in Cohen et al. (2001, Corollary 3.10) based on the $$s^*$$-compressibility of $${\mathbf{A}}$$, the results in Cohen et al. (2002) yield the following complexity bound for the present realization of Algorithm 2.1. Theorem 4.1 Let $${\bf u}\in{\mathcal{A}}^s$$ and let $${\mathbf{A}}$$ be $$s^*$$-compressible with $$0<s<s^*$$. Then for any given $$\varepsilon>0$$, the approximation $${\bf u}_\varepsilon$$ produced by the above variant of Algorithm 2.1 operating on approximations of the form (4.2) satisfies $\# {\mathrm{supp}} ({\bf u}_\varepsilon ) \lesssim \varepsilon^{-\frac {1}{s}} {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s}^{\frac {1} {s}}, \quad{\lVert{{\bf u}_\varepsilon}\rVert}_{{\mathcal{A}}^s} \lesssim {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s},$ and the number of operations is bounded up to a multiplicative constant by $$1 + \varepsilon^{-\frac {1}{s}} {\lVert{{\bf u}}\rVert}_{{\mathcal{A}}^s}^{\frac {1}{s}}$$. We next consider the compressibility of $${\mathbf{A}}$$, which determines the range of $$s$$ for which Theorem 4.1 yields optimality, and a corresponding procedure $${\small{\text{APPLY}}}$$. In Section 6.2, we consider in further detail for which values of $$s$$ one can indeed expect $${\bf u} \in {\mathcal{A}}^s$$. 4.1 Adaptive operator application Any numerical scheme $${\small{\text{APPLY}}}$$ necessarily involves a truncation of the series (4.1). Defining for each non-negative integer $$M$$ the corresponding truncation error $$\label{eM}e_{M} := {\left\lVert{ \sum_{j > M } {\mathbf{A}}_j \otimes {\mathbf{M}}_j }\right\rVert}$$ (4.3) of replacing $${\mathbf{A}}$$ by $$\sum_{j=1}^M {\mathbf{A}}_j \otimes {\mathbf{M}}_j$$, where $$e_0 = {\lVert{{\mathbf{A}}}\rVert}$$, the decay of $$e_M$$ describes the approximability of $${\mathbf{A}}$$. We will be concerned with algebraic rates $$\label{S}e_M\le C M^{-S}, \quad M\in {\mathbb{N}},$$ (4.4) where $$C, S>0$$ are fixed constants. Note that, in particular, our further developments do not require summability of $$({\lVert{{\theta}_j}\rVert}_{L^\infty})_{j\geq 1}$$ as assumed, e.g., in Eigel et al. (2014, 2017). A first limitation to the $$s^*$$-compressibility of $${\mathbf{A}}$$ lies in the decay of the truncation errors (4.4), which arise from replacing $${\mathbf{A}}$$ by a finite sum. This amounts to approximating all, but finitely many $${\mathbf{A}}_j$$ by zero. A second limitation is the compressibility of the remaining $${\mathbf{A}}_j$$, which depends on the particular expansion system $$({\theta}_j)_{j\in{\mathbb{N}}}$$. As we show next, a favourable $$s^*$$-compressibility result for $${\mathbf{A}}$$ (almost matching the truncation error decay (4.4)) can be obtained when $$({\theta}_j)_{j\in{\mathbb{N}}}$$ has multiscale structure, as summarized in the following set of assumptions. In Section 4.2, our findings are compared with previous results that hold under more generic assumptions. Assumption 4.2 Let $$\{\xi_\mu\}_{\mu \in {\it{\Lambda}}}$$ be a system of compactly supported multilevel basis functions with $$\operatorname{diam}({\rm supp}(\xi_\mu))\sim 2^{-|\mu|}$$ and $${\lVert{\xi_\mu}\rVert}_{L^\infty(D)} = 1$$. With $$(\mu_j)_{j\geq 1}$$ an enumeration of $${\it{\Lambda}}$$ by increasing level and some fixed $$\alpha>0$$, let $$\label{multiscaleexpansion}{\theta}_j = c_{\mu_j} \xi_{\mu_j}, \quad \text{ where c_{\mu_j} = 2^{-\alpha {\lvert{\mu_j}\rvert}}.}$$ (4.5) To simplify notation, let $$c_{\mu_0}\,{\,{:=}\,}\,1$$, $$\xi_{\mu_0}\,{\,{:=}\,}\,\bar a$$, and $${\lvert{\mu_0}\rvert}\,{\,{:=}\,}\,0$$. Note that for what follows, it would in fact suffice to assume $$c_{\mu} \sim 2^{-\alpha {\lvert{\mu}\rvert}}$$, with a constant that is uniform over $${\it{\Lambda}}$$, but we assume equality to simplify the exposition. Under Assumption 4.2, $$\label{multiscale eMbound}e_M \leq \sup_{y\in Y} {\left\lVert{\sum_{j>M} y_j \theta_j }\right\rVert}_{L^\infty(D)}\leq \sum_{\ell \geq \max\{ {\lvert{\mu_j}\rvert} \colon j \leq M\}} 2^{-\alpha\ell}\lesssim 2^{-\alpha \max\{ {\lvert{\mu_j}\rvert} \colon j \leq M\}} \lesssim M^{-\frac{\alpha}{m}},$$ (4.6) and we thus obtain (4.4) with $$S = \alpha/m$$. We now give a new result for the compressibility of $${\mathbf{A}}$$ arising from a wavelet-type parametrization as in (4.5). As we shall see, making use of a multilevel structure in the parametrization, one can obtain substantially better compressibility of $${\mathbf{A}}$$ than under the more generic assumptions used in Gittelson (2014). The result is based on compressibility properties of the corresponding matrices $${\mathbf{A}}_j$$. These are analysed in Appendix A, where the assumptions of the following Proposition are established under conditions on the regularity of $$\xi_\mu$$ and of the spatial wavelets. Proposition 4.3 Let $$\{ {\theta}_j \}_{j\geq 1}$$ and $$c_{\mu_j}$$ be as in Assumption 4.2, and let us assume that there exist a $$\tau > \frac{\alpha}{m}$$ and matrices $${\mathbf{A}}_{j,n}$$, $$n\in {\mathbb{N}}_0$$, where $${\mathbf{A}}_{j,0} = 0$$, with the following properties: (i) One has $$\label{compress2}\|{\mathbf{A}}_j - {\mathbf{A}}_{j,n}\|\lesssim c_{\mu_j} 2^{-\tau n},\quad n\in{\mathbb{N}}_0,$$ (4.7) where the hidden constant is independent of $$j,n$$. (ii) The number of nonvanishing entries in each column of $${\mathbf{A}}_{n,j}$$ does not exceed a uniform constant multiple of $$\bigl(1+|\mu_j|^q\bigr)2^n$$, for some $$q\geq 1$$. Then $${\mathbf{A}}$$ is $$s^*$$-compressible with $$\label{sparsecomprlimit}s^* = \frac{\alpha}{m} \frac{2\tau}{1+2\tau}.$$ (4.8) Specifically, it is shown in Appendix A that the above assumptions can be realized for arbitrarily large $$\tau$$ by choosing the functions $$\xi_\mu$$ and the spatial wavelets sufficiently smooth, the latter having sufficiently many vanishing moments. By the above result, we thus obtain that $${\mathbf{A}}$$ is then $$s^*$$-compressible, where $$s^*<\alpha/m$$ comes as close to $$\alpha/m$$ as one wishes when $$\tau$$ is suitably large. As discussed in further detail in Section 6, this means that the $$n$$-term approximability of $${\bf u}$$ can be essentially fully exploited by the adaptive scheme. Proof. We construct approximations $${\mathbf{A}}_{\mathbf{n}}$$ of $${\mathbf{A}}$$ by choosing sequences $$\mathbf{n} = (n_{j})_{j\geq 0}$$ of bounded support and defining $${\mathbf{A}}_{\mathbf{n}}\colon \ell^{2}({\mathcal{S}}\times{\mathcal{F}})\to \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ by $$\label{defAn}{\mathbf{A}}_\mathbf{n} \,{:=}\, \sum_{j\geq 0} {\mathbf{A}}_{j, n_{j}} \otimes {\mathbf{M}}_j.$$ (4.9) Our aim is to find such $$\mathbf{n}^J$$ such that the corresponding $${\mathbf{A}}^J := {\mathbf{A}}_{\mathbf{n}^J}$$ satisfy $${\lVert{{\mathbf{A}} - {\mathbf{A}}^J}\rVert} \lesssim J^{-2}2^{-s J},\quad J\in{\mathbb{N}},$$ (4.10) with $$s<s^*$$ and $$s^*$$ as in the assertion, and such that the number of nonzero entries in the each row and column of $${\mathbf{A}}^{J}$$ is bounded by a fixed constant multiple of $$J^{-2}2^J$$. We take $$L\in{\mathbb{N}}$$ arbitrary but fixed. Recall that we assume $$\mu_j$$ to be ordered by increasing level, that is, $${\lvert{\mu_{j+1}}\rvert} \geq {\lvert{\mu_j}\rvert}$$. We now consider $$(n_j)_{j\geq 0}$$, satisfying $$n_j = 0$$ for $$j > M_L \,{:=}\, {\lceil{L^{2m/\alpha}2^{mL}}\rceil}$$, so that $${\mathbf{A}}_{j,n_j} = 0$$ for $$j > M_L$$. Since $$e_{M_L} \lesssim L^{-2} 2^{-\alpha L}$$ by (4.4) and (4.6), we obtain $$\label{operr_triineq}{\lVert{{\mathbf{A}} - {\mathbf{A}}_\mathbf{n}}\rVert} \lesssim {\left\lVert{ \sum_{j=0}^{M_L} {\bigl( {\mathbf{A}}_j - {\mathbf{A}}_{j,n_j} \bigr)} \otimes {\mathbf{M}}_j }\right\rVert} + L^{-2} 2^{-\alpha L}.$$ (4.11) Within each level $$\ell \geq 0$$, that is, for each $$\mu$$ with $${\lvert{\mu}\rvert}=\ell$$, there are only finitely many $$\mu'$$ with $${\lvert{\mu'}\rvert}=\ell$$ such that $${\mathrm{supp}}\,\, \xi_\mu \cap {\mathrm{supp}} \,\, \xi_{\mu'} \neq \emptyset$$. Since the images of $${\mathbf{A}}_j$$ corresponding to $$\xi_{\mu_j}$$ with disjoint support are orthogonal, we obtain $${\left\lVert{ \sum_{j=0}^{M_L} {\bigl( {\mathbf{A}}_j - {\mathbf{A}}_{j,n_j} \bigr) } \otimes {\mathbf{M}}_j }\right\rVert}\lesssim \sum_{\ell=0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} \biggl( \sum_{j\colon{\lvert{\mu_j}\rvert} = \ell} {\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n_j}}\rVert}^2 \biggr)^{\frac {1}{2}} ,$$ (4.12) where the constant depends on the maximum number of $$\xi_\mu$$ of overlapping support on each level. Taking $n_j = n_\ell = \left\lceil{ {\frac{m \ell}{2\tau} + \frac{\alpha}{\tau} \left(L+ \frac{2}{\alpha}\log_2 L - \ell\right)} + \frac {1}{\tau} \log_2(1+\ell)^2 } \right\rceil$ for $$\mu_j$$ of level $$\ell$$ and recalling that for such $$j$$ we have $$|c_{\mu_j}|=2^{-\alpha\ell}$$ gives $$\label{errL}\sum_{\ell=0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} \biggl( \sum_{j\colon{\lvert{\mu_j}\rvert} = \ell} {\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n_j}}\rVert}^2 \biggr)^{\frac {1}{2}}\lesssim \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} 2^{\frac{m}{2} \ell} 2^{-\alpha \ell} 2^{-\tau n_\ell} \lesssim L^{-2} 2^{-\alpha L}.$$ (4.13) Let $$N_L$$ be the resulting maximum number of entries per row and column in $${\mathbf{A}}_{\mathbf{n}}$$, then \begin{align}N_L &\lesssim \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} (1+\ell^q)2^{m\ell} 2^{n_\ell}\lesssim L^{\frac{2}{\alpha}}2^{\frac{\alpha}{\tau} L } \sum_{\ell = 0}^{{\lceil{L+ \frac{2}{\alpha}\log_2 L}\rceil}} (1+\ell)^{q+\frac {2}{\tau}} 2^{(1 + \frac {1}{2\tau}) m \ell - \frac {\alpha}{\tau} \ell} \nonumber\\ & \lesssim L^{q + \frac{2(1+m)}{\alpha}} 2^{\frac{1+2\tau}{2\tau} m L}, \end{align} (4.14) where we have used $$\tau > \alpha/m$$. We now fix $$s\,{>}\,0$$ with $$s\,{<}\,t\,{\,{:=}\,}\,\frac{\alpha}{m} \frac{2\tau}{1+2\tau}$$ and take $$J\,{\,{:=}\,}\,{\lceil{\frac{t}{s} \frac{1+2\tau}{2\tau} m L}\rceil}\,{=}\,{\lceil{\frac {\alpha}{s} L}\rceil}$$ and $$\mathbf{n}^J\,{\,{:=}\,}\,\mathbf{n}$$. Since then $$N_L \lesssim J^{q+ \frac{2(1+m)}{\alpha}}2^{\frac{s}{t} J}$$ we see that $$N_L\lesssim J^{-2}2^J$$ with a constant that depends on $$\alpha, m$$ and increases when $$s$$ approaches $$t$$. It immediately follows from (4.13) that $${\lVert{{\mathbf{A}} - {\mathbf{A}}^J}\rVert} \lesssim J^{-2}2^{- s J},$$ (4.15) with a constant depending on $$m$$. Thus, $${\mathbf{A}}$$ is $$s^*$$-compressible with $$s^* = t$$. □ 4.2 Coefficient expansions In our compressibility result Proposition 4.3 for $${\mathbf{A}}$$, we have made use of the multiscale structure of the expansion functions $$\theta_j$$. Let us now briefly compare this with previous results for globally supported $$\theta_j$$ as they arise in Karhunen–Loève expansions. In fact, for certain problems, one has equivalent expansions in either globally supported or wavelet-type $$\theta_j$$. This is demonstrated, for instance, in Bachmayr et al. (2016) for log-normal diffusion coefficients with Gaussian random fields of Matérn covariance. To illustrate the basic issues in approximation $${\mathbf{A}}$$ in the case of typical globally supported $$\theta_j$$, we consider the following spatially one-dimensional setting with $$D=]0,1[$$ as in Gittelson (2014): for a monotonically decreasing positive sequence $$(c_j)_{j\in{\mathbb{N}}}$$ with $$\sum_{j\geq 1} c_j \leq \frac {1}{2}$$, take $${\theta}_j = c_j \sin(j\pi \cdot)$$, so that $$\label{cosexp}a(y) = 1 + \sum_{ j \geq 1 } y_j c_j \sin(j\pi \cdot).$$ (4.16) This model is representative in that such increasingly oscillatory $${\theta}_j$$ as $$j\to\infty$$ also arise in more general Karhunen–Loève expansions. As a concrete example, with $$\beta> 1$$, let $$c_j := \delta j^{-\beta}$$ with $$\delta>0$$ sufficiently small. Then, $${\lVert{{\mathbf{A}}_j}\rVert} \sim c_j$$, from which we only obtain $e_M \lesssim M^{-\beta + 1},$ and therefore $$S = \beta - 1$$. As shown in Gittelson (2014), taking the compression of the individual $${\mathbf{A}}_j$$ into account, one obtains $$s^*$$-compressibility of $${\mathbf{A}}$$ with $$s^* = \frac{1}{2}(\beta - 1) = \frac {1}{2} S$$. In this case, instead of conditions (i) and (ii) in Proposition 4.3, one has $${\lVert{{\mathbf{A}}_j -{\mathbf{A}}_{j,n}}\rVert} \lesssim j^{-(\alpha+\frac {1}{2})} 2^{-\gamma n}$$ with $${\mathcal{O}}( j ( 1 + \log_2 j) 2^n)$$ entries per row and column. We comment further in Remark A.3 on how this leads to the limitation to $$s^* = \frac{1}{2}(\beta - 1)$$. 5. Low-rank approximation We now turn to an adaptive method for finding low-rank approximations of the form (LR), based on the Hilbert–Schmidt decomposition of $${\bf u}$$. These approximations are of the form \begin{align}\label{fulllrcoeff}{\bf u} \approx \sum_{k=1}^n {\bf u}^{\rm x}_{k,\lambda} \otimes {\bf u}^{\rm y}_{k,\nu}, \end{align} (5.1) with finitely supported vectors $${\bf u}^{\rm x}_{k}$$, $${\bf u}^{\rm y}_{k}$$, $$k=1,\ldots,n$$. As in the scheme considered in Section 3, adaptivity in rank and in the basis expansions is intertwined by iteratively improving low-rank expansions of varying ranks, while at the same time identifying finitely supported approximations in $$\ell^{2}({\mathcal{S}})$$ and $$\ell^{2}({\mathcal{F}})$$, both based on approximate residual evaluations. In principle, the results of Section 3 concerning a full separation of variables based on hierarchical tensor formats could be applied with any finite truncation dimension $$d$$. However, assuming (4.4), a total error of order $$\varepsilon$$ requires $$\,{\rm d}(\varepsilon) \sim \varepsilon^{-1/S}$$. As a consequence, due to the $$d$$-dependent quasi-optimality (3.7) of the hierarchical SVD truncation, we can only obtain a highly suboptimal complexity bound in (3.14) for the hierarchical format. Concerning low-rank decompositions, we therefore concentrate here on a more basic case, namely, a separation of spatial and parametric variables as in (5.1). Since this separation also occurs in any hierarchical representation, the resulting Hilbert–Schmidt rank provides a lower bound for the hierarchical ranks that are required in a hierarchical format involving further matricizations. The efficiency of the obtained low-rank approximations is measured against the singular value decomposition of the Hilbert–Schmidt operator $$\ell^{2}({\mathcal{F}})\to \ell^{2}({\mathcal{S}})$$ induced by $${\bf u}$$, \begin{align}\label{hsapprox}{\bf u} = \sum_{k=1}^\infty \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k\!, \end{align} (5.2) where $$\sigma_k \geq 0$$, $$\{ {\bf U}^{({\rm x})}_k \}$$, $$\{ {\bf U}^{({\rm y})}_k \}$$ are orthonormal in $$\ell^{2}({\mathcal{S}})$$ and $$\ell^{2}({\mathcal{F}})$$, respectively, and \begin{align}\label{hsoptcoeff}{\left\lVert{ {\bf u} - \sum_{k=1}^r \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k}\right\rVert}^2 = \sum_{k>r} \sigma_k^2 = \min_{{\mathrm{rank}}({\bf w})\leq r} {\lVert{{\bf u}-{\bf w}}\rVert}^2. \end{align} (5.3) Ideally, the ranks of computed approximations should be comparable to the minimum $$r$$ for achieving the same error in (5.3). Moreover, we quantify in terms of $$\#\left( \bigcup_{k=1}^n {\mathrm{supp}}\,\, {\bf u}^i_{k} \right)$$, $$i={\rm x},{\rm y}$$ the number of nonzero coefficients in (5.1). The reasons for not considering each individual $$\# {\mathrm{supp}}\,\, {\bf u}^i_{k}$$ separately are mainly algorithmic: because the numerical methods require orthogonalizations of the sets $$( {\bf u}^i_{k} )_{k=1,\ldots,n}$$, their complexity is determined by the unions of the respective supports. To understand the joint approximability of the infinite vectors $${\bf U}^{(i)}_k$$, $$i={\rm x},{\rm y}$$, in (5.2) serving as our reference point, we consider the particular contractions defined, for $${\bf v} \in \ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, by $$\pi^{({\rm x})}({\bf v}) = (\pi^{({\rm x})}_\lambda({\bf v}))_{\lambda\in{\mathcal{S}}}$$, $$\pi^{({\rm y})}({\bf v}) = (\pi^{({\rm y})}_\nu({\bf v}))_{\nu\in{\mathcal{F}}}$$ with \begin{align}\label{contrxy}\pi^{({\rm x})}_\lambda({\bf v}) := \left(\sum_{\nu\in {\mathcal{F}}}|{\bf v}_{\lambda,\nu}|^2\right)^{1/2},\quad \pi^{({\rm y})}_\nu({\bf v}) := \left(\sum_{\lambda\in {\mathcal{S}}}|{\bf v}_{\lambda,\nu}|^2\right)^{1/2}. \end{align} (5.4) Note that $$\pi^{({\rm y})}_\nu({\bf u})$$ is uniformly proportional to the norm of the corresponding Legendre coefficient of $$u$$, that is, $$\pi^{({\rm y})}_\nu({\bf u}) \sim {\lVert{u_\nu}\rVert}_V$$. Let $$(\lambda^*_{k})_{k \in {\mathbb{N}}}$$ and $$(\nu^*_{k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{({\rm x})}_{\lambda^*_{k}}({\bf v}))_{k\in{\mathbb{N}}}$$ and $$(\pi^{({\rm y})}_{\nu^*_{k}}({\bf v}))_{k\in{\mathbb{N}}}$$ are nonincreasing, respectively. Then, the singular values $$\sigma_k({\bf v})$$ of $${\bf v}$$ satisfy $$\label{svpibound}\sigma_k({\bf v}) \, \leq \, \pi^{({\rm x})}_{\lambda^*_k}({\bf v}),\,\pi^{({\rm y})}_{\nu^*_k}({\bf v}), \quad k \in {\mathbb{N}}.$$ (5.5) In view of our results, for Example 6.4 (and the further numerical experiments of Example 6.10), we cannot generally expect faster than algebraic decay of singular values, which we quantify in terms of classes $${\mathcal{A}_\mathcal{H}({\gamma})}$$ specialized to tensors of order 2 and to the specific sequence $$\gamma(k):= (1+k)^{\bar s}$$. This yields the approximation classes ${{\it{\Sigma}}^{\bar s}} :=\left\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}): \sup_{k\in{\mathbb{N}}}(1+k)^{\bar s}\left(\sum_{j>k}\sigma_k({\bf v})^2\right)^{1/2} =: \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}<\infty \right\}.$ The approximate sparsity of the sequences $$\pi^{({\rm x})}({\bf v})$$, $$\pi^{({\rm y})}({\bf v})$$ is measured in terms of the largest $$s_{\rm x}, s_{\rm y}>0$$, such that $$\pi^{({\rm x})}({\bf v}) \in {\mathcal{A}}^{s_{\rm x}}({\mathcal{S}})$$, $$\pi^{({\rm y})}({\bf v}) \in {\mathcal{A}}^{s_{\rm y}}({\mathcal{F}})$$ according to (3.9). For the low-rank approximation, the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ used in Algorithm 2.1 are based on the specialization to tensors of order 2 of the routines described in the previous section. $${\small{\text{RECOMPRESS}}}({\bf v};\eta)$$ is a numerical realization of $$\hat{\mathrm{P}}_{\eta}({\bf v})$$, which we define as the operator producing the best low-rank approximation of $${\bf v}$$ with error at most $$\eta$$ with respect to $${\lVert{\cdot}\rVert}$$, obtained by truncating the singular value decomposition of its argument. The routine $${\small{\text{COARSEN}}}({\bf v};\eta)$$ is constructed as in Section 3 based on the contractions $$\pi^{({\rm x})}({\bf v})$$, $$\pi^{({\rm y})}({\bf v})$$ defined as in (5.4). The following result differs from Bachmayr & Dahmen (2015, Theorem 7), which is formulated for general hierarchical tensors, in that we now consider differing sparsity classes for the contractions $$\pi^{(i)}$$, $$i={\rm x},{\rm y}$$. In view of the preceding discussion, it is reasonable to assume possibly different, but algebraic decay for both contractions. Theorem 5.1 Let $$\mathbf{u}, \mathbf{v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with $${\lVert{\mathbf{u}-\mathbf{v}}\rVert} \leq \eta$$. Then for \begin{align}\label{weta}\mathbf{w}_{\eta} := \hat{\mathrm{C}}_{2^{3/2}(1+\alpha)\eta} \bigl(\hat{\mathrm{P}}_{ (1+\alpha)\eta} (\mathbf{v}) \bigr)\!, \end{align} (5.6) we have \begin{align}\label{eq:combinedcoarsen_errest}{\lVert{{\bf u} - {\bf w}_\eta}\rVert} \leq \bigl( 2+\alpha + 2^{3/2} (1+\alpha)\bigr) \eta . \end{align} (5.7) Moreover, when $$\mathbf{u}\in{{\it{\Sigma}}^{\bar s}}$$, $$\pi^{(i)}(\mathbf{u}) \in {\cal A}^{s_i}$$, $$i={\rm x},{\rm y}$$, we have \begin{align}\label{eq:combinedcoarsen_rankest}{\lvert{{\mathrm{rank}}({\bf w}_\eta)}\rvert}_\infty \leq 2 \bigl(\alpha^{-1} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \bigr)^{\frac {1}{\bar s}} \eta^{-\frac {1}{\bar s}},\qquad {\lVert{{\bf w}_\eta}\rVert}_{{{\it{\Sigma}}^{\bar s}}}\leq 2 (1 + \alpha^{-1}) {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}} , \end{align} (5.8) and \label{eq:combinedcoarsen_suppest}\begin{aligned}\#{\mathrm{supp}}_{\rm x} (\mathbf{w}_\eta) + \#{\mathrm{supp}}_{\rm y} (\mathbf{w}_\eta) &\leq 2+ \biggl( \frac{2\|\pi^{({\rm x})}({\bf u})\|_{{\mathcal{A}}^{s_{\rm x}}}}{\alpha\eta}\biggr)^{\frac {1}{s_{\rm x}}} + \biggl( \frac{2\|\pi^{({\rm y})}({\bf u})\|_{{\mathcal{A}}^{s_{\rm y}}}}{\alpha\eta}\biggr)^{\frac {1}{s_{\rm y}}}\\ {\lVert{\pi^{(i)}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^{s_i}}&\leq C {\lVert{ \pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}} , \quad i={\rm x},{\rm y}, \end{aligned} (5.9) where $$C$$ depends on $$\alpha$$ and $$s_i$$, $$i={\rm x},{\rm y}$$. The estimates (5.6) and (5.7) have been already shown in Bachmayr & Dahmen (2015). The only deviation concerns the stability estimate (5.9), which we prove in Appendix B. To apply Algorithm 2.1 it remains to specify the approximate application of $${\mathbf{A}}$$ by the procedure $${\small{\text{APPLY}}}$$ to representations of the form (5.2). As part of this procedure, we shall also use a modified routine $${\small{\text{COARSEN}}}_{\rm y}$$, which operates only on the second tensor mode and leaves $${\mathrm{supp}}_{\rm x}$$ unchanged. For this routine, we shall only use the simpler statement that for any $${\bf v}\in\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with $$\pi^{({\rm y})}({\bf v}) \in {\mathcal{A}}^{s_{\rm y}}({\mathcal{F}})$$, $${\bf v}_{\rm y}\,{\,{:=}\,}\,{\small{\text{COARSEN}}}_{\rm y}({\bf v};\eta)$$ satisfies $\#{\mathrm{supp}}_{\rm y} ({\bf v}_{\rm y}) \lesssim \eta^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{s_{\rm y}}}, \quad {\lVert{\pi^{({\rm y})}({\bf v}_{\rm y})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}} \lesssim {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}\!.$ 5.1 Adaptive operator application We now describe a specification of the more generic routine $${\small{\text{APPLY}}}$$ used in Bachmayr & Dahmen (2015) that is tailored to exploit anisotropy in the parametrizations of parametric operators. For any given $$\eta>0$$ and finitely supported $${\bf v}$$, we aim to construct $${\bf w}_\eta$$ such that $${\lVert{{\mathbf{A}} {\bf v} - {\bf w}_\eta}\rVert} \leq \eta$$. We follow here the general strategy of combining a priori knowledge on $${\mathbf{A}}$$ with a posteriori information on $${\bf v}$$, which is given in terms of a suitable decomposition of $${\bf v}$$. The routine $${\small{\text{APPLY}}}$$ is structured as follows: (S1) Preprocessing and decomposing the input: We first apply a preprocessing step to the finitely supported input $${\bf v}$$ that consists of applications of $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}_{\rm y}$$. We choose for a given $$\eta >0$$ the tolerances of order $$\eta$$ in such a way that the resulting $${\bf v}_\eta$$ satisfies $$\label{etahalf}\|{\bf v} - {\bf v}_\eta\|\le \frac {\eta}{2\|{\mathbf{A}}\|}.$$ (5.10) As a consequence, for any positive $$s_{\rm y}$$, $$\bar s$$ we have $$\label{vcoarsened}{\mathrm{rank}}({\bf v}_\eta)\lesssim \eta^{-\frac{1}{\bar s}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}},\quad \|{\bf v}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}},$$ (5.11) and $$\label{supps}\#{\mathrm{supp}}_{\rm y} (\mathbf{v}_\eta ) \lesssim {\eta}^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}^{\frac {1}{s_{\rm y}}}_{{\mathcal{A}}^{s_{\rm y}}}, \quad{\lVert{\pi^{({\rm y})}({\bf v}_\eta)}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}\lesssim {\lVert{ \pi^{({\rm y})}( {\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm y}}} \!.$$ (5.12) We then have the SVD of $${\bf v}_\eta$$ at hand, $$\label{SVD}{\bf v}_\eta = \sum_{k = 1}^K \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k\!,$$ (5.13) and set $$K_p = \{ 2^p ,\ldots, \min\{K,2^{p+1} - 1\}\}$$, for $$p = 0,1,\ldots$$, $$p\le \log_2 K$$. Furthermore, for $$q=0,1,\ldots$$, let $$\hat{\it{\Lambda}}^{({\rm y})}_{q}$$ be the support of the best $$2^q$$-term approximation of $$\pi^{({\rm y})}({\bf v}_\eta)$$. We set $${\it{\Lambda}}^{({\rm y})}_{0} := \hat{\it{\Lambda}}^{({\rm y})}_{0}$$ and $${\it{\Lambda}}^{({\rm y})}_{q} :=\hat {\it{\Lambda}}^{({\rm y})}_{q} \setminus \hat{\it{\Lambda}}^{({\rm y})}_{q-1}$$ for $$q>0$$. With this subdivision of $${\mathrm{supp}}_{\rm y}({\bf v}_\eta)$$, we now define $$\label{vpq}{\bf v}_{[p,q]} := {\mathrm{R}_{{\mathcal{S}} \times {\it{\Lambda}}^{({\rm y})}_{q}}} \sum_{k\in K_p} \sigma_k {\bf U}^{({\rm x})}_k \otimes {\bf U}^{({\rm y})}_k = \sum_{k\in K_p} \sigma_k {\bf U}^{({\rm x})}_k\otimes {\mathrm{R}_{{\it{\Lambda}}^{({\rm y})}_{q}}}{\bf U}^{({\rm y})}_k \,$$ (5.14) and obtain $$\label{decoAv}{\mathbf{A}} {\bf v}_\eta = \sum_{p, q \geq 0} \sum_{j=0}^\infty ({\mathbf{A}}_j \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]}= \sum_{p, q \geq 0} \sum_{j=0}^\infty \sum_{k\in K_p} \sigma_k \,\bigl( {\mathbf{A}}_j{\bf U}^{({\rm x})}_k\bigr) \otimes \bigl({\mathbf{M}}_j {\mathrm{R}_{{\it{\Lambda}}^{({\rm y})}_{q}}} {\bf U}^{({\rm y})}_k \bigr)\!.$$ (5.15) (S2) Adaptive operator truncation: To construct an approximation $${\bf w}_\eta$$ of $${\mathbf{A}} {\bf v}_\eta$$ based on this decomposition, we truncate the summations over $$j$$ for each $$p,q$$ at some index $$M_{p,q}\in {\mathbb{N}}$$, to be determined later, and then replace the remaining terms $${\mathbf{A}}_j$$ by compressed versions, again depending on the respective $$p,q$$. With $$e_M$$ defined for non-negative integer $$M$$ as in (4.3), for any given choice of $$M_{p,q}$$, we have $$\label{est1}{\left\lVert{{\mathbf{A}} {\bf v} - \sum_{p, q \geq 0} \sum_{j=0}^{M_{p,q}} ({\mathbf{A}}_{j} \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]} }\right\rVert} \leq \sum_{p, q \geq 0 } e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert}.$$ (5.16) We now choose the $$M_{p,q}= M_{p,q}(\eta)$$ such that $$\label{est2}\sum_{p, q \geq 0 } e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert} \leq \frac{\eta}{4}.$$ (5.17) This can be done by choosing positive weights $$\alpha_{p,q}$$ such that $$\sum_{p,q}\alpha_{p,q}=1$$, computing $${\lVert{{\bf v}_{[p,q]}}\rVert}$$ and adjusting the $$M_{p,q}$$ so as to guarantee that $$\label{etapq0}e_{M_{p,q}}\|{\bf v}_{[p,q]}\|\le \eta_{p,q} := \alpha_{p,q}\eta/4.$$ (5.18) We will give an a priori choice for $$M_{p,q}$$ in (5.29) below, but one may as well use, e.g., the Greedy scheme proposed in Gittelson (2013) for selecting these values. (S3) Adaptive application of the spatial components $${\mathbf{A}}_j$$: Next, to realize an approximate application of the (generally) infinite matrices $${\mathbf{A}}_j$$ to $${\bf U}^{({\rm x})}_k$$ in (5.15), we replace $${\mathbf{A}}_j {\bf v}_{[p,q]}$$ by an approximation $$\tilde{\mathbf{A}}_{j,p,q} {\bf v}_{[p,q]}$$ using (2.10) so as to satisfy $$\label{etapq}{\left\lVert{ \sum_{j=0}^{M_{p,q}} ({\mathbf{A}}_j - \tilde{\mathbf{A}}_{j,p,q})\otimes {\mathbf{M}}_j \, {\bf v}_{[p,q]} }\right\rVert} \leq \eta_{p,q}.$$ (5.19) The approximate operators $$\tilde{\mathbf{A}}_{j,p,q}$$ will be specified later. The sought approximation of $${\mathbf{A}}{\bf v}$$ can now be obtained as $$\label{weta assembly}{\bf w}_\eta := \sum_{p,q \geq 0} \sum_{j=0}^{M_{p,q}} (\tilde{\mathbf{A}}_{j,p,q} \otimes {\mathbf{M}}_j) {\bf v}_{[p,q]} ,$$ (5.20) which by the above construction satisfies the computable error bound $$\label{errorbound}{\lVert{{\mathbf{A}} {\bf v}_\eta - {\bf w}_\eta}\rVert} \leq \sum_{p,q \geq 0 } \bigl( e_{M_{p,q}} {\lVert{{\bf v}_{[p,q]}}\rVert} + \eta_{p,q} \bigr) \leq \eta/2 ,$$ (5.21) so that in summary $$\|{\mathbf{A}} {\bf v} - {\bf w}_\eta\|\le \eta$$. In summary, the above adaptive approximation of $${\mathbf{A}}$$ to a given finitely supported $${\bf v}$$ involves the following steps: $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). $${\small{\text{APPLY}}}\!: {\bf v} \to {\bf w}_\eta$$, with $${\bf v}$$ given by its SVD (S1): compute $${\bf v}_\eta \,{:=}\, {\small{\text{COARSEN}}}_{\rm y}({\small{\text{RECOMPRESS}}}({\bf v}; \eta/4{\lVert{{\mathbf{A}}}\rVert}); \eta/4{\lVert{{\mathbf{A}}}\rVert})$$ and (quasi-)sort1 the entries of $$\pi^{({\rm y})}({\bf v}_\eta)$$ to obtain the sets $${\it{\Lambda}}^{({\rm y})}_{q}$$; (S2): compute the quantities $${\lVert{{\bf v}_{p,q}}\rVert}$$ and determine the truncation values $$M_{p,q}= M_{p,q}(\eta)$$; (S3): compute the quantities $$\bigl(\pi^{({\rm x})}_\nu({\bf v}_{[p,q]})\bigr)_{\nu\in{\mathcal{S}}}$$ and use these to obtain the compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$, using (5.14) in the assembly step (5.20). 5.2 Complexity analysis To quantify the complexity of computing $${\bf w}_\eta$$ in (5.20), we need to specify the properties of the operator $$A(y)$$ as well as the sparsity properties of the input. In view of our preceding discussion, in the scenario of primary interest, the singular values of the solution $${\bf u}$$, as well as the best $$n$$-term approximations of the contractions $$\pi^{(i)}({\bf u})$$, $$i\in \{{\rm x}, {\rm y}\}$$, exhibit algebraic decay rates. As before, these rates are denoted by $$\bar s$$ and $$s_{\rm x}$$, $$s_{\rm y}$$, respectively. As indicated earlier, the complexity of the above scheme depends, in particular, on the operator approximability by truncation. We adhere to the natural assumption that $$e_M \le CM^{-S}$$ for some positive $$S$$, see (4.4). In the subsequent discussion, we assume $$S> s_i$$, $$i\in \{{\rm x},{\rm y}\}$$. As discussed in detail in Section 6.2, this holds true for the expansion model of Assumption 4.2 with $$S = \frac{\alpha}{m}$$. We gather next the properties upon which the complexity analysis will be based. Assumption 5.2 The solution $${\bf u}$$ to (2.5) and the matrix $${\mathbf{A}}$$ have the following properties: (i) One has $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s_i}$$, $$i={\rm x},{\rm y}$$, with $$s_{\rm x}, s_{\rm y} >0$$. (ii) $${\bf u}, {\mathbf{f}} \in {{\it{\Sigma}}^{\bar s}}$$ for some $$\bar s \ge s_{\rm x}, s_{\rm y}$$. (iii) There exists a constant $$C$$ such that $$e_M \le CM^{-S}$$, $$M\in {\mathbb{N}}$$, where $$e_M$$ is defined by (4.3) and \begin{align}\label{Ssi}S\ge \bar s , s_{\rm y}. \end{align} (5.22) (iv) The representations $${\mathbf{A}}_j$$, $$j\in {\mathbb{N}}$$, satisfy the assumptions of Proposition 4.3, where $$\tau$$ satisfies \begin{align}\label{tau}\frac{2\tau}{1+2\tau} \frac {\alpha}{m} = \frac{2\tau}{1+2\tau} S > s_{\rm x}. \end{align} (5.23) Under these conditions, one can construct a routine $${\small{\text{RHS}}}$$ that satisfies, for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta \,{:=}\, {\small{\text{RHS}}}(\eta)$$, \begin{gather*}\#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim \eta^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^{s_i}}, \quad {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^{s_i}} \lesssim {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^{s_i}}, \quad i \in \{{\rm x},{\rm y} \}, \\ {\mathrm{rank}}({\mathbf{f}}_\eta) \lesssim \eta^{-\frac {1}{\bar s}} {\lVert{{\mathbf{f}}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}} , \quad {\lVert{{\mathbf{f}}_\eta}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \lesssim {\lVert{{\mathbf{f}}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}. \end{gather*} Assuming full knowledge of $${\mathbf{f}}$$, one can also realize $${\small{\text{RHS}}}$$ using $$\mathcal{O}\Big(\eta^{-\tfrac{1}{\bar s} - \tfrac{1}{\min\{ s_{\rm x}, s_{\rm y} \}}}\Big)$$ operations, and we shall make this idealized assumption in what follows. As in Section 3, the requirements on $${\small{\text{RHS}}}$$ simplify substantially when the corresponding right-hand side $$f$$ is independent of the parametric variable. The main result of this section states that up to a logarithmic factor the sparsity properties of the input are preserved by the output of $${\small{\text{APPLY}}}$$. Theorem 5.3 Suppose that the properties listed under Assumption 5.2 hold. Then, given any finitely supported input $${\bf v}\in \ell^{2}({\cal S}\times {\cal F})$$, the output $${\bf w}_\eta$$ produced by the procedure $${\small{\text{APPLY}}}$$, based on the steps (S1)–(S3), satisfies $$\label{accuracy}\|{\mathbf{A}} {\bf v} - {\bf w}_\eta\|\le \eta.$$ (5.24) Moreover, with some $$b\le 2+ \frac{4}{s_{\rm x}}$$ one has $$\label{rankbars}{\mathrm{rank}}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{\bar s}}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}} (1+{\lvert{\log\eta}\rvert})^b , \quad\|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}} (1+{\lvert{\log\eta}\rvert})^{\bar s b },$$ (5.25) and $$\label{rp}\begin{gathered}\#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{s_{\rm y}}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac{1}{s_{\rm y}}}(1+{\lvert{\log \eta}\rvert})^{b}, \\ \|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}} \lesssim \|{\bf v}\|_{{\mathcal{A}}^{s_{\rm y}}} (1+{\lvert{\log \eta}\rvert})^{s_{\rm y} b}, \end{gathered}$$ (5.26) as well as $$\label{sstab}\begin{gathered}\#{\mathrm{supp}}_{\rm x}({\bf w}_\eta)\lesssim \eta^{-\frac {1}{s_{\rm x}}}\|\pi^{({\rm x})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}} (1+{\lvert{\log \eta}\rvert})^{b}, \\ \|\pi^{({\rm x})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm x}}} \lesssim \|\pi^{({\rm x})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm x}}} (1+{\lvert{\log \eta}\rvert})^{s_{\rm x} b}, \end{gathered}$$ (5.27) where the constants depend also on $$s_i$$, on $${\lvert{\log \|\pi^{(i)}({\bf v})\|_{{\mathcal{A}}^{s_i}}}\rvert}$$, $$i\in \{{\rm x}, {\rm y}\}$$ and on $$\tau$$ in Assumption 5.2. Proof. The error bound (5.24) is implied by the construction. As for the remaining claims, to assess the complexity of computing $${\bf w}_\eta$$, given by (5.20), we estimate first $$M_{p,q}=M_{p,q}(\eta)$$ in terms of $$\eta$$. To obtain a priori bounds for the $$M_{p,q}$$, we use Assumption 5.2(i) and (ii) to conclude that $$\label{vpqest}{\lVert{{\bf v}_{[p,q]}}\rVert} \leq 2^{-s_{\rm y} q}{\lVert{\pi^{({\rm y})}({\bf v})}\rVert} _{\mathcal{A}^{s_{\rm y}}} , \qquad {\lVert{{\bf v}_{[p,q]}}\rVert} \leq 2^{-\bar s p} {\lVert{ {\bf v} }\rVert}_{{{\it{\Sigma}}^{\bar s}}}\!.$$ (5.28) Then Assumption 5.2(iii) and (5.28) yield the sufficient conditions $$\label{Mpqest}M_{p,q}= M_{p,q}(\eta) \ge \left( \frac{ 4 C \min \{ 2^{-\bar s p} {\lVert{ {\bf v} }\rVert}_{{{\it{\Sigma}}^{\bar s}}} , 2^{-s_{\rm y} q} {\lVert{ \pi^{({\rm y})}({\bf v}) }\rVert}_{\mathcal{A}^{s_{\rm y}}} \}}{ \alpha_{p,q} \, \eta } \right)^{\frac{1}{S}} .$$ (5.29) From (5.20) and the decomposition (5.15), we see that $$\label{rankw}{\mathrm{rank}}({\bf w}_\eta) \leq \sum_{p,q\geq 0} M_{p,q} 2^p , \quad\# {\mathrm{supp}}_{\rm y}({\bf w}_\eta) \leq \sum_{p,q\geq 0 } 3 M_{p,q} 2^q\!.$$ (5.30) □ Note that the factor of $$3$$ in the bound for $$\# {\mathrm{supp}}_{\rm y}({\bf w}_\eta)$$ results from the bidiagonal form of the matrices $${\mathbf{M}}_j$$; that is, the action for each of these matrices can add at most twice the number of nonzero entries in the preimage sequence, in addition to the existing ones. The following lemma provides bounds for the right-hand sides in (5.30). Lemma 5.4 For any fixed constant $$a>1$$ choose \begin{align}\label{alphapq}\alpha_{p,q}=c\,\big((1+ p)(1+q)\big)^{-a}, \quad c: = \left(\sum_{p,q\geq 0} \big((1+ p)(1+q)\big)^{-a}\right)^{-1}, \end{align} (5.31) as weights in (5.29). Then for $$S\geq \bar s$$ one has \begin{align}\sum_{p,q}2^p M_{p,q}&\lesssim \eta^{-\frac {1}{S}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}\bigl(1+\log_2\#{\mathrm{supp}}_{\rm y}({\bf v})\bigr)^{1+\frac {a}{S}} \notag\\ &\quad\times \bigl(1+\log_2{\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\bigl({\mathrm{rank}}({\bf v}_\eta)\bigr)^{1- \frac{\bar s}{S}},\label{simplep}\end{align} (5.32) where the constant depends on $$a,S$$ and $$\bar s$$ on $$c$$ in (5.31) and on $$C$$ in Assumption 5.2(iii). Similarly, for $$S \geq s_{\rm y}$$ one has \begin{align}\sum_{p,q}2^q M_{p,q}&\lesssim \eta^{-\frac {1}{S}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}}\bigl(1+ \log_2 {\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}} \notag\\ &\quad\times \bigl(1+ \log_2\#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}} \bigl(\#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1- \frac{s_{\rm y}}{S}},\label{simpleq}\end{align} (5.33) with similar dependencies of the constants as before, but with $$\bar s$$ replaced by $$s_{\rm y}$$. Proof. Bounding $$M_{p,q}\lesssim \eta^{-\frac {1}{S}} \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}(1+q)^{\frac {a}{S}}(1+p)^{\frac {a}{S}} 2^{-\frac{\bar s p}{S}}$$, we derive \begin{align}\label{pestimate} \sum_{p,q}2^p M_{p,q}\lesssim \eta^{-\frac {1}{S}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{S}}\bigl(1+\log_2 \#{\mathrm{supp}}_{\rm y}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\sum_p (1+p)^{\frac {a}{S}}2^{p \left(1-\frac{\bar s}{S} \right)}\!, \end{align} (5.34) which gives (5.32), where the constant depends on $$a,S, \bar s$$ and $$c,C$$ from (5.29). To bound $$\sum_{p,q}2^qM_{p,q}$$, we use $$M_{p,q}\lesssim \eta^{-\frac {1}{S}}\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}} (1+p)^{\frac {a}{S}} (1+q)^{\frac {a}{S}}2^{-\frac{s_{\rm y} q}{S}}$$ and obtain \begin{align*}\sum_{p,q}2^q M_{p,q} \lesssim \eta^{-\frac {1}{S}}\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}}\bigl(1+ \log_2{\mathrm{rank}}({\bf v}_\eta)\bigr)^{1+\frac {a}{S}}\sum_q (1+q)^{\frac {a}{S}}2^{q \left(1- \frac{s_{\rm y}}{S} \right)}\!, \end{align*} which yields (5.33). □ We proceed estimating the various sparsity norms of $${\bf w}_\eta$$. We first address rank growth and parametric sparsity, which are independent of the specific choice of $$\tilde{\mathbf{A}}_{j,p,q}$$. Using (5.30) and (5.32) in Lemma 5.4 together with (5.11) and (5.12), for $$S \geq \bar s$$ we obtain \begin{align}\label{rankfirst}{\mathrm{rank}}({\bf w}_\eta) &\lesssim \eta^{-\frac {1}{S}}(1+{\lvert{\log\eta}\rvert})^{2\left(1+\frac {a}{S}\right)}\|{\bf v}\|^{\frac {1}{S}}_{{{\it{\Sigma}}^{\bar s}}}\,\eta^{-\frac{1}{\bar s}\left(1-\frac{\bar s}{S}\right)}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}\left(1-\frac{\bar s}{S}\right)}\!, \nonumber\\ & = \eta^{-\frac{1}{\bar s}}\| {\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac{1}{\bar s}}(1+{\lvert{\log\eta}\rvert})^{2\left(1+\frac {a}{S}\right)}\!, \end{align} (5.35) where the constant depends also on $${\lvert{\log \|\pi^{(i)}({\bf v})\|_{{\mathcal{A}}^{s_i}}}\rvert}$$, $$i\in \{{\rm x}, {\rm y}\}$$. Now suppose that $$N_\eta$$ is an upper bound for $${\mathrm{rank}}({\bf w}_\eta)$$. To simplify the exposition, let us assume without loss of generality that $$\eta \in(0,1)$$. Then, by definition, one has \begin{align*}\|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}& = \sup_{N\le N_\eta}N^{\bar s}\inf_{{\mathrm{rank}}({\bf w})\le N}\|{\bf w}_\eta - {\bf w}\| \le \sup_{B\in [1,\eta^{-1}]}N_{B\eta}^{\bar s}\|{\bf w}_\eta-{\bf w}_{B\eta}\|\\ &\le \sup_{B\in [1,\eta^{-1}]} N_{B\eta}^{\bar s}\big(\|{\bf w}_\eta -{\mathbf{A}} {\bf v}_\eta\|+ \|{\mathbf{A}} {\bf v}_\eta-{\bf w}_{B\eta}\|\big)\le \sup_{B\in [1,\eta^{-1}]} 2B\eta N_{B\eta}^{\bar s}. \end{align*} Now we can invoke for each $$B\in [1,\eta^{-1}]$$ the upper bound for $${\mathrm{rank}}({\bf v}_\eta)$$ given by (5.35) and observe that the resulting bound is maximized for $$B=\eta^{-1}$$ when $$S\ge \bar s$$. This gives $$\label{stabbars} \|{\bf w}_\eta\|_{{{\it{\Sigma}}^{\bar s}}}\lesssim \|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}} (1+{\lvert{\log\eta}\rvert})^{2\bar s\left(1+\frac {a}{S}\right)}\!,$$ (5.36) which confirms (5.25). Similarly, using the second estimate in (5.30) and (5.33) in Lemma 5.4 and invoking (5.12) yields, for $$S \geq s_{\rm y}$$, $$\label{suppp}\#{\mathrm{supp}}_{\rm y}({\bf w}_\eta) \lesssim \eta^{-\frac {1}{S}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac {1}{S}} (1+{\lvert{\log_2 \eta}\rvert})^{2+\frac{2 a}{S}}\left(\frac{\|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}}{\eta}\right)^{\frac{1}{s_{\rm y}}\left(1- \frac{s_{\rm y}}{S}\right)}\!.$$ (5.37) By the same argument as before, one obtains $$\label{suppp2}\#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)\lesssim \eta^{-\frac{1}{s_{\rm y}}} \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}}^{\frac{1}{s_{\rm y}}} (1+{\lvert{\log_2 \eta}\rvert})^{2+\frac{2 a}{S}}.$$ (5.38) We can then continue as above, denoting by $$M_\eta$$ an upper bound for $$\#{\mathrm{supp}}_{\rm y}({\bf w}_\eta)$$, to argue \begin{align*}\|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}} &\le \sup_{B\in [1,\eta^{-1}]} M_{B\eta}^{s_{\rm y}}\big(\|\pi^{({\rm y})}({\bf w}_\eta)-\pi^{({\rm y})}({\mathbf{A}}{\bf v}_\eta)\| + \|\pi^{({\rm y})}({\bf w}_{B\eta})-\pi^{({\rm y})}({\mathbf{A}}{\bf v}_\eta)\|\big)\\ &\leq \sup_{B\in [1,\eta^{-1}]} M_{B\eta}^{s_{\rm y}}\big(\| {\bf w}_\eta - {\mathbf{A}}{\bf v}\| + \| {\bf w}_{B\eta}- {\mathbf{A}}{\bf v}\|\big)\\ &\le \sup_{B\in [1,\eta^{-1}]} 2B\eta M_{B\eta}^{s_{\rm y}}. \end{align*} Thus, we obtain $$\label{stabsp} \|\pi^{({\rm y})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm y}}}\lesssim \|\pi^{({\rm y})}({\bf v})\|_{{\mathcal{A}}^{s_{\rm y}}} (1+{\lvert{\log_2 \eta}\rvert})^{2s_{\rm y}\left(1+\frac{a}{S}\right)}\!,$$ (5.39) which together with (5.38) shows (5.26). We now turn to estimating $$\#{\mathrm{supp}}_{\rm x}({\bf w}_\eta)$$ and $$\|\pi^{({\rm x})}({\bf w}_\eta)\|_{{\mathcal{A}}^{s_{\rm x}}}$$. To this end, we specify suitable compressed matrices $$\tilde{\mathbf{A}}_{j,p,q}$$ in (5.19). Denoting by $$\pi^{({\rm x})}({\bf v}_{[p,q]})_{\ell}$$ the best $$\ell$$-term approximation of $$\pi^{({\rm x})}({\bf v}_{[p,q]})$$, we set $${\it{\Lambda}}_{p,q,0} := {\mathrm{supp}}(\pi^{({\rm x})}({\bf v}_{[p,q]})_{1})$$ and ${\it{\Lambda}}_{p,q,n}:= {\mathrm{supp}}\big(\pi^{({\rm x})}({\bf v}_{[p,q]})_{2^n}\big) \setminus {\mathrm{supp}}\big(\pi^{({\rm x})}({\bf v}_{[p,q]})_{2^{n-1}}\big), \quad n\in{\mathbb{N}}.$ Note that ${\lVert{ {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}\times {\mathcal{F}}}} {\bf v}_{[p,q]} }\rVert}\leq {\lVert{ {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}} \pi^{({\rm x})}({\bf v}_{[p,q]}) }\rVert} \leq 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}\!.$ To proceed, we employ the following convenient reformulation of Proposition 4.3. Remark 5.5 Let $$M \in {\mathbb{N}}$$ and $$s < \frac{2\tau}{1+2\tau} S$$. Then for any $$J\in {\mathbb{N}}$$ we can find $${\mathbf{A}}_j^J$$, $$j\geq 0$$, such that \begin{equation*}{\left\lVert{\sum_{j=0}^M \bigl( {\mathbf{A}}_j - {\mathbf{A}}_j^J \bigr) \otimes {\mathbf{M}}_j}\right\rVert} \leq \beta_J 2^{-s J}, \end{equation*} and the following holds: for each $$\lambda\in{\mathcal{S}}$$, for the sum of the number of corresponding nonzero column entries of the $${\mathbf{A}}_j^J$$ we have the bound $$\label{sparsecompr_sum}\sum_{j = 0}^M \#{\mathrm{supp}} \,\bigl( {\mathbf{A}}_{j,\lambda'\lambda}^J \bigr)_{\lambda'\in{\mathcal{S}}}\leq \alpha_J 2^J.$$ (5.40) Here $$\boldsymbol{\alpha},\boldsymbol{\beta}$$ are positive summable sequences. For a suitable non-negative integer $$N=N_{j,p,q,\eta}$$, let $$\tilde{\mathbf{A}}_{j,p,q} \,{:=}\, \sum_{n = 0}^N {\mathbf{A}}_j^{N - n} {\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}}$$ and $$\label{wpq}{\bf w}_{p,q} := \sum_{j=0}^{M_{p,q}}(\tilde{\mathbf{A}}_{j,p,q} \otimes {\mathbf{M}}_j){\bf v}_{[p,q]}.$$ (5.41) Then \begin{equation*}{\left\lVert{{\bf w}_{p,q} - \sum_{j=0}^{M_{p,q}}\big( {\mathbf{A}}_j \otimes{\mathbf{M}}_j \big){\bf v}_{[p,q]} }\right\rVert} = {\left\lVert{\sum_{j=0}^{M_{p,q}} \sum_{n = 0}^N \big( ({\mathbf{A}}_j^{N-n} - {\mathbf{A}}_{j}){\mathrm{R}_{{\it{\Lambda}}_{p,q,n}}} \otimes{\mathbf{M}}_j\big) {\bf v}_{[p,q]} }\right\rVert} . \end{equation*} Using Remark 5.5 with $$s=s_{\rm x}$$, the right side can be estimated by \begin{align*}&\sum_{n = 0}^N \beta_{N-n} 2^{-s_{\rm x}(N-n)} 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} + 2 {\lVert{{\mathbf{A}}}\rVert} \sum_{n>N} 2^{-s_{\rm x} n} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} \\ &\quad\lesssim 2^{-s_{\rm x} N} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} , \end{align*} where the constant depends on $$s_{\rm x}$$, $${\lVert{{\mathbf{A}}}\rVert}$$ and $${\lVert{ \boldsymbol{\beta} }\rVert}_{\ell^1}$$. By (5.40), we obtain $$\label{supp pq}\#{\mathrm{supp}}_{\rm x} ({\bf w}_{p,q}) \lesssim \sum_{n = 0}^N 2^n \alpha_{N-n} 2^{N-n} \lesssim 2^N.$$ (5.42) If we now choose the smallest $$N$$ such that (5.19) holds, i.e., $$2^{-s_{\rm x} N} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}} \lesssim \eta_{p,q}$$, we obtain \begin{equation*}\#{\mathrm{supp}}_{\rm x} ({\bf w}_{p,q}) \lesssim \eta_{p,q}^{-\frac {1}{s_{\rm x}}} {\lVert{ \pi^{({\rm x})}({\bf v}_{[p,q]})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}}\lesssim \eta_{p,q}^{-\frac {1}{s_{\rm x}}}{\lVert{ \pi^{({\rm x})}({\bf v}_{\eta})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}}\!. \end{equation*} Keeping the definition of $$\eta_{p,q}=\alpha_{p,q}\eta$$ and (5.11), (5.12) in mind, summing over $$p,q$$ gives (5.27) with $$b= 2\big(1+\frac{a}{s_{\rm x}}\big) > 2\big(1+\frac{a}{S}\big)$$, where the bound on $${\lVert{\pi^{({\rm y})}({\bf w}_\eta)}\rVert}_{{\mathcal{A}}^{s_{\rm y}}}$$ follows as in (5.36) and (5.39). Remark 5.6 Note that in Assumption 5.2, we state that $$S\geq \bar s, s_{\rm y}$$ and $$S > s_{\rm x}$$. Although other cases can in principle be considered in the same manner, the convergence rate $$S$$ of the operator truncation then limits the achievable efficiency: if $$S< \bar s$$, for instance, it is easy to see that in general one can only obtain $${\mathrm{rank}}({\bf w}_\eta) = {\mathcal{O}}(\eta^{-1/S})$$. Proposition 5.7 Under the assumptions of Theorem 5.3, let $${\bf v}$$ be given by its SVD with $$r\,{\,{:=}\,}\,{\mathrm{rank}}({\bf v})$$ and $$n_i\,{\,{:=}\,}\,\#{\mathrm{supp}}_i({\bf v})$$ for $$i\in \{{\rm x},{\rm y}\}$$. Then for the number of operations $${\mathrm{ops}}({\bf w}_\eta)$$ required to obtain $${\bf w}_\eta$$, one has \begin{align}{\mathrm{ops}}({\bf w}_\eta) &\lesssim (n_{\rm x} + n_{\rm y}) r^2 + \left( ( 1 + {\lvert{\log \eta}\rvert})^{\frac{2a}{s_{\rm x}}} \eta^{-\frac {1}{s_{\rm x}}}{\lVert{ \pi^{({\rm x})}({\bf v})}\rVert}_{{\mathcal{A}}^{s_{\rm x}}}^{\frac {1}{s_{\rm x}}} \right.\notag\\ &\quad\left. + (1+{\lvert{\log \eta}\rvert} )^{\frac {2a}{S}} \eta^{-\frac {1}{s_{\rm y}}} {\lVert{\pi^{({\rm y})}({\bf v})}\rVert}^{\frac {1}{s_{\rm y}}}_{{\mathcal{A}}^{s_{\rm y}}} \right) \eta^{-\frac 1{\bar s}}\|{\bf v}\|_{{{\it{\Sigma}}^{\bar s}}}^{\frac 1{\bar s}}. \end{align} (5.43) For the proof of this proposition, we refer to Appendix B. With these preparations, we obtain the following complexity estimate for Algorithm 2.1. Theorem 5.8 Let Assumption 5.2 hold. Then for any $$\varepsilon>0$$, the approximation $${\bf u}_\varepsilon$$ of the form (LR) produced by Algorithm 2.1 satisfies $$\label{hsrankbound}{\mathrm{rank}}({\bf u}_\varepsilon) \lesssim \varepsilon^{-\frac {1}{\bar s}} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}}, \quad {\lVert{{\bf u}_\varepsilon}\rVert}_{{{\it{\Sigma}}^{\bar s}}} \lesssim {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}$$ (5.44) and $$\label{hssuppbound}\sum_{i\in\{{\rm x},{\rm y}\}} \#{\mathrm{supp}}_i ({\bf u}_\varepsilon) \lesssim \sum_{i\in\{{\rm x},{\rm y}\}} \varepsilon^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}^{-\frac{1}{s_i}}, \quad{\lVert{\pi^{(i)}({\bf u}_\varepsilon)}\rVert}_{{\mathcal{A}}^{s_i}} \lesssim {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}\!.$$ (5.45) The number of operations $${\mathrm{ops}}({\bf u}_\varepsilon)$$ required to produce $$\varepsilon$$ then satisfies $$\label{hsopbound}{\mathrm{ops}}({\bf u}_\varepsilon) \lesssim 1+ ( 1 + {\lvert{\log \varepsilon}\rvert})^\zeta \left( \varepsilon^{-\frac {1}{\bar s}} {\lVert{{\bf u}}\rVert}_{{{\it{\Sigma}}^{\bar s}}}^{\frac {1}{\bar s}} \right)^2 \sum_{i\in\{{\rm x},{\rm y}\}} \varepsilon^{-\frac {1}{s_i}} {\lVert{\pi^{(i)}({\bf u})}\rVert}_{{\mathcal{A}}^{s_i}}^{-\frac{1}{s_i}} ,$$ (5.46) where $$\zeta>0$$ depends on $$s_{\rm x}$$, on $$\operatorname{cond}({\mathbf{A}})$$, and on the choice of $$\kappa_1, \beta$$ in Algorithm 2.1. The constants in (5.44), (5.45) and (5.46) may also depend on $$S$$, $$\bar s$$, $$s_{\rm y}$$ and on the further parameters of Algorithm 2.1. Proof. We follow the general strategy of the proofs as in Bachmayr & Dahmen (2015) and in Theorem 3.4, combining the properties of the complexity reduction procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with the specific adaptive operator application that we have constructed for the present problem. The bound (5.44) and (5.45) follow from Theorem 5.1 applied to the result of line 11 in Algorithm 2.1. Note that here, the number $$J$$ of inner iterations depends only on $$\operatorname{cond}({\mathbf{A}})$$ (via $$\rho,\omega$$) and on the choice of $$\kappa_1$$ and $$\beta$$. With the complexity estimates for $${\small{\text{APPLY}}}$$ from Theorem 5.3 and Proposition 5.7 at hand, we obtain (5.46). □ Remark 5.9 The present version of Algorithm 2.1 necessarily performs orthogonalizations of the basis vectors in the computed low-rank expansions. Under the ensuing requirement of common supports, Theorem 5.8 shows that the output of Algorithm 2.1 for the format (LR) has optimal representation complexity. Regarding the computational complexity of the algorithm, as can be seen from the proofs of Theorem 5.3 and Proposition 5.7, the numerical cost for the approximate operator application is dominated by the cost of performing orthogonalizations of the input. In particular, this leads to a quadratic dependence on the approximation ranks, and up to the logarithmic term, (5.46) represents the best possible bound for an algorithm performing such orthogonalizations. The number of subsequent operations required to construct the low-rank representation of the output, however, remains proportional to the respective number of degrees of freedom. 6. Approximability of parametric problems In this section, we consider representative instances of (1.5) to compare the respective properties that determine the efficiency of the variants of our scheme for (ASP), (LR) and (STF). 6.1 Isotropic dependence on finitely many parameters As simple yet instructive examples, we consider problems with $$\bar{a}=1$$ and $$\label{cm}{\theta}_j = b_j \chi_{D_j},$$ (6.1) where $$b_j \in ]0,1[$$ are constants and the subdomains $$D_j$$ of the domain $$D$$ have disjoint closures so that the diffusion coefficient is a strictly positive piecewise constant, $$\label{inclusions}a(y) = \bar{a} + \sum_{j\geq 1} y_j b_j \chi_{D_j}.$$ (6.2) As a first problem of this type with $$D = ]0,1[$$, we consider the following. Example 6.1 Let $$d := \#\mathcal{I} < \infty$$, $$D_j\subset D\,{=}\,] 0, 1[$$ for $$j=1,\ldots,d$$ with pairwise disjoint $$\overline{D_j}$$, and $$b_j =\xi$$ for some $$\xi\,{\in}\,]0,1[$$. For low-rank approximation, we then have the following result for the rank of the Hilbert–Schmidt decomposition (5.2). Proposition 6.2 In Example 6.1, for any $$f\in V'$$, one has $${\mathrm{rank}}({\bf u}) \leq 4 d + 1$$. Proof. This follows by the same arguments as in Bachmayr & Cohen (2017, Example 2.2): the end points of the $$D_j$$ induce a partition of $$]0,1[$$ into at most $$2d+1$$ intervals. For each such interval $$I$$, for any $$F$$ such that $$F''=f$$, we have $$u(y)|_I \in {\mathrm{span}}\{ \chi_{I} ,x\,\chi_{I} , F\,\chi_{I} \}$$. Hence, $$u(y)$$ is contained in a $$y$$-independent space of dimension $$6d+3$$ for all $$y$$. In addition, there are $$2d+2$$ continuity conditions, independent of $$y$$, at the interval boundaries, which leaves at most $$4d+1$$ degrees of freedom. □ We observe, on the other hand, that the Legendre expansions for this problem involves infinitely many nonzero coefficients, that is, the solution map $$y\mapsto u(y)$$ is not a polynomial in $$y$$. This can be checked, for example, by considering the Taylor coefficients of $$u$$. For any $$\nu=(\nu_j)_{j\geq 1}\in {\mathcal{F}}$$, the coefficients in the Taylor expansion of $$u$$ are given by $$t_\nu(y) = \frac {1}{\nu!}\partial^\nu u(y), \quad \nu !:=\prod_{j \geq 1}\nu_j !.$$ (6.3) Denoting by $$e^j=(0,\dots,0,1,0,\dots)$$ the $$j$$th Kronecker sequence, differentiating the equation we find that these coefficients are given by the recursion $$\label{eq:taylor_recursion}t_\nu(y) := -A(y)^{-1}\sum_{j\in{\mathrm{supp}}\nu} A_j t_{\nu - {\it \,{e}}^j}(y), \quad t_0(y) = A(y)^{-1} f=u(y).$$ (6.4) We now consider the Taylor coefficients of order $$n$$ in a given variable $$j$$ at the origin, that is, $$t_{n,j}:=t_{ne^j} (0)=\frac 1 {n!} \partial^{n}_{y_j} u(0).$$ (6.5) As a particular case of (6.4), we have $$\int_D \bar a \nabla t_{n,j}\cdot \nabla v \,{\rm{d}}x =-\int_D \theta_j \nabla t_{n-1,j} \cdot\nabla v \,{\rm{d}}x .$$ (6.6) Since $$t_{0,j}=u(0)$$ is not trivial, there is at least one variable $$j$$ such that $$t_{1,j}$$ does not vanish on $$D_j$$. Then, taking $$v=t_{n-1,j}$$ in the above recursion shows by contradiction that $$t_{n,j}$$ does not vanish on $$D_j$$, for all values of $$n\geq 0$$. Thus $$y\mapsto u(y)$$ cannot be a polynomial. Low-rank approximations thus give substantially faster convergence than Legendre expansions in this case. Similar results showing substantial advantages of best low-rank approximations have also been obtained for spatially two-dimensional examples of analogous structure in Bachmayr & Cohen (2017). The test problems considered there are of the form (6.2) as well, with the coefficients piecewise constant on $$D\,{:=}\,]0,1[^2$$ and where $$D_j$$, $$j=1,\ldots, d$$ are a partition of $$D$$ into congruent square subdomains. The resulting ‘checkerboard’ geometry is illustrated for $$d = 16$$ in Fig. 1. Fig. 1. View largeDownload slide Example geometry of piecewise constant coefficients $$a$$ with $$d=16$$ and decay of Legendre coefficients of corresponding $$u$$. Fig. 1. View largeDownload slide Example geometry of piecewise constant coefficients $$a$$ with $$d=16$$ and decay of Legendre coefficients of corresponding $$u$$. The low-rank approximability of such problems with respect to space–parameter separation has been studied in Bachmayr & Cohen (2017). For the case $$d=4$$ (i.e., a $$2\times 2$$-checkerboard), it is shown in Bachmayr & Cohen (2017) that for each $$n\in {\mathbb{N}}$$ one can find $$u^{\rm x}_k$$, $$u^{\rm y}_k$$ for $$k=1,\ldots,n$$ such that for some $$c>0$$, ${\left\lVert{ u - \sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k}\right\rVert}_{L^2({Y},V)} \lesssim {\it \,{e}}^{- c n}.$ Numerical tests indicate that an analogous estimate can be achieved also for geometries of the type shown in Fig. 1 with $$d=9,16,25,\ldots$$, where $$c$$ has a moderate dependence on $$d$$. Note also that for a hierarchical tensor representation, the ranks of further matricizations enter as well. We are not aware of any bounds for these additional ranks. The numerically observed decay of the corresponding singular values for different values of $$d$$ (using a linear dimension tree) are shown in Fig. 2. Note that the singular values of the matricization $$T^{\{{\rm x}\}}_{\bf u}$$ are precisely those in the decomposition (5.2) underlying (LR). Fig. 2. View largeDownload slide Hierarchical singular values of $${\bf u}$$, where $$D$$ has $$\sqrt{d}\times\sqrt{d}$$-checkerboard geometry as in Fig. 1. Solid lines: singular values of matricizations $$T^{(\{i\})}_\mathbf{u}$$ associated with $$i\in\hat {\mathcal{I}}$$ (essentially identical for $$i\in{\mathcal{I}}$$, with the line of slower decay corresponding to $$i={\rm x}$$) and dashed lines: singular values of further matricizations in the hierarchical representation. The horizontal axes show the numbers of the decreasingly ordered singular values. Fig. 2. View largeDownload slide Hierarchical singular values of $${\bf u}$$, where $$D$$ has $$\sqrt{d}\times\sqrt{d}$$-checkerboard geometry as in Fig. 1. Solid lines: singular values of matricizations $$T^{(\{i\})}_\mathbf{u}$$ associated with $$i\in\hat {\mathcal{I}}$$ (essentially identical for $$i\in{\mathcal{I}}$$, with the line of slower decay corresponding to $$i={\rm x}$$) and dashed lines: singular values of further matricizations in the hierarchical representation. The horizontal axes show the numbers of the decreasingly ordered singular values. Remark 6.3 As we have noted for the spatially one-dimensional case in Example 6.1 in Section 6, for the separation between spatial and parametric variables for that case one always obtains fixed finite ranks that grow linearly in the number of parameters $$d$$. Note, however, that the approximation ranks corresponding to further separations among the parametric variables may then still not be uniformly bounded; see e.g., (Khoromskij & Schwab, 2011, Proposition 2.5) for an analysis of a simple example. In all examples considered above, we observe exponential-type decay of singular values. In particular, the numerical results in Fig. 2 indicate that Assumptions 3.2 and 3.3 are met for this family of problems. In fact, the obtained hierarchical singular values are consistent with the decay $$\exp({-\bar c n^{1/\bar b}})$$ with some $$\bar b > 1$$ independent of $$d$$ and $$\bar c >0$$ algebraic in $$d$$. The compressibility of any desired order of the operators $${\mathbf{A}}_j$$ is known from classical wavelet theory. In contrast, the decay of Legendre coefficients $${\lVert{u_\nu}\rVert} \sim \pi^{({\rm y})}_\nu({\bf u})$$ is significantly slower, where as in Fig. 1 one only observes a decay of the form $$\exp({-c n^{1/d}})$$ for the $$n$$th largest $${\lVert{u_\nu}\rVert}$$ as predicted by the available estimates (see, e.g., Bachmayr & Cohen, 2017). This indicates a clear advantage of hierarchical tensor approximations of the form (STF) over sparse polynomial expansions (ASP) for such problems. 6.2 Anisotropic dependence on infinitely many parameters We next consider a problem of the form (6.2) with countably many parameters of decreasing influence, where our conclusion are quite different from those concerning Example 6.1. Example 6.4 Let $$\mathcal{I} = {\mathbb{N}}$$, and let $$D_j\,{\subset}\,]0,1[$$ be disjoint with $${\lvert{D_j}\rvert}>0$$ for all $$j$$. In addition, let $$(b_j)_{j\geq 1} \in \ell^{q}({\mathbb{N}})$$ for some $$q>0$$. As an immediate consequence of the results in Bachmayr et al. (2017, Section 4.1), one has the following: Proposition 6.5 In Example 6.4, for all right-hand sides $$f\,{\in}\,V'$$, one has $$({\lVert{u_\nu}\rVert}_V)_{\nu \in {\mathcal{F}}}\,{\in}\,\ell^{p}({\mathcal{F}})$$ for $$p = \frac{2q}{2+q}$$. If $$(b_j)\notin \ell^{q'}({\mathbb{N}})$$ for any $$0<q'<q$$, then there exists $$f\in V'$$ such that $$({\lVert{u_\nu}\rVert}_V)_{\nu \in {\mathcal{F}}} \notin \ell^{p'}({\mathcal{F}})$$ for $$0\,{<}\,p'\,{<}\,p$$. If $$\sigma_n$$ are the singular values of $$u$$, then for the decreasing rearrangement $$(u^*_n)_{n\geq 1}$$ of $$({\lVert{u_\nu}\rVert}_V)_{\nu\in{\mathcal{F}}}$$ we clearly have $$u^*_n \geq \sigma_n$$. As the following new result shows by similar arguments as in Bachmayr et al. (2017, Section 4.1), the singular values do not necessarily have faster asymptotic decay in this situation than the ordered norms of the Legendre coefficients. Proposition 6.6 In Example 6.4, if $$(b_j)\notin \ell^{q'}({\mathbb{N}})$$ for any $$0<q'<q$$ then there exists an $$f \in V'$$ such that the singular values of $$u$$ are not in $$\ell^{p'}({\mathbb{N}})$$ for $$0<p' < p = \frac{2q}{2+q}$$. Proof. We first observe that the singular values of $$u = \sum_{\nu\in{\mathcal{F}}} u_\nu\otimes L_\nu$$ are bounded from below by those of $$\tilde u =\sum_{j\geq 1} u_{e^j} \otimes L_{e^j}$$, with $$e^j$$ denoting the $$j$$th Kronecker sequence. This follows from the fact that $$\tilde u = ({\rm I} \otimes \tilde P) u$$, where $$\tilde P$$ is the projector onto $$\overline{\operatorname{span}}\{L_{e^j}\}_{j\geq 1}$$. For $$u_{e^j}$$, one has by Rodrigues’ formula the explicit representation $$\label{rodrigues2}u_{e^j} = \frac{\sqrt3}{2} \int_Y t_{e^j}(y)\, (1 - y_j^2) \,{\rm{d}}\mu(y)$$ (6.7) in terms of the first-order derivatives $$t_{e^j}(y)=\partial_{y_j} u(y)$$. Let $$h_j$$ be the symmetric hat functions with support $$D_j$$. We now choose $f= -\sum_{j\geq 1} c_j h_j'',$ where $$\sum_{j\geq 1} c_j^2 / {\lvert{D_j}\rvert} < \infty$$, which yields $$f\in V'$$ and $t_0(y) = \sum_{j\geq 1} (1 + b_j y_j)^{-1} c_j h_j.$ By (6.4), $$t_{e^j}(y) = -(1 + b_j y_j)^{-2} b_j c_j h_j$$ and as a consequence of (6.7), $u_{e^j} = -M_j b_j c_j h_j, \qquad M_j := \frac{\sqrt3}{2} \int_{-1}^1 \frac{1-y^2}{(1+b_j y)^2}\frac{{\rm \,{d}}y}{2} \geq \frac {1}{4\sqrt{3}(1-\max_j b_j)} =: M_0.$ We thus obtain $$\langle u_{e^i}, u_{e^j}\rangle_V = 0$$ for $$i\neq j$$ as well as ${\lVert{u_{e^j}}\rVert}_V \geq M_0 b_j c_j {\lVert{h_j}\rVert}_V = \frac{2 M_0 b_j c_j}{\sqrt{{\lvert{D_j}\rvert}}}.$ Since $$(b_j)$$ is precisely in $$\ell^{q}({\mathbb{N}})$$, by choosing $$c_j = b_j^{q/2} \sqrt{{\lvert{D_j}\rvert}}$$, which guarantees in particular that $$(c_j/\sqrt{{\lvert{D_j}\rvert}})_{j\geq 1} \in \ell^{2}({\mathbb{N}})$$ as required, we arrive at the statement. □ The above result shows that from an asymptotic point of view, in Example 6.4, there is not necessarily any gain by low-rank approximation: there always exist right-hand sides $$f$$ such that the singular values have precisely the same asymptotic decay as the ordered norms of Legendre coefficients. Numerical tests as in Example 6.10 indicate that this also holds true for problems with different types of parametrization and more general $$f$$. Remark 6.7 The conclusion of Proposition 6.6 reveals that in the case of Example 6.4 and if $$(b_j)_{j\geq 1}\notin\ell^{q'}$$ for all $$0<q'<q$$, then any separable approximation of the form (1.12) satisfies $$\|u-u_n\|_{L^2(Y,V)}\geq c_r n^{-r}, \quad n\geq 1,$$ (6.8) for some $$c_r>0$$, whenever $$r>\frac{1}{q}$$. In turn, we also have $$\|u-u_n\|_{L^\infty(Y,V)} \geq c_r n^{-r}, \quad n\geq 1.$$ (6.9) This implies that the Kolmogorov $$n$$-width $$d_n({\cal M})_V=\inf_{\dim(E)\leq n} \max_{v\in {\cal M}}\operatorname{dist}(v,E)_V,$$ (6.10) of the solution manifold $${\cal M}:=\{u(y)\; : \; y\in Y\}$$ satisfies a similar lower bound $$d_n({\cal M})_V\geq c_r n^{-r}, \quad n\geq 1.$$ (6.11) Although upper bounds for $$d_n({\cal M})_V$$ in parametric PDEs are typically proved by exhibiting a particular separable approximation and studying its convergence in $$L^\infty(Y,V)$$ (see Cohen & DeVore, 2015; Bachmayr & Cohen, 2017), lower bounds are generally out of reach and the ones given above constitute a notable exception. Remark 6.8 One arrives at analogous observations in similar higher-dimensional settings. The construction of Example 6.4 immediately carries over to spatial domains with $$m>1$$ when the definition of $$f$$ is based on higher-dimensional hat functions. We next summarize the available knowledge on approximability of $${\bf u}$$ by representations (ASP) and (LR). Proposition 6.9 Let Assumption 4.2 hold with $$\alpha \notin{\mathbb{N}}$$ and let $$\xi_\mu \in C^{\kappa}(D)$$, $$\mu\in {\it{\Lambda}}$$, for a $$\kappa>\alpha$$. Then, the following holds: (i) $$\pi^{({\rm y})}({\bf u}) \in {\mathcal{A}}^s({\mathcal{F}})$$ for any $$s < s^*_{\rm y} := \frac{\alpha}{m}$$. (ii) For sufficiently regular $$f$$ and $$D$$, and sufficiently regular wavelets $$\psi_\lambda$$, $\pi^{({\rm x})}({\bf u}) \in {\mathcal{A}}^s({\mathcal{S}})\quad\text{for any}\quad s < s^*_{\rm x} \,{:=}\, \frac{\alpha}{m}.$ (iii) $${\bf u} \in {\it{\Sigma}}^{\bar s}$$ for an $$\bar s \geq \max\{s_{\rm x}, s_{\rm y}\}$$. (iv) If $$0<\alpha \leq 1$$, then $${\bf u} \in {\mathcal{A}}^s({\mathcal{S}}\times{\mathcal{F}})$$ for any $$s < \frac{2}{3}\alpha$$ when $$m=1$$ and for any $$s < \frac{\alpha}{m}$$ when $$m=2,3$$. Proof. Statement (i) follows directly from Bachmayr et al. (2017, Corollary 4.2), since $$\pi^{({\rm y})}({\bf u}) \sim {\lVert{u_\nu}\rVert}_V$$; (iii) follows immediately from properties of the Hilbert–Schmidt decomposition and (iv) is shown in Bachmayr et al. (2017, Section 8). To see (ii), note first that by regularity of the $$\xi_\mu$$ and, since $$\alpha \notin {\mathbb{N}}$$, we have $$a(y) \in B^\alpha_{\infty,\infty}(D)= C^\alpha(D)$$ for any $$y\in{Y}$$ and $$\sup_{y\in{Y}} {\lVert{a(y)}\rVert}_{C^\alpha} < \infty$$. Let $$0 < s < \frac{\alpha}{m}$$, and let $$\psi_\lambda$$ be sufficiently smooth to form a Riesz basis of $$H^{1+s}(D)$$. Then, $\sum_{\lambda \in {\mathcal{S}}} 2^{2s {\lvert{\lambda}\rvert}} {\lvert{\pi^{({\rm x})}_\lambda({\bf u})}\rvert}^2 = \sum_{\lambda\in{\mathcal{S}}}\sum_{\nu\in{\mathcal{F}}} 2^{2s{\lvert{\lambda}\rvert}} {\lvert{{\bf u}_{\lambda\nu}}\rvert}^2 \sim \int_{Y}{\lVert{u(y)}\rVert}_{H^{1+s}(D)}^2\,{\rm{d}}\mu(y).$ By Hackbusch (1992, Theorem 9.1.16), using regularity of $$f$$ and $$D$$, we have $${\lVert{u(y)}\rVert}_{H^{1+s}} \lesssim {\lVert{\,f}\rVert}_{H^{-1+s}}$$ uniformly in $$y$$ for any $$s < \alpha/m$$. Here uniformity in $$y$$ can be seen by inspection of the proof (see (see Charrier et al., 2013; Teckentrup et al., 2013)). □ Let us next illustrate the above estimates by a numerical test for $$m=1$$ that confirms that in general, this is indeed the best that one can expect. Example 6.10 We consider $$m=1$$ with $$D\,{=}\,]0,1[$$, $$\bar a = 1$$, $$f=1$$ and $\theta_j(x) = c_\alpha 2^{-\alpha\ell} h(2^\ell x - k), \quad j = 2^\ell + k$ for $$\ell\geq 0$$ and $$k = 0,\ldots,2^\ell-1$$, where $$h(x) = (1 - {\lvert{2x-1}\rvert})_+$$ and $$c_\alpha$$ is chosen so as to ensure uniform ellipticity. In other words, the parameter is expanded in a Schauder hat function basis. As the spatial wavelet basis $$\psi_\lambda$$, we use piecewise polynomial multiwavelets (Donovan et al., 1999). Figure 3 shows the resulting observed decay of the decreasing rearrangements of $${\lvert{{\bf u}_{\lambda,\nu}}\rvert}$$, $$\pi^{({\rm x})}_\lambda({\bf u})$$, $$\pi^{({\rm y})}_\nu({\bf u})$$, and of the singular values $$\sigma_k(u)$$ as in (1.13) (which satisfy $$\sigma_k(u)\sim\sigma_k({\bf u})$$, where $$\sigma_k({\bf u})$$ are the singular values of $${\bf u}$$ as in (5.2)). Here, we focus on $$0<\alpha \leq 1$$. By Proposition 6.9, we expect $${\lvert{{\bf u}_{\lambda,\nu}}\rvert}$$ in Fig. 3(a) to decay at approximately the rate $$\frac{2}{3}\alpha + \frac {1}{2}$$, the values $$\pi^{({\rm x})}_\lambda({\bf u})$$, $$\pi^{({\rm y})}_\nu({\bf u})$$, and $$\sigma_k(u)$$ in Fig. 3(b–d) to decay at the rate $$\alpha + \frac {1}{2}$$. Fig. 3. View largeDownload slide Observed decay rates for $$\alpha = 1$$ (black) and $$\alpha=\frac {1}{2}$$ (grey) in Example 6.10. The dashed lines show the respective decay rates expected according to Proposition 6.9. Fig. 3. View largeDownload slide Observed decay rates for $$\alpha = 1$$ (black) and $$\alpha=\frac {1}{2}$$ (grey) in Example 6.10. The dashed lines show the respective decay rates expected according to Proposition 6.9. The numerical tests confirm in particular that the result $$\label{spstar}s^*_{\rm x} = s^*_{\rm y} = \frac{\alpha}{m},$$ (6.12) obtained for sufficiently regular problem data, is in general sharp. We also see that one cannot expect $${\bf u} \in {\it{\Sigma}}^{\bar s}$$ for any $$\bar s$$ significantly larger than $$\frac{\alpha}{m}$$, and the results indeed suggest the conjecture, similar to what we have shown in a more restricted setting in Proposition 6.6, that $$(\sigma_k)_{k\in{\mathbb{N}}} \notin {\it{\Sigma}}^{s}$$ for any $$s> \max\{s^*_{\rm x}, s^*_{\rm y}\}$$. Moreover, the results in Fig. 3(a) also demonstrate that in the present case with $$m=1$$, one indeed only obtains $${\bf u}\in {\cal A}^s({\mathcal{S}}\times{\mathcal{F}})$$ with $$s\approx \frac{2}{3}\alpha$$. In other words, the statement in Proposition 6.6(iv), shown in Bachmayr et al. (2017), appears to be sharp also for $$m=1$$. This is a surprising difference to the corresponding results for $$m=2,3$$ with $$s$$ up to $$\frac{\alpha}{m}$$, which are necessarily sharp. In contrast, parametric expansions with globally supported $$\theta_j$$ as considered in Section 4.2, expansions with $${\theta}_j$$ of multilevel type as in Assumption 4.2 lead to dimension truncation errors $$e_M$$ with decay $$e_M\lesssim M^{-\alpha/m}$$, which matches the approximability properties of $${\bf u}$$. The residual approximation based on $${\small{\text{APPLY}}}$$ also requires compressibility of the matrices $${\mathbf{A}}_j$$. Under Assumptions 4.2 and A.1 with sufficiently large $$\gamma$$, the resulting $$s^*$$-compressibility of $${\mathbf{A}}$$ comes arbitrarily close to $$s^*=\frac{\alpha}{m}$$. Consequently, the complexity of the resulting schemes (both in Theorems 4.1 and 5.8) can come arbitrarily close to the optimal rates determined by the approximability of $${\bf u}$$. This is again in contrast to the corresponding results for globally supported $$\theta_j$$, as noted in Section 4.2. Finally, we may compare the conclusions of Theorems 4.1 and 5.8 for the approximations (ASP) and (LR), respectively, based on Proposition 6.9 and Example 6.10. We first consider the implications of the results for $$m=1$$ in Fig. 3, assuming $$s^*$$ sufficiently close to its limiting value in each case. The scheme for (ASP), by Theorem 4.1, with $$n_\text{op}$$ operations then converges as $${\mathcal{O}}(n_\text{op}^{-s})$$ for any $$s< 2\alpha/3$$. By Theorem 5.8, the scheme for (LR) converges as $${\mathcal{O}}(n_\text{op}^{-s})$$ for any $$s < \alpha/3$$. Thus, in this setting, the sparse Legendre expansion (ASP) turns out to be clearly more efficient. By Proposition 6.9, one arrives at the same conclusion for $$m>1$$. Note that the bound (5.46) for the computation of (LR) is essentially the best that can be expected for any such method that requires orthogonalization of the computed low-rank basis vectors in each step. Besides leading to complexity scaling quadratically in the ranks, this requirement also enforces uniform supports for each set of basis vectors, whereas no such restrictions play any role in the computation of (ASP). 7. Summary and conclusions In this work, we have studied the approximation of the solution map $$Y \ni y \mapsto u(y) \in V$$ in $$L^2({Y},V)$$ for parametric diffusion problems, where the parameter domain $$Y$$ is of high or infinite dimensionality. We have considered approximations based on sparse expansions in terms of tensor product Legendre polynomials in $$y$$, low-rank approximations based on separation of spatial and parametric variables and higher-order tensor decompositions using further hierarchical low-rank approximation among the parametric variables. The central aim is to investigate the performance of adaptive algorithms for each type of approximation that require as input only information on the parametric operator and right-hand side and that produce rigorous and computable a posteriori error bounds. These goals are achieved, in a unified manner for all considered types of approximations, by Algorithm 2.1. Such algorithms are necessarily based on the approximate evaluation of residuals. They are also intrusive in that they do not treat the underlying parametrized problem as a black box; however, we are not aware of any nonintrusive method with comparable properties. Although the resulting schemes do not use a priori information on the convergence of the respective approximations of the solution map, they still produce approximations of near-optimal complexity (e.g., with respect to the number of terms or tensor ranks). The question of also guaranteeing a near-optimal operation count for constructing these approximations is more delicate: this computational complexity depends on the costs of approximating the residual, and thus on the approximability properties of the operator. In the case of low-rank approximations, due to the required orthogonalizations, the number of operations also scales at least quadratically with respect to the arising tensor ranks. Especially keeping the latter point in mind, there is no single type of approximation that is most favourable in all of the representative model scenarios that we have considered. In the case of finitely many parameters of comparable influence, hierarchical tensor representations of $$u$$ turn out to be advantageous: We can show near-optimal computational complexity on certain natural approximability classes (as in Assumptions 3.2 and 3.3) for the adaptive scheme based on the method in Bachmayr & Dahmen (2015). The situation turns out to be different in the case of infinitely many parameters of decreasing influence. We have proven in Section 6, for a certain class of such problems, that the norms of Legendre coefficients of $$u$$ have the same asymptotic decay as the singular values in its Hilbert–Schmidt decomposition. In other words, the ranks in a corresponding low-rank approximation need to increase at the same rate as the number of terms in a sparse Legendre expansion as accuracy is increased. The numerical tests given in Fig. 3 indicate that this holds true also for substantially more general problems. As a consequence, even with the careful residual evaluation given in Section 5, which can preserve near-optimal ranks, due to the nonlinear scaling with respect to the ranks the computational complexity of finding low-rank approximations scales worse than a direct sparse expansion as considered in Section 4. This conclusion remains true also for hierarchical tensor decompositions involving the same separation between spatial and parametric variables. For both schemes in Sections 4 and 5, we have seen that whether the residual can be evaluated at a cost that matches the approximability of the solution depends on the type of parameter dependence in the diffusion coefficient. As the simple example given in Section 4.2 shows, in the case of diffusion coefficients expanded in terms of increasingly oscillatory functions of global support, due to insufficient operator compressibility, the complexity of the methods is in general worse than the approximability of the solution would allow. However, in the case of diffusion coefficients whose parametrization has a multilevel structure, we have demonstrated that one can come arbitrarily close to fully exploiting the approximability of $$u$$. Funding ERC AdG 338977 BREAD; Institut Universitaire de France; DFG SFB-Transregio 40; DFG Research Group 1779; the Excellence Initiative of the German Federal and State Governments (RWTH Aachen Distinguished Professorship); DARPA-BAA-15-13; and the Hausdorff Center of Mathematics, University of Bonn. Appendix A. Compressibility of parametric operators The approximate application of the operator $${\mathbf{A}}$$ in Algorithm 2.1 must involve, in particular, an approximate application of the spatial components $${\mathbf{A}}_j$$. Except for special cases, the infinite matrices $${\mathbf{A}}_j$$ are not sparse, but contain infinitely many nonzero entries in each column. Their approximation hinges on the compressibility of these operator representations as in Proposition 4.3. These are closely related to $$s^*$$-compressibility of $${\mathbf{A}}_j$$ as in (2.10), which here means that there exist matrices $${\mathbf{A}}_{j,n}$$ with $$\alpha_{j,n} 2^n$$ entries per row and column, and such that $$\label{compressAj}{\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n}}\rVert} \leq \beta_{j,n} 2^{-s n}, \quad \text{ for 0 < s < s^*},$$ (A.1) and where $${\boldsymbol{\alpha}}_j, {\boldsymbol{\beta}}_j \in \ell^{1}({\mathbb{N}}_0)$$. This is known to hold for each fixed $$j$$ when employing a piecewise polynomial wavelet-type Riesz basis $$\{\psi_\lambda\}_{\lambda \in \mathcal{S}}$$ for $$V$$ (see, e.g., Cohen et al., 2001; Stevenson, 2004). However, when insisting on the same compressibility bound $$s^*$$ for all $${\mathbf{A}}_j$$, the quantities $$\|{\boldsymbol{\alpha}}_j\|_{\ell^{1}}$$ and $$\|{\boldsymbol{\beta}}_j\|_{\ell^{1}}$$ can in general not be expected to both remain uniformly bounded in $$j$$ when the $${\theta}_j$$ become increasingly oscillatory. We consider operators $${\mathbf{A}}_j$$ arising from multilevel representations of the parameter of the form in Assumption 4.2. In the spatial variable, we use a wavelet Riesz basis $$\{ \psi_\lambda \}_{\lambda\in{\mathcal{S}}}$$, which yields compressible $${\mathbf{A}}_j$$. Their compressibility is governed by the modulus of the entries $$\langle {\theta}_j \nabla \psi_\lambda, \nabla \psi_{\lambda'}\rangle$$, where $$\langle\cdot,\cdot\rangle$$ denotes the $$L^2$$-inner product. Specifically, recall, e.g., from Cohen et al. (2001) that compression strategies for wavelet representations of an elliptic second-order operator with diffusion field $$c$$ are based on bounds of the type $$\label{reference}|\langle c\nabla \psi_\lambda,\nabla \psi_{\lambda'}\rangle|\lesssim \|c\|_{W^{b- m/2}(L^{\infty}(D))}2^{-{\lvert{|\lambda|-|\lambda'|}\rvert}b} ,$$ (A.2) where $$m$$ is the dimensionality of the spatial domain and where $$b> m/2$$ depends on the smoothness of the diffusion coefficient $$c$$ and of the wavelets $$\psi_\lambda$$. However, in our case the higher-order norms of $$c$$ on the right-hand side of (A.2) with $$c=\theta_j$$ depend on $$j$$, and the overall compression rate is also limited by the decay of the operator truncation error (4.3). In view of Proposition 4.3, the objective here is thus to have a compression rate for the individual components $${\mathbf{A}}_j$$ that is as high as possible, so that one approaches the limiting value imposed by (4.3). We now summarize the conditions on the multilevel parametric expansion functions and the spatial wavelet basis under which we will verify the requirements of Proposition 4.3. To simplify notation, let $$S_\lambda \,{:=}\, {\mathrm{supp}}\,\, \psi_\lambda$$. Note that the $$\nabla\psi_\lambda$$ then have vanishing moments of order $$k+1>\gamma$$. If $${\lvert{\lambda}\rvert},{\lvert{\mu}\rvert} \leq {\lvert{\lambda'}\rvert}$$, using (A.3) we obtain the standard estimate $$\label{vanmom}{\lvert{\langle \xi_\mu \nabla \psi_{\lambda}, \nabla \psi_{\lambda'} \rangle}\rvert}\le \inf_{P\in {\it{\Pi}}_{k+1}^{m}}\|\xi_\mu\nabla\psi_{\lambda} - P\|_{L^{2}(S_{\lambda'})} {\lVert{\psi_{\lambda'}}\rVert}_{L^{2}}\lesssim 2^{-|\lambda'|\gamma} {\lvert{ \xi_\mu \nabla\psi_{\lambda} }\rvert}_{H^{\gamma}(S_{\lambda'})}.$$ (A.4) Combining this with $${\lvert{ \xi_\mu \nabla\psi_{\lambda} }\rvert}_{H^{\gamma}(S_{\lambda'})} \lesssim 2^{- \frac{m}{2} {\lvert{{\lvert{\lambda}\rvert}-{\lvert{\lambda'}\rvert}}\rvert}} 2^{ \gamma \max\{{\lvert{\mu}\rvert},{\lvert{\lambda}\rvert}\}}$$, we obtain $$\label{entryestimate}{\lvert{\langle \xi_\mu \nabla \psi_{\lambda} , \nabla \psi_{\lambda'} \rangle}\rvert}\lesssim 2^{-(\gamma + \frac{m}{2}) {\lvert{{\lvert{\lambda}\rvert}-{\lvert{\lambda'}\rvert}}\rvert}} 2^{\gamma({\lvert{\mu}\rvert}-{\lvert{\lambda}\rvert})_+}.$$ (A.5) Note that the requirement (A.3) could be weakened along the lines of Stevenson (2004) to piecewise smoothness, in which case combinations of wavelets with overlapping singular supports need to be considered separately. Because this is not essential for our purposes, to keep the exposition accessible we do not consider this in further detail. The consequences of the estimate (A.5) depend on the relations between $${\lvert{\mu}\rvert}$$, $${\lvert{\lambda}\rvert}$$ and $${\lvert{\lambda'}\rvert}$$. We distinguish the three following cases: If $${\lvert{\mu}\rvert} \leq {\lvert{\lambda}\rvert}, {\lvert{\lambda'}\rvert}$$, we obtain an estimate analogous to the standard case (A.2), $${\lvert{\langle \xi_\mu \nabla \psi_\lambda, \nabla \psi_{\lambda'} \rangle}\rvert}\lesssim 2^{-(\gamma + \frac{m}{2}) {\lvert{{\lvert{\lambda}\rvert}-{\lvert{\lambda'}\rvert}}\rvert}}.$$ (A.6) If $${\lvert{\lambda}\rvert} \leq {\lvert{\mu}\rvert} < {\lvert{\lambda'}\rvert}$$, we obtain the modified estimate $$\label{sandwichcompression}{\lvert{\langle \xi_\mu \nabla \psi_\lambda, \nabla \psi_{\lambda'} \rangle}\rvert}\lesssim 2^{- \gamma ({\lvert{\lambda'}\rvert}-{\lvert{\mu}\rvert})} 2^{- \frac{m}{2} {\lvert{{\lvert{\lambda}\rvert}-{\lvert{\lambda'}\rvert}}\rvert} }.$$ (A.7) Note that for each fixed $$\mu$$ and fixed levels $${\lvert{\lambda}\rvert}, {\lvert{\lambda'}\rvert}$$, there exist in this case $${\mathcal{O}}(2^{m({\lvert{\lambda'}\rvert}-{\lvert{\mu}\rvert})})$$ entries that may be nonzero. Finally, if $${\lvert{\lambda}\rvert}, {\lvert{\lambda'}\rvert} \leq {\lvert{\mu}\rvert}$$, then for each $$\mu$$, there exist $$\mathcal{O}(|\mu|)$$ indices $$\lambda$$ such that the corresponding supports overlap, and in turn there exist $${\mathcal{O}}({\lvert{\mu}\rvert}^2)$$ pairs of $$\lambda,\lambda'$$ that may give a nonvanishing entry. These entries satisfy $$\label{slowcompression}{\lvert{ \langle \xi_\mu \nabla\psi_\lambda, \nabla\psi_{\lambda'}\rangle }\rvert}\lesssim 2^{-m{\lvert{\mu}\rvert}} 2^{\frac{m}{2}({\lvert{\lambda}\rvert}+{\lvert{\lambda'}\rvert})}.$$ (A.8) Note that we do not assume any vanishing moments for $$\xi_\mu$$. Hence, in general not much can be gained by discarding further entries in this third case. Our strategy for dealing with the increasingly oscillatory nature of $$\xi_\mu$$ as $${\lvert{\mu}\rvert}\to\infty$$ is to retain a common compression rate $$s^*$$ in (A.1) uniformly in $$\mu$$ without losing the decay induced by the factors $$c_\mu$$. To this end, we retain additional entries of the $${\mathbf{A}}_j$$ in the cases (A.7) and (A.8). This results in the $$j$$-dependent number of nonzero entries in each row and column of the compressed operators $${\mathbf{A}}_{j,n}$$, which is of order $${\mathcal{O}}((1+{\lvert{\mu_j}\rvert}^q) 2^n)$$. Let $$a_{\mu_j,\lambda,\lambda'}$$ denote the entries of $${\mathbf{A}}_{j}$$, i.e., $a_{\mu_j,\lambda,\lambda'} = c_{\mu_j} \langle \xi_{\mu_j} \nabla \psi_{\lambda},\nabla\psi_{\lambda'}\rangle.$ Relations (A.14), (A.15) show that the resulting compression rate is limited by the smoothness of the expansion functions $$\xi_\mu$$ and the spatial wavelets $$\psi_\lambda$$, as well as by the number of vanishing moments of the $$\psi_\lambda$$, expressed by the value $$\gamma$$. As Proposition 4.3 shows, with increasing $$\gamma$$ the rate of compressibility of the complete operator $${\mathbf{A}}$$ approaches the limiting value determined by the decay of its tail (4.3). Appendix B. Proofs of auxiliary results References Babuška I. , Nobile F. & Tempone R. ( 2010 ) A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Rev ., 52 , 317 – 355 . Google Scholar Crossref Search ADS Bachmayr M. & Cohen A. ( 2017 ) Kolmogorov widths and low-rank approximations of parametric elliptic PDEs. Math. Comp. , 86 , 701 – 724 . Google Scholar Crossref Search ADS Bachmayr M. , Cohen A. , Dũng D. & Schwab C. ( 2017 ) Fully discrete approximation of parametric and stochastic elliptic PDEs. SIAM J. Numer. Anal. , to appear. Bachmayr M. , Cohen A. & Migliorati G. ( 2017 ) Representations of Gaussian random fields and approximation of elliptic PDEs with lognormal coefficients. J. Fourier Anal. Appl. , to appear, https://doi.org/10.1007/s00041-017-9539-5 . Bachmayr M. , Cohen A. & Migliorati G. ( 2017 ) Sparse polynomial approximation of parametric elliptic PDEs. Part I: affine coefficients. ESAIM Math. Model. Numer. Anal. , 51 , 321 – 339 . Google Scholar Crossref Search ADS Bachmayr M. & Dahmen W. ( 2015 ) Adaptive near-optimal rank tensor approximation for high-dimensional operator equations. Found. Comput. Math. , 15 , 839 – 898 . Google Scholar Crossref Search ADS Bachmayr M. & Dahmen W. ( 2016 ) Adaptive low-rank methods: Problems on Sobolev spaces. SIAM J. Numer. Anal. , 54 , 744 – 796 . Google Scholar Crossref Search ADS Beck J. , Tempone R. , Nobile F. & Tamellini L. ( 2012 ) On the optimal polynomial approximation of stochastic PDEs by Galerkin and collocation methods. Math. Models Methods Appl. Sci. , 22 , 1250023 , 33 . Google Scholar Crossref Search ADS Charrier J. , Scheichl R. & Teckentrup A. ( 2013 ) Finite element error analysis of elliptic PDEs with random coefficients and its application to multilevel Monte Carlo methods. SIAM J. Numer. Anal. , 51 , 322 – 352 . Google Scholar Crossref Search ADS Chkifa A. , Cohen A. & Schwab C. ( 2014 ) High-dimensional adaptive sparse polynomial interpolation and applications to parametric PDEs. Found. Comput. Math. , 14 , 601 – 633 . Google Scholar Crossref Search ADS Cohen A. , Dahmen W. & DeVore R. ( 2001 ) Adaptive wavelet methods for elliptic operator equations: convergence rates. Math. Comp. , 70 , 27 – 75 . Google Scholar Crossref Search ADS Cohen A. , Dahmen W. & DeVore R. ( 2002 ) Adaptive wavelet methods II – beyond the elliptic case. Found. Comput. Math. , 2 , 203 – 245 . Google Scholar Crossref Search ADS Cohen A. & DeVore R. ( 2015 ) Approximation of high-dimensional parametric PDEs. Acta Numer. , 24 , 1 – 159 . Google Scholar Crossref Search ADS Cohen A. , DeVore R. & Schwab C. ( 2010 ) Convergence rates of best $$N$$-term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. , 10 , 615 – 646 . Google Scholar Crossref Search ADS Cohen A. , Devore R. & Schwab C. ( 2011 ) Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. (Singap). , 9 , 11 – 47 . Google Scholar Crossref Search ADS Donovan G. C. , Geronimo J. S. & Hardin D. P. ( 1999 ) Orthogonal polynomials and the construction of piecewise polynomial smooth wavelets. SIAM J. Math. Anal. , 30 , 1029 – 1056 . Google Scholar Crossref Search ADS Dũng D. ( 2015 ) Linear collective collocation and Galerkin methods for parametric and stochastic elliptic PDEs. Preprint, arXiv:1511.03377 . Eigel M. , Gittelson C. J. , Schwab C. & Zander E. ( 2014 ) Adaptive stochastic Galerkin FEM. Comput. Methods Appl. Mech. Engrg. , 270 , 247 – 269 . Google Scholar Crossref Search ADS Eigel M. , Gittelson C. J. , Schwab C. & Zander E. ( 2015 ) A convergent adaptive stochastic Galerkin finite element method with quasi-optimal spatial meshes. ESAIM Math. Model. Numer. Anal. , 49 , 1367 – 1398 . Google Scholar Crossref Search ADS Eigel M. , Pfeffer M. & Schneider R. ( 2017 ) Adaptive stochastic Galerkin FEM with hierarchical tensor representions. Numer. Math. , 136 , 765 – 803 . Google Scholar Crossref Search ADS Falcó A. & Hackbusch W. ( 2012 ) On minimal subspaces in tensor representations. Found. Comput. Math ., 12 , 765 – 803 . Google Scholar Crossref Search ADS Ghanem R. G. & Spanos P. D. ( 1990 ) Polynomial chaos in stochastic finite elements. J. Appl. Mech. , 57 , 197 – 202 . Google Scholar Crossref Search ADS Ghanem R. G. & Spanos P. D. ( 1991 ) Stochastic Finite Elements: A Spectral Approach , New York : Springer ( revised edition 2003, New York: Dover Publications ). Gittelson C. J. ( 2013 ) An adaptive stochastic galerkin method for random elliptic operators. Math. Comp. , 82 , 1515 – 1541 . Google Scholar Crossref Search ADS Gittelson C. J. ( 2014 ) Adaptive wavelet methods for elliptic partial differential equations with random operators. Numer. Math. , 126 , 471 – 513 . Google Scholar Crossref Search ADS Grasedyck L. ( 2010 ) Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. , 31 , 2029 – 2054 . Google Scholar Crossref Search ADS Grasedyck L. , Kressner D. & Tobler C. ( 2013 ) A literature survey of low-rank tensor approximation techniques. GAMM-Mitt. , 36 , 53 – 78 . Google Scholar Crossref Search ADS Griebel M. & Harbrecht H. ( 2014 ) Approximation of bi-variate functions: singular value decomposition versus sparse grids. IMA J. Numer. Anal. , 34 , 28 – 54 . Google Scholar Crossref Search ADS Hackbusch W. ( 1992 ) Elliptic Differential Equations: Theory and Numerical Treatment , Vol. 18 . Springer Series in Computational Mathematics . Berlin : Springer . Hackbusch W. ( 2012 ) Tensor Spaces and Numerical Tensor Calculus , Vol. 42 . Springer Series in Computational Mathematics . Heidelberg : Springer . Hackbusch W. & Kühn S. ( 2009 ) A new scheme for the tensor representation. J. Fourier Anal. Appl. , 15 , 706 – 722 . Google Scholar Crossref Search ADS Kahlbacher M. & Volkwein S. ( 2007 ) Galerkin proper orthogonal decomposition methods for parameter dependent elliptic systems. Discuss. Math. Differ. Incl. Control Optim. , 27 , 95 – 117 . Google Scholar Crossref Search ADS Khoromskij B. N. & Oseledets I. ( 2010 ) Quantics-TT collocation approximation of parameter-dependent and stochastic elliptic PDEs. Comput. Methods Appl. Math. , 10 , 376 – 394 . Google Scholar Crossref Search ADS Khoromskij B. N. & Schwab C. ( 2011 ) Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput. , 33 , 364 – 385 . Google Scholar Crossref Search ADS Kolda T. G. & Bader B. W. ( 2009 ) Tensor decompositions and applications. SIAM Rev. , 51 , 455 – 500 . Google Scholar Crossref Search ADS Kressner D. & Tobler C. ( 2011 ) Low-rank tensor Krylov subspace methods for parametrized linear systems. SIAM J. Matrix Anal. Appl. , 32 , 1288 – 1316 . Google Scholar Crossref Search ADS Lassila T. , Manzoni A. , Quarteroni A. & Rozza G. ( 2013 ) Generalized reduced basis methods and $$n$$-width estimates for the approximation of the solution manifold of parametric PDEs. In: Analysis and Numerics of Partial Differential Equations ( Brezzi F. Franzone P. C. Gianazza U. & Gilardi G. eds). Springer INdAM Series , vol. 4 . Milano : Springer . pp. 307 – 329 . Lathauwer L. D. , Moor B. D. & Vandewalle J. ( 2000 ) A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. , 21 , 1253 – 1278 . Google Scholar Crossref Search ADS Le Maître O. P. & Knio O. M. ( 2010 ) Spectral Methods for Uncertainty Quantification . New York : Springer . Matthies H. G. & Zander E. ( 2012 ) Solving stochastic systems with low-rank tensor compression. Linear Algebra Appl. , 436 , 3819 – 3838 . Google Scholar Crossref Search ADS Oseledets I. & Tyrtyshnikov E. ( 2009 ) Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. , 31 , 3744 – 3759 . Google Scholar Crossref Search ADS Oseledets I. V. ( 2011 ) Tensor-train decomposition. SIAM J. Sci. Comput. , 33 , 2295 – 2317 . Google Scholar Crossref Search ADS Rozza G. ( 2014 ) Fundamentals of reduced basis method for problems governed by parametrized PDEs and applications. In: Separated Representations and PGD-Based Model Reduction ( Chinesta F. & Ladevèze P. eds). CISM International Centre for Mechanical Sciences , vol. 554 . Vienna : Springer , pp. 153 – 227 . Rozza G. , Huynh D. B. P. & Patera A. T. ( 2008 ) Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations: application to transport and continuum mechanics. Arch. Comput. Methods Eng. , 15 , 229 – 275 . Google Scholar Crossref Search ADS Stevenson R. ( 2004 ) On the compressibility of operators in wavelet coordinates. SIAM J. Math. Anal. , 35 , 1110 – 1132 . Google Scholar Crossref Search ADS Teckentrup A. L. , Scheichl R. , Giles M. B. & Ullmann E. ( 2013 ) Further analysis of multilevel Monte Carlo methods for elliptic PDEs with random coefficient. Numer. Math. , 125 , 569 – 600 . Google Scholar Crossref Search ADS Tucker L. R. ( 1964 ) The extension of factor analysis to three-dimensional matrices. In: Contributions to mathematical psychology ( Frederiksen N. and Gulliksen H. eds). New York : Holt, Rinehart & Winston , pp. 109 – 127 . Tucker L. R. ( 1966 ) Some mathematical notes on three-mode factor analysis. Psychometrika. , 31 , 279 – 311 . Google Scholar Crossref Search ADS PubMed Xiu D. ( 2010 ) Numerical Methods for Stochastic Computations. A Spectral Method Approach. Princeton, NJ : Princeton University Press . Footnotes 1As usual, to warrant a linear scaling, instead of exact ordering it suffices to perform approximate sorting into buckets according to some fixed exponential decay. © The authors 2017. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Numerical Analysis Oxford University Press

# Parametric PDEs: sparse or low-rank approximations?

, Volume 38 (4) – Oct 16, 2018
48 pages

/lp/ou_press/parametric-pdes-sparse-or-low-rank-approximations-QkKHkQDK6s
Publisher
Oxford University Press
Abstract We consider adaptive approximations of the parameter-to-solution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial expansions, general low-rank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their near-optimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity is a new operator compression result in the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized low-rank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem. 1. Introduction Complex design, optimization or uncertainty quantification tasks based on parameter dependent families of partial differential equations (PDEs) arise in virtually all branches of science and engineering. Typical scenarios are models whose physical properties—such as diffusivity, transport velocity or domain geometry—are described by a finite number of real parameter values. In certain instances, one may even encounter infinitely many parameters of decreasing influence. This occurs, for instance, in the case of a random stochastic diffusion field represented by an infinite expansion in a given basis. The development and analysis of numerical strategies for capturing the dependence of the PDE on the parameters has been the subject of intensive research efforts in recent years. 1.1 Problem formulation The problems that are addressed in this article have the following general form. Let $$V$$ be a separable Hilbert space. We consider a parametric operator $$A(y)\colon V\to V'$$ of the form $$\label{Arep}A(y) := A_0 + \sum_{j\in{\mathcal{I}}} y_j A_j,\quad y\in{Y} := [-1,1]^{\mathcal{I}}\!,$$ (1.1) where $${\mathcal{I}}=\{1,\dots,d\}$$ or $${\mathcal{I}}= {\mathbb{N}}$$ in the finite or infinite dimensional case, respectively. In the infinite dimensional case, we require that the above series converges in $${\cal L}(V,V')$$ for any $$y\in {Y}$$. We assume uniform boundedness and ellipticity of $$A(y)$$ over the parameter domain, that is, $$\label{ellip}\langle A(y)v,w\rangle \leq R \|v\|_V\|w\|_V \quad {\rm and} \quad\langle A(y)v,v\rangle \geq r \|v\|_V^2, \quad v,w\in V, \; y\in {Y},$$ (1.2) for some $$0<r\leq R<\infty$$, which implies in particular that $$A(y)$$ is boundedly invertible uniformly in $$y\in {Y}$$, with $$\|A(y)\|_{{\cal L}(V',V)}\leq r^{-1}, \quad y\in {Y}.$$ (1.3) We also consider parametric data $$f\colon {Y} \to V'$$, and for each $$y\in Y$$, we define $$u(y)\in V$$ the solution to the equation $$\label{paramgeneral}A(y)\,u(y) = f(y).$$ (1.4) A guiding example is provided by affinely parametrized diffusion problems of the form $$\label{paramdiffusion-0}A(y) u := -{\mathrm{div}}\bigl( a(y) \nabla u \bigr) = f , \quad a(y) := \bar a + \sum_{j \in {\mathcal{I}}} y_j {\theta}_j ,$$ (1.5) with homogeneous Dirichlet boundary conditions, posed in the weak sense on a spatial domain $$D \subset {\mathbb{R}}^m$$. In this particular case of frequent interest, the right-hand side $$f\in V'$$ is independent of $$y$$. The validity of (1.2) is then usually ensured by the uniform ellipticity assumption $$\label{uea}\sum_{j\geq 1} {\lvert{{\theta}_j(x)}\rvert} \leq \bar a(x) - \underline{\alpha}, \quad x\in D,$$ (1.6) for some $$\underline{\alpha}>0$$. We then have $$V=H^{1}_0(D)$$, and the corresponding operators $$A_j \colon V \to V'$$ for $$j \in \{0\}\cup {\mathcal{I}}$$ are defined by \begin{equation*}\langle A_0 u, v \rangle := \int_D \bar{a} \nabla u\cdot\nabla v \,{\rm{d}}x ,\qquad \langle A_j u, v \rangle := \int_D {\theta}_j \nabla u\cdot\nabla v \,{\rm{d}}x ,\quad i \in{\mathcal{I}}, \end{equation*} for $$u,v\in V$$. Thus, $$V$$ is typically a function space defined over some physical domain $$D\subset {\mathbb{R}}^m$$, with $$m=1,2,3$$, for example, the Sobolev space $$H^1_0(D)$$ in the above case of second-order elliptic equations with homogeneous Dirichlet boundary conditions. Therefore, the solution may either be viewed as the Hilbert space-valued map $$y\mapsto u(y), \label{solutionmap}$$ (1.7) which acts from $${Y}$$ to $$V$$ or as the scalar-valued map $$(x,y)\mapsto u(x,y):=u(y)(x),$$ (1.8) where $$x\in D$$ and $$y\in {Y}$$ are referred to as the spatial and parametric variables. Approximating such solution maps amounts to approximating functions of a large or even infinite number of variables. In applications, one is often interested in specific functionals of the solution. Here, we focus on the basic question of approximating the entire solution map in an appropriate norm. The guiding questions, to be made precise below, are the following: What are the most suitable approximations to cope with the high dimensionality in problems of the form (1.4), and what features of problems (1.4) favour certain approaches over others? Moreover, at what numerical cost can one find these approximations, and how do these costs depend on particular features of the given problem? To address the latter question, for each setting, we construct adaptive computational methods that exhibit near-optimal complexity, in a sense to be made precise below. 1.2 Sparse and low-rank approximability Before addressing any concrete numerical schemes, we discuss the basic concepts of approximations for the solution map $$u$$ in (1.7). We focus on the mean squared error $$\|u-\tilde u\|_{L^2({Y},V)}$$ for an approximation $$\tilde u$$, where $\|v\|_{L^2({Y},V)}^2:=\int_{Y} \|v(y)\|_V^2 {\rm \,{d}}\mu(y),$ for a given probability measure $$\mu$$ over $${Y}$$. In what follows, we assume that $$\mu$$ is the uniform probability measure on $${Y}$$. The results carry over, however, to other product measures on $${Y}$$. The following types of approximation make essential use of the tensor product structure of the Bochner space $$L^2({Y},V)=V\otimes L^2({Y})$$, where $$L^2({Y})=L^2({Y},\mu)$$. Sparse polynomial expansions A first approach to approximating $$y\mapsto u(y)$$ is to employ an a priori chosen basis $$\{u_1^{{\rm y}}, \dots, u_n^{{\rm y}}\}\subset L^2(Y)$$ and compute the $$u_i^{{\rm x}}\in V$$ as the corresponding coefficients of this approximation. One prominent example of this approach is the orthogonal polynomial expansion method (see, e.g., Ghanem & Spanos, 1990, 1991; Le Maître & Knio, 2010; Xiu, 2010). In this case, the parametric functions $$u_i^{{\rm y}}$$ are picked from the set of tensorized Legendre polynomials $$L_\nu(y)=\prod_{j\geq 1} L_{\nu_j}(y_j), \quad \nu=(\nu_j)_{j\geq 1}, \label{tensorleg}$$ (1.9) with $$(L_k)_{k\geq 1}$$ the univariate Legendre polynomial sequence normalized in $$L^2([-1,1],\frac {\,{\rm d}t} {2})$$. The functions $$(L_\nu)_{\nu\in{\mathcal{F}}}$$ are an orthonormal basis of $$L^2({Y})$$, where $${\mathcal{F}}$$ is $${\mathbb{N}}_0^d$$ in the case $${\mathcal{I}}=\{1,\dots,d\}$$ or the set of finitely supported sequences of non-negative integers in the case $${\mathcal{I}}={\mathbb{N}}$$, that is, $${\mathcal{F}} := \{ \nu\in{\mathbb{N}}_0^{{\mathbb{N}}} \colon \#\,{\mathrm{supp}}\,\, \nu < \infty \}.$$ (1.10) One thus has $$\label{Legendre} u(y)=\sum_{\nu\in{\mathcal{F}}} u_\nu L_\nu(y), \quad u_\nu= \int_{Y} u(y)L_\nu(y) {\rm \,{d}}\mu(y).$$ (1.11) Then, one natural choice is the best $$n$$-term approximation $$u_n$$ obtained by restricting the above expansion to the set $${\it{\Lambda}}^{\rm y}_{n}\subset {\mathcal{F}}$$ of indices $$\nu$$ corresponding to the $$n$$ largest $$\|u_\nu\|_V$$, because this set minimizes the error $$\|u-u_n\|_{L^2({Y},V)}$$ among all possible choices of $$n$$-term truncations. This strategy for generating sparse polynomial approximations in the context of parametric PDEs was first introduced and analysed in Cohen et al. (2010, 2011). In practice, the set $${\it{\Lambda}}^{\rm y}_n$$ is not accessible, but provides a benchmark for the performance of algorithms. This representation complexity, however, does not yet determine the resulting computational complexity, because the coefficients $$u_\nu$$ in (1.11) in turn need to be approximated as well. For instance, one may choose a fixed basis $$\{ \psi_\lambda\}_{\lambda\in{\mathcal{S}}}$$ of $$V$$ and expand $u_\nu = \sum_{\lambda\in {\mathcal{S}}} {\bf u}_{\lambda,\nu} \psi_\lambda$ in (1.11), where $${\bf u}_{\lambda,\nu}\in{\mathbb{R}}$$. The simplest strategy is to use the same discretization for all $$u_\nu$$ by selecting a finite $${\it{\Lambda}}^{\rm x}$$, which yields the approximation $u \approx \sum_{\nu \in {\it{\Lambda}}_{n}^{\rm y}} \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}} {\bf u}_{\lambda,\nu} \psi_\lambda \right)\otimes L_\nu.$ Using instead an independently adapted spatial discretization for each $$u_\nu$$ corresponds to adaptive sparse polynomial approximations of the form $$\label{fullnterm}u \approx \sum_{(\lambda, \nu) \in {\it{\Lambda}}} {\bf u}_{\lambda,\nu}\, \psi_\lambda\otimes L_\nu,$$ (ASP) with $${\it{\Lambda}} \subset {\mathcal{S}} \times {\mathcal{F}}$$. It is natural to quantify the complexity of such an approximation by the number of activated degrees of freedom $$\#{\it{\Lambda}}$$. Here, one can again ask for best $$N$$-term approximations, now with respect to the fixed basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$, obtained by minimizing the error over all $${\it{\Lambda}}$$ with $$\#{\it{\Lambda}} = N$$. This now results in a fully discrete approximation. Low-rank approximation More generally, one may consider approximations of the form $$\label{basiclrapprox}u \approx u_n :=\sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k,$$ (1.12) where $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ are functions of the spatial and parametric variables, respectively. This contains (1.11) as a special case, but we also now allow $$u^{\rm y}_k \in L^2(Y)$$ to be arbitrary functions that are not given a priori, but adapted to the given problem. The shortest expansion of the form (1.12) that achieves a prescribed error in $$L^2({Y},V)$$ is given by truncation of the Hilbert–Schmidt decomposition of $$u$$ interpreted as the operator $$\label{Tu}T_u: v \to \int_Y u(x,y)v(y){\rm \,{d}}\mu(y),$$ (1.13) acting from $$L^2({Y})$$ to $$V$$. In this context, we define $${\mathrm{rank}}(u)$$ as the rank of the operator $$T_u$$, so that in particular $$u_n$$ with a representation by $$n$$ separable terms as in (1.12) has $${\mathrm{rank}}(u_n)\leq n$$. The functions $$u_1^{\rm x}, \dots, u_n^{\rm x}$$ and $$u_1^{\rm y}, \dots, u_n^{\rm y}$$ are given by the left and right singular functions, respectively, which yield the optimal rank-$$n$$ approximation of $$u$$ in $$L^2({Y},V)$$. This particular system of basis functions is a natural benchmark, as it minimizes the rank $$n=n(\varepsilon)$$ required to ensure a mean squared accuracy $$\varepsilon$$. However, it is not obvious how to compute sufficiently good approximations of these basis functions at affordable cost, a point to be taken up again later. The methods considered in this article are based on computing approximations of both $$u_k^{\rm x}$$ and $$u_k^{\rm y}$$. A low-rank approximation trying to approximately realizing a truncated Hilbert–Schmidt decomposition would be a first example for this category aiming at meeting the aforementioned benchmark. In this case, the error caused by truncation should ideally be balanced against the error in approximating the unknown basis functions $$u^{\rm x}_k, u^{\rm y}_k$$. Note that there exist alternative approaches for deriving computable expansions of the form (1.12), where only the functions $$u_1^{{\rm x}}, \dots, u_n^{{\rm x}}$$ and their span $$V_n$$ are constructed, which we comment on in Section 1.4. To obtain numerically realizable approximations, we may again use bases of $$V$$ and $$L^2(Y)$$ as in (ASP) and consider expansions for $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ to arrive at fully discrete low-rank approximations of the form $$\label{fulllr}u \approx \sum_{k=1}^n \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}_k} {\bf u}^{\rm x}_{k,\lambda} \psi_\lambda \right)\otimes \left(\sum_{\nu\in {\it{\Lambda}}^{\rm y}_k} {\bf u}^{\rm y}_{k,\nu} L_\nu \right)\! ,$$ (LR) with $${\it{\Lambda}}^{\rm x}_k \subset {\mathcal{S}}$$, $${\it{\Lambda}}^{\rm y}_k\subset {\mathcal{F}}$$, $$k=1,\ldots,n$$. Hierarchical tensor decompositions One may as well go beyond the Hilbert–Schmidt decomposition (1.12) and consider higher-order low-rank tensor representations that correspond to further decompositions of the factors $$u^{\rm y}_k$$ in (1.12). For simplicity, at this point, let us consider this in the finite-dimensional case $$d<\infty$$, possibly after truncating the expansion (1.1) for $$A(y)$$. Introducing an additional tensor decomposition of the factors $${\bf u}^{\rm y}_k$$, we obtain the general approximations in subspace-based tensor formats, $$\label{general}u_n = \sum_{k_{\rm x}=1}^{r_{\rm x}} u_{k_{\rm x}}^{\rm x} \otimes \left(\sum_{k_1=1}^{r_1}\cdots\sum_{k_d=1}^{r_d}{\bf a}_{k_{\rm x},k_1,\ldots,k_d}\bigotimes_{j=1}^d u^{{\rm y},j}_{k_j} \right)\!,$$ (STF) where each $$u_{k_j}^{{\rm y},j}$$ is a function of the individual variable $$y_j$$. The minimal $$r_j$$ such that $$u_n$$ can be represented in the form (STF) are called multilinear ranks of $$u_n$$. We confine our discussion to hierarchical tensor representations, with the tensor train format as a special case (see, e.g., Oseledets & Tyrtyshnikov, 2009; Grasedyck, 2010; Hackbusch, 2012), where the high-order core tensor $${{\bf a}}=({\bf a}_{k_{\rm x},k_1,\ldots,k_d})_{k_{\rm x},k_1,\ldots,k_d}$$ is further decomposed in terms of lower-order tensors, based on matricizations of $${\bf a}$$. For instance, if $$\label{ranks} \tilde r_i= {\mathrm{rank}}\big({\bf a}_{(k_0,\ldots, k_i),(k_{i+1},\ldots k_d)}\big),$$ (1.14) one has a factorized representation of the form $$\label{htcore}{{\bf a}}_{k_{\rm x},k_1,\ldots,k_d} = \sum_{\ell_1=1}^{\tilde r_1}{\mathbf{M}}^{(1)}_{k_{\rm x},\ell_1} \sum_{\ell_2=1}^{\tilde r_2} {\mathbf{M}}^{(2)}_{\ell_1,k_1,\ell_2} \cdots \sum_{\ell_{d-1}=1}^{\tilde r_{d-1}} {\mathbf{M}}^{(d-1)}_{\ell_{d-2},k_{d-2},\ell_{d-1}} {\mathbf{M}}^{(d)}_{\ell_{d-1},k_{d-1},k_d}$$ (1.15) in terms of the tensors $${\mathbf{M}}^{(i)}$$, $$i=1,\ldots,d$$, of order at most three, and only these low-order tensors need to be stored and manipulated. The representation (STF) contains (ASP) and (LR) as special cases. For instance, to recover a sparse polynomial expansion (ASP), let $$\nu(k_{\rm x})$$, $$k_{\rm x}=1,\ldots,r_{\rm x}$$, be an enumeration of elements of $${\mathbb{N}}^d_0$$, and choose $${\bf a}_{k_{\rm x},k_1,\ldots,k_d} = \delta_{\nu(k_{\rm x}),(k_1,\ldots,k_d)}$$, $$u^{{\rm y},j}_{k_j}= L_{k_j}$$. With $$r_{\rm x} = r_1=\ldots=r_d$$ and diagonal core tensor $${\bf a}$$ having nonzero entries $${\bf a}_{k,k,\ldots,k}=1$$ for $$k = 1,\ldots, r_{\rm x}$$, one obtains representations (LR). 1.3 Guiding questions In the different types of approximation outlined above, the degrees of freedom enter in varying degrees of nonlinearity. More strongly nonlinear approximations (STF) with hierarchical decomposition (1.15) can potentially yield more strongly compressed representations, in the sense that the number of degrees of freedom $$n_{\text{dof}}(\varepsilon)$$ required for a target accuracy $$\varepsilon$$ in $$L^2(Y,V)$$ scales more favourably. Handling this stronger compression in the computation of such representations, however, leads to additional difficulties, and the number of required operations $$n_{\rm op}(\varepsilon)$$ may in fact scale less favourably than $$n_{\text{dof}}(\varepsilon)$$. Here, we aim for algorithms which, for each of the above types of approximation, are guaranteed to achieve any prescribed accuracy $$\varepsilon$$ and which are universal. This means that they do not require a priori knowledge on the approximability of the solution (e.g., on the decay of coefficients), but adjust to such approximability automatically. This goes hand in hand with a mechanism for obtaining a posteriori error bounds, making use only of the given data. This leads us to our first guiding question: (I) For a given parametric problem and approximation format (ASP), (LR) or (STF), can one contrive a universal numerical scheme that can achieve any given target accuracy $$\varepsilon$$, with approximate solutions close to the minimum required representation complexity $$n_{\rm dof} (\varepsilon)$$, and can $$n_{\rm op}$$ be related to $$\varepsilon$$, and hence to $$n_{\rm dof}(\varepsilon)$$? The minimum required representation complexity can be expressed in terms of the intrinsic approximability properties of the parametrized solutions $$u$$ in each of the formats. The corresponding required number of operations also depends on the problem data that are used in the solution process. We construct algorithms, based on a common generic strategy for (ASP), (LR) and (STF), which are near-optimal in this regard. With such algorithms at hand, a natural further question is the following. (II) Which of the approximation types (ASP), (LR) or (STF) is best suited for a given parametric problem, in the sense of leading to the smallest growth of $$n_{\rm op}(\varepsilon)$$ as $$\varepsilon\to 0$$? This amounts to asking for which parametric problems the investment into approximations of higher structural nonlinearity pays off, or conversely, for which problems possible gains in approximation efficiency are offset by more demanding computations. We address this point by analyses of the approximability of model problems, complemented by numerical experiments, with conclusions depending on the particular problem type. For problems with finitely many parameters that are each of comparable influence, hierarchical tensor representations of the form (STF) with (1.15) turn out to be clearly advantageous. In the case of an anisotropic dependence on infinitely many parameters, for representative model problems, we demonstrate that (ASP) can in general yield faster convergence than (LR) or (STF). The particular structure of such infinite parameter expansions also turns out to have a major influence on the efficiency of the adaptive schemes. 1.4 Relation to previous work There are a variety of results on the convergence of sparse polynomial expansions (1.11) (see, e.g., Cohen et al., 2010, 2011; Bachmayr et al., 2017). Furthermore, some estimates are available that include multilevel spatial discretizations and, hence, provide upper bounds for the error of best $$n$$-term approximation (ASP) (see, e.g., Cohen et al., 2011; Dũng, 2015). Concerning our question (II), there are only few specialized results comparing the different approximation formats. In the case of general bivariate functions, a systematic comparison between sparse grids and low-rank approximation is discussed in Griebel & Harbrecht (2014), showing in particular that for Sobolev classes the latter does not bring any improvement. In the case of high-dimensional functions associated with parametric PDEs, possible gains by low-rank approximations have been identified in Lassila et al. (2013) and Bachmayr & Cohen (2017) by exploiting the particular structure of the problem, all concerning the case of finitely many parameters. There are various approaches for generating sparse polynomial expansions, for instance, based on collocation (Babuška et al., 2010; Beck et al., 2012) or adaptive Taylor expansion (Chkifa et al., 2014). Note that these strategies do not currently yield a posteriori error bounds for the computed solutions, and their performance is thus described by a priori estimates that may not be sharp. The adaptive methods proposed in Eigel et al. (2014, 2015), based on finite element discretization for the spatial variable, yields a posteriori error bounds for the full approximations. However, the complexity bounds proved in Eigel et al. (2015) are given only in terms of the resulting finite element meshes. Adaptive schemes using wavelet-based spatial discretizations, which yield approximations of the form (ASP), have been studied by Gittelson (2013, 2014). In this case, bounds for the complete computational complexity are proved which, however, do not fully comply with the approximability properties of the solution. Reduced basis and proper orthogonal decomposition (POD) methods (Kahlbacher & Volkwein, 2007; Rozza et al., 2008; Lassila et al., 2013; Rozza, 2014) correspond to expansions of the form (1.12), where only the spatial basis elements $$u^{\rm x}_k$$ spanning $$V_n$$ are explicitly computed in an offline stage. Then, in an online stage, for any given $$y\in {Y}$$, the approximate solution $$u_r(y)$$ is defined as the Galerkin projection of $$u(y)$$ on the space $$V_n$$. For known variants of these methods, accuracy guarantees in the respective norms (where reduced basis methods usually aim at the error in $$L^\infty$$-norm $$\|v\|_{L^\infty({Y},V)} :=\sup_{y\in {Y}} \|v(y)\|_V$$) require a sufficiently dense sampling of the parameter domain. This becomes prohibitive for large $$d$$, and only one obtains a posteriori bounds for the resulting $$V$$-error in each given $$y\in{Y}$$. In methods based on higher-order tensor representations, instead of sampling in the parameter domain, one also approximates $$u^{\rm y}_k$$ as in (STF), at the price of additional approximability requirements as in (1.15). A variety of schemes have been proposed that operate on fixed discretizations (Khoromskij & Oseledets, 2010; Khoromskij & Schwab, 2011; Kressner & Tobler, 2011; Matthies & Zander, 2012), which do not yield information on the discretization error. Based on Eigel et al. (2014), an adaptive scheme for hierarchical tensor approximation is proposed in Eigel et al. (2017). It provides rigorous a posteriori bounds for the approximation error, but is not proven to converge. 1.5 Novelty of the article and outline Question (I) is addressed in Sections 2–5. A generic algorithm is described in Section 2 based on the work in Bachmayr & Dahmen (2015), which is guaranteed to converge without any a priori assumptions on the solution. Furthermore, it yields rigorous a posteriori error bounds, using only information on the problem data. Suitable specifications cover all the aforementioned types of approximations (ASP), (LR) and (STF). The scheme is formulated in a general sequence space framework, using a discretization of the space $$L^2({Y},V)$$ through a basis with elements of the form $$\psi_\lambda\otimes L_\nu$$. Here, $$\{\psi_\lambda\}_{\mu\in {\mathcal{S}}}$$ is a given Riesz basis of $$V$$ (e.g., a wavelet basis in the case where $$V$$ is a Sobolev space) and $$\{L_\nu\}_{\nu \in{\mathcal{F}}}$$ is the previously described multivariate Legendre basis. The algorithm performs an iteration in the sequence space $$\ell^2({\mathcal{S}}\times {\mathcal{F}})$$. It involves at each step specific routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ aiming at controlling the ranks of the current approximation as well as the number of degrees of freedom in each of its factors, respectively. We then describe realizations of this generic algorithm corresponding to two distinct settings. In Section 3, we apply the algorithm for the generation of approximations (STF) in the setting of finitely many parametric variables. In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on a truncation of a hierarchical singular value decomposition of the coefficient tensor. We analyse the performance of the algorithms for classes described by the decay of the corresponding singular values and joint sparsity of the corresponding singular vectors. Sections 4 and 5 are devoted to the case of anisotropic dependence on infinitely many parameters in the diffusion problem (1.5). In Section 4, we analyse a specialized version of Algorithm 2.1 producing $$n$$-term sparse Legendre expansions, see (ASP). In this version, the routine $${\small{\text{RECOMPRESS}}}$$ is simply the identity, and hence, Algorithm 2.1 agrees with the adaptive solver developed and analysed in Cohen et al. (2002). In Section 5, we consider, in the same setting as in Section 4, a solver for approximations (LR). In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on standard SVD truncation. The corresponding notions of approximability are analogous to those arising in Section 3. A key ingredient in Sections 4 and 5 is the adaptive approximation of the operator based on matrix compression results in Appendix A. Here, we obtain new estimates for wavelet-type multilevel expansions of the parametrized coefficients that are more favourable than what is known for Karhunen–Loève-type expansions. Our further algorithmic developments also require substantially weaker assumptions on the $$A_j$$ in (1.1) than the methods in Eigel et al. (2014, 2017), which require summability of $$({\lVert{A_j}\rVert})_{j\geq 1}$$. By the new operator compression results, we establish, in particular, computational complexity estimates for (ASP), which significantly improve on those of similar schemes in Gittelson (2013, 2014). Based on these complexity estimates, Question (II) is then addressed in Section 6. Although the presented algorithms are guaranteed to converge, the corresponding computational cost can only be quantified in terms of approximability properties of solutions $$u$$. In Section 6, we study the corresponding properties, which are different for each realization of the scheme, in representative examples of parametric problems of the form (1.5). In particular, for a certain class of such problems, we prove that the best $$n$$-term Legendre approximation is already asymptotically near-optimal among all rank-$$n$$ approximations. For other examples, we prove that optimized low-rank approximations can achieve significantly better complexity than best $$n$$-term Legendre approximations. This is illustrated further by numerical tests, demonstrating that these observations also hold for more involved model problems. 2. A generic algorithm In this section, we follow the approach developed in Bachmayr & Dahmen (2015) by first reformulating the general equation (1.4) in a sequence space, and then introducing a generic resolution algorithm based on this equivalent formulation. We first notice that (1.4) may also be written as $$Au=f, \label{paramgeneral1}$$ (2.1) where $$A$$ is elliptic and boundedly invertible from $$L^2({Y},V)$$ to $$L^2({Y},V')$$ and can be defined in a weak sense by $$\langle Au,v\rangle :=\int_{{Y}}\langle A(y)u(y),v(y)\rangle {\rm \,{d}}\mu(y), \quad u,v\in L^2({Y},V).$$ (2.2) We assume that $$f\in L^2({Y},V')$$, so that there exists a unique solution $$u\in L^2({Y},V)$$. Given a Riesz basis $$\{ \psi_\lambda \}_{\lambda \in {\mathcal{S}}}$$ of $$V$$, we tensorize it with the orthonormal basis $$\{L_\nu\}_{\nu\in {\mathcal{F}}}$$ of $$L^2({Y})$$. The resulting system $$\{\psi_\lambda \otimes L_\nu\}_{(\lambda,\nu)\in {\mathcal{S}} \times {\mathcal{F}}}$$ is a Riesz basis of $$L_2({Y},V)$$, which we now use to discretize (2.1). For this purpose, we define the matrices $${\mathbf{A}}_j := \bigl( \langle A_j \psi_{\lambda'}, \psi_{\lambda}\rangle \bigr)_{\lambda,\lambda' \in {\mathcal{S}}}\quad{\rm and} \quad {\mathbf{M}}_j = \left( \int_{{Y}} y_j L_\nu(y) L_{\nu'}(y) \,{\rm{d}}\mu(y) \right)_{\nu,\nu'\in{\mathcal{F}}},$$ (2.3) where $${\mathbf{M}}_0$$ is set to be the identity on $$\ell^2({\mathcal{F}})$$, and the right-hand side column vector $${\mathbf{f}} := \big(\langle f, \psi_\lambda\otimes L_\nu\rangle\big)_{(\lambda,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}.$$ (2.4) We thus obtain an equivalent problem $$\label{equiv}{\mathbf{A}} {\bf u} = {\mathbf{f}}$$ (2.5) on $$\ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, where $$\label{Arep2}{\mathbf{A}} := \sum_{j\geq 0} {\mathbf{A}}_j \otimes {\mathbf{M}}_j$$ (2.6) and $${\bf u}=\big(u_{\lambda,\nu}\big)_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$ is the coordinate vector of $$u$$ in the basis $$\{\psi_\mu \otimes L_\nu\}_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$. Regarding $$\nu\in{\mathcal{F}}$$ as the column index of the infinite matrix $${\bf u}=({\bf u}_{\mu,\nu})_{\mu \in {\mathcal{S}},\nu\in{\mathcal{F}}}$$, we denote by $${\bf u}_\nu$$ the columns of $${\bf u}$$, which are precisely the basis representations of the Legendre coefficients $$u_\nu\in V$$. In what follows, we always denote by $${\lVert{\cdot}\rVert}$$ the $$\ell^{2}$$-norm on the respective index set that could be $${\mathcal{S}}$$, $${\mathcal{F}}$$ or $${\mathcal{S}}\times{\mathcal{F}}$$ or the corresponding operator norm when this is clear from the context. Since $$\{\psi_\mu\}_{\mu\in {\mathcal{S}}}$$ is a Riesz basis for $$V$$ we have $$\|u_\nu\|_{V}\sim \|{\bf u}_\nu\|$$ uniformly in $$\nu\in{\mathcal{F}}$$, which together with boundedness and ellipticity of $$A$$ implies that $${\mathbf{A}}$$ is bounded and elliptic on $$\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$, and that we have $$\label{normequiv2}\|{\bf u}\|\sim \|{\mathbf{A}}{\bf u}\|\sim \|Au\|_{L^2({Y},V')}\sim \|u\|_{L^2({Y},V)}$$ (2.7) with uniform constants. On account of (2.7), solving (2.5) approximately up to some target accuracy is equivalent to solving (2.5) in $$\ell^{2}$$ to essentially the same accuracy. As a further consequence, one can find a fixed positive $$\omega$$ such that $${\lVert{\mathbf{I} - \omega {\mathbf{A}}}\rVert}\le \rho <1$$, ensuring that a simple Richardson iteration converges with a fixed error reduction rate per step. This serves as the conceptual starting point for the adaptive low-rank approximation scheme introduced in Bachmayr & Dahmen (2015) as given in Algorithm 2.1. This basic algorithmic template can be used to produce various types of sparse and low-rank approximations, with appropriate choices of the subroutines $${\small{\text{APPLY}}}$$, $${\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$. The procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ are independent of the considered $${\mathbf{A}}$$ and $${\mathbf{f}}$$, and satisfy $$\label{craccuracy}{\lVert{{\small{\text{COARSEN}}}({\bf v} ; \eta ) - {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RECOMPRESS}}}({\bf v};\eta) - {\bf v}}\rVert} \leq \eta,$$ (2.8) for any $$\eta \geq 0$$ and any compactly supported $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$. Here, $${\small{\text{COARSEN}}}$$ is intended to reduce the support of the sequence $${\bf v}$$, whereas $${\small{\text{RECOMPRESS}}}$$ reduces the rank of $${\bf v}$$ in a low-rank tensor representation. The particular realizations of these routines depend on the dimensionality of the problem and on the type of approximation. We shall use the constructions given in Bachmayr & Dahmen (2015). In the case of the sparse approximations considered in Section 4, $${\small{\text{RECOMPRESS}}}$$ is chosen as the identity, and Algorithm 2.1 essentially reduces to the method analysed in Cohen et al. (2002). The routines $${\small{\text{APPLY}}}$$ and $${\small{\text{RHS}}}$$ are assumed to satisfy, for compactly supported $${\bf v}$$ and any $$\eta>0$$, the requirements $$\label{applyaccuracy}{\lVert{{\small{\text{APPLY}}}({\bf v} ; \eta ) - {\mathbf{A}} {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RHS}}}(\eta) - {\mathbf{f}}}\rVert} \leq \eta.$$ (2.9) Their construction depends on not only the type of approximation, but also the specific problem under consideration. These two routines are indeed the main driver of adaptivity in Algorithm 2.1, and a major part of what follows concerns the construction of $${\small{\text{APPLY}}}$$ in different scenarios. It hinges on the compression of matrices by exploiting their near-sparsity in certain basis representations. We use the following notion introduced in Cohen et al. (2001): A bi-infinite matrix $${\mathbf{B}}$$ is called $$s^*$$-compressible if there exist matrices $${\mathbf{B}}_{n}$$ with $$\alpha_{n} 2^n$$ entries per row and column and such that $$\label{compress}{\lVert{{\mathbf{B}} - {\mathbf{B}}_{n}}\rVert} \leq \beta_{n} 2^{-s n}, \quad \mbox{ for }\,\, 0 < s < s^*,$$ (2.10) and where the sequences $${\boldsymbol{\alpha}} = (\alpha_n)_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}} = (\beta_n)_{n\in{\mathbb{N}}_0}$$ are summable. Here, we always assume $${\mathbf{B}}_{0} = 0$$. Remark 2.1 As shown in Bachmayr & Dahmen (2015), regardless of the specifications of the routines $${\small{\text{APPLY}}}, {\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}\,\, and \,\,{\small{\text{RECOMPRESS}}}$$, Algorithm 2.1 terminates after finitely many steps and its output $${\bf u}_\varepsilon$$ satisfies $$\|{\bf u} - {\bf u}_\varepsilon\|\le \varepsilon$$. At this point, we record for later usage a particular feature of $${\mathbf{A}}$$ that arises as a consequence of our choice of tensor product orthogonal polynomials for parameter dependence: the approximate application of $${\mathbf{A}}$$ is facilitated by the fact that the matrices $${\mathbf{M}}_j$$ are bidiagonal. That is, in view of the three-term recurrence relation $$\label{threeterm}tL_n(t)= p_{n+1} L_{n+1}(t) + p_{n} L_{n-1}(t), \quad L_{-1}\equiv 0,$$ (2.11) where $$\label{recurr_coeff}p_0 = 0, \qquad p_n= \frac{1}{\sqrt{4 - n^{-2}}} , \quad n>0,$$ (2.12) one has $$\int_Y y_j\,L_\nu(y)L_{\nu'}(y)\,{\rm{d}}\mu(y) =0$$ unless $$| \nu_j - \nu'_j | = 1$$ and $$\nu_i = \nu'_i$$ for $$i \neq j$$ providing $$\label{threeterm_matrix}({\mathbf{M}}_j)_{\nu,\nu'}= p_{\nu_j + 1}\delta_{\nu +{\it \,{e}}^j,\nu'}+ p_{\nu_j}\delta_{\nu-{\it \,{e}}^j,\nu'}$$ (2.13) with the Kronecker sequence $$({\it {e}}^j_i)_{i\in {\mathcal{I}}}:=(\delta_{i,j} )_{i\in{\mathcal{I}}} \in {\mathcal{F}}$$. 3. Hierarchical tensor representations in the case of finitely many parameters We begin by considering the setting $$\label{cId}{\mathcal{I}}=\{1,\dots,d\},$$ (3.1) where $${\mathcal{F}} = {\mathbb{N}}_0^d$$ and $${\bf u} \in \ell^{2}({\mathcal{S}} \times {\mathbb{N}}_0\times\cdots\times{\mathbb{N}}_0)$$. Here, we are interested in the case that all coordinates in $${\mathcal{I}}$$ have comparable influence. As illustrated in Section 6, a direct sparse Legendre expansion of $${\bf u}$$ over $${\mathcal{S}}\times{\mathcal{F}}$$ will then in general be infeasible already for moderately large $$d$$. However, one may as well exploit Cartesian product structure in $${\mathcal{F}}$$, regarding $${\bf u}$$ as a higher-order tensor, and use corresponding hierarchical low-rank representations. As we shall detail in what follows, the results of Bachmayr & Dahmen (2015) can be adapted to this problem in a rather straightforward manner. It will be convenient to introduce a numbering of tensor modes as follows: $${\mathcal{G}}_{\rm x} := {\mathcal{S}}$$, $${\mathcal{G}}_1 := {\mathbb{N}}_0, \ldots$$, $${\mathcal{G}}_d := {\mathbb{N}}_0$$. We additionally introduce the notation $\hat{\mathcal{I}} := \{{\rm x}\}\cup{\mathcal{I}} .$ The representations of higher-order tensors that we consider are built on the Hilbert–Schmidt case via matricizations: for each nonempty $$M \subset \hat{\mathcal{I}}$$, any $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ induces a compact operator $$T_{\bf v}^{(M)} \colon \ell^{2}({\Large{\times}}_{i\in \hat{\mathcal{I}}\setminus M} {\mathcal{G}}_i) \to \ell^{2}({\Large{\times}}_{i\in M} {\mathcal{G}}_i)$$. In terms of the left singular vectors $$\{ {\bf U}^{(i)}_k \}_{k \in {\mathbb{N}}}$$ of $$T_{\bf v}^{(\{i\})}$$, $$i\in \hat{\mathcal{I}}$$, we obtain the HOSVD representation (Lathauwer et al., 2000) in the Tucker format (Tucker, 1964, 1966), $$\label{tuckerformat}{\bf v} = \sum_{k_{\rm x}=1}^{r_{\rm x}} \sum_{k_1 = 1}^{r_1} \cdots \sum_{k_d=1}^{r_d} {\bf a}_{k_{\rm x},k_1,\ldots,k_d} \bigotimes_{i\in\hat{\mathcal{I}}} {\bf U}^{(i)}_{k_i}.$$ (3.2) Here the tensor $${\bf a} = ({\bf a}_{\mathsf{k}})_{\mathsf{k}\in{\mathbb{N}}^{d+1}}$$ of order $$d+1$$ is referred to as core tensor and $$(r_{\rm x},r_1,\ldots,r_d)$$ as the multilinear ranks of $${\bf v}$$. The hierarchical tensor format (Hackbusch & Kühn, 2009), on which the variant of our scheme described in this section is based, can be interpreted as a further decomposition of $${\bf a}$$ into tensors of order at most three. This decomposition is obtained using further matricizations of the tensor according to a recursive decomposition of the set of modes $$\hat{\mathcal{I}}$$ into a binary tree, which we denote by $$\mathcal{D}$$. For simplicity, we focus in our exposition on linear trees corresponding to factorizations (1.15), where $$\label{lindimtree}\mathcal{D} = \bigl\{ \hat{\mathcal{I}}, \{{\rm x}\}, \{1,\ldots,d\}, \{1\}, \{2,\ldots, d\}, \ldots, \{d-1\}, \{d\} \bigr\}.$$ (3.3) For each $$\alpha \in \mathcal{D}$$, the rank of the corresponding matricization $$T^{(\alpha)}_{\bf v}$$ is denoted by $${\mathrm{rank}}_\alpha({\bf v})$$, its singular values by $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$. Here, $${\mathrm{rank}}_{\hat{\mathcal{I}}}({\bf v}) = 1$$ for all $${\bf v} \neq 0$$, and we set $${\mathrm{rank}}({\bf v}) := \bigl({\mathrm{rank}}_\alpha({\bf v})\bigr)_{\alpha\in\mathcal{D}\setminus\hat{\mathcal{I}}}.\label{multirank}$$ (3.4) In this section, we specialize the generic template Algorithm 2.1 to produce approximate solutions to (2.1) of the form (STF) with core tensor in hierarchical form as in (1.15). More precisely, the output is given in the form of a tensor $${\bf u}_\varepsilon$$ of order $$d+1$$, supported on $${\it{\Lambda}}^\varepsilon_{\rm x}\times {\it{\Lambda}}^\varepsilon_1 \times\cdots\times{\it{\Lambda}}^\varepsilon_d$$ with finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$. This tensor is given (assuming $$\mathcal{D}$$ as in (3.3)) in the following form: let $$r^\varepsilon_i := {\mathrm{rank}}_{\{i\}}({\bf u}_\varepsilon)$$ for $$i \in \hat {\mathcal{I}}$$ and $$\tilde r^\varepsilon_i := {\mathrm{rank}}_{\{i+1,\ldots, d\}}({\bf u}_\varepsilon)$$ for $$i = 1,\ldots,d-1$$, then \begin{align*}{\bf u}_{\varepsilon, \lambda,\nu_1,\ldots,\nu_d} &= \sum_{k_{\rm x}=1}^{r^\varepsilon_{\rm x}} \cdots \sum_{k_d=1}^{r^\varepsilon_d} \sum_{\ell_1=1}^{\tilde r^\varepsilon_1}\cdots\sum_{\ell_{d-1}=1}^{\tilde r^\varepsilon_{d-1}} {\mathbf{M}}^{(1),\varepsilon}_{k_{\rm x},\ell_1} \,{\mathbf{M}}^{(2),\varepsilon}_{\ell_1,k_1,\ell_2} \cdots {\mathbf{M}}^{(d-1),\varepsilon}_{\ell_{d-2},k_{d-2},\ell_{d-1}} \\ &\quad\times {\mathbf{M}}^{(d),\varepsilon}_{\ell_{d-1},k_{d-1},k_d} \, {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \, {\bf U}^{(1),\varepsilon}_{k_1, \nu_1}\cdots {\bf U}^{(d),\varepsilon}_{k_d, \nu_d}, \quad \lambda\in {\it{\Lambda}}^\varepsilon_{\rm x},\; \nu_i \in {\it{\Lambda}}^\varepsilon_i. \end{align*} The adaptive scheme identifies, in an intertwined fashion, the ranks $$r^\varepsilon_i$$, $$\tilde r^\varepsilon_i$$, the sets $${\it{\Lambda}}^\varepsilon_i$$, as well as the coefficient tensors $${\mathbf{M}}^{(i),\varepsilon}$$ and $${\bf U}^{(i),\varepsilon}$$ of $${\bf u}_\varepsilon$$. The function represented by $${\bf u}_\varepsilon$$ has precisely the form (STF), where $u^{{\rm x}}_{k_{\rm x}} = \sum_{\lambda \in {\mathcal{S}}} {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \psi_\lambda, \quad u^{{\rm y},i}_{k_i} = \sum_{n \in {\mathcal{G}}_i} {\bf U}^{(i),\varepsilon}_{k_i, n} L_n .$ The hierarchical format can offer substantially more favourable complexity characteristics for large $$d$$ than (3.2). The left singular vectors of the involved matricizations yield a hierarchical singular value decomposition (Grasedyck, 2010). We refer also to Falcó & Hackbusch (2012), Grasedyck et al. (2013), Hackbusch & Kühn (2009), Hackbusch (2012) and Kolda & Bader (2009) for detailed expositions regarding the finitely supported case (see also Oseledets & Tyrtyshnikov, 2009; Oseledets, 2011 for the related tensor train representation) and to Bachmayr & Dahmen (2015) for analogous results of tensors in sequence spaces, with notation analogous to this article. For quantifying the approximability of tensors on $${\Large{\times}}_{i \in \hat{\mathcal{I}}}{\mathcal{G}}_i$$ in terms of the best selection of finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$ as above, a pivotal role is played by the quantitites $$\label{contr}\pi^{(i)}({\bf v}) \,{:=}\, \bigl(\pi^{(i)}_{\nu_i}({\bf v})\bigr)_{\nu_i\in {\mathcal{G}}_i},\quad \pi^{(i)}_{\mu}({\bf v}) \,{:=}\, \left(\sum_{\substack{\nu\in{\mathcal{F}}\\ \nu_i=\mu}}|{\bf v}_\nu|^2\right)^{1/2} , \quad i \in \hat{\mathcal{I}} = \{{\rm x}, 1, \ldots, d\}.$$ (3.5) They are introduced in Bachmayr & Dahmen (2015) and called contractions in analogy to the terminology in tensor analysis. An efficient evaluation (without any $$d$$-dimensional summations) is possible due to the relation $$\label{sumN}\pi^{(i)}_{\mu}({\bf v})= \left(\sum_{k}|{\bf U}^{(i)}_{k,\mu}|^2|\sigma^{(i)}_k|^2\right)^{1/2},$$ (3.6) where $$\sigma^{(i)}_k$$ are the mode-$$i$$ singular values of $${\bf v}$$. As in our previous notation, we abbreviate $${\mathrm{supp}}_i {\bf v}:= {\mathrm{supp}}\bigl(\pi^{(i)}({\bf v})\bigr)$$, $$i \in \hat{\mathcal{I}}$$. 3.1 Adaptive scheme In this case, we consider Algorithm 2.1 with the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ for the hierarchical format as given in Bachmayr & Dahmen (2015, Remark 15). $${\small{\text{RECOMPRESS}}}$$ is based on a truncation of a hierarchical singular value decomposition up to a prescribed accuracy $$\eta>0$$, which can be ensured based on the $$\ell^2$$-norm of omitted singular values of matricizations. We denote this operation by $$\hat{\mathrm{P}}_{\eta}$$. As shown in Grasedyck (2010), it satisfies the quasi-optimality property $$\label{recompress quasiopt}{\lVert{ {\bf v} - \hat{\mathrm{P}}_{\eta}({\bf v})}\rVert} \leq \sqrt{2d-3} \inf \bigl\{ {\lVert{{\bf v} - {\bf w}}\rVert} \colon {{\mathrm{rank}}({\bf w})\leq {\mathrm{rank}}(\hat{\mathrm{P}}_{\eta}({\bf v}))} \bigr\},$$ (3.7) with the inequality between ranks as defined in (3.4) to be understood componentwise. $${\small{\text{COARSEN}}}$$ retains the degrees of freedom for each mode that correspond to the largest contractions (3.5). Let $$(\mu^*_{i,k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{(i)}_{\mu^*_{i,k}}({\bf v}))_{k\in{\mathbb{N}}}$$ is nonincreasing. Denote for $${\it{\Lambda}} \subset {\mathcal{S}}\times{\mathcal{F}}$$ by $${\mathrm{R}_{{\it{\Lambda}}}}{\bf v}$$ the array obtained by retaining all entries of $${\bf v}$$ corresponding to indices in $${\it{\Lambda}}$$, while replacing all others by zero. Given $$\eta >0$$, we define the product set ${\it{\Lambda}}(\eta)= {\Large{\times}}_{i \in \hat{\mathcal{I}}}\{ \mu^*_{i,k} \colon k \leq N_i\},$ where $$N_i$$, $$i\in \hat{\mathcal{I}}$$, are chosen to such that $$\sum_{i \in \hat{\mathcal{I}}} N_i$$ is minimal subject to the condition $$\label{hatC} \left( \sum_{i \in \hat{\mathcal{I}}} \sum_{k > N_i\vphantom{\hat{\mathcal{I}}}}|\pi^{(i)}_{\mu^*_{i,k}}({\bf v})|^2 \right)^{1/2}\leq \eta.$$ (3.8) Noting that the left side in (3.8) is an upper bound for $${\lVert{{\bf v} - {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}}\rVert}$$, we define $${\small{\text{COARSEN}}}$$ as a numerical realization of $$\hat{\mathrm{C}}_{\eta}{\bf v} := {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}$$, for which one has an analogous quasi-optimality property as in (3.7) with constant $$\sqrt{d}$$. Furthermore, $${\mathbf{A}}$$ as defined in (2.6) is in this case of finitely many parameters a finite sum of Kronecker product operators, ${\mathbf{A}} = \sum_{j=0}^d {\mathbf{A}}_j \otimes {\mathbf{M}}_j,$ which considerably simplifies the construction of the corresponding routine $${\small{\text{APPLY}}}$$. The action of $${\mathbf{A}}$$ can thus increase each hierarchical rank of its argument at most by a factor of $$d+1$$. Remark 3.1 In contrast to the case considered in Bachmayr & Dahmen (2016), here the Hilbert space $${\mathcal{H}} =V \otimes L^{2}({Y})$$ on which the problem is posed is endowed with a cross norm. As a consequence, the isomorphism that takes $$v\in {\mathcal{H}}$$ to its coefficients $${\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with respect to the tensor product basis is of Kronecker rank one. The original low-rank structure (1.1) of $$A(y)$$ is therefore preserved in the $$\ell^{2}$$-representation (2.6) of the problem. Consequently, the routine $${\small{\text{APPLY}}}$$ that adaptively approximates the action of $${\mathbf{A}}$$ can be obtained following the generic construction given in Bachmayr & Dahmen (2015), provided that the operators $${\mathbf{A}}_j$$ and $${\mathbf{M}}_j$$ acting on each mode have the required compressibility properties. Recall that by (2.13), the infinite matrices $${\mathbf{M}}_j$$ are bidiagonal and, hence, do not require any further approximation. To use the construction of Bachmayr & Dahmen (2015), we thus only need that the operators $${\mathbf{A}}_0, \ldots, {\mathbf{A}}_d$$ acting on the spatial variables are $$s^*$$-compressible. 3.2 Convergence analysis Our complexity results aim at the following type of statements: given a certain approximability of the solution, the algorithm recovers the corresponding convergence rates without their explicit knowledge. To describe these approximability properties, we now recall the definition of approximation classes to quantify the convergence of hierarchical low-rank approximations from Bachmayr & Dahmen (2015), in terms of the hierarchical rank defined by (3.4). Let $$\gamma = \bigl(\gamma(n)\bigr)_{n\in {\mathbb{N}}_0}$$ be positive and strictly increasing with $$\gamma(0)=1$$ and $$\gamma(n)\to\infty$$ as $$n\to\infty$$, for $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ let ${\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, \sup_{r\in{\mathbb{N}}_0} \gamma({r})\,\inf_{{\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r} {\lVert{{\bf v} - {\bf w}}\rVert} ,$ where $$\inf \{ {\lVert{{\bf v} - {\bf w}}\rVert}\colon {\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r\}$$ – that is, the errors of approximation by hierarchical ranks at most $$r$$ – can be bounded in terms of the hierarchical singular values $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$ for $$\alpha\in{\cal D}$$. We introduce the approximation classes ${\mathcal{A}_\mathcal{H}({\gamma})} \,{:=}\, \bigl\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}) : {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} <\infty \bigr\} , \quad{\lVert{{\bf v}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, {\lVert{{\bf v}}\rVert} + {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}.$ We restrict our considerations to $$\gamma$$ that satisfy \begin{equation*}\rho_\gamma \,{:=}\, \sup_{n\in{\mathbb{N}}} {\gamma}(n)/{\gamma}(n-1)<\infty, \end{equation*} which corresponds to a restriction to at most exponential growth; our considerations would still apply for faster growth, but lead to less sharp results. For an approximation $${\bf v}$$ of bounded support to $${\bf u}$$, the number of nonzero coefficients $$\#{\mathrm{supp}}_i {\bf v}$$ required in each tensor mode to achieve a certain accuracy depends on the best $$n$$-term approximability of the sequences $$\pi^{(i)}({\bf u})$$. This approximability by sparse sequences is quantified by the classical approximation classes$${\mathcal{A}^s} = {\mathcal{A}^s}(\mathcal{J})$$, where $$s>0$$ and $$\mathcal{J}$$ is a countable index set, comprised of all $${\bf w}\in \ell_2(\mathcal{J})$$ for which the quasi-norm $$\label{Asdef}{\lVert{\mathbf{{\bf w}}}\rVert}_{{\mathcal{A}^s}(\mathcal{J})} := \sup_{N\in{\mathbb{N}}_0} (N+1)^s \inf_{\substack{{\it{\Lambda}}\subset \mathcal{J}\\ \#{\it{\Lambda}}\leq N}} {\lVert{{\bf w} - {\mathrm{R}_{{\it{\Lambda}}}} {\bf w}}\rVert}$$ (3.9) is finite. In particular, if $$\pi^{(i)}({\bf u}) \in {\mathcal{A}^s}(\mathcal{G}_i)$$ then these sequences can be approximated within accuracy $$\eta$$ by finitely supported sequences with $${\cal \rm O}(\eta^{-1/s})$$ nonzero entries. In what follows, we do not explicitly specify the index set in the spaces $${\mathcal{A}^s}$$ when this is clear from the context. We analyse the complexity of the algorithm under the following benchmark assumptions, see the discussion in Section 6.1. Assumption 3.2 For the hierarchical tensor approximation in the case (3.1) of $$d$$ parametric variables, we assume the following: (i) $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s}({\mathcal{G}}_i)$$, $$i\in\hat{\mathcal{I}}$$, for an $$s>0$$. (ii) $${\bf u}, {\mathbf{f}} \in {\mathcal{A}}_{\mathcal{H}}(\gamma)$$, where $$\gamma(n) \,{:=}\, {\it \,{e}}^{\bar c n^{1/\bar b}}$$ with $$\bar b\geq 1$$, $$\bar c>0$$. (iii) The $${\mathbf{A}}_j$$, $$j \in \hat{\mathcal{I}}$$, are $$s^*$$-compressible for an $$s^*>s$$, and hence there exist matrices $${\mathbf{A}}_{j,n}$$ with $$\alpha_{j,n} 2^n$$ entries per row and column and such that $${\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n}}\rVert} \leq \beta_{j,n} 2^{-s n}$$, and where the sequences $${\boldsymbol{\alpha}}_j = (\alpha_{j,n})_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}}_j = (\beta_{j,n})_{n\in{\mathbb{N}}_0}$$ are summable. We will use the above assumptions as a reference point for the scaling with respect to $$\varepsilon$$ of the computational complexity. Note that Assumption 3.2(i),(ii) mean in particular that best $$N$$-term approximations of $$\pi^{(i)}({\bf u})$$ converge at least as $${\mathcal{O}} (N^{-s})$$ for $$i\in\hat{\mathcal{I}}$$ and low-rank approximations $${\bf u}_n$$ of $${\bf u}$$ with $${\lvert{{\mathrm{rank}}({\bf u}_n)}\rvert}_\infty\,{\leq}\,n$$ converge as $${\mathcal{O}}(\!{\it {e}}^{-\bar c n^{1/\bar b}})$$. Assumption 3.2(iii) needs to be established for each given problem. Under these conditions, it is possible to realize a routine $${\small{\text{RHS}}}$$ that satisfies the following conditions: for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta:={\small{\text{RHS}}}(\eta)$$, $$\label{rhsass1}\begin{gathered}\sum_{i\in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim d \eta^{-\frac {1}{s}} \left( \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}}, \quad \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^s} \lesssim d^{1+\max\{1,s\}} \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s}, \\ {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \lesssim \bigl(d^{-1} \ln ({\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}_{\mathcal{H}}(\gamma)}/\eta) \bigr)^{\bar{b}}, \end{gathered}$$ (3.10) with hidden constants that do not depend on $$d$$. Such approximations can be obtained, for instance, by combining $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with parameters adjusted as in Bachmayr & Dahmen (2015, Theorem 7). Assuming full knowledge of $${\mathbf{f}}$$, $${\small{\text{RHS}}}$$ can be realized such that the required number of operations is bounded, with $$C>0$$ independent of $$d$$, by $$\label{rhsass2}C\left( d {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty^3 + {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \sum_{i \in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\mathbf{f}}_\eta) \right)\!,$$ (3.11) and we shall make this idealized assumption in the subsequent analysis. Note that these requirements greatly simplify when the corresponding right-hand side $$f$$ is independent of the parametric variable. To compare different parametric dimensionalities $$d$$ in the complexity bounds, we additionally need a specific reference family of $$d$$-depen