Abstract We consider adaptive approximations of the parameter-to-solution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial expansions, general low-rank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their near-optimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity is a new operator compression result in the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized low-rank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem. 1. Introduction Complex design, optimization or uncertainty quantification tasks based on parameter dependent families of partial differential equations (PDEs) arise in virtually all branches of science and engineering. Typical scenarios are models whose physical properties—such as diffusivity, transport velocity or domain geometry—are described by a finite number of real parameter values. In certain instances, one may even encounter infinitely many parameters of decreasing influence. This occurs, for instance, in the case of a random stochastic diffusion field represented by an infinite expansion in a given basis. The development and analysis of numerical strategies for capturing the dependence of the PDE on the parameters has been the subject of intensive research efforts in recent years. 1.1 Problem formulation The problems that are addressed in this article have the following general form. Let $$V$$ be a separable Hilbert space. We consider a parametric operator $$A(y)\colon V\to V'$$ of the form \begin{equation}\label{Arep}A(y) := A_0 + \sum_{j\in{\mathcal{I}}} y_j A_j,\quad y\in{Y} := [-1,1]^{\mathcal{I}}\!, \end{equation} (1.1) where $${\mathcal{I}}=\{1,\dots,d\}$$ or $${\mathcal{I}}= {\mathbb{N}}$$ in the finite or infinite dimensional case, respectively. In the infinite dimensional case, we require that the above series converges in $${\cal L}(V,V')$$ for any $$y\in {Y}$$. We assume uniform boundedness and ellipticity of $$A(y)$$ over the parameter domain, that is, \begin{equation}\label{ellip}\langle A(y)v,w\rangle \leq R \|v\|_V\|w\|_V \quad {\rm and} \quad\langle A(y)v,v\rangle \geq r \|v\|_V^2, \quad v,w\in V, \; y\in {Y}, \end{equation} (1.2) for some $$0<r\leq R<\infty$$, which implies in particular that $$A(y)$$ is boundedly invertible uniformly in $$y\in {Y}$$, with \begin{equation}\|A(y)\|_{{\cal L}(V',V)}\leq r^{-1}, \quad y\in {Y}. \end{equation} (1.3) We also consider parametric data $$f\colon {Y} \to V'$$, and for each $$y\in Y$$, we define $$u(y)\in V$$ the solution to the equation \begin{equation}\label{paramgeneral}A(y)\,u(y) = f(y). \end{equation} (1.4) A guiding example is provided by affinely parametrized diffusion problems of the form \begin{equation} \label{paramdiffusion-0}A(y) u := -{\mathrm{div}}\bigl( a(y) \nabla u \bigr) = f , \quad a(y) := \bar a + \sum_{j \in {\mathcal{I}}} y_j {\theta}_j , \end{equation} (1.5) with homogeneous Dirichlet boundary conditions, posed in the weak sense on a spatial domain $$D \subset {\mathbb{R}}^m$$. In this particular case of frequent interest, the right-hand side $$f\in V'$$ is independent of $$y$$. The validity of (1.2) is then usually ensured by the uniform ellipticity assumption \begin{equation}\label{uea}\sum_{j\geq 1} {\lvert{{\theta}_j(x)}\rvert} \leq \bar a(x) - \underline{\alpha}, \quad x\in D, \end{equation} (1.6) for some $$\underline{\alpha}>0$$. We then have $$V=H^{1}_0(D)$$, and the corresponding operators $$A_j \colon V \to V'$$ for $$j \in \{0\}\cup {\mathcal{I}}$$ are defined by \begin{equation*}\langle A_0 u, v \rangle := \int_D \bar{a} \nabla u\cdot\nabla v \,{\rm{d}}x ,\qquad \langle A_j u, v \rangle := \int_D {\theta}_j \nabla u\cdot\nabla v \,{\rm{d}}x ,\quad i \in{\mathcal{I}}, \end{equation*} for $$u,v\in V$$. Thus, $$V$$ is typically a function space defined over some physical domain $$D\subset {\mathbb{R}}^m$$, with $$m=1,2,3$$, for example, the Sobolev space $$H^1_0(D)$$ in the above case of second-order elliptic equations with homogeneous Dirichlet boundary conditions. Therefore, the solution may either be viewed as the Hilbert space-valued map \begin{equation}y\mapsto u(y), \label{solutionmap}\end{equation} (1.7) which acts from $${Y}$$ to $$V$$ or as the scalar-valued map \begin{equation}(x,y)\mapsto u(x,y):=u(y)(x), \end{equation} (1.8) where $$x\in D$$ and $$y\in {Y}$$ are referred to as the spatial and parametric variables. Approximating such solution maps amounts to approximating functions of a large or even infinite number of variables. In applications, one is often interested in specific functionals of the solution. Here, we focus on the basic question of approximating the entire solution map in an appropriate norm. The guiding questions, to be made precise below, are the following: What are the most suitable approximations to cope with the high dimensionality in problems of the form (1.4), and what features of problems (1.4) favour certain approaches over others? Moreover, at what numerical cost can one find these approximations, and how do these costs depend on particular features of the given problem? To address the latter question, for each setting, we construct adaptive computational methods that exhibit near-optimal complexity, in a sense to be made precise below. 1.2 Sparse and low-rank approximability Before addressing any concrete numerical schemes, we discuss the basic concepts of approximations for the solution map $$u$$ in (1.7). We focus on the mean squared error $$\|u-\tilde u\|_{L^2({Y},V)}$$ for an approximation $$\tilde u$$, where \[\|v\|_{L^2({Y},V)}^2:=\int_{Y} \|v(y)\|_V^2 {\rm \,{d}}\mu(y), \] for a given probability measure $$\mu$$ over $${Y}$$. In what follows, we assume that $$\mu$$ is the uniform probability measure on $${Y}$$. The results carry over, however, to other product measures on $${Y}$$. The following types of approximation make essential use of the tensor product structure of the Bochner space $$L^2({Y},V)=V\otimes L^2({Y})$$, where $$L^2({Y})=L^2({Y},\mu)$$. Sparse polynomial expansions A first approach to approximating $$y\mapsto u(y)$$ is to employ an a priori chosen basis $$\{u_1^{{\rm y}}, \dots, u_n^{{\rm y}}\}\subset L^2(Y)$$ and compute the $$u_i^{{\rm x}}\in V$$ as the corresponding coefficients of this approximation. One prominent example of this approach is the orthogonal polynomial expansion method (see, e.g., Ghanem & Spanos, 1990, 1991; Le Maître & Knio, 2010; Xiu, 2010). In this case, the parametric functions $$u_i^{{\rm y}}$$ are picked from the set of tensorized Legendre polynomials \begin{equation} L_\nu(y)=\prod_{j\geq 1} L_{\nu_j}(y_j), \quad \nu=(\nu_j)_{j\geq 1}, \label{tensorleg} \end{equation} (1.9) with $$(L_k)_{k\geq 1}$$ the univariate Legendre polynomial sequence normalized in $$L^2([-1,1],\frac {\,{\rm d}t} {2})$$. The functions $$(L_\nu)_{\nu\in{\mathcal{F}}}$$ are an orthonormal basis of $$L^2({Y})$$, where $${\mathcal{F}}$$ is $${\mathbb{N}}_0^d$$ in the case $${\mathcal{I}}=\{1,\dots,d\}$$ or the set of finitely supported sequences of non-negative integers in the case $${\mathcal{I}}={\mathbb{N}}$$, that is, \begin{equation} {\mathcal{F}} := \{ \nu\in{\mathbb{N}}_0^{{\mathbb{N}}} \colon \#\,{\mathrm{supp}}\,\, \nu < \infty \}.\end{equation} (1.10) One thus has \begin{equation}\label{Legendre} u(y)=\sum_{\nu\in{\mathcal{F}}} u_\nu L_\nu(y), \quad u_\nu= \int_{Y} u(y)L_\nu(y) {\rm \,{d}}\mu(y). \end{equation} (1.11) Then, one natural choice is the best $$n$$-term approximation $$u_n$$ obtained by restricting the above expansion to the set $${\it{\Lambda}}^{\rm y}_{n}\subset {\mathcal{F}}$$ of indices $$\nu$$ corresponding to the $$n$$ largest $$\|u_\nu\|_V$$, because this set minimizes the error $$\|u-u_n\|_{L^2({Y},V)}$$ among all possible choices of $$n$$-term truncations. This strategy for generating sparse polynomial approximations in the context of parametric PDEs was first introduced and analysed in Cohen et al. (2010, 2011). In practice, the set $${\it{\Lambda}}^{\rm y}_n$$ is not accessible, but provides a benchmark for the performance of algorithms. This representation complexity, however, does not yet determine the resulting computational complexity, because the coefficients $$u_\nu$$ in (1.11) in turn need to be approximated as well. For instance, one may choose a fixed basis $$\{ \psi_\lambda\}_{\lambda\in{\mathcal{S}}}$$ of $$V$$ and expand \[u_\nu = \sum_{\lambda\in {\mathcal{S}}} {\bf u}_{\lambda,\nu} \psi_\lambda \] in (1.11), where $${\bf u}_{\lambda,\nu}\in{\mathbb{R}}$$. The simplest strategy is to use the same discretization for all $$u_\nu$$ by selecting a finite $${\it{\Lambda}}^{\rm x}$$, which yields the approximation \[u \approx \sum_{\nu \in {\it{\Lambda}}_{n}^{\rm y}} \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}} {\bf u}_{\lambda,\nu} \psi_\lambda \right)\otimes L_\nu. \] Using instead an independently adapted spatial discretization for each $$u_\nu$$ corresponds to adaptive sparse polynomial approximations of the form \begin{equation}\label{fullnterm}u \approx \sum_{(\lambda, \nu) \in {\it{\Lambda}}} {\bf u}_{\lambda,\nu}\, \psi_\lambda\otimes L_\nu, \end{equation} (ASP) with $${\it{\Lambda}} \subset {\mathcal{S}} \times {\mathcal{F}}$$. It is natural to quantify the complexity of such an approximation by the number of activated degrees of freedom $$\#{\it{\Lambda}}$$. Here, one can again ask for best $$N$$-term approximations, now with respect to the fixed basis $$\{\psi_\lambda\otimes L_\nu\}_{\lambda\in{\mathcal{S}},\nu\in{\mathcal{F}}}$$, obtained by minimizing the error over all $${\it{\Lambda}}$$ with $$\#{\it{\Lambda}} = N$$. This now results in a fully discrete approximation. Low-rank approximation More generally, one may consider approximations of the form \begin{equation}\label{basiclrapprox}u \approx u_n :=\sum_{k=1}^n u^{\rm x}_k \otimes u^{\rm y}_k, \end{equation} (1.12) where $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ are functions of the spatial and parametric variables, respectively. This contains (1.11) as a special case, but we also now allow $$u^{\rm y}_k \in L^2(Y)$$ to be arbitrary functions that are not given a priori, but adapted to the given problem. The shortest expansion of the form (1.12) that achieves a prescribed error in $$L^2({Y},V)$$ is given by truncation of the Hilbert–Schmidt decomposition of $$u$$ interpreted as the operator \begin{equation}\label{Tu}T_u: v \to \int_Y u(x,y)v(y){\rm \,{d}}\mu(y), \end{equation} (1.13) acting from $$L^2({Y})$$ to $$V$$. In this context, we define $${\mathrm{rank}}(u)$$ as the rank of the operator $$T_u$$, so that in particular $$u_n$$ with a representation by $$n$$ separable terms as in (1.12) has $${\mathrm{rank}}(u_n)\leq n$$. The functions $$u_1^{\rm x}, \dots, u_n^{\rm x}$$ and $$u_1^{\rm y}, \dots, u_n^{\rm y}$$ are given by the left and right singular functions, respectively, which yield the optimal rank-$$n$$ approximation of $$u$$ in $$L^2({Y},V)$$. This particular system of basis functions is a natural benchmark, as it minimizes the rank $$n=n(\varepsilon)$$ required to ensure a mean squared accuracy $$\varepsilon$$. However, it is not obvious how to compute sufficiently good approximations of these basis functions at affordable cost, a point to be taken up again later. The methods considered in this article are based on computing approximations of both $$u_k^{\rm x}$$ and $$u_k^{\rm y}$$. A low-rank approximation trying to approximately realizing a truncated Hilbert–Schmidt decomposition would be a first example for this category aiming at meeting the aforementioned benchmark. In this case, the error caused by truncation should ideally be balanced against the error in approximating the unknown basis functions $$u^{\rm x}_k, u^{\rm y}_k$$. Note that there exist alternative approaches for deriving computable expansions of the form (1.12), where only the functions $$u_1^{{\rm x}}, \dots, u_n^{{\rm x}}$$ and their span $$V_n$$ are constructed, which we comment on in Section 1.4. To obtain numerically realizable approximations, we may again use bases of $$V$$ and $$L^2(Y)$$ as in (ASP) and consider expansions for $$u^{\rm x}_k$$ and $$u^{\rm y}_k$$ to arrive at fully discrete low-rank approximations of the form \begin{equation}\label{fulllr}u \approx \sum_{k=1}^n \left( \sum_{\lambda\in {\it{\Lambda}}^{\rm x}_k} {\bf u}^{\rm x}_{k,\lambda} \psi_\lambda \right)\otimes \left(\sum_{\nu\in {\it{\Lambda}}^{\rm y}_k} {\bf u}^{\rm y}_{k,\nu} L_\nu \right)\! , \end{equation} (LR) with $${\it{\Lambda}}^{\rm x}_k \subset {\mathcal{S}}$$, $${\it{\Lambda}}^{\rm y}_k\subset {\mathcal{F}}$$, $$k=1,\ldots,n$$. Hierarchical tensor decompositions One may as well go beyond the Hilbert–Schmidt decomposition (1.12) and consider higher-order low-rank tensor representations that correspond to further decompositions of the factors $$u^{\rm y}_k$$ in (1.12). For simplicity, at this point, let us consider this in the finite-dimensional case $$d<\infty$$, possibly after truncating the expansion (1.1) for $$A(y)$$. Introducing an additional tensor decomposition of the factors $${\bf u}^{\rm y}_k$$, we obtain the general approximations in subspace-based tensor formats, \begin{equation} \label{general}u_n = \sum_{k_{\rm x}=1}^{r_{\rm x}} u_{k_{\rm x}}^{\rm x} \otimes \left(\sum_{k_1=1}^{r_1}\cdots\sum_{k_d=1}^{r_d}{\bf a}_{k_{\rm x},k_1,\ldots,k_d}\bigotimes_{j=1}^d u^{{\rm y},j}_{k_j} \right)\!, \end{equation} (STF) where each $$u_{k_j}^{{\rm y},j}$$ is a function of the individual variable $$y_j$$. The minimal $$r_j$$ such that $$u_n$$ can be represented in the form (STF) are called multilinear ranks of $$u_n$$. We confine our discussion to hierarchical tensor representations, with the tensor train format as a special case (see, e.g., Oseledets & Tyrtyshnikov, 2009; Grasedyck, 2010; Hackbusch, 2012), where the high-order core tensor $${{\bf a}}=({\bf a}_{k_{\rm x},k_1,\ldots,k_d})_{k_{\rm x},k_1,\ldots,k_d}$$ is further decomposed in terms of lower-order tensors, based on matricizations of $${\bf a}$$. For instance, if \begin{equation}\label{ranks} \tilde r_i= {\mathrm{rank}}\big({\bf a}_{(k_0,\ldots, k_i),(k_{i+1},\ldots k_d)}\big), \end{equation} (1.14) one has a factorized representation of the form \begin{equation}\label{htcore}{{\bf a}}_{k_{\rm x},k_1,\ldots,k_d} = \sum_{\ell_1=1}^{\tilde r_1}{\mathbf{M}}^{(1)}_{k_{\rm x},\ell_1} \sum_{\ell_2=1}^{\tilde r_2} {\mathbf{M}}^{(2)}_{\ell_1,k_1,\ell_2} \cdots \sum_{\ell_{d-1}=1}^{\tilde r_{d-1}} {\mathbf{M}}^{(d-1)}_{\ell_{d-2},k_{d-2},\ell_{d-1}} {\mathbf{M}}^{(d)}_{\ell_{d-1},k_{d-1},k_d}\end{equation} (1.15) in terms of the tensors $${\mathbf{M}}^{(i)}$$, $$i=1,\ldots,d$$, of order at most three, and only these low-order tensors need to be stored and manipulated. The representation (STF) contains (ASP) and (LR) as special cases. For instance, to recover a sparse polynomial expansion (ASP), let $$\nu(k_{\rm x})$$, $$k_{\rm x}=1,\ldots,r_{\rm x}$$, be an enumeration of elements of $${\mathbb{N}}^d_0$$, and choose $${\bf a}_{k_{\rm x},k_1,\ldots,k_d} = \delta_{\nu(k_{\rm x}),(k_1,\ldots,k_d)}$$, $$u^{{\rm y},j}_{k_j}= L_{k_j}$$. With $$r_{\rm x} = r_1=\ldots=r_d$$ and diagonal core tensor $${\bf a}$$ having nonzero entries $${\bf a}_{k,k,\ldots,k}=1$$ for $$k = 1,\ldots, r_{\rm x}$$, one obtains representations (LR). 1.3 Guiding questions In the different types of approximation outlined above, the degrees of freedom enter in varying degrees of nonlinearity. More strongly nonlinear approximations (STF) with hierarchical decomposition (1.15) can potentially yield more strongly compressed representations, in the sense that the number of degrees of freedom $$n_{\text{dof}}(\varepsilon)$$ required for a target accuracy $$\varepsilon$$ in $$L^2(Y,V)$$ scales more favourably. Handling this stronger compression in the computation of such representations, however, leads to additional difficulties, and the number of required operations $$n_{\rm op}(\varepsilon)$$ may in fact scale less favourably than $$n_{\text{dof}}(\varepsilon)$$. Here, we aim for algorithms which, for each of the above types of approximation, are guaranteed to achieve any prescribed accuracy $$\varepsilon$$ and which are universal. This means that they do not require a priori knowledge on the approximability of the solution (e.g., on the decay of coefficients), but adjust to such approximability automatically. This goes hand in hand with a mechanism for obtaining a posteriori error bounds, making use only of the given data. This leads us to our first guiding question: (I) For a given parametric problem and approximation format (ASP), (LR) or (STF), can one contrive a universal numerical scheme that can achieve any given target accuracy $$\varepsilon$$, with approximate solutions close to the minimum required representation complexity $$n_{\rm dof} (\varepsilon)$$, and can $$n_{\rm op}$$ be related to $$\varepsilon$$, and hence to $$n_{\rm dof}(\varepsilon)$$? The minimum required representation complexity can be expressed in terms of the intrinsic approximability properties of the parametrized solutions $$u$$ in each of the formats. The corresponding required number of operations also depends on the problem data that are used in the solution process. We construct algorithms, based on a common generic strategy for (ASP), (LR) and (STF), which are near-optimal in this regard. With such algorithms at hand, a natural further question is the following. (II) Which of the approximation types (ASP), (LR) or (STF) is best suited for a given parametric problem, in the sense of leading to the smallest growth of $$n_{\rm op}(\varepsilon)$$ as $$\varepsilon\to 0$$? This amounts to asking for which parametric problems the investment into approximations of higher structural nonlinearity pays off, or conversely, for which problems possible gains in approximation efficiency are offset by more demanding computations. We address this point by analyses of the approximability of model problems, complemented by numerical experiments, with conclusions depending on the particular problem type. For problems with finitely many parameters that are each of comparable influence, hierarchical tensor representations of the form (STF) with (1.15) turn out to be clearly advantageous. In the case of an anisotropic dependence on infinitely many parameters, for representative model problems, we demonstrate that (ASP) can in general yield faster convergence than (LR) or (STF). The particular structure of such infinite parameter expansions also turns out to have a major influence on the efficiency of the adaptive schemes. 1.4 Relation to previous work There are a variety of results on the convergence of sparse polynomial expansions (1.11) (see, e.g., Cohen et al., 2010, 2011; Bachmayr et al., 2017). Furthermore, some estimates are available that include multilevel spatial discretizations and, hence, provide upper bounds for the error of best $$n$$-term approximation (ASP) (see, e.g., Cohen et al., 2011; Dũng, 2015). Concerning our question (II), there are only few specialized results comparing the different approximation formats. In the case of general bivariate functions, a systematic comparison between sparse grids and low-rank approximation is discussed in Griebel & Harbrecht (2014), showing in particular that for Sobolev classes the latter does not bring any improvement. In the case of high-dimensional functions associated with parametric PDEs, possible gains by low-rank approximations have been identified in Lassila et al. (2013) and Bachmayr & Cohen (2017) by exploiting the particular structure of the problem, all concerning the case of finitely many parameters. There are various approaches for generating sparse polynomial expansions, for instance, based on collocation (Babuška et al., 2010; Beck et al., 2012) or adaptive Taylor expansion (Chkifa et al., 2014). Note that these strategies do not currently yield a posteriori error bounds for the computed solutions, and their performance is thus described by a priori estimates that may not be sharp. The adaptive methods proposed in Eigel et al. (2014, 2015), based on finite element discretization for the spatial variable, yields a posteriori error bounds for the full approximations. However, the complexity bounds proved in Eigel et al. (2015) are given only in terms of the resulting finite element meshes. Adaptive schemes using wavelet-based spatial discretizations, which yield approximations of the form (ASP), have been studied by Gittelson (2013, 2014). In this case, bounds for the complete computational complexity are proved which, however, do not fully comply with the approximability properties of the solution. Reduced basis and proper orthogonal decomposition (POD) methods (Kahlbacher & Volkwein, 2007; Rozza et al., 2008; Lassila et al., 2013; Rozza, 2014) correspond to expansions of the form (1.12), where only the spatial basis elements $$u^{\rm x}_k$$ spanning $$V_n$$ are explicitly computed in an offline stage. Then, in an online stage, for any given $$y\in {Y}$$, the approximate solution $$u_r(y)$$ is defined as the Galerkin projection of $$u(y)$$ on the space $$V_n$$. For known variants of these methods, accuracy guarantees in the respective norms (where reduced basis methods usually aim at the error in $$L^\infty$$-norm $$\|v\|_{L^\infty({Y},V)} :=\sup_{y\in {Y}} \|v(y)\|_V$$) require a sufficiently dense sampling of the parameter domain. This becomes prohibitive for large $$d$$, and only one obtains a posteriori bounds for the resulting $$V$$-error in each given $$y\in{Y}$$. In methods based on higher-order tensor representations, instead of sampling in the parameter domain, one also approximates $$u^{\rm y}_k$$ as in (STF), at the price of additional approximability requirements as in (1.15). A variety of schemes have been proposed that operate on fixed discretizations (Khoromskij & Oseledets, 2010; Khoromskij & Schwab, 2011; Kressner & Tobler, 2011; Matthies & Zander, 2012), which do not yield information on the discretization error. Based on Eigel et al. (2014), an adaptive scheme for hierarchical tensor approximation is proposed in Eigel et al. (2017). It provides rigorous a posteriori bounds for the approximation error, but is not proven to converge. 1.5 Novelty of the article and outline Question (I) is addressed in Sections 2–5. A generic algorithm is described in Section 2 based on the work in Bachmayr & Dahmen (2015), which is guaranteed to converge without any a priori assumptions on the solution. Furthermore, it yields rigorous a posteriori error bounds, using only information on the problem data. Suitable specifications cover all the aforementioned types of approximations (ASP), (LR) and (STF). The scheme is formulated in a general sequence space framework, using a discretization of the space $$L^2({Y},V)$$ through a basis with elements of the form $$\psi_\lambda\otimes L_\nu$$. Here, $$\{\psi_\lambda\}_{\mu\in {\mathcal{S}}}$$ is a given Riesz basis of $$V$$ (e.g., a wavelet basis in the case where $$V$$ is a Sobolev space) and $$\{L_\nu\}_{\nu \in{\mathcal{F}}}$$ is the previously described multivariate Legendre basis. The algorithm performs an iteration in the sequence space $$\ell^2({\mathcal{S}}\times {\mathcal{F}})$$. It involves at each step specific routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ aiming at controlling the ranks of the current approximation as well as the number of degrees of freedom in each of its factors, respectively. We then describe realizations of this generic algorithm corresponding to two distinct settings. In Section 3, we apply the algorithm for the generation of approximations (STF) in the setting of finitely many parametric variables. In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on a truncation of a hierarchical singular value decomposition of the coefficient tensor. We analyse the performance of the algorithms for classes described by the decay of the corresponding singular values and joint sparsity of the corresponding singular vectors. Sections 4 and 5 are devoted to the case of anisotropic dependence on infinitely many parameters in the diffusion problem (1.5). In Section 4, we analyse a specialized version of Algorithm 2.1 producing $$n$$-term sparse Legendre expansions, see (ASP). In this version, the routine $${\small{\text{RECOMPRESS}}}$$ is simply the identity, and hence, Algorithm 2.1 agrees with the adaptive solver developed and analysed in Cohen et al. (2002). In Section 5, we consider, in the same setting as in Section 4, a solver for approximations (LR). In this case, the $${\small{\text{RECOMPRESS}}}$$ routine is based on standard SVD truncation. The corresponding notions of approximability are analogous to those arising in Section 3. A key ingredient in Sections 4 and 5 is the adaptive approximation of the operator based on matrix compression results in Appendix A. Here, we obtain new estimates for wavelet-type multilevel expansions of the parametrized coefficients that are more favourable than what is known for Karhunen–Loève-type expansions. Our further algorithmic developments also require substantially weaker assumptions on the $$A_j$$ in (1.1) than the methods in Eigel et al. (2014, 2017), which require summability of $$({\lVert{A_j}\rVert})_{j\geq 1}$$. By the new operator compression results, we establish, in particular, computational complexity estimates for (ASP), which significantly improve on those of similar schemes in Gittelson (2013, 2014). Based on these complexity estimates, Question (II) is then addressed in Section 6. Although the presented algorithms are guaranteed to converge, the corresponding computational cost can only be quantified in terms of approximability properties of solutions $$u$$. In Section 6, we study the corresponding properties, which are different for each realization of the scheme, in representative examples of parametric problems of the form (1.5). In particular, for a certain class of such problems, we prove that the best $$n$$-term Legendre approximation is already asymptotically near-optimal among all rank-$$n$$ approximations. For other examples, we prove that optimized low-rank approximations can achieve significantly better complexity than best $$n$$-term Legendre approximations. This is illustrated further by numerical tests, demonstrating that these observations also hold for more involved model problems. 2. A generic algorithm In this section, we follow the approach developed in Bachmayr & Dahmen (2015) by first reformulating the general equation (1.4) in a sequence space, and then introducing a generic resolution algorithm based on this equivalent formulation. We first notice that (1.4) may also be written as \begin{equation} Au=f, \label{paramgeneral1}\end{equation} (2.1) where $$A$$ is elliptic and boundedly invertible from $$L^2({Y},V)$$ to $$L^2({Y},V')$$ and can be defined in a weak sense by \begin{equation}\langle Au,v\rangle :=\int_{{Y}}\langle A(y)u(y),v(y)\rangle {\rm \,{d}}\mu(y), \quad u,v\in L^2({Y},V). \end{equation} (2.2) We assume that $$f\in L^2({Y},V')$$, so that there exists a unique solution $$u\in L^2({Y},V)$$. Given a Riesz basis $$\{ \psi_\lambda \}_{\lambda \in {\mathcal{S}}}$$ of $$V$$, we tensorize it with the orthonormal basis $$\{L_\nu\}_{\nu\in {\mathcal{F}}}$$ of $$L^2({Y})$$. The resulting system $$\{\psi_\lambda \otimes L_\nu\}_{(\lambda,\nu)\in {\mathcal{S}} \times {\mathcal{F}}}$$ is a Riesz basis of $$L_2({Y},V)$$, which we now use to discretize (2.1). For this purpose, we define the matrices \begin{equation}{\mathbf{A}}_j := \bigl( \langle A_j \psi_{\lambda'}, \psi_{\lambda}\rangle \bigr)_{\lambda,\lambda' \in {\mathcal{S}}}\quad{\rm and} \quad {\mathbf{M}}_j = \left( \int_{{Y}} y_j L_\nu(y) L_{\nu'}(y) \,{\rm{d}}\mu(y) \right)_{\nu,\nu'\in{\mathcal{F}}}, \end{equation} (2.3) where $${\mathbf{M}}_0$$ is set to be the identity on $$\ell^2({\mathcal{F}})$$, and the right-hand side column vector \begin{equation}{\mathbf{f}} := \big(\langle f, \psi_\lambda\otimes L_\nu\rangle\big)_{(\lambda,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}. \end{equation} (2.4) We thus obtain an equivalent problem \begin{equation}\label{equiv}{\mathbf{A}} {\bf u} = {\mathbf{f}}\end{equation} (2.5) on $$\ell^{2}({\mathcal{S}} \times {\mathcal{F}})$$, where \begin{equation}\label{Arep2}{\mathbf{A}} := \sum_{j\geq 0} {\mathbf{A}}_j \otimes {\mathbf{M}}_j \end{equation} (2.6) and $${\bf u}=\big(u_{\lambda,\nu}\big)_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$ is the coordinate vector of $$u$$ in the basis $$\{\psi_\mu \otimes L_\nu\}_{(\mu,\nu)\in {\mathcal{S}}\times{\mathcal{F}}}$$. Regarding $$\nu\in{\mathcal{F}}$$ as the column index of the infinite matrix $${\bf u}=({\bf u}_{\mu,\nu})_{\mu \in {\mathcal{S}},\nu\in{\mathcal{F}}}$$, we denote by $${\bf u}_\nu$$ the columns of $${\bf u}$$, which are precisely the basis representations of the Legendre coefficients $$u_\nu\in V$$. In what follows, we always denote by $${\lVert{\cdot}\rVert}$$ the $$\ell^{2}$$-norm on the respective index set that could be $${\mathcal{S}}$$, $${\mathcal{F}}$$ or $${\mathcal{S}}\times{\mathcal{F}}$$ or the corresponding operator norm when this is clear from the context. Since $$\{\psi_\mu\}_{\mu\in {\mathcal{S}}}$$ is a Riesz basis for $$V$$ we have $$\|u_\nu\|_{V}\sim \|{\bf u}_\nu\|$$ uniformly in $$\nu\in{\mathcal{F}}$$, which together with boundedness and ellipticity of $$A$$ implies that $${\mathbf{A}}$$ is bounded and elliptic on $$\ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$, and that we have \begin{equation}\label{normequiv2}\|{\bf u}\|\sim \|{\mathbf{A}}{\bf u}\|\sim \|Au\|_{L^2({Y},V')}\sim \|u\|_{L^2({Y},V)}\end{equation} (2.7) with uniform constants. On account of (2.7), solving (2.5) approximately up to some target accuracy is equivalent to solving (2.5) in $$\ell^{2}$$ to essentially the same accuracy. As a further consequence, one can find a fixed positive $$\omega$$ such that $${\lVert{\mathbf{I} - \omega {\mathbf{A}}}\rVert}\le \rho <1$$, ensuring that a simple Richardson iteration converges with a fixed error reduction rate per step. This serves as the conceptual starting point for the adaptive low-rank approximation scheme introduced in Bachmayr & Dahmen (2015) as given in Algorithm 2.1. This basic algorithmic template can be used to produce various types of sparse and low-rank approximations, with appropriate choices of the subroutines $${\small{\text{APPLY}}}$$, $${\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$. The procedures $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ are independent of the considered $${\mathbf{A}}$$ and $${\mathbf{f}}$$, and satisfy \begin{equation}\label{craccuracy}{\lVert{{\small{\text{COARSEN}}}({\bf v} ; \eta ) - {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RECOMPRESS}}}({\bf v};\eta) - {\bf v}}\rVert} \leq \eta, \end{equation} (2.8) for any $$\eta \geq 0$$ and any compactly supported $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$. Here, $${\small{\text{COARSEN}}}$$ is intended to reduce the support of the sequence $${\bf v}$$, whereas $${\small{\text{RECOMPRESS}}}$$ reduces the rank of $${\bf v}$$ in a low-rank tensor representation. The particular realizations of these routines depend on the dimensionality of the problem and on the type of approximation. We shall use the constructions given in Bachmayr & Dahmen (2015). In the case of the sparse approximations considered in Section 4, $${\small{\text{RECOMPRESS}}}$$ is chosen as the identity, and Algorithm 2.1 essentially reduces to the method analysed in Cohen et al. (2002). The routines $${\small{\text{APPLY}}}$$ and $${\small{\text{RHS}}}$$ are assumed to satisfy, for compactly supported $${\bf v}$$ and any $$\eta>0$$, the requirements \begin{equation}\label{applyaccuracy}{\lVert{{\small{\text{APPLY}}}({\bf v} ; \eta ) - {\mathbf{A}} {\bf v} }\rVert} \leq \eta,\quad {\lVert{{\small{\text{RHS}}}(\eta) - {\mathbf{f}}}\rVert} \leq \eta. \end{equation} (2.9) Their construction depends on not only the type of approximation, but also the specific problem under consideration. These two routines are indeed the main driver of adaptivity in Algorithm 2.1, and a major part of what follows concerns the construction of $${\small{\text{APPLY}}}$$ in different scenarios. It hinges on the compression of matrices by exploiting their near-sparsity in certain basis representations. We use the following notion introduced in Cohen et al. (2001): A bi-infinite matrix $${\mathbf{B}}$$ is called $$s^*$$-compressible if there exist matrices $${\mathbf{B}}_{n}$$ with $$\alpha_{n} 2^n$$ entries per row and column and such that \begin{equation}\label{compress}{\lVert{{\mathbf{B}} - {\mathbf{B}}_{n}}\rVert} \leq \beta_{n} 2^{-s n}, \quad \mbox{ for }\,\, 0 < s < s^*, \end{equation} (2.10) and where the sequences $${\boldsymbol{\alpha}} = (\alpha_n)_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}} = (\beta_n)_{n\in{\mathbb{N}}_0}$$ are summable. Here, we always assume $${\mathbf{B}}_{0} = 0$$. Remark 2.1 As shown in Bachmayr & Dahmen (2015), regardless of the specifications of the routines $${\small{\text{APPLY}}}, {\small{\text{RHS}}}$$, $${\small{\text{COARSEN}}}\,\, and \,\,{\small{\text{RECOMPRESS}}}$$, Algorithm 2.1 terminates after finitely many steps and its output $${\bf u}_\varepsilon$$ satisfies $$\|{\bf u} - {\bf u}_\varepsilon\|\le \varepsilon$$. At this point, we record for later usage a particular feature of $${\mathbf{A}}$$ that arises as a consequence of our choice of tensor product orthogonal polynomials for parameter dependence: the approximate application of $${\mathbf{A}}$$ is facilitated by the fact that the matrices $${\mathbf{M}}_j$$ are bidiagonal. That is, in view of the three-term recurrence relation \begin{equation}\label{threeterm}tL_n(t)= p_{n+1} L_{n+1}(t) + p_{n} L_{n-1}(t), \quad L_{-1}\equiv 0, \end{equation} (2.11) where \begin{equation}\label{recurr_coeff}p_0 = 0, \qquad p_n= \frac{1}{\sqrt{4 - n^{-2}}} , \quad n>0, \end{equation} (2.12) one has $$\int_Y y_j\,L_\nu(y)L_{\nu'}(y)\,{\rm{d}}\mu(y) =0$$ unless $$| \nu_j - \nu'_j | = 1$$ and $$\nu_i = \nu'_i$$ for $$i \neq j$$ providing \begin{equation}\label{threeterm_matrix}({\mathbf{M}}_j)_{\nu,\nu'}= p_{\nu_j + 1}\delta_{\nu +{\it \,{e}}^j,\nu'}+ p_{\nu_j}\delta_{\nu-{\it \,{e}}^j,\nu'}\end{equation} (2.13) with the Kronecker sequence $$({\it {e}}^j_i)_{i\in {\mathcal{I}}}:=(\delta_{i,j} )_{i\in{\mathcal{I}}} \in {\mathcal{F}}$$. 3. Hierarchical tensor representations in the case of finitely many parameters We begin by considering the setting \begin{equation}\label{cId}{\mathcal{I}}=\{1,\dots,d\}, \end{equation} (3.1) where $${\mathcal{F}} = {\mathbb{N}}_0^d$$ and $${\bf u} \in \ell^{2}({\mathcal{S}} \times {\mathbb{N}}_0\times\cdots\times{\mathbb{N}}_0)$$. Here, we are interested in the case that all coordinates in $${\mathcal{I}}$$ have comparable influence. As illustrated in Section 6, a direct sparse Legendre expansion of $${\bf u}$$ over $${\mathcal{S}}\times{\mathcal{F}}$$ will then in general be infeasible already for moderately large $$d$$. However, one may as well exploit Cartesian product structure in $${\mathcal{F}}$$, regarding $${\bf u}$$ as a higher-order tensor, and use corresponding hierarchical low-rank representations. As we shall detail in what follows, the results of Bachmayr & Dahmen (2015) can be adapted to this problem in a rather straightforward manner. It will be convenient to introduce a numbering of tensor modes as follows: $${\mathcal{G}}_{\rm x} := {\mathcal{S}}$$, $${\mathcal{G}}_1 := {\mathbb{N}}_0, \ldots$$, $${\mathcal{G}}_d := {\mathbb{N}}_0$$. We additionally introduce the notation \[\hat{\mathcal{I}} := \{{\rm x}\}\cup{\mathcal{I}} . \] The representations of higher-order tensors that we consider are built on the Hilbert–Schmidt case via matricizations: for each nonempty $$M \subset \hat{\mathcal{I}}$$, any $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ induces a compact operator $$T_{\bf v}^{(M)} \colon \ell^{2}({\Large{\times}}_{i\in \hat{\mathcal{I}}\setminus M} {\mathcal{G}}_i) \to \ell^{2}({\Large{\times}}_{i\in M} {\mathcal{G}}_i)$$. In terms of the left singular vectors $$\{ {\bf U}^{(i)}_k \}_{k \in {\mathbb{N}}}$$ of $$T_{\bf v}^{(\{i\})}$$, $$i\in \hat{\mathcal{I}}$$, we obtain the HOSVD representation (Lathauwer et al., 2000) in the Tucker format (Tucker, 1964, 1966), \begin{equation}\label{tuckerformat}{\bf v} = \sum_{k_{\rm x}=1}^{r_{\rm x}} \sum_{k_1 = 1}^{r_1} \cdots \sum_{k_d=1}^{r_d} {\bf a}_{k_{\rm x},k_1,\ldots,k_d} \bigotimes_{i\in\hat{\mathcal{I}}} {\bf U}^{(i)}_{k_i}. \end{equation} (3.2) Here the tensor $${\bf a} = ({\bf a}_{\mathsf{k}})_{\mathsf{k}\in{\mathbb{N}}^{d+1}}$$ of order $$d+1$$ is referred to as core tensor and $$(r_{\rm x},r_1,\ldots,r_d)$$ as the multilinear ranks of $${\bf v}$$. The hierarchical tensor format (Hackbusch & Kühn, 2009), on which the variant of our scheme described in this section is based, can be interpreted as a further decomposition of $${\bf a}$$ into tensors of order at most three. This decomposition is obtained using further matricizations of the tensor according to a recursive decomposition of the set of modes $$\hat{\mathcal{I}}$$ into a binary tree, which we denote by $$\mathcal{D}$$. For simplicity, we focus in our exposition on linear trees corresponding to factorizations (1.15), where \begin{equation}\label{lindimtree}\mathcal{D} = \bigl\{ \hat{\mathcal{I}}, \{{\rm x}\}, \{1,\ldots,d\}, \{1\}, \{2,\ldots, d\}, \ldots, \{d-1\}, \{d\} \bigr\}. \end{equation} (3.3) For each $$\alpha \in \mathcal{D}$$, the rank of the corresponding matricization $$T^{(\alpha)}_{\bf v}$$ is denoted by $${\mathrm{rank}}_\alpha({\bf v})$$, its singular values by $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$. Here, $${\mathrm{rank}}_{\hat{\mathcal{I}}}({\bf v}) = 1$$ for all $${\bf v} \neq 0$$, and we set \begin{equation}{\mathrm{rank}}({\bf v}) := \bigl({\mathrm{rank}}_\alpha({\bf v})\bigr)_{\alpha\in\mathcal{D}\setminus\hat{\mathcal{I}}}.\label{multirank}\end{equation} (3.4) In this section, we specialize the generic template Algorithm 2.1 to produce approximate solutions to (2.1) of the form (STF) with core tensor in hierarchical form as in (1.15). More precisely, the output is given in the form of a tensor $${\bf u}_\varepsilon$$ of order $$d+1$$, supported on $${\it{\Lambda}}^\varepsilon_{\rm x}\times {\it{\Lambda}}^\varepsilon_1 \times\cdots\times{\it{\Lambda}}^\varepsilon_d$$ with finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$. This tensor is given (assuming $$\mathcal{D}$$ as in (3.3)) in the following form: let $$r^\varepsilon_i := {\mathrm{rank}}_{\{i\}}({\bf u}_\varepsilon)$$ for $$i \in \hat {\mathcal{I}}$$ and $$\tilde r^\varepsilon_i := {\mathrm{rank}}_{\{i+1,\ldots, d\}}({\bf u}_\varepsilon)$$ for $$i = 1,\ldots,d-1$$, then \begin{align*}{\bf u}_{\varepsilon, \lambda,\nu_1,\ldots,\nu_d} &= \sum_{k_{\rm x}=1}^{r^\varepsilon_{\rm x}} \cdots \sum_{k_d=1}^{r^\varepsilon_d} \sum_{\ell_1=1}^{\tilde r^\varepsilon_1}\cdots\sum_{\ell_{d-1}=1}^{\tilde r^\varepsilon_{d-1}} {\mathbf{M}}^{(1),\varepsilon}_{k_{\rm x},\ell_1} \,{\mathbf{M}}^{(2),\varepsilon}_{\ell_1,k_1,\ell_2} \cdots {\mathbf{M}}^{(d-1),\varepsilon}_{\ell_{d-2},k_{d-2},\ell_{d-1}} \\ &\quad\times {\mathbf{M}}^{(d),\varepsilon}_{\ell_{d-1},k_{d-1},k_d} \, {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \, {\bf U}^{(1),\varepsilon}_{k_1, \nu_1}\cdots {\bf U}^{(d),\varepsilon}_{k_d, \nu_d}, \quad \lambda\in {\it{\Lambda}}^\varepsilon_{\rm x},\; \nu_i \in {\it{\Lambda}}^\varepsilon_i. \end{align*} The adaptive scheme identifies, in an intertwined fashion, the ranks $$r^\varepsilon_i$$, $$\tilde r^\varepsilon_i$$, the sets $${\it{\Lambda}}^\varepsilon_i$$, as well as the coefficient tensors $${\mathbf{M}}^{(i),\varepsilon}$$ and $${\bf U}^{(i),\varepsilon}$$ of $${\bf u}_\varepsilon$$. The function represented by $${\bf u}_\varepsilon$$ has precisely the form (STF), where \[u^{{\rm x}}_{k_{\rm x}} = \sum_{\lambda \in {\mathcal{S}}} {\bf U}^{({\rm x}),\varepsilon}_{k_{\rm x}, \lambda} \psi_\lambda, \quad u^{{\rm y},i}_{k_i} = \sum_{n \in {\mathcal{G}}_i} {\bf U}^{(i),\varepsilon}_{k_i, n} L_n . \] The hierarchical format can offer substantially more favourable complexity characteristics for large $$d$$ than (3.2). The left singular vectors of the involved matricizations yield a hierarchical singular value decomposition (Grasedyck, 2010). We refer also to Falcó & Hackbusch (2012), Grasedyck et al. (2013), Hackbusch & Kühn (2009), Hackbusch (2012) and Kolda & Bader (2009) for detailed expositions regarding the finitely supported case (see also Oseledets & Tyrtyshnikov, 2009; Oseledets, 2011 for the related tensor train representation) and to Bachmayr & Dahmen (2015) for analogous results of tensors in sequence spaces, with notation analogous to this article. For quantifying the approximability of tensors on $${\Large{\times}}_{i \in \hat{\mathcal{I}}}{\mathcal{G}}_i$$ in terms of the best selection of finite $${\it{\Lambda}}^\varepsilon_i \subset {\mathcal{G}}_i$$ as above, a pivotal role is played by the quantitites \begin{equation}\label{contr}\pi^{(i)}({\bf v}) \,{:=}\, \bigl(\pi^{(i)}_{\nu_i}({\bf v})\bigr)_{\nu_i\in {\mathcal{G}}_i},\quad \pi^{(i)}_{\mu}({\bf v}) \,{:=}\, \left(\sum_{\substack{\nu\in{\mathcal{F}}\\ \nu_i=\mu}}|{\bf v}_\nu|^2\right)^{1/2} , \quad i \in \hat{\mathcal{I}} = \{{\rm x}, 1, \ldots, d\}. \end{equation} (3.5) They are introduced in Bachmayr & Dahmen (2015) and called contractions in analogy to the terminology in tensor analysis. An efficient evaluation (without any $$d$$-dimensional summations) is possible due to the relation \begin{equation}\label{sumN}\pi^{(i)}_{\mu}({\bf v})= \left(\sum_{k}|{\bf U}^{(i)}_{k,\mu}|^2|\sigma^{(i)}_k|^2\right)^{1/2}, \end{equation} (3.6) where $$\sigma^{(i)}_k$$ are the mode-$$i$$ singular values of $${\bf v}$$. As in our previous notation, we abbreviate $${\mathrm{supp}}_i {\bf v}:= {\mathrm{supp}}\bigl(\pi^{(i)}({\bf v})\bigr)$$, $$i \in \hat{\mathcal{I}}$$. 3.1 Adaptive scheme In this case, we consider Algorithm 2.1 with the routines $${\small{\text{RECOMPRESS}}}$$ and $${\small{\text{COARSEN}}}$$ for the hierarchical format as given in Bachmayr & Dahmen (2015, Remark 15). $${\small{\text{RECOMPRESS}}}$$ is based on a truncation of a hierarchical singular value decomposition up to a prescribed accuracy $$\eta>0$$, which can be ensured based on the $$\ell^2$$-norm of omitted singular values of matricizations. We denote this operation by $$\hat{\mathrm{P}}_{\eta}$$. As shown in Grasedyck (2010), it satisfies the quasi-optimality property \begin{equation}\label{recompress quasiopt}{\lVert{ {\bf v} - \hat{\mathrm{P}}_{\eta}({\bf v})}\rVert} \leq \sqrt{2d-3} \inf \bigl\{ {\lVert{{\bf v} - {\bf w}}\rVert} \colon {{\mathrm{rank}}({\bf w})\leq {\mathrm{rank}}(\hat{\mathrm{P}}_{\eta}({\bf v}))} \bigr\}, \end{equation} (3.7) with the inequality between ranks as defined in (3.4) to be understood componentwise. $${\small{\text{COARSEN}}}$$ retains the degrees of freedom for each mode that correspond to the largest contractions (3.5). Let $$(\mu^*_{i,k})_{k \in {\mathbb{N}}}$$ be such that $$(\pi^{(i)}_{\mu^*_{i,k}}({\bf v}))_{k\in{\mathbb{N}}}$$ is nonincreasing. Denote for $${\it{\Lambda}} \subset {\mathcal{S}}\times{\mathcal{F}}$$ by $${\mathrm{R}_{{\it{\Lambda}}}}{\bf v}$$ the array obtained by retaining all entries of $${\bf v}$$ corresponding to indices in $${\it{\Lambda}}$$, while replacing all others by zero. Given $$\eta >0$$, we define the product set \[{\it{\Lambda}}(\eta)= {\Large{\times}}_{i \in \hat{\mathcal{I}}}\{ \mu^*_{i,k} \colon k \leq N_i\}, \] where $$N_i$$, $$i\in \hat{\mathcal{I}}$$, are chosen to such that $$\sum_{i \in \hat{\mathcal{I}}} N_i$$ is minimal subject to the condition \begin{equation} \label{hatC} \left( \sum_{i \in \hat{\mathcal{I}}} \sum_{k > N_i\vphantom{\hat{\mathcal{I}}}}|\pi^{(i)}_{\mu^*_{i,k}}({\bf v})|^2 \right)^{1/2}\leq \eta. \end{equation} (3.8) Noting that the left side in (3.8) is an upper bound for $${\lVert{{\bf v} - {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}}\rVert}$$, we define $${\small{\text{COARSEN}}}$$ as a numerical realization of $$\hat{\mathrm{C}}_{\eta}{\bf v} := {\mathrm{R}_{{\it{\Lambda}}(\eta)}}{\bf v}$$, for which one has an analogous quasi-optimality property as in (3.7) with constant $$\sqrt{d}$$. Furthermore, $${\mathbf{A}}$$ as defined in (2.6) is in this case of finitely many parameters a finite sum of Kronecker product operators, \[{\mathbf{A}} = \sum_{j=0}^d {\mathbf{A}}_j \otimes {\mathbf{M}}_j, \] which considerably simplifies the construction of the corresponding routine $${\small{\text{APPLY}}}$$. The action of $${\mathbf{A}}$$ can thus increase each hierarchical rank of its argument at most by a factor of $$d+1$$. Remark 3.1 In contrast to the case considered in Bachmayr & Dahmen (2016), here the Hilbert space $${\mathcal{H}} =V \otimes L^{2}({Y})$$ on which the problem is posed is endowed with a cross norm. As a consequence, the isomorphism that takes $$v\in {\mathcal{H}}$$ to its coefficients $${\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ with respect to the tensor product basis is of Kronecker rank one. The original low-rank structure (1.1) of $$A(y)$$ is therefore preserved in the $$\ell^{2}$$-representation (2.6) of the problem. Consequently, the routine $${\small{\text{APPLY}}}$$ that adaptively approximates the action of $${\mathbf{A}}$$ can be obtained following the generic construction given in Bachmayr & Dahmen (2015), provided that the operators $${\mathbf{A}}_j$$ and $${\mathbf{M}}_j$$ acting on each mode have the required compressibility properties. Recall that by (2.13), the infinite matrices $${\mathbf{M}}_j$$ are bidiagonal and, hence, do not require any further approximation. To use the construction of Bachmayr & Dahmen (2015), we thus only need that the operators $${\mathbf{A}}_0, \ldots, {\mathbf{A}}_d$$ acting on the spatial variables are $$s^*$$-compressible. 3.2 Convergence analysis Our complexity results aim at the following type of statements: given a certain approximability of the solution, the algorithm recovers the corresponding convergence rates without their explicit knowledge. To describe these approximability properties, we now recall the definition of approximation classes to quantify the convergence of hierarchical low-rank approximations from Bachmayr & Dahmen (2015), in terms of the hierarchical rank defined by (3.4). Let $$\gamma = \bigl(\gamma(n)\bigr)_{n\in {\mathbb{N}}_0}$$ be positive and strictly increasing with $$\gamma(0)=1$$ and $$\gamma(n)\to\infty$$ as $$n\to\infty$$, for $${\bf v} \in \ell^{2}({\mathcal{S}}\times{\mathcal{F}})$$ let \[{\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, \sup_{r\in{\mathbb{N}}_0} \gamma({r})\,\inf_{{\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r} {\lVert{{\bf v} - {\bf w}}\rVert} , \] where $$\inf \{ {\lVert{{\bf v} - {\bf w}}\rVert}\colon {\lvert{{\mathrm{rank}}({\bf w})}\rvert}_\infty\leq r\}$$ – that is, the errors of approximation by hierarchical ranks at most $$r$$ – can be bounded in terms of the hierarchical singular values $$(\sigma^{(\alpha)}_k)_{k\in{\mathbb{N}}}$$ for $$\alpha\in{\cal D}$$. We introduce the approximation classes \[{\mathcal{A}_\mathcal{H}({\gamma})} \,{:=}\, \bigl\{{\bf v}\in \ell^{2}({\mathcal{S}}\times{\mathcal{F}}) : {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} <\infty \bigr\} , \quad{\lVert{{\bf v}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}} \,{:=}\, {\lVert{{\bf v}}\rVert} + {\lvert{{\bf v}}\rvert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}. \] We restrict our considerations to $$\gamma$$ that satisfy \begin{equation*}\rho_\gamma \,{:=}\, \sup_{n\in{\mathbb{N}}} {\gamma}(n)/{\gamma}(n-1)<\infty, \end{equation*} which corresponds to a restriction to at most exponential growth; our considerations would still apply for faster growth, but lead to less sharp results. For an approximation $${\bf v}$$ of bounded support to $${\bf u}$$, the number of nonzero coefficients $$\#{\mathrm{supp}}_i {\bf v}$$ required in each tensor mode to achieve a certain accuracy depends on the best $$n$$-term approximability of the sequences $$\pi^{(i)}({\bf u})$$. This approximability by sparse sequences is quantified by the classical approximation classes$${\mathcal{A}^s} = {\mathcal{A}^s}(\mathcal{J})$$, where $$s>0$$ and $$\mathcal{J}$$ is a countable index set, comprised of all $${\bf w}\in \ell_2(\mathcal{J})$$ for which the quasi-norm \begin{equation}\label{Asdef}{\lVert{\mathbf{{\bf w}}}\rVert}_{{\mathcal{A}^s}(\mathcal{J})} := \sup_{N\in{\mathbb{N}}_0} (N+1)^s \inf_{\substack{{\it{\Lambda}}\subset \mathcal{J}\\ \#{\it{\Lambda}}\leq N}} {\lVert{{\bf w} - {\mathrm{R}_{{\it{\Lambda}}}} {\bf w}}\rVert}\end{equation} (3.9) is finite. In particular, if $$\pi^{(i)}({\bf u}) \in {\mathcal{A}^s}(\mathcal{G}_i)$$ then these sequences can be approximated within accuracy $$\eta$$ by finitely supported sequences with $${\cal \rm O}(\eta^{-1/s})$$ nonzero entries. In what follows, we do not explicitly specify the index set in the spaces $${\mathcal{A}^s}$$ when this is clear from the context. We analyse the complexity of the algorithm under the following benchmark assumptions, see the discussion in Section 6.1. Assumption 3.2 For the hierarchical tensor approximation in the case (3.1) of $$d$$ parametric variables, we assume the following: (i) $$\pi^{(i)}({\bf u}), \pi^{(i)}({\mathbf{f}}) \in {\mathcal{A}}^{s}({\mathcal{G}}_i)$$, $$i\in\hat{\mathcal{I}}$$, for an $$s>0$$. (ii) $${\bf u}, {\mathbf{f}} \in {\mathcal{A}}_{\mathcal{H}}(\gamma)$$, where $$\gamma(n) \,{:=}\, {\it \,{e}}^{\bar c n^{1/\bar b}}$$ with $$\bar b\geq 1$$, $$\bar c>0$$. (iii) The $${\mathbf{A}}_j$$, $$j \in \hat{\mathcal{I}}$$, are $$s^*$$-compressible for an $$s^*>s$$, and hence there exist matrices $${\mathbf{A}}_{j,n}$$ with $$\alpha_{j,n} 2^n$$ entries per row and column and such that $${\lVert{{\mathbf{A}}_j - {\mathbf{A}}_{j,n}}\rVert} \leq \beta_{j,n} 2^{-s n}$$, and where the sequences $${\boldsymbol{\alpha}}_j = (\alpha_{j,n})_{n\in{\mathbb{N}}_0}$$ and $${\boldsymbol{\beta}}_j = (\beta_{j,n})_{n\in{\mathbb{N}}_0}$$ are summable. We will use the above assumptions as a reference point for the scaling with respect to $$\varepsilon$$ of the computational complexity. Note that Assumption 3.2(i),(ii) mean in particular that best $$N$$-term approximations of $$\pi^{(i)}({\bf u})$$ converge at least as $${\mathcal{O}} (N^{-s})$$ for $$i\in\hat{\mathcal{I}}$$ and low-rank approximations $${\bf u}_n$$ of $${\bf u}$$ with $${\lvert{{\mathrm{rank}}({\bf u}_n)}\rvert}_\infty\,{\leq}\,n$$ converge as $${\mathcal{O}}(\!{\it {e}}^{-\bar c n^{1/\bar b}})$$. Assumption 3.2(iii) needs to be established for each given problem. Under these conditions, it is possible to realize a routine $${\small{\text{RHS}}}$$ that satisfies the following conditions: for sufficiently small $$\eta>0$$ and $${\mathbf{f}}_\eta:={\small{\text{RHS}}}(\eta)$$, \begin{equation}\label{rhsass1}\begin{gathered}\sum_{i\in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i ({\mathbf{f}}_\eta) \lesssim d \eta^{-\frac {1}{s}} \left( \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s} \right)^{\frac {1}{s}}, \quad \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}}_\eta)}\rVert}_{{\cal A}^s} \lesssim d^{1+\max\{1,s\}} \sum_{i\in \hat{\mathcal{I}}} {\lVert{\pi^{(i)}({\mathbf{f}})}\rVert}_{{\cal A}^s}, \\ {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \lesssim \bigl(d^{-1} \ln ({\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}}_{\mathcal{H}}(\gamma)}/\eta) \bigr)^{\bar{b}}, \end{gathered}\end{equation} (3.10) with hidden constants that do not depend on $$d$$. Such approximations can be obtained, for instance, by combining $${\small{\text{COARSEN}}}$$ and $${\small{\text{RECOMPRESS}}}$$ with parameters adjusted as in Bachmayr & Dahmen (2015, Theorem 7). Assuming full knowledge of $${\mathbf{f}}$$, $${\small{\text{RHS}}}$$ can be realized such that the required number of operations is bounded, with $$C>0$$ independent of $$d$$, by \begin{equation}\label{rhsass2}C\left( d {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty^3 + {\lvert{{\mathrm{rank}}({\mathbf{f}}_\eta)}\rvert}_\infty \sum_{i \in \hat{\mathcal{I}}} \#{\mathrm{supp}}_i({\mathbf{f}}_\eta) \right)\!, \end{equation} (3.11) and we shall make this idealized assumption in the subsequent analysis. Note that these requirements greatly simplify when the corresponding right-hand side $$f$$ is independent of the parametric variable. To compare different parametric dimensionalities $$d$$ in the complexity bounds, we additionally need a specific reference family of $$d$$-dependent problems. We introduce the following model assumptions, which we shall also consider in more detail for a concrete class of problems in Section 6. Assumption 3.3 For the quantities in Assumption 3.2, in addition, let the following hold: (i) There exist $$C_1, c_1>0$$ independent of $$d$$ such that \begin{align*}&\max\bigl\{{\lVert{\pi^{({\rm x})}({\bf u})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\bf u})}\rVert}_{{\mathcal{A}}^s}, \ldots,{\lVert{\pi^{(x)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s}, {\lVert{\pi^{(1)}({\mathbf{f}})}\rVert}_{{\mathcal{A}}^s},\ldots,\\ &\quad\bar c^{-1} ,\, {\lVert{{\bf u}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}, {\lVert{{\mathbf{f}}}\rVert}_{{\mathcal{A}_\mathcal{H}({\gamma})}}\bigr\}\leq C_1 d^{c_1} . \end{align*} (ii) There exists $$C_2>0$$ independent of $$d$$ such that \[