# A convergent time–space adaptive dG(s) finite element method for parabolic problems motivated by equal error distribution

A convergent time–space adaptive dG(s) finite element method for parabolic problems motivated... Abstract We shall develop a fully discrete space-time adaptive method for linear parabolic problems based on new, reliable and efficient a posteriori analysis for higher order dG(s) finite element discretisations. The adaptive strategy is motivated by the principle of equally distributing the a posteriori indicators in time and the convergence of the method is guaranteed by the uniform energy estimate from Kreuzer et al. (2012, Design and convergence analysis for an adaptive discretization of the heat equation. IMA J. Numer. Anal., 32, 1375–1403) under minimal assumptions on the regularity of the data. 1. Introduction Let Ω be a bounded polyhedral domain in $${\mathbb{R}}^{d}$$, $$d\in{\mathbb{N}}$$. We consider the linear parabolic partial differential equation   \begin{align} \begin{aligned} {\partial_{t}} u + \mathcal{L} u &= f \qquad &&\textrm{in } \varOmega \times (0,T), \\ u &= 0 &&\textrm{on } \partial\varOmega\times (0,T),\\ u(\cdot,0) &= u_{0} &&\textrm{in } \varOmega. \end{aligned} \end{align} (1.1)Hereafter, $$\mathcal{L} u=-\textrm{div}\mathbf{A}\nabla u + cu$$ is a second-order elliptic operator with respect to space and $${\partial _{t}} u=\frac{\partial u}{\partial t}$$ denotes the partial derivative with respect to time. In the simplest setting $$\mathcal{L}=-\varDelta$$, whence (1.1) is the heat equation. Precise assumptions on data are provided in Section 2.1. The objective of this paper is the design and a detailed convergence analysis of an efficient adaptive finite element method for solving (1.1) numerically. To this end we resort to adaptive finite elements in space combined with a discontinuous Galerkin dG(s) time-stepping scheme in Section 2.2, compared with Thomée (2006). The conforming finite element spaces are continuous piecewise polynomials of fixed degree over a simplicial triangulation of the domain Ω. In each single we reduce or enlarge the local time-step size and refine and coarsen the underlying triangulation. The adaptive decisions are based on a posteriori error indicators. Numerous such estimators for various error notions are available in the literature. In fact, Eriksson & Johnson (1991, 1995) have analysed adaptive dG(s) methods for parabolic problems in their studies providing a priori and a posteriori bounds for the $$L^{\infty }\left (L^{2}\right )$$ error using duality techniques. Lakkis & Makridakis (2006) proved a posteriori bounds based on the elliptic reconstruction technique, which was introduced in the study by Makridakis & Nochetto (2003) in the semidiscrete context. The $$L^{2}\left (H^{1}\right )$$, respectively $$L^{2}\left (H^{1}\right )\cap H^{1}\left (H^{-1}\right )$$, error bounds in the studies by Picasso (1998) and Verfürth (2003) are based on energy techniques and have been used with a dG(0) time-stepping scheme in the adaptive methods and convergence analysis presented in the studies by Chen & Feng (2004) and Kreuzer et al. (2012). For our purpose we generalise the residual-based estimator (Verfürth, 2003) to higher-order dG(s) schemes in Section 3. The estimator is built from five indicators: an indicator for the initial error, indicators for the temporal and spatial errors, a coarsening error indicator and an indicator controlling the so-called consistency error. It is important to notice that besides the first indicator all other indicators accumulate in L2 in time. The adaptation of the time-step size uses information of the indicators for the temporal error and the consistency error. The adaptation of the spatial triangulation is based on refinement by bisection using information from the indicators for the spatial error and for the coarsening error. Very recently, an independently developed guaranteed a posteriori estimator for higher-order dG(s) schemes was provided in the study by Ern et al. (2017) using equilibrated flux-based bounds for the spatial error. By now the convergence and optimality of adaptive methods for stationary inf–sup stable, respectively coercive, problems are well established (Dörfler, 1996; Morin et al., 2000, 2002; Binev et al., 2004; Chen & Feng, 2004; Mekchay & Nochetto, 2005; Stevenson, 2007; Cascon et al., 2008; Diening & Kreuzer, 2008; Morin et al., 2008; Kreuzer & Siebert, 2011; Siebert, 2011; Diening et al., 2016); compare also with the overview article by Nochetto et al. (2009). The essential design principle motivating the adaptive strategies in most of the above methods is the equal distribution of the error. The importance of this principle is highlighted by the near characterisations of nonlinear approximation classes with the help of a thresholding algorithm in the studies by Binev et al. (2002) and Gaspoz & Morin (2014). In contrast to the situation for above-mentioned problems the convergence analysis of adaptive approximation of time-dependent problems is still in its infancy. In the study by Schwab & Stevenson (2009) optimal computational complexity of an adaptive wavelet method for parabolic problems is proved using a symmetric and coercive discretisation based on a least squares formulation. To the best of our knowledge there exist only two results (Chen & Feng, 2004; Kreuzer et al., 2012) concerned with a rigorous convergence analysis of time-space adaptive finite element methods. In the study by Chen & Feng (2004) it is proved for the dG(0) time-stepping scheme that each single terminates and that the error of the computed approximation is below a prescribed tolerance when the final time is reached. This, however, is not guaranteed and thus theoretically the adaptively generated sequence of time instances {tn}n≥0 may not be finite and such that tn → t⋆ < T as $$n\to \infty$$. This drawback has been overcome in the study by Kreuzer et al. (2012) with the help of an a priori computed minimal time-step size in terms of the right-hand side f and the discrete initial value U0. However, the design of neither of the two methods heeds the principle of equally distributing the error. Let us shed some light on this fact with the help of the initial value problem   $${\partial_{t}} u + u = f\quad\textrm{in}\ (0,T) \qquad\textrm{and}\qquad u(0)=u_{0}.$$Let 0 = t0 < t1 < ⋯ < tN = T be some partition of (0, T). Using the dG(0) time-stepping scheme we obtain $$\{U_{n}\}_{n=0}^{N}$$ such that   $$\frac{U_{n}-U_{n-1}}{\tau_{n}}+U_{n}=f_{n}:=\frac 1{\tau_{n}}\int_{t_{n-1}}^{t_{n}}f\,\textrm{d}t,\quad n=1,\ldots, N,\ U_{0}=u_{0},$$where τn = tn − tn−1. Let $${\mathcal{U}}$$ be the piecewise affine interpolation $${\mathcal{U}}$$ of the nodal values $$\{U_{n}\}_{n=0}^{N}$$. Then we have with Young’s inequality that   \begin{align*} {\int_{0}^{T}}\frac12 {\partial_{t}}|u-{\mathcal{U}}|^{2}+|u-{\mathcal{U}}|^{2}\,\textrm{d}t&= \sum_{n=1}^{N}\int_{t_{n-1}}^{t_{n}} (f-f_{n})(u-{\mathcal{U}})+(U_{n}-{\mathcal{U}})( u-{\mathcal{U}})\,\textrm{d}t \\ &\leqslant \sum_{n=1}^{N}\int_{t_{n-1}}^{t_{n}} |\,f-f_{n}|^{2} + |U_{n}-{\mathcal{U}}|^{2}+ \frac12 |u-{\mathcal{U}}|^{2}\,\textrm{d}t. \end{align*} A simple computation reveals $$\int _{t_{n-1}}^{t_{n}}|U_{n}-{\mathcal{U}}|^{2}\,\textrm{d}t= \frac 13 \tau _{n} |U_{n}-U_{n-1}|^{2}$$. This term and $$\int _{t_{n-1}}^{t_{n}}|\,f-f_{n}|^{2}\,\textrm{d}t$$ are the so-called time and consistency a posteriori indicators. In order to illustrate the basic differences in the design of the adaptive schemes we shall concentrate on the time indicator. In the studies by Chen & Feng (2004) and Kreuzer et al. (2012) the partition is constructed such that   $$|U_{n}-U_{n-1}|^{2} \le \frac{{\texttt{TOL}}^{2}}{T},\quad\textrm{which implies} \quad \sum_{n=1}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}\leqslant \sum_{n=1}^{N} \tau_{n} \frac{{\texttt{TOL}}^{2}}{T}={\texttt{TOL}}^{2},$$i.e., the accumulated indicator is below the prescribed tolerance $${\texttt{TOL}}$$. We call this the $$L^{\infty }$$-strategy and remark that it does not aim to equally distribute the local indicators. In contrast to this we shall use the L2-strategy   $$\tau_{n}|U_{n}-U_{n-1}|^{2}\le{\texttt{tol}}^{2}.$$Thanks to the uniform energy bound   \begin{align} \sum_{n=1}^{N}|U_{n}-U_{n-1}|^{2}\leqslant{\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2} \end{align} (1.2)(see Corollary 3.11 below) we conclude then that   \begin{align*} \sum_{n=1}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}=&\,\sum_{\tau_{n}\leqslant\delta}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}+\sum_{\tau_{n} \gt \delta}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2} \\ \leqslant&\, \delta\, \left({\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right)+\frac{T}{\delta} {\texttt{tol}}^{2} =T^{1/2}\left({\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right)^{\frac12}\,{\texttt{tol}}, \end{align*}where $$\delta ={\texttt{tol}}\,T^{1/2}/\left ({\int _{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right )^{1/2}$$. Taking $${\texttt{tol}}={\texttt{TOL}}^{2}/\delta$$ guarantees that the error is below the prescribed tolerance $${\texttt{TOL}}$$. These arguments directly generalise to semidiscretisations of (1.1) in time. In the case of a full space-time discretisation of (1.1) additional indicators regarding the space discretisation are involved. We shall roughly explain the principle adaptive strategies of algorithm TAFEM for (1.1). In a preprocessing step an L2-tolerance for handling the consistency error is computed. Then in each we first control the consistency error with an L2-strategy since this does not require solving expensive linear systems. Second, we control the time indicator using an L2-strategy and (1.2). Unfortunately, for the spatial indicator a control similar to (1.2) is not available; therefore, we enforce in a third step that the spatial indicators are bounded by the time or the consistency indicator. In the case where these indicators are equally distributed in time and no massive undershooting appears, this likely results also in an equal distribution of the spatial indicators. In order to handle the other case we include the $$L^{\infty }$$-strategy from the studies by Chen & Feng (2004) and Kreuzer et al. (2012) for the spatial indicators as a backup strategy. The detailed algorithm TAFEM is presented in Section 4 and its convergence analysis is given in Section 5. The advantage of our new approach over the algorithms in Chen & Feng (2004) and Kreuzer et al. (2012) is twofold. First, from the fact that TAFEM aims for an equal distribution of the error, we expect an improved performance. Second, we use an L2-strategy for the consistency error, which requires only L2-regularity of f in time instead of the H1-regularity needed for the $$L^{\infty }$$-strategy in the studies by Chen & Feng (2004) and Kreuzer et al. (2012). This makes the proposed method suitable for problems where the existing approaches may even fail completely. We conclude the paper in Section 6 with comments on the implementation in DUNE (Blatt et al., 2016) and some numerical experiments. The experiments confirm the expectations and show that the performance of our algorithm TAFEM is more than competitive. 2. The continuous and discrete problems In this section we state the weak formulation of the continuous problem together with the assumptions on data. Then the discretisation by adaptive finite elements in space combined with the dG(s) scheme in time is introduced. 2.1 The weak formulation For $$d\in{\mathbb{N}}$$, let $$\varOmega \subset{\mathbb{R}}^{d}$$ be a bounded, polyhedral domain that is meshed by some conforming simplicial mesh $${\mathcal{G}_{\textrm{init}}}$$. We denote by H1(Ω) the Sobolev space of square integrable functions L2(Ω) whose first derivatives are in L2(Ω) and we let $${\mathbb{V}}:={H_{0}^{1}}(\varOmega )$$ be the space of functions in H1(Ω) with vanishing trace on ∂Ω. For any measurable set ω and $$k\in{\mathbb{N}}$$ we denote by $$\Vert{\cdot }\Vert _\omega$$ the $$L^{2}(\omega ;{\mathbb{R}}^{k})$$-norm, whence $$\Vert \upsilon \Vert ^2_{H^{1}(\varOmega )}=\Vert{\upsilon }\Vert ^2_\varOmega +\Vert \nabla \upsilon \Vert ^{2}_\varOmega .$$ We suppose that the data of (1.1) have the following properties: $${\mathbf{A}}\!:\!\varOmega \!\rightarrow\! {\mathbb{R}}^{d\times d}$$ is piecewise Lipschitz over $${\mathcal{G}_{\textrm{init}}}$$ and is symmetric positive definite with eigenvalues $$0< a_{*}\leqslant a^{\ast }<\infty$$, i.e.,   \begin{align} a_{*}|\boldsymbol{\xi}|^{2}\leqslant{\mathbf{A}}(x)\boldsymbol{\xi}\cdot\boldsymbol{\xi}\leqslant a^{\ast}|\boldsymbol{\xi}|^{2} \qquad \textrm{for all } \boldsymbol{\xi}\in{\mathbb{R}}^{d}\!,\; x\in\varOmega; \end{align} (2.1)$$c\in L^{\infty }(\varOmega )$$ is non-negative, i.e., c ⩾ 0 in Ω; $$f\in L^{2}\left ((0,T);L^{2}(\varOmega )\right )= L^{2}(\varOmega \times (0,T))$$ and u0 ∈ L2(Ω). We next turn to the weak formulation of (1.1); compare with Evans (2010, Chap. 7). We let $$\mathcal{B}\!:{\mathbb{\!V}}\times{\mathbb{V}}\!\rightarrow\! {\mathbb{R}}$$ be the symmetric bilinear form associated to the weak form of the elliptic operator $$\mathcal{L}$$, i.e.,   $${\mathcal{B}[w,\,v]} \mathrel{:=} \int_{\varOmega} {\mathbf{A}}\nabla v \cdot \nabla w +c\,vw \,\textrm{d}V \qquad\textrm{for all } v,\,w\in{\mathbb{V}}.$$Recalling the Poincaré–Friedrichs inequality $$\Vert{v}\Vert _{\varOmega }\leqslant{C_{F}}\Vert \nabla{v}\Vert _\varOmega$$ for all $$v\in{\mathbb{V}}$$ with CF = CF(d, Ω) (Gilbarg & Trudinger, 2001, p. 158) we deduce from (2.1) that $$\mathcal{B}$$ is a scalar product on $${\mathbb{V}}$$ with induced norm   $$|||{v}|||^{2}:={\mathcal{B}[v,\,v]}= \int_{\varOmega} {\mathbf{A}} \nabla v\cdot\nabla v + cv^{2}\,\textrm{d}V \qquad \textrm{for all } v\in{H_{0}^{1}}(\varOmega).$$This energy norm is equivalent to the H1-norm $$\Vert \cdot \Vert _{H^{1}(\varOmega )}$$ and we shall use the energy norm in the subsequent analysis. We denote the restriction of the energy norm to some subset ω ⊂ Ω by $${|||}\cdot{|||}_\omega$$ and let $${\mathbb{V}}^{\ast }\mathrel{:=} H^{-1}(\varOmega )$$ be the dual space of $${H^{1}_{0}}(\varOmega )$$ equipped with the operator norm $${|||}{g}{|||}_{\ast }\mathrel{:=}\sup _{v\in{\mathbb{V}}} \frac{{\langle g,\,v\rangle }}{{{|||}{v}{|||}\varOmega }},$$ where ⟨⋅ , ⋅⟩ denotes the usual duality bracket in $$H^{-1}(\varOmega )\times{H^{1}_{0}}(\varOmega )$$. The weak solution space   $${\mathbb{W}}(0,T)\mathrel{:=} \left\lbrace u\in L^{2}(0,T; {\mathbb{V}}) \mid{\partial_{t}} u\in L^{2}(0,T;{\mathbb{V}}^{\ast}) \right\rbrace$$is a Banach space endowed with the norm   $$\Vert{v}\Vert^2_{\mathbb{W}(0,T)}=\int^T_{0}{|||}{\partial_{t}v}\!{|||}^2_\ast+ {|||} \!v \!{|||}^2_\varOmega \textrm{d}t+\Vert{v}(T)\Vert^2_\varOmega, \quad v\in{\mathbb{W}}(0,T).$$Moreover, it is continuously embedded into $$C^{0}\left ([0,T];L^{2}(\varOmega )\right )$$; see, e.g., Evans (2010, Chap. 5). After this preparation we are in a position to state the weak formulation of (1.1): a function $$u\in \mathbb{W}(0,T)$$ is a weak solution to (1.1) if it satisfies   $${\left\langle{\partial_{t}} u(t),\,v\right\rangle} + {\mathcal{B}\left[u(t),\, v\right]} = \langle\;{f(t),}\ {v}\rangle_\varOmega \quad\textrm{for all } v\in{\mathbb{V}},\; \text{a.e. } t \in(0,T),$$ (2.2a)  \begin{align} u(0) = u_{0}.{\hskip69pt} \end{align} (2.2b)Hereafter, ⟨⋅, ⋅⟩Ω denotes the L2(Ω) scalar product. Since the operator $$\mathcal{L}$$ is elliptic, problem (2.2) admits for any $$f\in L^{2}\left (0,T;L^{2}(\varOmega )\right )$$ and u0 ∈ L2(Ω) a unique weak solution; compare, e.g., with Evans (2010, Chap. 7). 2.2 The discrete problem For the discretisation of (2.2) we use adaptive finite elements in space and a dG(s) scheme with adaptive time-step size control. Adaptive grids and time steps. For the adaptive space discretisation we restrict ourselves to simplicial grids and local refinement by bisection; compare with the studies by Bänsch (1991); Kossaczký (1994); Maubach (1995); Traxler (1997), as well as Schmidt & Siebert (2005) and Nochetto et al. (2008) and the references therein. To be more precise, refinement is based on the initial conforming triangulation $${\mathcal{G}_{\textrm{init}}}$$ of Ω and a procedure $$\texttt{REFINE}$$ with the following properties: given a conforming triangulation $$\mathcal{G}$$ and a subset $$\mathcal{M}\subset \mathcal{G}$$ of marked elements, then   $$\texttt{REFINE}(\mathcal{G},\mathcal{M})$$outputs a conforming refinement $$\mathcal{G}_{+}$$ of $$\mathcal{G}$$ such that all elements in $$\mathcal{M}$$ are bisected at least once. In general, additional elements are refined in order to ensure conformity. The input $$\mathcal{G}$$ can be either $${\mathcal{G}_{\textrm{init}}}$$ or the output of a previous application of $$\texttt{REFINE}$$. We denote by $${\mathbb{G}}$$ the class of all conforming triangulations that can be produced from $${\mathcal{G}_{\textrm{init}}}$$ by finitely many applications of $$\texttt{REFINE}$$. For $$\mathcal{G}\in{\mathbb{G}}$$ we call $$\mathcal{G}_{+}\in{\mathbb{G}}$$ a refinement of $$\mathcal{G}$$ if $$\mathcal{G}_{+}$$ is produced from $$\mathcal{G}$$ by a finite number of applications of $$\texttt{REFINE}$$ and we denote this by $$\mathcal{G}\leqslant \mathcal{G}_{+}$$ or $$\mathcal{G}_{+}\geqslant \mathcal{G}$$. Conversely, we call any $$\mathcal{G}_{-}\in{\mathbb{G}}$$ with $$\mathcal{G}_{-}\leqslant \mathcal{G}$$ a coarsening of $$\mathcal{G}$$. Throughout the discussion we deal only with conforming grids; this means that whenever we refer to some triangulations $$\mathcal{G}$$, $$\mathcal{G}_{+}$$ and $$\mathcal{G}_{-}$$ we tacitly assume $$\mathcal{G},\mathcal{G}_{+},\mathcal{G}_{-}\in{\mathbb{G}}$$. One key property of the refinement by bisection is uniform shape regularity for any $$\mathcal{G}\in{\mathbb{G}}$$. This means that all constants depending on the shape regularity are uniformly bounded depending on $${\mathcal{G}_{\textrm{init}}}$$. For the discretisation in time we let $$0=t_{0}<t_{1}<\dots <t_{N}=T$$ be a partition of (0, T) into half-open subintervals In = (tn−1, tn] with corresponding local time-step sizes $$\tau _{n}:={\left |I_{n}\right |} = t_{n}-t_{n-1}$$, $$n=1,\dots ,N$$. Space-time discretisation. For the spatial discretisation we use Lagrange finite elements. For any $$\mathcal{G}\in{\mathbb{G}}$$ the finite element space $${\mathbb{V}}(\mathcal{G})$$ consists of all continuous, piecewise polynomials of fixed degree ℓ ⩾ 1 over $$\mathcal{G}$$ that vanish on ∂Ω. This gives a conforming discretisation of $${\mathbb{V}}$$, i.e., $${\mathbb{V}}(\mathcal{G})\subset{\mathbb{V}}$$. Moreover, Lagrange finite elements give nested spaces, i.e., $${\mathbb{V}}(\mathcal{G})\subset{\mathbb{V}}(\mathcal{G}_{+})$$ whenever $$\mathcal{G}\leqslant \mathcal{G}_{+}$$. We denote by $$\mathcal{G}_0$$ the triangulation at t0 = 0 and for n ⩾ 1, we denote by $$\mathcal{G}_n$$ the grid in In and let $${\mathbb{V}}_{n}={\mathbb{V}}(\mathcal{G}_n)$$, n = 0, … , N be the corresponding finite element spaces. For $$\mathcal{G}\in{\mathbb{G}}$$ we denote by $$\prod _ \mathcal{G}\colon L^{2}(\varOmega )\to{\mathbb{V}}(\mathcal{G})$$ the L2-projection onto $${\mathbb{V}}(\mathcal{G})$$ and set $$\prod _n\mathrel{:=}\prod _{\mathcal{G}_n}$$. On each time interval the discrete approximation is polynomial in time over the corresponding spatial finite element space. Let $$s\in{\mathbb{N}}_{0}$$ for any real vector space $${\mathbb{U}}$$ and interval $$I\subset{\mathbb{R}}$$; we denote by   $$\mathbb{P}_s\left(I;{\mathbb{U}}\right):=\left\{t\mapsto \textstyle\sum\limits_{i=0}^{s} t^{i}V_{i}:V_{i}\in{\mathbb{U}}\right\}$$the space of all polynomials with degree less than or equal to s over $${\mathbb{U}}$$. We write $$\mathbb{P}_s({\mathbb{U}}):=\mathbb{P}_s({\mathbb{R}},{\mathbb{U}})$$ and $$\mathbb{P}_s\mathrel{:=}\mathbb{P}_s({\mathbb{R}})$$. Furthermore, for an interval I ⊂ (0, T) we let   $$f_{I}\in \mathbb{P}_s\left(I;L^{2}(\varOmega)\right)$$be the best approximation of f|I in $$L^{2}\left (I;L^{2}(\varOmega )\right )$$. In particular we use $$f_{n}\mathrel{:=} f_{I_{n}}$$ as a time discretisation of f on In. For s = 0, $$f_{I}=\frac{1}{|I|}\int _{I} f\,\textrm{d}t$$ is the mean value of f on I. In the following we introduce the so-called discontinuous Galerkin time-stepping scheme dG(s) of degree s, where dG(0) is the well-known implicit Euler scheme. To this end we denote for n ⩾ 1 the actual grid on In by $$\mathcal{G}_n$$ and let $$\mathbb{V}_n={\mathbb{V}}(\mathcal{G}_n)$$ be the corresponding finite element space. We start with a suitable initial refinement $$\mathcal{G}_0$$ of $${\mathcal{G}_{\textrm{init}}}$$ and an approximation $$U_0=\varPi _{0} u_{0}=\varPi _{\mathcal{G}_0} u_{0}\in \mathbb{V}_0$$ of the initial value u0. Note that in principle any suitable interpolation operator can be used instead of Π0. We then inductively compute for n > 0 a solution $$U_{\mid{I}_n}\in \mathbb{P}_s(\mathbb{V}_n)$$ to the problem   \begin{align} \begin{split} \int_{I_{n}}\langle\partial_{t}U_{\mid{I}_n},V\rangle_\varOmega + \mathcal{B}\left[U_{\mid{I}_n},V\right]\textrm{d}t + \langle\left[\negthinspace\left[{U}\right]\negthinspace\right]_{n-1},V(t_{n-1})\rangle_\varOmega = \int_{I_{n}}\langle{\,f_n},V\rangle_\varOmega \end{split} \end{align} (2.3)for all $$V\in \mathbb{P}_s(\mathbb{V}_n)$$. Thereby, $$f_{n}\mathrel{:=} f_{I_{n}}$$ and $$\left [\negthinspace \left [{U}\right ]\negthinspace \right ]_{n-1}$$ denotes the jump   $$\left[\negthinspace\left[{U}\right]\negthinspace\right]_{n-1}:=U_{n-1}^{+}-U_{n-1}^{-}$$of U across tn−1, where we used $$U_{n-1}^{+}\mathrel{:=}\lim _{t\downarrow t_{n-1}}U_{\mid{I}_n}(t)$$, $$U_{n}^{-}\mathrel{:=}U_{\mid{I}_n}(t_{n})$$, n = 1, … , N and $$U_{0}^{-}\mathrel{:=}U_{0}$$. Note that with this definition we have $$U_{n-1}^{-}=U(t_{n-1})$$. The solution U is uniquely defined (Thomée, 2006) and we will see below that (2.3) is equivalent to an (s + 1)-dimensional second-order elliptic system. Note that U is allowed to be discontinuous across the nodal points t0, … , tN and hence in general $$U\not \in \mathbb{W}(0,T)$$. In order to construct from U a conforming function we shall now define the Radau reconstruction, which was first introduced and analysed by Makridakis & Nochetto (2006) and is motivated by the close relation of the dG(s) to Runge–Kutta Radau IIA collocation methods. In fact, let c1 < ⋯ < cs+1 and b1, … , bs+1 be the abscissae and weights of the RadauIIA quadrature formula, which is exact of degree 2s, i.e.,   \begin{align} \sum_{j=1}^{s+1}b_{j} P(c_{j}) ={\int_{0}^{1}}P(t)\,\textrm{d}t\qquad \textrm{for all}\ P\in\mathbb{P}[2s]. \end{align} (2.4)We define $${\mathcal{U}}\in \mathbb{W}$$, $${\mathcal{U}}_{|I_{n}}\in \mathbb{P}_{s+1}(I_{n};{\mathbb{V}})$$ as the piecewise interpolation of U at the local RadauIIA points $${t_{n}^{\,j}}:=t_{n-1}+c_{j}\tau _{n}$$, i.e.,   \begin{align} {\mathcal{U}}\left({t_{n}^{\, j}}\right) = U_{\mid{I}_n}\left({t_{n}^{\,j}}\right)\in\mathbb{V}_n, \qquad j =1,\ldots,s+1. \end{align} (2.5a)The continuous embedding of $$\mathbb{W}$$ in $$C^{0}\left ([0,T];L^{2}(\varOmega )\right )$$ additionally enforces   \begin{align} {\mathcal{U}}(t_{n-1}) = U_{n-1}^{-}\in\mathbb{V}_{n-1}, \end{align} (2.5b)which uniquely defines $${\mathcal{U}}\in \mathbb{W}$$ since 0 =: c0 < c1 < ⋯ < cs are pairwise disjoint. For convenience, we shall adopt the representation of $${\mathcal{U}}$$ with Legendre polynomials from Davis & Rabinowitz (1984). We observe from Davis & Rabinowitz (1984) that the $${t_{n}^{j}}$$, j = 1, … , s are the zeros of   $$\frac{(-1)^{s}}{2}\left({L_{s}^{n}}-L_{s+1}^{n}\right)\in\mathbb{P}_{s+1},$$where for $$r\in{\mathbb{N}}_{0}$$, $${L_{r}^{n}}$$ denotes the affine transformation of the rth Legendre polynomial Lr on (−1, 1) to the interval In, i.e.,   $${L_{r}^{n}}(t):=L_{r}\left(2\frac{t-t_{n-1}}{\tau_{n}}-1\right)\!.$$Therefore, we have that the $$\{{L_{r}^{n}}\}_{r\in{\mathbb{N}}_{0}}$$ are L2-orthogonal on In and that $${L_{r}^{n}}(t_{n})=1$$, $${L_{r}^{n}}(t_{n-1})=(-1)^{r}$$, as well as $$\int _{I_{n}}|{L_{r}^{n}}|^{2}\,\textrm{d}t=\frac{\tau _{n}}{2r+1}$$. These properties readily imply the representation   \begin{align} {\mathcal{U}}_{|I_{n}}=U_{|I_{n}}+\frac{(-1)^{s}}{2}\left(L_{s+1}^{n}-{L_{s}^{n}}\right)\left(U_{n-1}^{+}-U_{n-1}^{-}\right)\!; \end{align} (2.6)compare with the studies by Ern et al. (2017) and Smears (2017). Using integration by parts with respect to time, (2.4) and (2.5), we observe that (2.3) is equivalent to   \begin{align} \int_{I_{n}}\langle{{\partial_{t}} {\mathcal{U}}},{V}\rangle + {\mathcal{B}[U,\,V]}\,\textrm{d}t = \int_{I_{n}}\langle{f_{n}},{ V}\rangle_\varOmega\textrm{d}t \end{align} (2.7)for all n = 1, … , n and $$V\in \mathbb{P}_s(\mathbb{V}_n)$$. We emphasise that $${\mathcal{U}}(t)$$ is a finite element function, since for t ∈ In, we have $${\mathcal{U}}(t)\in{\mathbb{V}}\left (\mathcal{G}_{n-1}\oplus \mathcal{G}_n\right )\subset{\mathbb{V}}$$, where $$\mathcal{G}_{n-1}\oplus \mathcal{G}_n\in{\mathbb{G}}$$ is the smallest common refinement of $$\mathcal{G}_{n-1}$$ and $$\mathcal{G}_{n}$$, which we call overlay. Continuity of $${\mathcal{U}}$$ in time in combination with $${\mathcal{U}}(t)\in{\mathbb{V}}$$ for all t ∈ I then implies $${\mathcal{U}}\in \mathbb{W}$$. Remark 2.1 For s = 0 we see from (2.3) that in each time step $$n\in{\mathbb{N}}$$ we need to solve for partial differential operators of the form − Δ + μ, with $$\mu =\frac 1{\tau _{n}}$$, in order to compute Un. Unfortunately, for s > 0, though still coercive, (2.3) becomes an (s + 1)-dimensional coupled nonsymmetric system. Recently, in the study by Smears (2017) a preconditioned conjugate gradient (PCG) method for a symmetrisation of (2.3) is proposed, which is fully robust with respect to the discretisation parameters s and τ, provided a solver for the operator − Δ + μ, μ ⩾ 0 is available. 3. A posteriori error estimation One basic ingredient of adaptive methods is a posteriori error indicator building up a reliable upper bound for the error in terms of the discrete solution and given data. The dG(0) method corresponds to the implicit Euler scheme and residual-based estimators for the heat equation can be found in the study by Verfürth (2003). In this section we generalise this result and prove reliable and efficient residual-based estimators for dG(s) schemes (2.3), with arbitrary $$s\in{\mathbb{N}}_{0}$$. Some arguments in this section are straightforward generalisations of those in the study by Verfürth (2003) and we only sketch their proofs; others are based on new ideas and therefore we shall prove them in detail. 3.1 Equivalence of error and residual In order to prove residual-based error estimators one first has to relate the error to the residual. To this end we note that (2.2) can be taken as an operator equation in $$L^{2}(0,T;{\mathbb{V}}^{\ast })\times L^{2}(\varOmega )$$. Its residual $$\textrm{Res}({\mathcal{U}})$$ in $${\mathcal{U}}\in \mathbb{W}(0,T)\$$ is given by   \begin{align} {\left\langle \textrm{Res}({\mathcal{U}}),\,v\right\rangle}=&\,{\left\langle{\partial_{t}}(u-{\mathcal{U}}),\,v\right\rangle}+{\mathcal{B}[u-{\mathcal{U}},\,v]}\nonumber\\ =&\,\langle{\,f-{\partial_{t}} {\mathcal{U}}},{v}\rangle-{\mathcal{B}[{\mathcal{U}},\,v]}\qquad\textrm{for all } v\in{\mathbb{V}}. \end{align} (3.1) From the study by Tantardini & Veeser (2016) we have the following identity between the residual and the error. Proposition 3.1 (Abstract error bound). Let $$u\in \mathbb{W}$$ be the solution of (2.2) and let $${\mathcal{U}}\in \mathbb{W}$$ be the approximation defined in (2.5) for time instances t0 = 0, … , tN = T and time-step sizes τn := tn − tn−1, n = 1, … , N. Then it holds for 0 ≤ k ≤ N that   \begin{align} \Vert{u}-\mathcal{U}\Vert^2_{\mathbb{W}(0,T)}=\Vert{u_0}-U_0\Vert^2_\varOmega+\Vert\textrm{Res}(\mathcal{U})\Vert^2_{{L^2}(0,T;\mathbb{V}\ast)} \end{align} (3.2a) and  \begin{align} \Vert\textrm{Res}(\mathcal{U})\Vert^2_{{L^2}(I_{n},\mathbb{V}^\ast)}\leqslant\left\{2\Vert\partial_{t}(u-\mathcal{U})\Vert^2_{{L^2}(I_{n};\mathbb{V}^\ast)}+\Vert{u}-\mathcal{U}\Vert^2_{{L^2}(I_{n};\mathbb{V})}\right\}\!. \end{align} (3.2b) The rest of this section concentrates on proving computable upper and lower bounds for the error. We note that the initial error $$\Vert{u_{0}-U_{0}}\Vert_\varOmega$$ in (3.2) is already a posteriori computable, whence it remains to estimate the dual norm of the residual. However, there is another issue of separating the influence of the temporal and the spatial discretisations on the error. In particular, defining the temporal residual $${\textrm{Res}_{\tau }}({\mathcal{U}}) \in L^{2}(0,T;{\mathbb{V}}^{\ast })$$ as   \begin{align} {\langle{\textrm{Res}_{\tau}}({\mathcal{U}}),\,v\rangle}\mathrel{:=}{\mathcal{B}[U-{\mathcal{U}},\,v]} \end{align} (3.3)and the spatial residual $${\textrm{Res}_{h}}({\mathcal{U}})\in L^{2}(0,T;{\mathbb{V}}^{\ast })$$ as   \begin{align} {\langle{\textrm{Res}_{h}}({\mathcal{U}}),\,v\rangle}\mathrel{:=}{\langle\, f_{n}-{\partial_{t}}{\mathcal{U}},\,v\rangle}-{\mathcal{B}[U,\,v]} \quad\textrm{on}\quad I_{n}, \end{align} (3.4)we obtain   \begin{align} \textrm{Res}({\mathcal{U}})= {\textrm{Res}_{\tau}}({\mathcal{U}})+{\textrm{Res}_{h}}({\mathcal{U}})+f-f_{n} \quad\textrm{on}\quad I_{n}. \end{align} (3.6)In what follows we use this decomposition to prove separated time and space error indicators, which build up a reliable and efficient bound for the error. 3.2 Temporal residual Recalling (2.6), we have   $${\mathcal{U}}(t)-U(t)=\frac{(-1)^{s}}{2}\left(L_{s+1}^{n} -{L_{s}^{n}}\right) \left(U_{n-1}^{+}-U_{n-1}^{-}\right)\quad\textrm{for}\ t\in I_{n},\ n=1,\ldots,N,$$and thanks to (2.5) and (2.4), we obtain   \begin{align} \int_{I_{n}}{\left\Vert{\textrm{Res}_{\tau}}({\mathcal{U}})\right\Vert}_{\mathbb{V}^{\ast}}^{2}\,\textrm{d}t=&\, \int_{I_{n}} {|||}{U- {\mathcal{U}}{|||}}_{\varOmega}^{2} \,\textrm{d}t ={{|||} U_{n-1}^{+}- U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\frac14\left|L_{s+1}^{n}-{L_{s}^{n}}\right|^{2}\,\textrm{d}t\nonumber\\ =&\, \tau_{n}\, C_{\tau}\,{{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}, \end{align} (3.6)where $$C_{\tau }=C_{\tau }(s):=\frac 14\left (\frac{1}{2s+3}+\frac 1{2s+1}\right )$$; see also the study by Ern et al. (2017). 3.3 The spatial residual In this section we estimate the spatial residual. Lemma 3.2 Let U be the approximation of (2.3) to the solution u of (2.2) and let $${\mathcal{U}}$$ be its interpolation defined by (2.5). Then there exists a constant $$C_{{\mathbb{G}}}>0$$ such that   $$\int_{I_{n}}\Vert{{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}\ast}^{2}\,\textrm{d}t\leqslant C_{{\mathbb{G}}} \sum_{{E} \in\mathcal{G}_{n}}\int_{I_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\Vert}_{E}^{2}+h_{{E}}\Vert{J(U)\Vert}_{\partial E}^{2}\,\textrm{d}t$$for all 1 ⩽ n ⩽ N. Thereby, for $$V\in\mathbb{V}_n\$$ we denote by J(V )|S for an interior side S the jump of the normal flux $${\mathbf{A}}\nabla V\cdot{\vec{n}}$$ across S and for boundary sides S we set J(V )|S ≡ 0. The mesh size of an element $${E} \in \mathcal{G}$$ is given by hE := |E|1/d. Proof. Recalling (3.4), we first observe that $${\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}^{\ast}}^{2}\in \mathbb{P}_{2s}$$, whence by (2.4) we have   $$\int_{I_{n}} \Vert{{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}\ast}^{2}\,\textrm{d}t = \tau_{n} \sum_{j=1}b_{j} {\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2}\!.$$Therefore, it suffices to estimate $${\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{{\mathbb{V}}^{\ast }}^{2}$$ at the abscissae of the RadauIIA quadrature formula. For arbitrary $$V_{j}\in{\mathbb{V}}_{n}$$, j = 1, … , s + 1 choose $$V\in \mathbb{P}_s\ ({\mathbb{V}}_{n})$$ in (2.7) such that $$V\left (t+c_{i}\tau _{n}\right )=V_{j}\delta _{ij}$$, 1 ⩽ i ⩽ s + 1. Then again exploiting (2.4) yields the Galerkin orthogonality   $${\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\!,\,V_{j}\right\rangle}= 0, \qquad j=1,\ldots,s+1.$$Since $$V_{j}\in \mathbb{V}_n$$ was arbitrary we have for any $$v\in{\mathbb{V}}$$ that   $${\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\!,\,v\right\rangle}={\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\!,\,v-V\right\rangle}\qquad \textrm{for all } V\in{\mathbb{V}}_{n}.$$Using integration by parts with respect to the space variable, the Cauchy–Schwarz inequality, and the scaled trace inequality and choosing V as a suitable interpolation of v, we arrive at   $${\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2} \leqslant C_{{\mathbb{G}}} \sum_{{E} \in\mathcal{G}_{n}} \left \{ h_{{E}} ^{2}{\left\Vert({\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}) \left({t^{\, j}_{n}}\right)\right\Vert}_{E}^{2} +h_{{E}} {\left\Vert J(U) \left({t^{\, j}_{n}}\right)\right\Vert}_{\partial E}^{2}\right\}.$$ The right-hand side is a pointwise evaluation of a polynomial of degree 2s and thus the claimed upper bound follows from (2.4). The following result shows that the spatial indicators are locally efficient in time as well. Lemma 3.3 Under the conditions of Lemma 3.2 we have   \begin{align*} \sum_{{E} \in\mathcal{G}_{n}}\int_{I_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\Vert}_{E}^{2}+h_{{E}} {\Vert J(U)\Vert}_{\partial E}^{2}\,\textrm{d}t \leqslant C \left\{ \int_{I_{n}}{\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}^{\ast}}^{2}+\textrm{osc}_{\mathcal{G}_{n}}^{2}(f_{n},{\mathcal{U}})\,\textrm{d}t\right\}\!, \end{align*}where   $$\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}}):=\sum_{{E} \in\mathcal{G}_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}-R_{{E}} \Vert}_{E}^{2} + h_{{E}} {\left\Vert J(U)-J_{E}\right\Vert}_{\partial E}^{2}$$with, at time t ∈ In, pointwise L2(Ω)-best approximations $$R_{{E}} (t)\in \mathbb{P}_s [2\ell -2](E)$$, respectively $$J_{E}(t)_{|{S} }\in \mathbb{P}_s [2\ell -1]({S} )$$, for each side S ⊂ ∂E. The constant C > 0 depends solely on the shape regularity of $${\mathbb{G}}$$. Proof. With the same arguments as in the proof of Lemma 3.2, for each 1 ⩽ j ⩽ s + 1 it suffices to prove that   \begin{align*} &C_{{\mathbb{G}}} \sum_{E\in\mathcal{G}_{n}}h_{{E}} ^{2}{\left\Vert\left({\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\right)\left({t^{\, j}_{n}}\right)\right\Vert}_{E}^{2}+h_{{E}} {\left\Vert J(U)\left({t^{\, j}_{n}}\right)\right\Vert}_{\partial E}^{2} \\&\qquad\leqslant\! C\!\left\{{\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2}\!+\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n},U)\left({t^{\, j}_{n}}\right)\right\}.\end{align*}This, however, follows with standard techniques used in a posteriori estimation of elliptic second-order problems; see, e.g., Verfürth (2013) and Mekchay & Nochetto (2005) and compare with the case of the implicit Euler scheme s = 0 in the study by Verfürth (2003). 3.4 Estimation of the error By means of the decomposition of the residual (3.5) we can combine the above results to obtain a reliable and efficient error estimator for (1.1). To this end we introduce the following error indicators for the sake of brevity of presentation: for $$\mathcal{G}\in{\mathbb{G}}$$ and $$v\in{\mathbb{V}}$$ the estimator for the initial value is given by   \begin{align} \mathcal{E}_{0}^{2}(v,\mathcal{G}):={\Vert{v-{{\mathcal{I}}_{\mathcal{G}}} v}\Vert}_{\varOmega}^{2}. \end{align} (3.7a)For $$f\in L^{2}\left (0,T;L^{2}(\varOmega )\right )$$, t⋆ ∈ (0, T) and I = (t⋆, t⋆ + τ] ⊂ (t⋆, T], the so-called consistency error which is inherited by the decomposition of the residual (3.5) is defined by   \begin{align} \mathcal{E}_{f}^{2}(\,f,t_{\star},\tau):=3\frac{{C_{F}^{2}}}{a_{*}}\int_{I}{\Vert\, f- f_{I}\Vert}_{\varOmega}^{2}\,\textrm{d}t. \end{align} (3.7b)For $$v^{-},v^{+}\in{\mathbb{V}}$$, $$\mathcal{G}\in{\mathbb{G}}$$, $$V\in \mathbb{P}_s (\mathbb{V}(G)\ )$$, $${E} \in \mathcal{G}$$ and $$g\in \mathbb{P}_s (L^{2}(\varOmega ))$$ the indicator   \begin{align} \mathcal{E}_{c\tau}^{2}(v^{+},v^{-},\tau):= \tau3\, C_{\tau}\,{{|||} v^{-}-v^{+}{|||}}_{\varOmega}^{2} \end{align} (3.7c)is motivated by (3.6) and Lemma 3.2 suggests defining the spatial indicators by   \begin{align} \mathcal{E}_{\mathcal{G}}^{2}(V,v^{-},t_{\star},\tau,g,\mathcal{G},{E} ) :=&\,3 C_{{\mathbb{G}}} \int_{I} h_{{E}} ^{2}{\Vert{\partial_{t}} \mathcal{V}+\mathcal{L} V-g\Vert}_{E}^{2} +h_{{E}} {\Vert J(V)\Vert}_{\partial E}^{2}\,\textrm{d}t\\=&\,3 C_{{\mathbb{G}}} \,\tau\,\sum_{j=1}^{s+1}b_{j}\left\{ h_{{E}} ^{2}{\left\Vert\left({\partial_{t}} \mathcal{V}+\mathcal{L} V-g\right)\left(t_{\star}+c_{j}\tau\right)\right\Vert}_{E}^{2}+h_{{E}} {\left\Vert J(V)(t_{\star}+c_{j}\tau)\right\Vert}_{\partial E}^{2}\right\}\!.\nonumber \end{align} (3.7d)Here we have used, analogously to (2.6), that   \begin{align} \mathcal{V}(t):=V+\frac{(-1)^{s}}{2}\left(L_{s+1}^{I}-{L_{s}^{I}}\right)\left(V(t)-v^{-}\right)\!, \end{align} (3.8)where for $$r\in{\mathbb{N}}_{0}$$, $${L_{r}^{I}}$$ denotes the affine transformation of the Legendre polynomial Lr to I. Proposition 3.4 (Upper bound). Let $$u\in \textrm{W}$$ be the solution of (2.2) and let $${\mathcal{U}}\in \textrm{W}$$ be the approximation defined in (2.5) for time instances t0 = 0, … , tN = T and time-step sizes τn := tn − tn−1, n = 1, … , N. Then we have the estimate   \begin{align*} {\Vert u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}\leqslant&\, \mathcal{E}_{0}^{2}(u_{0},\mathcal{G}_{0})+\sum_{n=1}^{N}\left\{ \mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right) + \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\right. \\ &\qquad\qquad\qquad\qquad\left.+3 {\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}\right\}\!. \end{align*}In particular, this implies   \begin{align*} {\Vert u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}\leqslant \mathcal{E}_{0}^{2}(u_{0},\mathcal{G}_{0})\!+\!\sum_{n=1}^{N}\!\left\{ \mathcal{E}_{c\tau}^{2}\!\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right) \!+ \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right) \!+\mathcal{E}_{f}^{2}(\,f,t_{n-1},\tau_{n})\right\}\!. \end{align*} Proof. By the decomposition of the residual (3.5) and the triangle inequality we estimate on each interval In, n = 1, … , N,  \begin{align*} {\left\Vert \textrm{Res}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}&\leqslant 3 {\Vert{\textrm{Res}_{\tau}}({\mathcal{U}})\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2} +3{\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+3 {\Vert\, f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}. \end{align*}Now the first assertion follows with Proposition 3.1, (3.6) and Lemma 3.2. The second bound follows from the Friedrichs inequality and the ellipticity (2.1) of $$\mathcal{B}$$. Indeed, we have   \begin{align} {\Vert\, f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}\ {\lesssim}\ \frac{{C_{F}^{2}}}{a_{*}}{\Vert\, f-f_{n}\Vert}_{L^{2}\left(I_{n};L^{2}(\varOmega)\right)}^{2}. \end{align} (3.9) Proposition 3.5 (Lower bound). Supposing the conditions of Proposition 3.4 we have   \begin{align*} &\mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right)+ \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\\ &\qquad\leq C\,\left\{{\left\Vert{\partial_{t}}(u-{\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+{\Vert u-{\mathcal{U}}\Vert}_{L^{2}(I_{n};{\mathbb{V}})}^{2}+\int_{I_{n}}\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}})\,\textrm{d}t+{\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2} \right\}\!, \end{align*}where the constant C depends solely on the shape regularity of $${\mathbb{G}}$$ and on ℓ. Proof. We first consider the spatial indicators. By Lemma 3.3 there exists C > 0 such that   $$\mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\leqslant C {\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+C\int_{I_{n}}\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}})\,\textrm{d}t.$$The first term on the right-hand side can be further estimated using the decomposition of the residual, the triangle inequality and (3.6) to obtain   \begin{align} \left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert_{L^{2}(I_{n};\mathbb{V}^{\ast})} \leqslant \Vert{\textrm{Res}({\mathcal{U}})}\Vert_{L^{2}(I_{n};{\mathbb{V}}^{\ast})}+ \Vert{\,f-f_{n}}\Vert_{L^{2}(I_{n};{\mathbb{V}}^{\ast})} + {\mathcal{E}_{c\tau}}\left(U_{n-1}^{-},U_{n-1}^{+},\tau_{n}\right)\!. \end{align} (3.10) The time/coarsening indicator $$\mathcal{E}_{c\tau }$$ can be bounded as in the study by Ern et al. (2017). In order to keep the paper self-contained we shall reproduce the proof here. Recalling the properties of the transformed Legendre functions we conclude with (2.6) that   \begin{align}\frac{1}{4}\frac{\tau_{n}}{2s+3}{{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}=&\,\tfrac14 {{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\left(L_{s+1}^{n}\right)^{2}\,\textrm{d}t\nonumber \\ =&\, {{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\tfrac{(-1)^{s}}{2}\left(L_{s+1}^{n}-{L_{s}^{n}}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\,\textrm{d}t \nonumber\\ =&\,\int_{I_{n}}{\left\langle\nabla\left({\mathcal{U}}-U\right),\nabla\left(U_{n-1}^{+}-U_{n-1}^{-}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}_{\varOmega}\,\textrm{d}t \nonumber\\ =&\,\int_{I_{n}}{\left\langle \textrm{Res}({\mathcal{U}}),\,\left(U_{n-1}^{-}-U_{n-1}^{+}\right)\tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}\,\textrm{d}t \nonumber\\ &+\int_{I_{n}}{\left\langle\, f-f_{n},\left(U_{n-1}^{-}-U_{n-1}^{+}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}_{\varOmega}\,\textrm{d}t. \end{align} (3.11)Here we have used in the last step that $$L_{s+1}^{n}$$ is L2-orthogonal on In onto $$\mathbb{P}_{s}$$ and thus we have $$\int _{I_{n}}{\left \langle{\partial _{t}}{\mathcal{U}},\,L_{s+1}^{n}\right \rangle }\,\textrm{d}t=\int _{I_{n}}{\langle\, f_{n}},{L_{s+1}^{n}\rangle}_{\varOmega}\,\textrm{d}t=\int _{I_{n}}{\langle \nabla U, L_{s+1}^{n}\rangle}_{\varOmega}\,\textrm{d}t=0$$. The assertion follows then from a Cauchy–Schwarz inequality. Indeed, recalling (3.7c) and observing that   $$\frac{\frac14\frac{1}{2s+3}}{C_{\tau}}=\frac{2s+1}{4s+4}\in\left[\frac14,\frac12\right)\!,$$we have that the involved constants are robust with respect to s. Remark 3.6 (Consistency indicator $$\mathcal{E}_{f}$$). The upper and lower bounds in Propositions 3.4 and 3.5 in principle involve the consistency terms $${\Vert f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}$$, n = 1, … , N, which are not computable in general. Though these terms can be bounded by $$\mathcal{E}_{f}^{2}(f,t_{n-1},\tau _{n})$$ (compare with (3.11)), this still involves the best approximation $$f_{n}\in \mathbb{P} _{s}\left (I_{n};L^{2}(\varOmega )\right )$$ of $$f_{|I_{n}}$$ in $$L^{2}\left (I_{n};L^{2}(\varOmega )\right )$$. In a practical algorithm fn may therefore be replaced by some quasi-best approximation in $$\mathbb{P} _{s}\left (I_{n};L^{2}(\varOmega )\right )$$ like a Clément-type interpolation in time; see the study by Clément (1975). Remark 3.7 (Temporal residual). We emphasise that the techniques from the study by Ern et al. (2017) for bounding the indicator $$\mathcal{E}_{c\tau }$$ in the proof of Proposition 3.5 are different from the one in the study by Verfürth (2003) for the dG(0) scheme and it is noteworthy that this improves the constant even in this case. In fact, it follows from the proof of Proposition 3.5 that   $$\tfrac18\mathcal{E}_{f}^{2}\left(U_{n-1}^{+}, U_{n-1}^{-},\tau_{n}\right)^{2}\leqslant{\Vert \textrm{Res}({\mathcal{U}})\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+ {\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2};$$compare also with the study by Ern et al. (2017). Remark 3.8 (Time-space efficiency). We note that the lower bound in Proposition 3.5 is local in time but, however, global in space. In fact, though the estimate (3.12) of the indicators $$\mathcal{E}_{\mathcal{G}}$$ in principle is local also in space, we have that the estimate of $$\mathcal{E}_{c\tau }$$ in (3.11) follows from a global relation in space. Ern et al. (2017) presented an estimator, which is locally efficient in space and time for the modified error norm   \begin{align} \Vert{u-U}\Vert_{{\mathcal{E}_{Y}}}:= \left(\Vert{u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}+\sum_{n=1}^{N}\mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right)^{2}\right)^{\!1/2}\!, \end{align} (3.12)which can be interpreted as a dG(s) norm controlling the temporal jumps of the discrete solution. With respect to this norm the local space and time efficiency of the indicator $$\mathcal{E}_{c\tau }$$ with respect to $$\Vert{u-U}\Vert_{\mathcal{E}_{Y}}$$ is then immediate. Moreover, by resorting to equilibrated flux estimators instead of the residual-based ones in (3.7d), the a posteriori bounds in the study by Ern et al. (2017) are also robust with respect to the polynomial degree ℓ in space. Remark 3.9 (Elliptic problem). In case of the implicit Euler scheme dG(0) it is well known that in each $$1\leqslant n\leqslant N$$, $$U_{|I_{n}} \in \mathbb{P}_s [0](\mathbb{V}_n )=\mathbb{V}_n$$ is the Ritz approximation to a coercive elliptic problem. Moreover, the spatial estimators (3.7d) are the standard residual-based estimators for this elliptic problem. This observation transfers to the dG(s) scheme for $$s\geqslant 1$$. To see this we observe that (after transformation to the unit interval) (2.3) is a Galerkin approximation to the solution $$u_{\tau }\in \mathbb{P}_s ({\mathbb{V}})$$ of a problem of the kind   \begin{align} {\int_{0}^{1}}\frac{1}{\tau}\langle{{\partial_{t}} u_{\tau}},{v}\rangle_{\varOmega} + {\mathcal{B}[u_{\tau},\,v]}\,\textrm{d}t +\frac{1}{\tau}\langle{u_{\tau}(0)},{v(0)}\rangle_{\varOmega} = {\int_{0}^{1}}\left\langle{\,\bar f},{v}\right\rangle_{\varOmega} \,\textrm{d}t +\frac{1}{\tau}\left\langle{v^{-}},{v(0)}\right\rangle_{\varOmega} \end{align} (3.13)for all $$v\in \mathbb{P}_s ({\mathbb{V}})$$ and some data $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$, v−∈ L2(Ω) and τ > 0. The mappings v ↦ v(0) and v ↦ ∂tv are linear and continuous on $$\mathbb{P}_s ({\mathbb{V}})$$, whence this equation can be taken as a vector-valued linear variational problem of second order on $${\mathbb{V}}^{s+1}$$. Testing with v = uτ proves coercivity:   \begin{align*} {\int_{0}^{1}}\frac{1}{\tau}\langle{{\partial_{t}} u_{\tau}},{u_{\tau}}\rangle_{\varOmega} + {\mathcal{B}[u_{\tau},\,u_{\tau}]}\,\textrm{d}t +\!\frac{1}{\tau}\langle{u_{\tau}(0)},{u_{\tau}(0)}\rangle_{\varOmega} =\frac{1}{2\tau}{\Vert u_{\tau}(0)\Vert}_{\varOmega}^{2}\!+\!\frac{1}{2\tau}{\Vert u_{\tau}(1)\Vert}_{\varOmega}^{2}+ {\int_{0}^{1}}{{|||} u_{\tau}{|||}}_{\varOmega}^{2}\,\textrm{d}t.\end{align*}Obviously, its residual in $$V\in \mathbb{P}_s ({\mathbb{V}})$$ is given by   $${\langle{\textrm{Res}_{h}}(\mathcal{V}),\,v\rangle}={\langle\, \bar f-{\partial_{t}}\mathcal{V},\,v\rangle}-{\mathcal{B}[V,\,v]},\quad v\in\mathbb{P}_s({\mathbb{V}}),$$where $$\mathcal{V}\in \mathbb{P}_s [s+1]({\mathbb{V}})$$ is such that $$\mathcal{V}(c_{j})=V(c_{j})$$, j = 1, … , s and $$\mathcal{V}(0)=v^{-}$$; compare with (2.5). Thanks to Lemmas 3.2 and 3.3, for $$V\in \mathbb{P}_s (\mathbb{V}(\mathcal{G}) )$$, $$\mathcal{G}\in{\mathbb{G}}$$, the standard residual-based estimator for this problem is given by $$\mathcal{E}_{\mathcal{G}}^{2}(V,v^{-},\tau ,0,\bar f,\mathcal{G})$$. Energy estimation. We shall now generalise the energy estimate from the study by Kreuzer et al. (2012) to higher-order dG(s) schemes. Similar estimates have been obtained in the studies by Eriksson & Johnson (1991, 1995), although they use different techniques involving restrictions on the time step and mesh sizes, as well as constants resulting from inverse inequalities. Proposition 3.10 (Uniform global energy estimate). Assume $$N\in{\mathbb{N}}\cup \{\infty \}$$ and arbitrary time instances $$0=t_{0}<\cdots <t_{N}\leqslant T$$ with time–step sizes τ1, … , τN > 0. Let $$U_{0}= \varPi_{0}u_{0}$$ and for $$1\leqslant n\leqslant N$$ let $$U_{|I_{n}} \in \mathbb{P}_s (\mathbb{V}_n )$$ be the discrete solutions to (2.3) and let $${\mathcal{U}}\in \textrm{W}$$ as defined in (2.5). Then for any $$m=1,\dots ,N$$ we have   $$\sum_{n=1}^{m} \int_{I_{n}}{\Vert{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2}\,\textrm{d}t + {{|||} U_{n-1}^{+}-\varPi_{n} U_{n-1}^{-}{|||}}^{2} +{{|||} U_{n}^{-}{|||}}_{\varOmega}^{2} -{{|||} \varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \leqslant \sum_{n=1}^{m}\int_{I_{n}}{\Vert\, f_{n}\Vert}_{\varOmega}^{2}\,\textrm{d}t.$$ Proof. We choose $$V\mathrel{:=}\varPi_{n}{\partial _{t}}{\mathcal{U}}_{|I_{n}}\in \mathbb{P}_s (\mathbb{V}_n )$$ as a test function in (2.7) obtaining   \begin{align} \int_{I_{n}}{\Vert \varPi_{n}{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2} + {\mathcal{B}[U,\,\varPi_{n} {\partial_{t}}{\mathcal{U}}]}\,\textrm{d}t = \int_{I_{n}} \langle{\,f_{n}},{\varPi_{n}{\partial_{t}}{\mathcal{U}}}\rangle_{\varOmega}\,\textrm{d}t. \end{align} (3.14)In order to analyse the second term on the left-hand side we first observe that $$\varPi_{n}{\partial _{t}}{\mathcal{U}}_{|I_{n}}={\partial _{t}}\varPi_{n}{\mathcal{U}}_{|I_{n}}\in \mathbb{P}_s (\mathbb{V}_n )$$. Recalling (2.5b) and that $$\mathcal{B}:{\mathbb{V}}\times{\mathbb{V}}\to{\mathbb{R}}$$ is constant in time we obtain, integrating by parts, that   $$\int_{I_{n}} {\mathcal{B}\left[U,\,\varPi_{n}{\partial_{t}}{\mathcal{U}}\right]}\,\textrm{d}t = - \int_{I_{n}} {\mathcal{B}\left[{\partial_{t}} U,\,\varPi_{n}{\mathcal{U}}\right]}\,\textrm{d}t + {|||}{U_n^{-}{|||}}^{2}-{\mathcal{B}\left[U_{n-1}^{+},\,\varPi_{n}U_{n-1}^{-}\right]}.$$Since $${\mathcal{B}[{\partial _{t}} U,\,\varPi_{n}{\mathcal{U}}]}_{|I_{n}}\in \mathbb{P}_s [2s]$$ we can apply (2.4) and conclude with (2.5a) that   \begin{align*} \int_{I_{n}} {\mathcal{B}\left[U,\,\varPi_{n}{\partial_{t}}{\mathcal{U}}\right]}\,\textrm{d}t&= - \int_{I_{n}} {\mathcal{B}\left[{\partial_{t}} U,\,U\right]}\,\textrm{d}t + {{|||} U_n^{-}{|||}}_{\varOmega}^{2}-{\mathcal{B}\left[U_{n-1}^{+},\,\varPi_{n}U_{n-1}^{-}\right]}\\ &= \frac12{{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} - \frac12{ {|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}+ \frac12{{|||} U_n^{-}{|||}}_{\varOmega}^{2}, \end{align*}where we used that $${\mathcal{B}\left [{\partial _{t}} U_{|I_n} ,\,U_{|I_n} \right ]}= \frac 12{\partial _{t}} {\Vert U_{|I_n}\Vert}_{\varOmega}^{2}$$. Inserting this in (3.14) yields   $$\int_{I_{n}}{\Vert\varPi_{n}{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2}\,\textrm{d}t+ \frac12{{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} - \frac12{{|||} \varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}+ \frac12{{|||} U_{n}^{-}{|||}}_{\varOmega}^{2} = \int_{I_{n}} \langle f_{n}, \varPi_{n}{\partial_{t}}{\mathcal{U}}\rangle_\varOmega\,\textrm{d}t.$$Estimating the right-hand side with the help of the Cauchy–Schwarz and the Young inequalities proves the assertion. Corollary 3.11 Under the conditions of Proposition 3.10 assume that   \begin{align} {{|||}U_{n-1}^{-}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t \geqslant 0\quad\textrm{ for}\quad n=1,\dots,N. \end{align} (3.15)Then we have the estimate   $$\sum_{n=1}^{m} \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t + {{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \leqslant \Vert\,{f}\Vert_{\varOmega\times(0,t_{m})}^{2} + {|||} U_{0}{|||}_{\varOmega}^{2}-{{|||}U_{m}^{-}{|||}}_{\varOmega}^{2}.$$In particular, the series $$\sum _{n=1}^{N} {{|||} U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}$$ is uniformly bounded irrespective of the sequence of time-step sizes used. Proof. Summing the non-negative terms in (3.15) yields   $$0\leqslant \sum_{n=1}^{m}{{|||}U_{n-1}^{-}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t,$$which is equivalent to   $${{|||}U_{m}^{-}{|||}}_{\varOmega}^{2} - {{|||}U_0{|||}}_{\varOmega}^{2}\leqslant\sum_{n=1}^{m}{{|||}U_{n-1}^{+}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t.$$Using this in the estimate of Proposition 3.10 yields the desired estimate. Having a closer look at the indicator $${\mathcal{E}_{c\tau }}$$ we note that since we allow for coarsening, it is not a pure temporal error indicator. Coarsening may cause the loss of information and too little information may lead to wrong decisions within the adaptive method. For this reason we use the triangle inequality to split   \begin{align} {\mathcal{E}}_{c\tau }^{2}(v^{-},v^{+},\tau,\mathcal{G})\leqslant \mathcal{E}^2_{c}(v^{-},\tau,\mathcal{G}) + \mathcal{E}^2_{\tau}(v^{+},v^{-},\tau,\mathcal{G}) \end{align} (3.16a)into a measure   \begin{align} \mathcal{E}^2_c(v^{-},\tau,\mathcal{G}) {:}{=}\sum_{{E} \in\mathcal{G}} \mathcal{E}^2_c(v^{-},\tau,\mathcal{G},E) := 6 C_{\tau}\sum_{{E} \in\mathcal{G}}\tau{{|||}\varPi_{\mathcal{G}} v^{-} - v^{-}{|||}}_{E}^{2} \end{align} (3.16b)for the coarsening error and   \begin{align} \mathcal{E}^2_\tau(v^{+},v^{-},\tau,\mathcal{G}) \mathrel{:=} 6 C_{\tau}\tau{{|||} v^{+} - \varPi_{\mathcal{G}} v^{-}{|||}}_{\varOmega}^{2}, \end{align} (3.16c)which serves as an indicator for the temporal error. This allows us to control the coarsening error separately. Assuming that (3.15) holds, Corollary 3.11 provides control of the sum of the time error indicators $$\mathcal{E}_{\tau}^{2} \left({U_{n-1}^{+}},{U_{n-1}^{-}},\tau _{n},\mathcal{G}_{n}\right) = 6C_{\tau }\tau{{{|||}U_{n-1}^{+}}-\varPi_{n}{U_{n-1}^{-}}{|||}}_{\varOmega}^{2}$$. Assumption (3.15) would trivially be satisfied for the Ritz projection $$R_{n}U_{n-1}^{-}$$ of $$U_{n-1}^{-}$$ into $$\mathbb{V}_n$$, since $${{|||} R_{n}U_{n-1}^{-}{|||}}_{\varOmega}\leqslant {|||}{U_{n-1}^{-}{|||}}_{\varOmega}$$. The L2-projection $$\varPi_n U_{n-1}^{-}$$, however, does not satisfy this monotonicity property in general and therefore coarsening may lead to an increase of energy. The algorithm presented below ensures that (3.15) is fulfilled at the end of every time step. To this end, using the notation (3.8) we define for $$V\in \mathbb{P}_s \left ({\mathbb{V}}(\mathcal{G})\right )$$, $$v^{-}\in{\mathbb{V}}$$, t⋆ ∈ (0, T), I = (t⋆, t⋆ + τ] ⊂ (t⋆, T], $$\mathcal{G}\in{\mathbb{G}}$$ and $${E} \in \mathcal{G}$$ the indicators   $$\mathcal{E}^2_{\star}(V,v^{-},t_{\star},\tau,\mathcal{G},{E}) \mathrel{:=} {{|||}\varPi_{\mathcal{G}} v^{-}{|||}}_{E}^{2} - {|||}{v^{-}{|||}}_{E}^{2} - \frac12\int_{I} {\Vert \varPi_{\mathcal{G}}{\partial_{t}} \mathcal{V}\Vert}_{E}^{2}\,\textrm{d}t,$$as well as the convenient notation $$\mathcal{E}^2_{\star}(V,v^{-},t_{\star },\tau ,\mathcal{G}) \mathrel{:=} \sum _{E\in \mathcal{G}} \mathcal{E}_{\star}^2 (V,v^{-},t_{\star },\tau ,\mathcal{G},{E} )$$. Condition (3.15) is then equivalent to $$\mathcal{E}^2_{\star} (U,{U_{n-1}^{-}},t_{n-1},\tau _{n},\mathcal{G}_{n})\leqslant 0$$, $$n=1,\dots ,N$$. Note that the term $$- \int _{I_{n}} { \Vert\varPi_{n} {\partial _{t}} \mathcal{U}\Vert}_{E}^{2}$$ may compensate for $${{|||}\varPi_{n} U_{n-1}^{-}{|||}}_{E}^{2}> {|||}{U_{n-1}^{-}{|||}}_{E}^{2}$$. 4. The adaptive algorithm TAFEM Based on the observations in the previous section and a new concept for marking we shall next describe the adaptive algorithm TAFEM in this section. In contrast to the algorithms presented in the studies by Kreuzer et al. (2012) and Chen & Feng (2004), TAFEM is based on a different marking philosophy. In fact, they mark according to the same indicators, (3.7b)–(3.7d) and (3.16a), but in contrast to the studies by Kreuzer et al. (2012) and Chen & Feng (2004), TAFEM uses an L2-instead of an $$L^{\infty }$$-strategy. Philosophically, this aims at an L2 rather than an $$L^{\infty }$$ equal distribution of the error in time; compare also with the introductory Section 1. We follow a bottom-up approach, i.e., we first state basic properties on some rudimentary modules that are treated as black-box routines, then describe three core modules in detail and finally combine these procedures in the adaptive algorithm TAFEM . 4.1 Black-box modules As in the study by Kreuzer et al. (2012) we use standard modules ADAPT_INIT, COARSEN, MARK_REFINE and SOLVE as black-box routines, while adding the module ENLARGE as a black box as well. In particular, we use the subroutine MARK_REFINE in an object-oriented fashion, i.e., the functionality of MARK_REFINE changes according to its arguments. We next state the basic properties of these routines. Assumption 4.1 (Properties of modules). We suppose that all rudimentary modules terminate with an output having the following properties: (1) For a given initial datum u0 ∈ L2(Ω) and tolerance $${\texttt{TOL}_{0}}>0$$ the output   $$(U_{0}, \mathcal{G}_0) = \texttt{ADAPT_INIT}{{\left(u_{0},{\mathcal{G}_{\textrm{init}}},{\texttt{TOL}_{0}}\right)}}$$is a refinement $$\mathcal{G}_0\geqslant{\mathcal{G}_{\textrm{init}}}$$ and an approximation $$U_{0}\in{\mathbb{V}}(\mathcal{G}_0)$$ to u0 such that $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant \texttt{TOL}_{0}^{2}$$. (2) For given g ∈ L2(Ω), $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$, t⋆ ∈ (0, T), I = (t⋆, t⋆ + τ] ⊂ (t⋆, T] and $$\mathcal{G}\in{\mathbb{G}}$$, the output   $$U_{I} = \texttt{SOLVE}{{(g, \bar f, t, \tau, \mathcal{G})}}$$is the solution $$U_{I}\in \mathbb{P}_s (I,\mathbb{V}(G) )$$ to the discrete elliptic problem   $$\int_{I}\langle{{\partial_{t}} U_{I}},{V}\rangle_\varOmega + {\mathcal{B}[U_{I},\,V]}\,\textrm{d}t +\langle{U_{I}(t)},{V(t)}\rangle_\varOmega= \langle{g},{V}\rangle_\varOmega + \int_{I}\langle{\bar f},{V}\rangle_\varOmega \,\textrm{d}t$$for all $$V\in \mathbb{P}_s \left (\mathbb{V}(G) \right )$$; compare with (2.3). Hereby, we assume exact integration and linear algebra. (3) For a given grid $$\mathcal{G}\in{\mathbb{G}}$$ and a discrete function $$V\in{\mathbb{V}}(\mathcal{G})$$, the output   $$\mathcal{G}_{*} = \texttt{COARSEN}{(V,\mathcal{G})}$$satisfies $$\mathcal{G}_{*}\leqslant \mathcal{G}$$. (4) For a given function $$f\in L^{2}\left (\varOmega \times (0,T]\right )$$, an initial time t ∈ (0, T) and an initial step size τ ∈ (0, T − t], the output   $$\tau_{*} = \texttt{ENLARGE}\left(f, t, \tau,{\texttt{tol}_{f}}\right)$$satisfies $$\tau _{*}\geqslant \tau$$ or τ = T − t. The argument $${\texttt{tol}_{f}}$$ can be used to additionally restrict the enlargement; compare with Remark 4.3 below. (5) For a given grid $$\mathcal{G}$$ and a set of indicators $$\{{\mathcal{E}_{{E} }}\}_{{E} \in \mathcal{G}}$$, the output   $$\mathcal{G}_{*} = \texttt{MARK_REFINE}{{\left(\{{\mathcal{E}_{{E}}}\}_{{E}\in\mathcal{G}}, \mathcal{G}\right)}}\in{\mathbb{G}}$$is a conforming refinement of $$\mathcal{G}$$, where at least one element in the subset $${\arg \!\max }\{{\mathcal{E}_{{E}}}\!:\!{E} \!\in\! \mathcal{G}\}\subset \mathcal{G}$$ has been refined. (6) For given grids $$\mathcal{G},\mathcal{G}_{\textrm{old}}\in{\mathbb{G}}$$ and a set of indicators $$\{{\mathcal{E}_{{E} }}\}_{{E} \in \mathcal{G}}$$, the output   $$\mathcal{G}_{*} = \texttt{MARK_REFINE}{{\left(\{{\mathcal{E}_{{E} }}\}_{{E} \in\mathcal{G}}, \mathcal{G},\mathcal{G}_{\textrm{old}}\right)}} \in{\mathbb{G}}$$is a conforming refinement of $$\mathcal{G}$$, where at least one element of the set $$\left \{{E} \in \mathcal{G}\colon h_{\mathcal{G}|{E} }>h_{\mathcal{G}_{\textrm{old}}|{E} }\right \}$$ of coarsened elements (with respect to $$\mathcal{G}_{\textrm{old}}$$) is refined. For a more detailed description of these modules see the study by Kreuzer et al. (2012, Section 3.3.1). 4.2 The core modules The first core module CONSISTENCY controls the consistency error $${\mathcal{E}_{f}}$$. Recalling its definition in (3.7b) we see that the consistency error is solely influenced by the time-step size and can be computed without solving expensive discrete systems. Therefore, CONSISTENCY is used in the initialisation of each time step to adjust the time-step size such that the local consistency indicator $$\mathcal{E}^2_f(f,t,\tau)$$ is below a local tolerance $${{\texttt{tol}}_{f}}$$. It is important to notice that this module can be chosen to follow the classic thresholding algorithm, which ensures quasi-optimal order of convergence in terms of the degrees of freedom (DoFs); see Remark 4.3. We start with termination of the module CONSISTENCY. Lemma 4.2 (Termination of CONSISTENCY). Assume $$f \in L^{2}\left ((0,T);L^{2}(\varOmega )\right )$$. Then for any t ∈ (0, T) and τin ∈ (0, T − t],   $$\left(\,\bar f,\tau\right)=\texttt{CONSISTENCY}\left(f,t,\tau^{\textrm{in}},{{\texttt{tol}}_{f}}\right)$$terminates and   \begin{align} \mathcal{E}^2_f(f,t,\tau) \leqslant {\texttt{tol}}_{f}^{2}. \end{align} (4.1) Proof. The proof is straightforward since $$\mathcal{E}^2_f(f,t,\tau)$$ is monotone nonincreasing and $$\mathcal{E}^2_f(f,t,\tau) \rightarrow 0$$ when $$\tau \rightarrow 0$$. Remark 4.3 We note that the enlargement step in line 2 of CONSISTENCY is not relevant for Lemma 4.2, i.e., it could for instance be implemented by Algorithm 2. In this case the module CONSISTENCY becomes a classical thresholding algorithm; compare, e.g., with the study by Binev et al. (2002). However, a too aggressive enlargement of the time-step size may possibly need to be reduced later because of the time estimator $${\mathcal{E}_{\tau }}$$ (e.g., if $${\mathcal{E}_{f}}\equiv 0$$), which requires the solution of expensive linear systems. Therefore, we decided to introduce the abstract enlargement routine $$\texttt{ENLARGE}$$, which can be adjusted individually to the characteristics of the considered problem; compare also with Section 6.2.3, where we choose κ2 > 1, i.e., $$\min \{\kappa _{2}\tau ,T-t\}=\texttt{ENLARGE}(f,t,\tau ,{{\texttt{tol}}_{f}})$$ for the rough initial data experiment. Obviously, a local control of the form (4.1) does not guarantee that the global consistency error is below some prescribed tolerance $${{\texttt{TOL}}_{f}}$$. For this reason we first precompute some local tolerance $${{\texttt{tol}}_{f}}$$ from the global tolerance $${{\texttt{TOL}}_{f}}$$ by the following module $$\texttt{TOLFIND}$$. The next result states that if all local consistency indicators are below the threshold $${{\texttt{tol}}_{f}}$$ then the accumulation of the consistency indicators stays indeed below the prescribed global tolerance $${{\texttt{TOL}}_{f}}$$. Lemma 4.4 (Termination of TOLFIND). Assume $$f \in L^{2}\left ((0,T);L^{2}(\varOmega )\right )$$. Then for any $${{\texttt{TOL}}_{f}}>0$$ we have that   $${{\texttt{tol}}_{f}}=\texttt{TOLFIND}\left(\,f,T,{{\texttt{TOL}}_{f}}\right)>0$$terminates. Moreover, let 0 = t0 < t1 < ⋯< tN = T be arbitrary with $$\tau _{n}=t_n -t_{n-1}$$, n = 1, … , N, then   \begin{align} \mathcal{E}^2_f(f,{t_{n-1}},\tau_{n})\leqslant {\texttt{tol}}_{f}^{2},\quad n=1,\ldots, N \quad\Rightarrow\quad \sum_{n=1}^{N} \mathcal{E}^2_f(f,{t_{n-1}},\tau_{n}) \leqslant {\texttt{TOL}}_{f}^{2}. \end{align} (4.2) Proof. The proof is divided into three steps. (1) We show that the process from lines 4 to 8 terminates. To this end we recall the parameter κ1 ∈ (0, 1) from $$\texttt{CONSISTENCY}\left (\,f,{\tilde t_{n-1}},\tilde{\tau }_{n-1},{{\texttt{tol}}_{f}}\right )$$. We argue by contradiction and assume that an infinite monotone sequence $$\left \{\tilde t_{n}\right \}_{n\geqslant 0} \subset [0,T]$$ is constructed by TOLFIND with $$\lim _{n\to \infty }\tilde t_{n}=t^{\star }\in (0,T]$$. Defining the intervals $$I_{n}^{\ast }=\left (\tilde t_{n}, \tilde{t}_{n}^{\ast }\right ]$$ where $$\tilde{t}_{n}^{\ast }=\min \left \{ \kappa _{1}^{-1}(t^{\ast }-\tilde{t}_{n})+\tilde{t}_{n} ,T\right \}$$ we have that there exists $$n_{0}\in{\mathbb{N}}$$ such that   $$\Vert\,{f}\Vert_{\varOmega\times\left(\tilde{t}_{n}, \tilde{t}_{n}^{\ast}\right)}^{2}\leqslant {\texttt{tol}}_{f}^{2}$$for all n ⩾ n0 since $$\tilde{t}_{n}\to t^{\ast }$$. Let m > n0 such that the condition in line 3 of the module $$\texttt{CONSISTENCY} \left (\,f,{\tilde{t}_{m-1}},\tilde{\tau }_{m-1},{{\texttt{tol}}_{f}}\right )$$ is activated at least once (the existence of such m is a direct consequence of $$\tilde{t}_{n}\to t^{\ast }$$), then   \begin{align*} {\texttt{tol}}_{f}^{2} &< \mathcal{E}^2_f(\,f,{\tilde t_{m-1}},\kappa^{-1}_{1} \tilde{\tau}_{m} )\\ &\leqslant \Vert\,{f}\Vert_{\varOmega\times\left(\tilde t_{m-1},\tilde t_{m-1}+\kappa^{-1}_{1} \tilde{\tau}_{m}\right)}^{2} \\ &\leqslant \Vert\,{f}\Vert_{\varOmega\times\left(\tilde t_{m-1},\tilde t_{m-1}^{\ast}\right)}^{2}\leqslant {\texttt{tol}}_{f}^{2}. \end{align*}This contradiction implies that the sequence $$\left \{\tilde t_{n}\right \}_{n\geqslant 0}$$ is finite. (2) We next check that the condition of line 10 is violated after finitely many steps. Since the span of characteristics of dyadic intervals is dense in L2(0, T) we can choose M > 0 such that the squared consistency error on the grid of 2M uniform intervals is below $$\frac 14 {\texttt{TOL}}_{f}^{2}$$. We split the intervals generated in $$\texttt{TOLFIND}\left (\,f,T,{{\texttt{tol}}_{f}}\right )$$ into   $${\mathbb{I}}_{\textrm{in}}:=\left\{n: \left(\tilde t_{n-1},\tilde t_{n}\right]\subset T\big(m2^{-M},(m+1)2^{-M}\big]\ \textrm{for some}\ m\in\big\{0,\ldots,2^{M}-1\big\}\right\}$$and $${\mathbb{I}}_{\textrm{out}}:=\left \{1,\ldots ,N_{f}\right \}\setminus{\mathbb{I}}_{\textrm{in}}$$ according to whether or not they are included in one of the dyadic intervals. Therefore, we have, with the monotonicity of the consistency error and $$\#{\mathbb{I}}_{\textrm{out}}\leqslant 2^{M}$$, that   $$\epsilon = \sum_{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{\tilde t_{n-1}},\tilde{\tau}_{n})+\sum_{n\in{\mathbb{I}}_{\textrm{out}}}\mathcal{E}^2_f(\,f,{\tilde t_{n-1}},\tilde{\tau}_{n})\leqslant \frac14{\texttt{TOL}}_{f}^{2} + 2^{M}{\texttt{tol}}_{f}^{2}.$$Taking $${\texttt{tol}}_{f}^{2}\!<\! 2^{-(M+2)}{\texttt{TOL}}_{f}^{2}$$ we see that the condition of line 10 is violated, which proves the assertion. (3) Combining the above steps we conclude that TOLFIND terminates and it remains to prove (4.2). To this end we proceed similarly as in (2) and let   $${\mathbb{I}}_{\textrm{in}}:=\left\{n: \left(t_{n-1},t_{n}\right]\subset \left(\tilde t_{m-1},\tilde t_{m}\right]\ \textrm{for some}\ m\in\left\{1,\ldots,N_{f}\right\}\right\}$$and $${\mathbb{I}}_{\textrm{out}}:=\left \{1,\ldots , N\right \}\setminus{\mathbb{I}}_{\textrm{in}}$$. By monotonicity we have $$\sum _{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{ t_{n-1}},\tau _{n})\leqslant \sum _{n=1}^{N_{f}}\mathcal{E}^2_f(\,f,{ \tilde t_{n-1}},\tilde \tau _{n})\leqslant {\texttt{TOL}}_{f}^{2}/2$$ and thus the assertion follows from   \begin{align*} \sum_{n=1}^{N}\mathcal{E}^2_f(\,f,{t_{n-1}},\tau_{n})&= \sum_{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{ t_{n-1}},\tau_{n})+\sum_{n\in{\mathbb{I}}_{\textrm{out}}}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n}) \\ &\leqslant \frac{{\texttt{TOL}}_{f}^{2}}2 + N_{f}{\texttt{tol}}_{f}^{2} = \frac{{\texttt{TOL}}_{f}^{2}}2 + N_{f}\frac{{\texttt{TOL}}_{f}^{2}}{2N_{f}}\leqslant {\texttt{TOL}}_{f}^{2}. \end{align*} Remark 4.5 (Estimation of $${{\texttt{tol}}_{f}}$$ under regularity assumptions). Supposing the regularity assumption $$f \in H^{s}\left ((0,T);L^{2}(\varOmega )\right )$$, s ∈ (0, 1] the following idea may be used as an alternative for the estimation of $${{\texttt{tol}}_{f}}$$ with TOLFIND. Let δ > 0. Then using Lemma 4.2 together with Poincaré’s inequality in Hs and the fact that there are at most $$\frac{T}{\delta }$$ disjoint intervals of length δ in (0, T], we obtain   \begin{align*} \sum_{n=1}^{N}\mathcal{E}^2_f(f,{t_{n-1}},\tau_{n})&= \sum_{\tau_{n}>\delta}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n})+\sum_{\tau_{n}\leq\delta}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n}) \\ &\leqslant \frac{T}{\delta}{\texttt{tol}}_{f}^{2} +\sum_{\tau_{n}\leqslant\delta} \tau_{n}^{2s} \Vert\,{f}\Vert_{H^{s}(t_{n-1},t_{n},L^{2}(\varOmega))}^{2} \\ &= \frac{T}{\delta}{\texttt{tol}}_{f}^{2} + \delta^{2s} \Vert\,{f}\Vert_{H^{s}(0,T,L^{2}(\varOmega))}^{2}. \end{align*}By choosing $$\delta = \left (\frac{T \, {{\texttt{tol}}_{f}}}{ \Vert\,{f}\Vert_{H^{s}\left (0,T,L^{2}(\varOmega )\right )}}\right )^{\frac{2}{2s+1}}$$ the previous estimate turns into   $$\sum_{n=1}^{N}\mathcal{E}^2_f(\,f,{t_{n-1}},\tau_{n})\leqslant 2 T^{\frac{2s}{2s+1}} \, \Vert\,{f}\Vert_{H^{s}\left(0,T,L^{2}(\varOmega)\right)}^{\frac{2}{2s+1}} {\texttt{tol}}_{f}^{\frac{4s}{2s+1}}.$$ In other words if a priori knowledge of the regularity of the right-hand side is available then TOLFIND can be replaced by the somewhat simpler term   $${\texttt{tol}}_{f}^{2} = 2^{-\frac{2s+1}{2s}}\,T^{-1} \Vert{\, f}\Vert_{H^{s}\left(0,T,L^{2}(\varOmega)\right)}^{-\frac{1}{s}} \,{\texttt{TOL}}_{f}^{\frac{2s+1}{s}}.$$ We turn to the module ST_ADAPTATION, listed in Algorithm 4, which handles a single time step. The module adapts the grid and the time-step size according to the indicators involving the discrete solution of the current time step, namely the space indicator $${\mathcal{E}_{\mathcal{G}}}$$ and the separated coarsening and time indicators $${\mathcal{E}_{c}}$$ and $${\mathcal{E}_{\tau }}$$. The routine requires, right at the start of each iteration, the computation of the discrete solution on the actual grid and with the current time-step size; see line 5. Note that in ST_ADAPTATION only refinements are performed (both in space and in time). Recalling the discussion in the introductory Section 1, we aim to use a thresholding algorithm for the indicators $${\mathcal{E}_{\tau }}$$ in order to equally distribute the time error. To this end we need to guarantee $${\mathcal{E}_{*}}\leqslant 0$$ in order to control the global time error with the help of the uniform energy estimate from Corollary 3.11. Since there is no similar control available for either the space or the coarsening errors, we relate the corresponding indicators to the time or the consistency indicator, i.e., to adapt the spatial triangulation until   \begin{align} \mathcal{E}_{c}^{2},\mathcal{E}_{\mathcal{G}}^{2}\leqslant \mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}. \end{align} (4.3)Here we have invoked the consistency indicator $${\mathcal{E}_{f}}$$ on the right-hand side although it is controlled by CONSISTENCY outside ST_ADAPTATION—note that $${\mathcal{E}_{f}}$$ does not depend on the discrete solution. In fact, from the uniform energy estimate, Corollary 3.11, we have that $${\mathcal{E}_{\tau }}$$ vanishes faster than $${\mathcal{E}_{f}}$$ by one order, when no additional regularity of f is assumed. Consequently, in this case the time-step size is dictated by $${\mathcal{E}_{f}}$$, which may lead to $${\mathcal{E}_{\tau }}\ll{{\texttt{tol}}_{\mathcal{G}\tau }}$$. Thanks to Lemma 4.4 we expect that (4.3) leads to an equal distribution of the errors in time in most cases. However, the case $$\max \left \{{\mathcal{E}_{\tau }},{\mathcal{E}_{f}}\right \}\ll \min \left \{{{\texttt{tol}}_{\mathcal{G}\tau }},{{\texttt{tol}}_{f}}\right \}$$ cannot be avoided theoretically; hence, we have supplemented (4.3) with the safeguard $$L^{\infty }$$ marking tolerance $$\tau{{\texttt{tol}}_{\mathcal{G}\tau }}$$; compare with lines 8 and 15 of ST_ADAPTATION. In particular, in order to not waste tolerances we have implemented the safeguard additively thanks to   $$\max\left\{\mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}, \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\right\}\leqslant \mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}+ \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\leqslant 2 \max\left\{\mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}, \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\right\}\!.$$ Note that in the above discussion we have concentrated on an equal distribution in time and have tacitly assumed that in each time step the local space indicators are optimally distributed, which is motivated by the optimal convergence analysis for elliptic problems; compare, e.g., with the studies by Cascon et al. (2008); Diening et al. (2016) and Stevenson (2007). Remark 4.6 We note that the if conditions in lines 15 and 8 of ST_ADAPTATION may involve additional parameters. For instance, line 8 may be replaced by   $$15:\mathbf{else}\ \mathbf{if} \ \mathcal{E}^2_c (U_{t}^{-},\tau ,\mathcal{G})\geqslant \gamma _{c}\, \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau ,\mathcal{G}) + \rho _{c}\mathcal{E}^2_f(f,t,\tau) +\sigma _{c} \tau{{\texttt{tol}}_{\mathcal{G}\tau }}\ \mathbf{then}$$with γc, ρc, σc > 0 and similar for the space indicator $$\mathcal{E}_{\mathcal{G}}$$ in line 15 with constants $$\gamma _{\mathcal{G}},\rho _{\mathcal{G}}, \sigma _{\mathcal{G}}>0$$. This requires some modifications of TAFEM, which would make the presentation more technical. For the sake of clarity of the presentation we decided to skip these customisation possibilities; compare also with Remark 5.3. 4.3 The main module TAFEM We are now in a position to formulate TAFEM in Algorithm 5 below. In the initialisation phase the given tolerance $${{\texttt{TOL}}}>0$$ is split into tolerances $${{\texttt{TOL}}_{0}},{{\texttt{TOL}}_{f}},{{\texttt{TOL}}_{\mathcal{G}\tau }}>0$$. Next ADAPT_INIT provides a sufficiently good approximation $$U_{0}$$ of the initial datum u0. The constant CT is computed in order to determine the right scaling of the tolerances in the module ST_ADAPTATION. In particular, the last term 2T in CT accounts for the $$L^{\infty }$$ safeguard marking, whereas the other terms result from (4.3) in combination with the estimate for $$\mathcal{E}_{\tau }$$ from Corollary 3.11; compare also with the proof of Theorem 5.2. Then the time-step iteration is entered, where each single time step consists of the following main steps. We first initialise the time-step size by CONSISTENCY and then conduct one coarsening step with COARSEN. The adaptation of the grid and time-step size with respect to the indicators for the spatial, temporal and coarsening error is done by ST_ADAPTATION. 5. Convergence In this section we first prove that the core modules and TAFEM terminate and then verify that the estimators and thus the error are below the given tolerance. Throughout the section we suppose that the black-box modules satisfy Assumption 4.1. Before turning to the main module ST_ADAPTATION, as an auxiliary result, we shall consider convergence of the adaptive finite element method for stationary elliptic problems of the kind (3.13), which have to be solved in each time step. Proposition 4.7 (Convergence for the elliptic problem). Suppose that $$v^- \in L^{2}(\varOmega )$$, $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$ and τ > 0. Then starting from any grid $$\mathcal{G}^{0}\in{\mathbb{G}}$$ we have for the sequence $$\{\mathcal{G}^{k},U_{\tau }^{k}\}_{k\geqslant 0}\subset{\mathbb{G}} \times \mathbb{P}_s ({\mathbb{V}})$$ generated by AFEM$$(v^{-},\bar f,t,\tau ,\mathcal{G}^{0})$$ that   $$\mathcal{E}_{\mathcal{G}}^{2}\left(U_\tau^k,v^-,\tau,t,\bar f,\mathcal{G}^k\right) \rightarrow 0\quad\textrm{as}\ k\to\infty.$$ Proof. Recalling Remark 3.9 we have that $$\mathcal{E}_{\mathcal{G}}^{2}(U_{\tau }^{k},v^{-},\tau ,0,\bar f,\mathcal{G}^{k})$$ are the standard residual-based a posteriori error estimators for the coercive problem (3.13). From Lemmas 3.2 and 3.3 and Assumption 4.1 on MARK_REFINE we have that the conditions of Siebert (2011) are satisfied. This yields the assertion. Lemma 4.8 (Termination of ST_ADAPTATION). For any t ∈ (0, T), τin ∈ (0, T − t], $$\mathcal{G},\mathcal{G}_{\textrm{old}}\in{\mathbb{G}}$$ and $$U_{t}^{-}\in{\mathbb{V}}(\mathcal{G}_{\textrm{old}})$$ we have that   $$\left(U_I,\tau,\bar f,\mathcal{G}\right)=\texttt{ST_ADAPTATION}{\left(U_t^-,f,t,\tau^{\textrm{in}},\mathcal{G}_{\textrm{in}},\mathcal{G}_{\textrm{old}}, {{\texttt{tol}}_{\mathcal{G}\tau}}\right)}$$terminates. Moreover, we have $$\mathcal{G}\geqslant \mathcal{G}_{0}$$, $$\mathcal{E}^2_\star ({U_{t}^{+}, U_{t}^{-},\tau ,\mathcal{G}})\leqslant 0$$,   $$\tau^{\textrm{in}}\geqslant \tau \geqslant \min\left\{\tau^{\textrm{in}},\frac{\kappa\,{\texttt{tol}}_{\mathcal{G}\tau}^{2}}{ 6 \left( \Vert\,{f}\Vert_{\varOmega\times(t,\, t+\tau)}^{2} + {{|||} U_{t}^{-}{|||}}_{\varOmega}^{2} \right)}\right\}$$and the indicators satisfy the tolerances   \begin{align*} \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G}) &\leqslant {\texttt{tol}}_{\mathcal{G}\tau}^{2}, \\ \mathcal{E}_{\mathcal{G}}^{2}\left(U_{I},U_{t}^{-},t,\tau,\bar f,\mathcal{G}\right)&\leqslant \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G}) +\mathcal{E}^2_f(f,t,\tau) + \tau{{\texttt{tol}}_{\mathcal{G}\tau}}, \\ \mathcal{E}^2_c(U_{t}^{-},\tau,\mathcal{G})&\leqslant \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G})+ \mathcal{E}^2_f(f,t,\tau) + \tau{{\texttt{tol}}_{\mathcal{G}\tau}}, \end{align*}where $$U_{t}^{+}=\lim _{s\searrow t}U_{(t,t+\tau ]}(s)$$. Proof. In each iteration of the loop in ST_ADAPTATION at first, a discrete solution UI is computed on the current grid $$\mathcal{G}$$ with the actual time-step size τ. Then either the time-step size is reduced or the actual grid is refined. More precisely, exactly one of the statements labelled $$\fbox{\textsf{A}}$$,…,$$\fbox{\textsf{D}}$$ in Algorithm 4 is executed, any of them terminating by Assumption 4.1. Whenever one of these statements is executed the corresponding indicator is positive. In statement $$\fbox{\textsf{A}}$$ the grid is refined due to the coarsening indicator $${\mathcal{E}_{c}}$$. Thanks to Assumption 4.1 (5), after a finite number of executions of $$\fbox{\textsf{A}}$$, a grid $$\mathcal{G}$$ is obtained with $$\mathcal{G}_{\textrm{old}}\leqslant \mathcal{G}$$ and thus $$\mathcal{E}^2_c (U_{t}^{-},\mathcal{G})=0$$, i.e., statement $$\fbox{\textsf{A}}$$ is not entered anymore. This happens irrespective of refinements in other statements. In statement $$\fbox{\textsf{B}}$$ the grid is refined with respect to the indicators $$\mathcal{E}_{*}$$ controlling the energy gain due to coarsening. Therefore, it follows from the same reasoning as for statement $$\fbox{\textsf{A}}$$ that statement $$\fbox{\textsf{B}}$$ is also executed at most until the coarsening is fully removed after finitely many refinement steps. Assume that statement $$\fbox{\textsf{C}}$$ is executed infinitely many times. Then according to the above considerations, we have that, after a finite number of iterations, statement $$\fbox{\textsf{B}}$$ is not executed anymore, i.e., (3.15) is always satisfied according to line 10. Consequently, the condition in line 12 and Corollary 3.11 imply   $$\frac{1}{\tau} \leqslant \frac{1}{\tau} 6\,\tau{{|||} U_{t}^{+}-\varPi_{\mathcal{G}} U_{t}^{-}{|||}}_{\varOmega}^{2} \frac{1}{{\texttt{tol}}_{\mathcal{G}\tau}^{2}} \leqslant \frac{1}{{\texttt{tol}}_{\mathcal{G}\tau}^{2}} 6\, \left( \Vert\,{f}\Vert_{\varOmega\times(t,t+\tau)}^{2} + {{|||} U^{-}_{t}{|||}}_{\varOmega}^{2} \right)\!,$$which contradicts the assumption that statement $$\fbox{\textsf{C}}$$ is executed infinitely many times. The same argument also proves the asserted lower bound on the finally accepted time-step size. Assuming that ST_ADAPTATION does not terminate we infer from the fact that all other statements are conducted only finitely many times, that statement $$\fbox{\textsf{D}}$$ has to be executed infinitely many times. In other words the loop reduces to the adaptive iteration AFEM with fixed data $$U_{t}^{-}$$, $$\bar f$$, t and τ. Therefore, Proposition 4.7 contradicts the condition in line 15. In summary, we deduce that ST_ADAPTATION terminates and the iteration is abandoned in line 18. This proves the assertion. We next address the termination of the main module TAFEM . Proposition 5.1 (Termination of TAFEM). The adaptive algorithm TAFEM terminates for any initial time-step size τ0 > 0 and produces a finite number of time instances $$0=t_{0}<\dots <t_{N}=T$$. Moreover, we have $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant {\texttt{TOL}}_{0}^{2}$$ and that the consistency error complies with (4.2). For all n = 1, … , N we have that the estimates in Lemma 4.8 are satisfied with t = tn−1, τ = τn, $$U_{I}=U_{|I_{n}}$$, $$U_{t}^{\pm }=U_{n-1}^{\pm }$$, $$\mathcal{G}=\mathcal{G}_{n}$$ and $$\mathcal{G}_{\textrm{old}}=\mathcal{G}_{n-1}$$. Proof. Each loop starts with setting the time-step size such that $$\tau _{n}\leqslant T-t_{n}$$, $$n\in{\mathbb{N}}$$. Thanks to Assumption 4.1 for the black-box modules, Lemma 4.2 for CONSISTENCY and Lemma 4.8 for ST_ADAPTATION, all modules of TAFEM terminate and in each time step the asserted properties are satisfied. Since we have $$\mathcal{E}^2_\star({U_{n-1}^{+}, U_{n-1}^{-},\tau _{n},\mathcal{G}_{n}})\leqslant 0$$ for all n we may conclude $${{|||} U_{n-1}^{-}{|||}}_{\varOmega}\leqslant \Vert\, {f}\Vert_{\varOmega \times (0,T)}^{2}+ {|||} U_0 {|||}_{\varOmega}$$ from Lemma 3.11 and thus it follows with Lemma 4.8 that   $$\tau^{\textrm{in}}_{n}\geqslant \tau_{n} \geqslant \min\left\{\tau^{\textrm{in}}_{n},\frac{\kappa\,{\texttt{tol}}_{\mathcal{G}\tau}^{2}}{12 \left( \Vert{f}\Vert_{\varOmega\times(0,T)}^{2} + { {|||} U_{0}{|||}}_{\varOmega}^{2} \right)}\right\}\!,$$where $$\tau ^{\textrm{in}}_{n}=\texttt{CONSISTENCY}(\,{f,t_{n-1},\tau _{n-1}}, {{\texttt{tol}}_{f}})$$. Assuming that the final time is not reached implies τn → 0 as $$n\to \infty$$ and therefore there exists $$n_{0}\in{\mathbb{N}}$$ such that τn = τnin for all n ⩾ n0. Now the contradiction follows as in step (1) of the proof of Lemma 4.4. Collecting the results derived above allows us to prove the main result. Theorem 5.2 (Convergence into tolerance). Algorithm TAFEM computes for any prescribed tolerance $${{\texttt{TOL}}}>0$$ and initial time-step size τ0 > 0 a partition $$0<t_{0}<\dots <t_{N}=T$$ with associated meshes $$\{\mathcal{G}_n\}_{n=0,\dots ,N}$$, such that we have for the corresponding approximation $${\mathcal{U}}\in \textrm{W}$$ from (2.7) that   $$\Vert{u-{\mathcal{U}}}\Vert_{\mathbb{W}(0,T)} \leqslant{{\texttt{TOL}}}.$$ Proof. Thanks to Proposition 5.1 we have that TAFEM terminates and it remains to prove the error bound. For the sake of brevity of the presentation we shall use the abbreviations   \begin{alignat*}{2} \mathcal{E}^2_{\tau}(n)&:= \mathcal{E}^2_{\tau}({U_{n-1}^{+},U_{n-1}^{-}, \tau_{n},\mathcal{G}_{n}}),&\qquad \mathcal{E}^2_f(n)&:= \mathcal{E}^2_f(\,f,t_{n-1},\tau_{n}), \\ \mathcal{E}_{\mathcal{G}}^{2}(n)&:=\mathcal{E}_{\mathcal{G}}^{2}({U,U_{n-1}^{-},t_{n-1},\tau_{n}, f_{n},\mathcal{G}_n}) &\quad\textrm{and}\quad \mathcal{E}^2_c (n)&:= \mathcal{E}^2_c ({U_{n-1}^{-},\tau_{n}, \mathcal{G}_n}). \end{alignat*} The initial error satisfies $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant {\texttt{TOL}}_{0}^{2}$$ by Assumption 4.1. Thanks to the choice of the precomputed local tolerance $${{\texttt{tol}}_{f}}$$ we know from Lemma 4.4 that the consistency error is bounded by $${{\texttt{TOL}}_{f}}$$, i.e., we have (4.2). When finalizing a time step, we also have from Lemma 5.1 that   $$\mathcal{E}^2_{\tau}(n) \leqslant {\texttt{tol}}_{\mathcal{G}\tau}^{2} \qquad \textrm{and} \qquad \mathcal{E}_{\mathcal{G}}^{2}(n), \mathcal{E}^2_c (n)\leqslant \mathcal{E}^2_\tau(n)+\mathcal{E}^2_f(n) +\tau_{n} {{\texttt{tol}}_{\mathcal{G}\tau}},$$with $${{\texttt{tol}}_{\mathcal{G}\tau }}={\texttt{TOL}}_{\mathcal{G}\tau }^{2}/C_{T}$$. Combining this with (3.16a) and (4.2) we conclude that  \begin{align*} \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\,\mathcal{E}_{c\tau}^{2}\left({U_{n-1}^{+}},{U_{n-1}^{-},\tau_{n}}\right) &\leqslant \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\mathcal{E}^2_c (n) +\mathcal{E}^2_{\tau}(n) \\ &\leqslant \sum_{n=1}^{N} 2\,\tau_{n} {{\texttt{tol}}_{\mathcal{G}\tau}}+2\,\mathcal{E}^2_f(n)+ 3\, \mathcal{E}^2_{\tau}(n) \\ &\leqslant 2 T \, {{\texttt{tol}}_{\mathcal{G}\tau}} + 2\,{\texttt{TOL}}_{f}^{2}+3 \sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n). \end{align*}Using Corollary 3.11 for the last term we get for any δ > 0 that   \begin{align*} \sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n) &= \sum_{\tau_{n}>\delta} \mathcal{E}^2_{\tau}(n) + \sum_{\tau_{n}\leqslant\delta} \mathcal{E}^2_{\tau}(n) \\ & \leqslant \frac{T}{\delta} {\texttt{tol}}_{\mathcal{G}\tau}^{2} + \delta \sum_{n=1}^{N} 6\, C_{\tau}\,{{|||}U_{n-1}^{+} -\varPi_{\mathcal{G}_{n}}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \\ &\leqslant \frac{T}{\delta} {\texttt{tol}}_{\mathcal{G}\tau}^{2} + \delta 6 \,C_{\tau} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}^{2} \right) \end{align*}and by choosing   $$\delta = \left( \frac{T}{ 6 \,C_{\tau} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}_{\varOmega}^{2} \right)}\right)^{\frac{1}{2}} {{\texttt{tol}}_{\mathcal{G}\tau}},$$ we obtain   $$\sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n) \leqslant 2\left( 6 C_{\tau} \,T \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + { \Vert U_{0}\Vert}_{\varOmega}^{2} \right) \right)^{\frac12} {{\texttt{tol}}_{\mathcal{G}\tau}} .$$Inserting this into the above estimate yields   \begin{align*} \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\mathcal{E}_{c\tau}^{2}\left({U_{n-1}^{+}},{U_{n-1}^{-},\tau_{n}}\right) &\leqslant \underbrace{\left( 6\,\sqrt{6 C_{\tau}\,T} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}_{\varOmega}^{2} \right)^{\frac12}+ 2 T \right)}_{=C_{T}}\, {{\texttt{tol}}_{\mathcal{G}\tau}}+2\,{\texttt{TOL}}_{f}^{2} \\ &\leqslant{\texttt{TOL}}_{\mathcal{G}\tau}^{2}+2\,{\texttt{TOL}}_{f}^{2}. \end{align*} Collecting the bounds for the indicators $${\mathcal{E}_{0}}$$, $${\mathcal{E}_{\mathcal{G}}},$$$${\mathcal{E}_{c\tau }}$$, and $${\mathcal{E}_{f}}$$, recalling the splitting   $${\texttt{TOL}}_0^2 + 3\,{\texttt{TOL}}_f^2+ {\texttt{TOL}}_{\mathcal{G}\tau}^2 = {{\texttt{TOL}}}^2$$and taking into account the upper bound of Proposition 3.4 proves the assertion. Remark 5.3 In order to guarantee the main result (Theorem 5.2) also for the modifications of Remark 4.6, line 5 in TAFEM must be changed to 5: compute $$C_{T}:= \left (1+\gamma _{c}+\gamma _{\mathcal{G}}\right )\,2\,\sqrt{6 \,C_{\tau } \, T} \left ( \Vert\,{f}\Vert_{\varOmega \times (0,T)}^{2} + {{|||} U_{0}^{-}{|||}}_{\varOmega}^{2} \right )^{\frac 12} + \left (\sigma _{c}+\sigma _{\mathcal{G}}\right )\,T$$. Moreover, the splitting of the tolerances in line 2 must be changed to 2: split tolerance $${{\texttt{TOL}}}>0$$ such that $${\texttt{TOL}}_{0}^{2}+(1+\rho _{\mathcal{G}}+\rho _{c}){\texttt{TOL}}_{f}^{2}+{\texttt{TOL}}_{\mathcal{G}\tau }^{2}={\texttt{TOL}}^{2}$$. 6. Numerical aspects and experiments We conclude the article by illustrating some practical aspects of the implementation with three numerical experiments. We compare the presented algorithm TAFEM with the algorithm ASTFEM introduced in the study by Kreuzer et al. (2012). 6.1 The implementation The experiments are implemented in DUNE (Blatt et al., 2016) using the DUNE-ACFEM ( https://www.dune-project.org/modules/dune-acfem/) module. The computations utilise linear finite elements in space and the dG(0) time-stepping scheme. All simulations were performed on an AMD® Opteron™7274 Processor with 128 GB RAM. Both algorithms TAFEM and ASTFEM start from exactly the same initial mesh $${\mathcal{G}_{\textrm{init}}}$$. The initial values are interpolated on the mesh and local refinements are performed in order to comply with the initial tolerance. On the resulting meshes the needed constants are computed (the minimal time-step size τ* for ASTFEM and $${{\texttt{tol}}_{f}}$$ from TOLFIND for TAFEM ). In order to control $$\mathcal{E}_{c}$$ and $$\mathcal{E}_{*}$$ the algorithms need to handle two meshes and corresponding finite element spaces at every new time step. This is realised by exploiting the tree structure of the refinements of macroelements as in the study by Kreuzer et al. (2012). At every new time step the time-step size is adjusted by the module CONSISTENCY and all elements of the mesh are marked to be coarsened up to two times and then adapted again if necessary. The mentioned estimators are computed only up to constants and used for the adaptive refinement progress. The spatial marking relies on the equidistrubution strategy, which marks every element with an estimator bigger than the arithmetic mean. The following remark lists the tolerance splitting used by ASTFEM. Remark 6.1 In the study by Kreuzer et al. (2012), ASTFEM uses the tolerance splitting   $${{\texttt{TOL}}}^{2}={{{\texttt{TOL}}}}_{0}^{2}+T\widetilde{{{\texttt{TOL}}}}_{f}^{2}+T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2}.$$Thereby, $${{\texttt{TOL}}}_{*}^{2}$$ is used to compute a minimal safeguard step size τ*. The method computes then an approximation $$\mathcal{U}\in \textrm{W}$$ to (2.2) such that   $$\mathcal{E}^2_0(u_{0},\mathcal{G}_{0})\leqslant{{\texttt{TOL}}}_{0}^{2},\qquad \sum_{n=1}^{N}\left\{\mathcal{E}^2_f(\,f,t_{n-1},\tau_{n})\right\}\leqslant T\widetilde{{{\texttt{TOL}}}}_{f}^{2}$$and   $$\sum_{n=1}^{N}\left\{ \mathcal{E}^2_0(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}) +\mathcal{E}^2_0(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n})\right\} \leqslant T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2}.$$This motivates the relations   $$T\widetilde{{{\texttt{TOL}}}}_{f}^{2}=3{{{\texttt{TOL}}}}_{f}^{2}\quad \textrm{and}\quad T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2} ={{\texttt{TOL}}}_{\mathcal{G}\tau}^{2}$$in the examples below. For the simulations presented below we have used the following comparable splittings for the two methods ASTFEM and TAFEM relative to the total tolerance $${{\texttt{TOL}}}$$: $${{\texttt{TOL}}}_{0}^{2}=\frac 1{10}{\texttt{TOL}}^{2}$$, $${{\texttt{TOL}}}_{f}^{2}=T\widetilde{{{\texttt{TOL}}}}^{2}_{f} =\frac 4{10} {{\texttt{TOL}}}^{2}$$, $${{\texttt{TOL}}}_{\mathcal{G}\tau }^{2}=T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau }^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2} =\frac 6{10} {{\texttt{TOL}}}^{2}$$, $$\widetilde{{{\texttt{TOL}}}}_{*}^{2} =\frac 1{100} {{\texttt{TOL}}}^{2}$$. 6.2 The experiments In this section we introduce the three numerical experiments in detail and discuss the numerical results. 6.2.1 Singularity in time This numerical experiment is constructed on the spatial domain $$\varOmega =(0,1)^{2}\subset{\mathbb{R}}^{2}$$ over the time interval (0, T) = (0, 2) with homogeneous Dirichlet boundary conditions and homogeneous initial data. The right-hand side f is chosen such that the exact solution is given by   $$u(\boldsymbol{x},t) = |t-\bar{t}|^{\alpha}\sin\big(\pi\big(x^{2}-x\big)t\big)\sin\big(\pi(y^{2}-y)t\big)$$with parameters $$\bar{t}=\frac{\pi }{3}$$ and α = 0.7. The graph of u has a singularity in time at $$t=\frac{\pi }{3}$$. Hence, the right-hand side contains the term $$\textrm{sgn}(t-\bar{t})\alpha |t-\bar{t}|^{\alpha -1}$$. A direct calculation shows that this term is L2-integrable but is not in H1. This particular example shows one main advantage of TAFEM . In fact, in contrast to ASTFEM, TAFEM does not require the right-hand side f to have a temporal derivative in L2 in order to control the consistency error $${\mathcal{E}_{f}}$$. As in the example of Section 6.2.2 the threshold-like version of ENLARGE from Remark 4.3 is used for the computations, with enlargement constant κ2 = 4. However, test runs with the fixed enlargement, i.e., $$\min \{\kappa _{2}\tau ,T-t\}=\texttt{ENLARGE}(\,f,t,\tau ,{{\texttt{tol}}_{f}})$$, produce nearly the same results. Fig. 1. View largeDownload slide DoFs and time-step sizes for the singularity in time problem. Fig. 1. View largeDownload slide DoFs and time-step sizes for the singularity in time problem. Fig. 2. View largeDownload slide The local error estimators $${\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}$$ for TAFEM and ASTFEM, as well as the sum of local $$L^{\infty }$$ indicators $$\frac 1\tau \left ({\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}\right )$$ used by ASTFEM for the singularity in time problem. Fig. 2. View largeDownload slide The local error estimators $${\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}$$ for TAFEM and ASTFEM, as well as the sum of local $$L^{\infty }$$ indicators $$\frac 1\tau \left ({\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}\right )$$ used by ASTFEM for the singularity in time problem. ASTFEM was killed after time step 321 in which 27 513668 DoFs are used as well as a time-step size of 7.3172e−10. As can be observed from Fig. 1, ASTFEM massively refines in time and space. It was killed before reaching the singularity at $$\bar{t}=\frac{\pi }{3}$$, thereby accumulating a total number of 1070889132 DoFs. The reason for this behaviour lies in the $$L^{\infty }$$ marking. Indeed, Fig. 2 shows that ASTFEM equally distributes the $$L^{\infty }$$ indicators, thereby leading to very small local errors, which cause the strong spatial refinement. Note that the minimal step size τ* in ASTFEM applies only when temporal refinement is performed due to the time indicator $${\mathcal{E}_{\tau }}$$, i.e., time-step sizes below the threshold τ* can be chosen when required by the consistency estimator $$\mathcal{E}_{f}$$, which is the case close to the singularity. Consequently, the behaviour of ASTFEM cannot essentially be improved by a different choice of $${{\texttt{TOL}}}_{*}$$. In contrast, the local estimators in TAFEM appear to be quite equally distributed. It uses slightly larger time steps and far fewer DoFs close to the singularity; compare with the table of Table 1. It completely outperforms ASTFEM and reaches the final time with a total of 2481808 DoFs in 575 time steps. Table 1 Time steps and DoFs of ASTFEM and TAFEM for the singularity in time problem Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Table 1 Time steps and DoFs of ASTFEM and TAFEM for the singularity in time problem Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  6.2.2 Jumping singularity Inspired by Morin et al. (2000, example 5.3) we construct an experiment where the solution has a strong spatial singularity that changes its position in time. In the domain Ω × (0, 4], with Ω = (0, 3) × (0, 3), we define the elliptic operator $$\mathcal{L}$$u = −div A∇u, where   $${\mathbf{A}}(t,x) = \begin{cases} a_{1}\mathbb{I}\qquad & \textrm{if } (x-x_{i})(y-y_{i})\geqslant0,\\ a_{2}\mathbb{I}\qquad & \textrm{if } (x-x_{i})(y-y_{i})<0, \end{cases}$$with a1 = 161.4476387975881, a2 = 1, i = ⌈t⌉, (x1, y1) = (1, 2), (x2, y2) = (1, 1), (x3, y3) = (2, 1) and (x4, y4) = (2, 2). This operator will ‘move’ the singularity through the points xi. Let u be the function   $$u(x,t) = \sum_{i=1}^{4} s_{i}(t)\ r_{i}^{\gamma}\ \mu(\theta_{i}),$$where   $$s_{i}(t) = \begin{cases} (t-(i-1))^{2}(t-i)^{2}\quad &\textrm{if } i-1\leqslant t \leqslant i,\\ 0\qquad & \textrm{otherwise}, \end{cases}$$and   $$\mu(\theta) = \begin{cases} \cos\left(\left(\frac{\pi}{2}-\sigma\right)\gamma\right)\cos\left(\left(\theta-\frac{\pi}{2}+\rho\right)\gamma\right) \quad& \textrm{if } 0\leqslant\theta<\frac{1}{2}\pi,\\ \cos(\rho\gamma)\cos((\theta-\pi+\sigma)\gamma) & \textrm{if } \frac{1}{2}\pi\leqslant\theta<\pi,\\ \cos(\sigma\gamma)\cos((\theta-\pi-\rho)\gamma) & \textrm{if } \pi\leqslant\theta<\frac{3}{2}\pi,\\ \cos\left(\left(\frac{\pi}{2}-\rho\right)\gamma\right)\cos\left(\left(\theta-\frac{3\pi}{2}-\sigma\right)\gamma\right) & \textrm{if } \frac{3}{2}\pi\leqslant\theta<2\pi,\\ \end{cases}$$with γ = 0.1, $$\rho =\frac{\pi }{4}$$, σ = −14.92256510455152, $$x-x_{i}=r_{i}\cos (\theta _{i})$$ and $$y-y_{i}=r_{i}\sin (\theta _{i})$$. It is easy to check that u satisfies   $$\partial_{t} u(x,t) +\mathcal{L} u(x,t) = \sum_{i=1}^{4} r_{i}^{\gamma} \mu(\theta_{i})\ \partial_{t} s_{i}(t).$$ Based on the ideas presented in Remark 6.1 we compare TAFEM and ASTFEM with the same tolerance $${{\texttt{TOL}}}=0.007$$. ASTFEM makes excessive use of the nonstandard exit, i.e., the time-step sizes equal minimal time-step size τ* = 0.0123477 for 231 of a total of 302 time steps, and uses a total of 893 771 DoFs. The L2−H1 error is 0.0546689, the L2−L2 error is 0.0355061 and the total computation time was 628.398 seconds. TAFEM uses a total of 508 352 DoFs in 228 time steps. The $$L^{2}\left (0,4,H^{1}(\varOmega )\right )$$ error is 0.0563979, the $$L^{2}\left (0,4,L^{2}(\varOmega )\right )$$ error is 0.0367746 and the total computation time was 646.126 seconds (including TOLFIND). By looking at Fig. 3 we see that TAFEM makes more use of the spatial and temporal adaptivity and achieves a similar result with slightly less effort. The adaptive meshes generated by TAFEM are displayed in Fig. 4. We see that the spatial adaptivity captures the position of the singularity by local refinement and coarsens the region when the singularity has passed by. Fig. 3. View largeDownload slide DoFs and time-step sizes for the jumping singularity problem. Fig. 3. View largeDownload slide DoFs and time-step sizes for the jumping singularity problem. Fig. 4. View largeDownload slide Adaptive grids for the jumping singularity problem. Fig. 4. View largeDownload slide Adaptive grids for the jumping singularity problem. Fig. 5. View largeDownload slide The local time indicator $$\mathcal{E}_{\tau}^{2}$$ for TAFEM and ASTFEM, as well as the local $$L^{\infty }$$ indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ used by ASTFEM for the rough initial data problem. Fig. 5. View largeDownload slide The local time indicator $$\mathcal{E}_{\tau}^{2}$$ for TAFEM and ASTFEM, as well as the local $$L^{\infty }$$ indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ used by ASTFEM for the rough initial data problem. Fig. 6. View largeDownload slide DoFs and time-step sizes for the rough initial data problem. Fig. 6. View largeDownload slide DoFs and time-step sizes for the rough initial data problem. Fig. 7. View largeDownload slide Adapted meshes generated with ASTFEM (a−d) and TAFEM (e−h) for the rough initial data problem. Fig. 7. View largeDownload slide Adapted meshes generated with ASTFEM (a−d) and TAFEM (e−h) for the rough initial data problem. The advantages of TAFEM come fully into their own in the presence of singularities in time (see Section 6.2.1). For regular (in time) problems, TAFEM is expected to perform similarly to ASTFEM up to the disadvantage that, at the beginning, the module TOLFIND needs several adaptive iterations over the time span, whereas the computation for the minimal time-step size in ASTFEM iterates only once over the time. This is reflected in the comparable computing times for the jumping singularity problem. 6.2.3 Rough initial data We conclude with an example inspired by Kreuzer et al. (2012, numerical experiment 5.3.2) the numerical experiment 5.3.2 with homogeneous Dirichlet boundary conditions and homogeneous right-hand side f ≡ 0. As initial data we choose a checkerboard pattern over Ω = (0, 1)2 where u0 ≡−1 on $$\varOmega _{1}=\left (\frac{1}{3},\frac{2}{3}\right )\times \left ( \left (0,\frac{1}{3}\right )\cup \left (\frac{2}{3},1\right ) \right )\ \cup \ \left ( \left (0,\frac{1}{3}\right )\cup \left (\frac{2}{3},1\right ) \right ) \times \left (\frac{1}{3},\frac{2}{3}\right )$$, u0 ≡ 1 on Ω ∖ Ω1 and u0 ≡ 0 on ∂Ω. Starting with an initial mesh with only 5 DoFs, the approximation of u0 uses Lagrange interpolation and refines the mesh until $$\|U_{0}-u_{0}\|_{\varOmega }^{2}\leqslant {\texttt{TOL}}_{0}^{2} = 10^{-2}$$ is fulfilled. For this example we choose $$\min \left \{\kappa _{2}\tau ,T-t\right \}=\texttt{ENLARGE}\left (f,t,\tau ,{{\texttt{tol}}_{f}}\right )$$. In fact, since f ≡ 0, using the threshold-like version of ENLARGE from Remark 4.3 would yield $$T-t=\texttt{ENLARGE}\left (f,t,\tau ,{{\texttt{tol}}_{f}}\right )$$ and then in ST_ADAPTATION the time step must be reduced according to $${\mathcal{E}_{\tau, }}$$ again requiring the solution of an expensive linear system in each iteration. This slows down TAFEM by a factor of about 10. Starting ASTFEM and TAFEM with a tolerance of $${{\texttt{TOL}}}=10^{-1}$$ and running to the final time T = 1 we get the following results: ASTFEM needs 811 time steps, a total of 436 199 377 DoFs, with an estimated total error of 0.0230905 and a total computation time of 144 114 seconds. ASTFEM makes use of the nonstandard exit for the first 270 time steps, with minimal time-step size of τ* = 7.77573e-7; the small size of the time steps at the beginning is also accompanied by extreme spatial refinements contributing to the large total number of DoFs. This is due to the $$L^{\infty }$$-strategy that aims to equally distribute the time-indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ rather than $$\mathcal{E}_{\tau}^{2}$$. In order to highlight this effect close to the initial time we have used a log scale for the time in Figs. 5 and 6. TAFEM needs only 81 time steps and a total of 672 159 DoFs, resulting in an estimated total error of 0.0574711. It is about 725 times faster with a total computation time of 198.682 seconds (including TOLFIND). TAFEM refines the mesh initially and then almost steadily coarsens in time and space (see Figs. 6 and 7 (e−h)). Figure 5 shows that the time indicators $$\mathcal{E}_{\tau}^{2}$$ are nearly equally distributed. Both algorithms reduce the spatial resolution once the singular behaviour of the solution is reduced; see Figs. 6 and 7. Funding Deutsche Forschungsgemeinschaft (DFG) project SI 814/7-1. References Bänsch, E. ( 1991) Local mesh refinement in 2 and 3 dimensions. IMPACT Comput. Sci. Eng. , 3, 181– 191. Google Scholar CrossRef Search ADS   Binev, P., Dahmen, W. & DeVore, R. A. ( 2004) Adaptive finite element methods with convergence rates. Numer. Math. , 97, 219– 268. Google Scholar CrossRef Search ADS   Binev, P., Dahmen, W., DeVore, R. A. & Petrushev, P. ( 2002) Approximation classes for adaptive methods. Serdica Math. J. , 28, 391– 416. Blatt, M., Burchardy, A., Dedner, A., Engwer, C., Fahlle, J., Flemisch, B., Gersbacher, C., Gräser, C., Gruber, F., Grüninger, C., Kempf, D., Klöfkorn, R., Malkmus, T., Müthing, S., Nolte, M., Piatkowski, M. & Sander, O. ( 2016) The distributed and unified numerics environment (DUNE) Arch. Numerical Software , 100, 13– 29. Cascon, J. M., Kreuzer, C., Nochetto, R. H. & Siebert, K. G. ( 2008) Quasi-optimal convergence rate for an adaptive finite element method. SIAM J. Numer. Anal. , 46, 2524– 2550. Google Scholar CrossRef Search ADS   Chen, Z. & Feng, J. ( 2004) An adaptive finite element algorithm with reliable and efficient error control for linear parabolic problems. Math. Comp. , 73, 1167– 1193. Google Scholar CrossRef Search ADS   Clément, P. ( 1975) Approximation by finite element functions using local regularization. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. RAIRO Analyse Numérique , 9, 77– 84. Davis, P. J. & Rabinowitz, P. ( 1984) Methods of Numerical Integration, 2nd edn. Computer Science and Applied Mathematics . Orlando, FL: Academic Press. Diening, L. & Kreuzer, C. ( 2008) Convergence of an adaptive finite element method for the p-Laplacian equation. SIAM J. Numer. Anal. , 46, 614– 638. Google Scholar CrossRef Search ADS   Diening, L., Kreuzer, C. & Stevenson, R. ( 2016) Instance optimality of the adaptive maximum strategy. Found. Comut. Math. , 16, 33– 68. Google Scholar CrossRef Search ADS   Dörfler, W. ( 1996) A convergent adaptive algorithm for Poisson’s equation. SIAM J. Numer. Anal. , 33, 1106– 1124. Google Scholar CrossRef Search ADS   Eriksson, K. & Johnson, C. ( 1991) Adaptive finite element methods for parabolic problems I: a linear model problem. SIAM J. Numer. Anal. , 28, 43– 77. Google Scholar CrossRef Search ADS   Eriksson, K. & Johnson, C. ( 1995) Adaptive finite element methods for parabolic problems II: optimal error estimates in $$L_\infty L_2$$ and $$L_\infty L_\infty$$. SIAM J. Numer. Anal. , 32, 706– 740. Ern, A., Smears, I. & Vohralík, M. ( 2017) Guaranteed, locally space-time efficient, and polynomial-degree robust a posteriori error estimates for high-order discretizations of parabolic problems. SIAM J. Numer. Anal. , 55, 2811– 2834. Google Scholar CrossRef Search ADS   Evans, L. C. ( 2010) Partial Differential Equations, 2nd edn. Graduate Studies in Mathematics , vol. 19. Providence, RI: American Mathematical Society. Gaspoz, F. D. & Morin, P. ( 2014) Approximation classes for adaptive higher order finite element approximation. Math. Comp. , 83, 2127– 2160. Google Scholar CrossRef Search ADS   Gilbarg, D. & Trudinger, N. S. ( 2001) Elliptic Partial Differential Equations of Second Order . Classics in Mathematics. Berlin: Springer. Kossaczký, I. ( 1994) A recursive approach to local mesh refinement in two and three dimensions. J. Comput. Appl. Math. , 55, 275– 288. Google Scholar CrossRef Search ADS   Kreuzer, C., Möller, C. A., Schmidt, A. & Siebert, K. G. ( 2012) Design and convergence analysis for an adaptive discretization of the heat equation. IMA J. Numer. Anal. , 32, 1375– 1403. Google Scholar CrossRef Search ADS   Kreuzer, C. & Siebert, K. G. ( 2011) Decay rates of adaptive finite elements with Dörfler marking. Numer. Math. , 117, 679– 716. Google Scholar CrossRef Search ADS   Lakkis, O. & Makridakis, C. ( 2006) Elliptic reconstruction and a posteriori error estimates for fully discrete linear parabolic problems. Math. Comp. , 75, 1627– 1658. Google Scholar CrossRef Search ADS   Makridakis, C. & Nochetto, R. H. ( 2003) Elliptic reconstruction and a posteriori error estimates for parabolic problems. SIAM J. Numer. Anal. , 41, 1585– 1594. Google Scholar CrossRef Search ADS   Makridakis, C. & Nochetto, R. H. ( 2006) A posteriori error analysis for higher order dissipative methods for evolution problems. Numer. Math. , 104, 489– 514. Google Scholar CrossRef Search ADS   Maubach, J. M. ( 1995) Local bisection refinement for n-simplicial grids generated by reflection. SIAM J. Sci. Comput. , 16, 210– 227. Google Scholar CrossRef Search ADS   Mekchay, K. & Nochetto, R. H. ( 2005) Convergence of adaptive finite element methods for general second order linear elliptic PDEs. SIAM J. Numer. Anal. , 43, 1803– 1827. Google Scholar CrossRef Search ADS   Morin, P., Nochetto, R. H. & Siebert, K. G. ( 2000) Data oscillation and convergence of adaptive FEM. SIAM J. Numer. Anal. , 38, 466– 488. Google Scholar CrossRef Search ADS   Morin, P., Nochetto, R. H. & Siebert, K. G. ( 2002) Convergence of adaptive finite element methods. SIAM Rev. , 44, 631– 658. Google Scholar CrossRef Search ADS   Morin, P., Siebert, K. G. & Veeser, A. ( 2008) A basic convergence result for conforming adaptive finite elements. Math. Models Methods Appl. Sci. , 18, 707– 737. Google Scholar CrossRef Search ADS   Nochetto, R. H., Siebert, K. G. & Veeser, A. ( 2009) Theory of adaptive finite element methods: an introduction. Multiscale, Nonlinear and Adaptive Approximation . Berlin: Springer, pp. 409– 542. Picasso, M. ( 1998) Adaptive finite elements for a linear parabolic problem. Comput. Methods Appl. Mech. Eng. , 167, 223– 237. Google Scholar CrossRef Search ADS   Schmidt, A. & Siebert, K. G. ( 2005) Design of Adaptive Finite Element Software: The Finite Element Toolbox ALBERTA . Lecture Notes in Computational Science and Engineering, vol. 42. Berlin: Springer. Schwab, C. & Stevenson, R. ( 2009) Space-time adaptive wavelet methods for parabolic evolution problems. Math. Comp. , 78, 1293– 1318. Google Scholar CrossRef Search ADS   Siebert, K. G. ( 2011) A convergence proof for adaptive finite elements without lower bound. IMA J. Numer. Anal. , 31, 947– 970. Google Scholar CrossRef Search ADS   Smears, I. ( 2017) Robust and efficient preconditioners for the discontinuous Galerkin time-stepping method. IMA J. Numer. Anal. , 37, 1961– 1985. Stevenson, R. ( 2007) Optimality of a standard adaptive finite element method. Found. Comput. Math. , 7, 245– 269. Google Scholar CrossRef Search ADS   Tantardini, F. & Veeser, A. ( 2016) The L2−projection and quasi-optimality of Galerkin methods for parabolic equations. SIAM J. Numer. Anal. , 54, 317– 340. Google Scholar CrossRef Search ADS   Thomée, V. ( 2006) Galerkin Finite Element Methods for Parabolic Problems , 2nd edn. Springer Series in Computational Mathematics, vol. 25. Berlin: Springer. Traxler, C. T. ( 1997) An algorithm for adaptive mesh refinement in n dimensions. Computing , 59, 115– 137. Google Scholar CrossRef Search ADS   Verfürth, R. ( 2003) A posteriori error estimates for finite element discretizations of the heat equation. Calcolo , 40, 195– 212. Google Scholar CrossRef Search ADS   Verfürth, R. ( 2013) A Posteriori Error Estimation Techniques for Finite Element Methods . Numerical Mathematics and Scientific Computation . Oxford: Oxford University Press. Google Scholar CrossRef Search ADS   © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png IMA Journal of Numerical Analysis Oxford University Press

# A convergent time–space adaptive dG(s) finite element method for parabolic problems motivated by equal error distribution

, Volume Advance Article – Apr 4, 2018
37 pages

Publisher
Oxford University Press
ISSN
0272-4979
eISSN
1464-3642
D.O.I.
10.1093/imanum/dry005
Publisher site
See Article on Publisher Site

### Abstract

Abstract We shall develop a fully discrete space-time adaptive method for linear parabolic problems based on new, reliable and efficient a posteriori analysis for higher order dG(s) finite element discretisations. The adaptive strategy is motivated by the principle of equally distributing the a posteriori indicators in time and the convergence of the method is guaranteed by the uniform energy estimate from Kreuzer et al. (2012, Design and convergence analysis for an adaptive discretization of the heat equation. IMA J. Numer. Anal., 32, 1375–1403) under minimal assumptions on the regularity of the data. 1. Introduction Let Ω be a bounded polyhedral domain in $${\mathbb{R}}^{d}$$, $$d\in{\mathbb{N}}$$. We consider the linear parabolic partial differential equation   \begin{align} \begin{aligned} {\partial_{t}} u + \mathcal{L} u &= f \qquad &&\textrm{in } \varOmega \times (0,T), \\ u &= 0 &&\textrm{on } \partial\varOmega\times (0,T),\\ u(\cdot,0) &= u_{0} &&\textrm{in } \varOmega. \end{aligned} \end{align} (1.1)Hereafter, $$\mathcal{L} u=-\textrm{div}\mathbf{A}\nabla u + cu$$ is a second-order elliptic operator with respect to space and $${\partial _{t}} u=\frac{\partial u}{\partial t}$$ denotes the partial derivative with respect to time. In the simplest setting $$\mathcal{L}=-\varDelta$$, whence (1.1) is the heat equation. Precise assumptions on data are provided in Section 2.1. The objective of this paper is the design and a detailed convergence analysis of an efficient adaptive finite element method for solving (1.1) numerically. To this end we resort to adaptive finite elements in space combined with a discontinuous Galerkin dG(s) time-stepping scheme in Section 2.2, compared with Thomée (2006). The conforming finite element spaces are continuous piecewise polynomials of fixed degree over a simplicial triangulation of the domain Ω. In each single we reduce or enlarge the local time-step size and refine and coarsen the underlying triangulation. The adaptive decisions are based on a posteriori error indicators. Numerous such estimators for various error notions are available in the literature. In fact, Eriksson & Johnson (1991, 1995) have analysed adaptive dG(s) methods for parabolic problems in their studies providing a priori and a posteriori bounds for the $$L^{\infty }\left (L^{2}\right )$$ error using duality techniques. Lakkis & Makridakis (2006) proved a posteriori bounds based on the elliptic reconstruction technique, which was introduced in the study by Makridakis & Nochetto (2003) in the semidiscrete context. The $$L^{2}\left (H^{1}\right )$$, respectively $$L^{2}\left (H^{1}\right )\cap H^{1}\left (H^{-1}\right )$$, error bounds in the studies by Picasso (1998) and Verfürth (2003) are based on energy techniques and have been used with a dG(0) time-stepping scheme in the adaptive methods and convergence analysis presented in the studies by Chen & Feng (2004) and Kreuzer et al. (2012). For our purpose we generalise the residual-based estimator (Verfürth, 2003) to higher-order dG(s) schemes in Section 3. The estimator is built from five indicators: an indicator for the initial error, indicators for the temporal and spatial errors, a coarsening error indicator and an indicator controlling the so-called consistency error. It is important to notice that besides the first indicator all other indicators accumulate in L2 in time. The adaptation of the time-step size uses information of the indicators for the temporal error and the consistency error. The adaptation of the spatial triangulation is based on refinement by bisection using information from the indicators for the spatial error and for the coarsening error. Very recently, an independently developed guaranteed a posteriori estimator for higher-order dG(s) schemes was provided in the study by Ern et al. (2017) using equilibrated flux-based bounds for the spatial error. By now the convergence and optimality of adaptive methods for stationary inf–sup stable, respectively coercive, problems are well established (Dörfler, 1996; Morin et al., 2000, 2002; Binev et al., 2004; Chen & Feng, 2004; Mekchay & Nochetto, 2005; Stevenson, 2007; Cascon et al., 2008; Diening & Kreuzer, 2008; Morin et al., 2008; Kreuzer & Siebert, 2011; Siebert, 2011; Diening et al., 2016); compare also with the overview article by Nochetto et al. (2009). The essential design principle motivating the adaptive strategies in most of the above methods is the equal distribution of the error. The importance of this principle is highlighted by the near characterisations of nonlinear approximation classes with the help of a thresholding algorithm in the studies by Binev et al. (2002) and Gaspoz & Morin (2014). In contrast to the situation for above-mentioned problems the convergence analysis of adaptive approximation of time-dependent problems is still in its infancy. In the study by Schwab & Stevenson (2009) optimal computational complexity of an adaptive wavelet method for parabolic problems is proved using a symmetric and coercive discretisation based on a least squares formulation. To the best of our knowledge there exist only two results (Chen & Feng, 2004; Kreuzer et al., 2012) concerned with a rigorous convergence analysis of time-space adaptive finite element methods. In the study by Chen & Feng (2004) it is proved for the dG(0) time-stepping scheme that each single terminates and that the error of the computed approximation is below a prescribed tolerance when the final time is reached. This, however, is not guaranteed and thus theoretically the adaptively generated sequence of time instances {tn}n≥0 may not be finite and such that tn → t⋆ < T as $$n\to \infty$$. This drawback has been overcome in the study by Kreuzer et al. (2012) with the help of an a priori computed minimal time-step size in terms of the right-hand side f and the discrete initial value U0. However, the design of neither of the two methods heeds the principle of equally distributing the error. Let us shed some light on this fact with the help of the initial value problem   $${\partial_{t}} u + u = f\quad\textrm{in}\ (0,T) \qquad\textrm{and}\qquad u(0)=u_{0}.$$Let 0 = t0 < t1 < ⋯ < tN = T be some partition of (0, T). Using the dG(0) time-stepping scheme we obtain $$\{U_{n}\}_{n=0}^{N}$$ such that   $$\frac{U_{n}-U_{n-1}}{\tau_{n}}+U_{n}=f_{n}:=\frac 1{\tau_{n}}\int_{t_{n-1}}^{t_{n}}f\,\textrm{d}t,\quad n=1,\ldots, N,\ U_{0}=u_{0},$$where τn = tn − tn−1. Let $${\mathcal{U}}$$ be the piecewise affine interpolation $${\mathcal{U}}$$ of the nodal values $$\{U_{n}\}_{n=0}^{N}$$. Then we have with Young’s inequality that   \begin{align*} {\int_{0}^{T}}\frac12 {\partial_{t}}|u-{\mathcal{U}}|^{2}+|u-{\mathcal{U}}|^{2}\,\textrm{d}t&= \sum_{n=1}^{N}\int_{t_{n-1}}^{t_{n}} (f-f_{n})(u-{\mathcal{U}})+(U_{n}-{\mathcal{U}})( u-{\mathcal{U}})\,\textrm{d}t \\ &\leqslant \sum_{n=1}^{N}\int_{t_{n-1}}^{t_{n}} |\,f-f_{n}|^{2} + |U_{n}-{\mathcal{U}}|^{2}+ \frac12 |u-{\mathcal{U}}|^{2}\,\textrm{d}t. \end{align*} A simple computation reveals $$\int _{t_{n-1}}^{t_{n}}|U_{n}-{\mathcal{U}}|^{2}\,\textrm{d}t= \frac 13 \tau _{n} |U_{n}-U_{n-1}|^{2}$$. This term and $$\int _{t_{n-1}}^{t_{n}}|\,f-f_{n}|^{2}\,\textrm{d}t$$ are the so-called time and consistency a posteriori indicators. In order to illustrate the basic differences in the design of the adaptive schemes we shall concentrate on the time indicator. In the studies by Chen & Feng (2004) and Kreuzer et al. (2012) the partition is constructed such that   $$|U_{n}-U_{n-1}|^{2} \le \frac{{\texttt{TOL}}^{2}}{T},\quad\textrm{which implies} \quad \sum_{n=1}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}\leqslant \sum_{n=1}^{N} \tau_{n} \frac{{\texttt{TOL}}^{2}}{T}={\texttt{TOL}}^{2},$$i.e., the accumulated indicator is below the prescribed tolerance $${\texttt{TOL}}$$. We call this the $$L^{\infty }$$-strategy and remark that it does not aim to equally distribute the local indicators. In contrast to this we shall use the L2-strategy   $$\tau_{n}|U_{n}-U_{n-1}|^{2}\le{\texttt{tol}}^{2}.$$Thanks to the uniform energy bound   \begin{align} \sum_{n=1}^{N}|U_{n}-U_{n-1}|^{2}\leqslant{\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2} \end{align} (1.2)(see Corollary 3.11 below) we conclude then that   \begin{align*} \sum_{n=1}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}=&\,\sum_{\tau_{n}\leqslant\delta}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2}+\sum_{\tau_{n} \gt \delta}^{N} \tau_{n} |U_{n}-U_{n-1}|^{2} \\ \leqslant&\, \delta\, \left({\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right)+\frac{T}{\delta} {\texttt{tol}}^{2} =T^{1/2}\left({\int_{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right)^{\frac12}\,{\texttt{tol}}, \end{align*}where $$\delta ={\texttt{tol}}\,T^{1/2}/\left ({\int _{0}^{T}}|\,f|^{2}\,\textrm{d}t +|U_{0}|^{2}\right )^{1/2}$$. Taking $${\texttt{tol}}={\texttt{TOL}}^{2}/\delta$$ guarantees that the error is below the prescribed tolerance $${\texttt{TOL}}$$. These arguments directly generalise to semidiscretisations of (1.1) in time. In the case of a full space-time discretisation of (1.1) additional indicators regarding the space discretisation are involved. We shall roughly explain the principle adaptive strategies of algorithm TAFEM for (1.1). In a preprocessing step an L2-tolerance for handling the consistency error is computed. Then in each we first control the consistency error with an L2-strategy since this does not require solving expensive linear systems. Second, we control the time indicator using an L2-strategy and (1.2). Unfortunately, for the spatial indicator a control similar to (1.2) is not available; therefore, we enforce in a third step that the spatial indicators are bounded by the time or the consistency indicator. In the case where these indicators are equally distributed in time and no massive undershooting appears, this likely results also in an equal distribution of the spatial indicators. In order to handle the other case we include the $$L^{\infty }$$-strategy from the studies by Chen & Feng (2004) and Kreuzer et al. (2012) for the spatial indicators as a backup strategy. The detailed algorithm TAFEM is presented in Section 4 and its convergence analysis is given in Section 5. The advantage of our new approach over the algorithms in Chen & Feng (2004) and Kreuzer et al. (2012) is twofold. First, from the fact that TAFEM aims for an equal distribution of the error, we expect an improved performance. Second, we use an L2-strategy for the consistency error, which requires only L2-regularity of f in time instead of the H1-regularity needed for the $$L^{\infty }$$-strategy in the studies by Chen & Feng (2004) and Kreuzer et al. (2012). This makes the proposed method suitable for problems where the existing approaches may even fail completely. We conclude the paper in Section 6 with comments on the implementation in DUNE (Blatt et al., 2016) and some numerical experiments. The experiments confirm the expectations and show that the performance of our algorithm TAFEM is more than competitive. 2. The continuous and discrete problems In this section we state the weak formulation of the continuous problem together with the assumptions on data. Then the discretisation by adaptive finite elements in space combined with the dG(s) scheme in time is introduced. 2.1 The weak formulation For $$d\in{\mathbb{N}}$$, let $$\varOmega \subset{\mathbb{R}}^{d}$$ be a bounded, polyhedral domain that is meshed by some conforming simplicial mesh $${\mathcal{G}_{\textrm{init}}}$$. We denote by H1(Ω) the Sobolev space of square integrable functions L2(Ω) whose first derivatives are in L2(Ω) and we let $${\mathbb{V}}:={H_{0}^{1}}(\varOmega )$$ be the space of functions in H1(Ω) with vanishing trace on ∂Ω. For any measurable set ω and $$k\in{\mathbb{N}}$$ we denote by $$\Vert{\cdot }\Vert _\omega$$ the $$L^{2}(\omega ;{\mathbb{R}}^{k})$$-norm, whence $$\Vert \upsilon \Vert ^2_{H^{1}(\varOmega )}=\Vert{\upsilon }\Vert ^2_\varOmega +\Vert \nabla \upsilon \Vert ^{2}_\varOmega .$$ We suppose that the data of (1.1) have the following properties: $${\mathbf{A}}\!:\!\varOmega \!\rightarrow\! {\mathbb{R}}^{d\times d}$$ is piecewise Lipschitz over $${\mathcal{G}_{\textrm{init}}}$$ and is symmetric positive definite with eigenvalues $$0< a_{*}\leqslant a^{\ast }<\infty$$, i.e.,   \begin{align} a_{*}|\boldsymbol{\xi}|^{2}\leqslant{\mathbf{A}}(x)\boldsymbol{\xi}\cdot\boldsymbol{\xi}\leqslant a^{\ast}|\boldsymbol{\xi}|^{2} \qquad \textrm{for all } \boldsymbol{\xi}\in{\mathbb{R}}^{d}\!,\; x\in\varOmega; \end{align} (2.1)$$c\in L^{\infty }(\varOmega )$$ is non-negative, i.e., c ⩾ 0 in Ω; $$f\in L^{2}\left ((0,T);L^{2}(\varOmega )\right )= L^{2}(\varOmega \times (0,T))$$ and u0 ∈ L2(Ω). We next turn to the weak formulation of (1.1); compare with Evans (2010, Chap. 7). We let $$\mathcal{B}\!:{\mathbb{\!V}}\times{\mathbb{V}}\!\rightarrow\! {\mathbb{R}}$$ be the symmetric bilinear form associated to the weak form of the elliptic operator $$\mathcal{L}$$, i.e.,   $${\mathcal{B}[w,\,v]} \mathrel{:=} \int_{\varOmega} {\mathbf{A}}\nabla v \cdot \nabla w +c\,vw \,\textrm{d}V \qquad\textrm{for all } v,\,w\in{\mathbb{V}}.$$Recalling the Poincaré–Friedrichs inequality $$\Vert{v}\Vert _{\varOmega }\leqslant{C_{F}}\Vert \nabla{v}\Vert _\varOmega$$ for all $$v\in{\mathbb{V}}$$ with CF = CF(d, Ω) (Gilbarg & Trudinger, 2001, p. 158) we deduce from (2.1) that $$\mathcal{B}$$ is a scalar product on $${\mathbb{V}}$$ with induced norm   $$|||{v}|||^{2}:={\mathcal{B}[v,\,v]}= \int_{\varOmega} {\mathbf{A}} \nabla v\cdot\nabla v + cv^{2}\,\textrm{d}V \qquad \textrm{for all } v\in{H_{0}^{1}}(\varOmega).$$This energy norm is equivalent to the H1-norm $$\Vert \cdot \Vert _{H^{1}(\varOmega )}$$ and we shall use the energy norm in the subsequent analysis. We denote the restriction of the energy norm to some subset ω ⊂ Ω by $${|||}\cdot{|||}_\omega$$ and let $${\mathbb{V}}^{\ast }\mathrel{:=} H^{-1}(\varOmega )$$ be the dual space of $${H^{1}_{0}}(\varOmega )$$ equipped with the operator norm $${|||}{g}{|||}_{\ast }\mathrel{:=}\sup _{v\in{\mathbb{V}}} \frac{{\langle g,\,v\rangle }}{{{|||}{v}{|||}\varOmega }},$$ where ⟨⋅ , ⋅⟩ denotes the usual duality bracket in $$H^{-1}(\varOmega )\times{H^{1}_{0}}(\varOmega )$$. The weak solution space   $${\mathbb{W}}(0,T)\mathrel{:=} \left\lbrace u\in L^{2}(0,T; {\mathbb{V}}) \mid{\partial_{t}} u\in L^{2}(0,T;{\mathbb{V}}^{\ast}) \right\rbrace$$is a Banach space endowed with the norm   $$\Vert{v}\Vert^2_{\mathbb{W}(0,T)}=\int^T_{0}{|||}{\partial_{t}v}\!{|||}^2_\ast+ {|||} \!v \!{|||}^2_\varOmega \textrm{d}t+\Vert{v}(T)\Vert^2_\varOmega, \quad v\in{\mathbb{W}}(0,T).$$Moreover, it is continuously embedded into $$C^{0}\left ([0,T];L^{2}(\varOmega )\right )$$; see, e.g., Evans (2010, Chap. 5). After this preparation we are in a position to state the weak formulation of (1.1): a function $$u\in \mathbb{W}(0,T)$$ is a weak solution to (1.1) if it satisfies   $${\left\langle{\partial_{t}} u(t),\,v\right\rangle} + {\mathcal{B}\left[u(t),\, v\right]} = \langle\;{f(t),}\ {v}\rangle_\varOmega \quad\textrm{for all } v\in{\mathbb{V}},\; \text{a.e. } t \in(0,T),$$ (2.2a)  \begin{align} u(0) = u_{0}.{\hskip69pt} \end{align} (2.2b)Hereafter, ⟨⋅, ⋅⟩Ω denotes the L2(Ω) scalar product. Since the operator $$\mathcal{L}$$ is elliptic, problem (2.2) admits for any $$f\in L^{2}\left (0,T;L^{2}(\varOmega )\right )$$ and u0 ∈ L2(Ω) a unique weak solution; compare, e.g., with Evans (2010, Chap. 7). 2.2 The discrete problem For the discretisation of (2.2) we use adaptive finite elements in space and a dG(s) scheme with adaptive time-step size control. Adaptive grids and time steps. For the adaptive space discretisation we restrict ourselves to simplicial grids and local refinement by bisection; compare with the studies by Bänsch (1991); Kossaczký (1994); Maubach (1995); Traxler (1997), as well as Schmidt & Siebert (2005) and Nochetto et al. (2008) and the references therein. To be more precise, refinement is based on the initial conforming triangulation $${\mathcal{G}_{\textrm{init}}}$$ of Ω and a procedure $$\texttt{REFINE}$$ with the following properties: given a conforming triangulation $$\mathcal{G}$$ and a subset $$\mathcal{M}\subset \mathcal{G}$$ of marked elements, then   $$\texttt{REFINE}(\mathcal{G},\mathcal{M})$$outputs a conforming refinement $$\mathcal{G}_{+}$$ of $$\mathcal{G}$$ such that all elements in $$\mathcal{M}$$ are bisected at least once. In general, additional elements are refined in order to ensure conformity. The input $$\mathcal{G}$$ can be either $${\mathcal{G}_{\textrm{init}}}$$ or the output of a previous application of $$\texttt{REFINE}$$. We denote by $${\mathbb{G}}$$ the class of all conforming triangulations that can be produced from $${\mathcal{G}_{\textrm{init}}}$$ by finitely many applications of $$\texttt{REFINE}$$. For $$\mathcal{G}\in{\mathbb{G}}$$ we call $$\mathcal{G}_{+}\in{\mathbb{G}}$$ a refinement of $$\mathcal{G}$$ if $$\mathcal{G}_{+}$$ is produced from $$\mathcal{G}$$ by a finite number of applications of $$\texttt{REFINE}$$ and we denote this by $$\mathcal{G}\leqslant \mathcal{G}_{+}$$ or $$\mathcal{G}_{+}\geqslant \mathcal{G}$$. Conversely, we call any $$\mathcal{G}_{-}\in{\mathbb{G}}$$ with $$\mathcal{G}_{-}\leqslant \mathcal{G}$$ a coarsening of $$\mathcal{G}$$. Throughout the discussion we deal only with conforming grids; this means that whenever we refer to some triangulations $$\mathcal{G}$$, $$\mathcal{G}_{+}$$ and $$\mathcal{G}_{-}$$ we tacitly assume $$\mathcal{G},\mathcal{G}_{+},\mathcal{G}_{-}\in{\mathbb{G}}$$. One key property of the refinement by bisection is uniform shape regularity for any $$\mathcal{G}\in{\mathbb{G}}$$. This means that all constants depending on the shape regularity are uniformly bounded depending on $${\mathcal{G}_{\textrm{init}}}$$. For the discretisation in time we let $$0=t_{0}<t_{1}<\dots <t_{N}=T$$ be a partition of (0, T) into half-open subintervals In = (tn−1, tn] with corresponding local time-step sizes $$\tau _{n}:={\left |I_{n}\right |} = t_{n}-t_{n-1}$$, $$n=1,\dots ,N$$. Space-time discretisation. For the spatial discretisation we use Lagrange finite elements. For any $$\mathcal{G}\in{\mathbb{G}}$$ the finite element space $${\mathbb{V}}(\mathcal{G})$$ consists of all continuous, piecewise polynomials of fixed degree ℓ ⩾ 1 over $$\mathcal{G}$$ that vanish on ∂Ω. This gives a conforming discretisation of $${\mathbb{V}}$$, i.e., $${\mathbb{V}}(\mathcal{G})\subset{\mathbb{V}}$$. Moreover, Lagrange finite elements give nested spaces, i.e., $${\mathbb{V}}(\mathcal{G})\subset{\mathbb{V}}(\mathcal{G}_{+})$$ whenever $$\mathcal{G}\leqslant \mathcal{G}_{+}$$. We denote by $$\mathcal{G}_0$$ the triangulation at t0 = 0 and for n ⩾ 1, we denote by $$\mathcal{G}_n$$ the grid in In and let $${\mathbb{V}}_{n}={\mathbb{V}}(\mathcal{G}_n)$$, n = 0, … , N be the corresponding finite element spaces. For $$\mathcal{G}\in{\mathbb{G}}$$ we denote by $$\prod _ \mathcal{G}\colon L^{2}(\varOmega )\to{\mathbb{V}}(\mathcal{G})$$ the L2-projection onto $${\mathbb{V}}(\mathcal{G})$$ and set $$\prod _n\mathrel{:=}\prod _{\mathcal{G}_n}$$. On each time interval the discrete approximation is polynomial in time over the corresponding spatial finite element space. Let $$s\in{\mathbb{N}}_{0}$$ for any real vector space $${\mathbb{U}}$$ and interval $$I\subset{\mathbb{R}}$$; we denote by   $$\mathbb{P}_s\left(I;{\mathbb{U}}\right):=\left\{t\mapsto \textstyle\sum\limits_{i=0}^{s} t^{i}V_{i}:V_{i}\in{\mathbb{U}}\right\}$$the space of all polynomials with degree less than or equal to s over $${\mathbb{U}}$$. We write $$\mathbb{P}_s({\mathbb{U}}):=\mathbb{P}_s({\mathbb{R}},{\mathbb{U}})$$ and $$\mathbb{P}_s\mathrel{:=}\mathbb{P}_s({\mathbb{R}})$$. Furthermore, for an interval I ⊂ (0, T) we let   $$f_{I}\in \mathbb{P}_s\left(I;L^{2}(\varOmega)\right)$$be the best approximation of f|I in $$L^{2}\left (I;L^{2}(\varOmega )\right )$$. In particular we use $$f_{n}\mathrel{:=} f_{I_{n}}$$ as a time discretisation of f on In. For s = 0, $$f_{I}=\frac{1}{|I|}\int _{I} f\,\textrm{d}t$$ is the mean value of f on I. In the following we introduce the so-called discontinuous Galerkin time-stepping scheme dG(s) of degree s, where dG(0) is the well-known implicit Euler scheme. To this end we denote for n ⩾ 1 the actual grid on In by $$\mathcal{G}_n$$ and let $$\mathbb{V}_n={\mathbb{V}}(\mathcal{G}_n)$$ be the corresponding finite element space. We start with a suitable initial refinement $$\mathcal{G}_0$$ of $${\mathcal{G}_{\textrm{init}}}$$ and an approximation $$U_0=\varPi _{0} u_{0}=\varPi _{\mathcal{G}_0} u_{0}\in \mathbb{V}_0$$ of the initial value u0. Note that in principle any suitable interpolation operator can be used instead of Π0. We then inductively compute for n > 0 a solution $$U_{\mid{I}_n}\in \mathbb{P}_s(\mathbb{V}_n)$$ to the problem   \begin{align} \begin{split} \int_{I_{n}}\langle\partial_{t}U_{\mid{I}_n},V\rangle_\varOmega + \mathcal{B}\left[U_{\mid{I}_n},V\right]\textrm{d}t + \langle\left[\negthinspace\left[{U}\right]\negthinspace\right]_{n-1},V(t_{n-1})\rangle_\varOmega = \int_{I_{n}}\langle{\,f_n},V\rangle_\varOmega \end{split} \end{align} (2.3)for all $$V\in \mathbb{P}_s(\mathbb{V}_n)$$. Thereby, $$f_{n}\mathrel{:=} f_{I_{n}}$$ and $$\left [\negthinspace \left [{U}\right ]\negthinspace \right ]_{n-1}$$ denotes the jump   $$\left[\negthinspace\left[{U}\right]\negthinspace\right]_{n-1}:=U_{n-1}^{+}-U_{n-1}^{-}$$of U across tn−1, where we used $$U_{n-1}^{+}\mathrel{:=}\lim _{t\downarrow t_{n-1}}U_{\mid{I}_n}(t)$$, $$U_{n}^{-}\mathrel{:=}U_{\mid{I}_n}(t_{n})$$, n = 1, … , N and $$U_{0}^{-}\mathrel{:=}U_{0}$$. Note that with this definition we have $$U_{n-1}^{-}=U(t_{n-1})$$. The solution U is uniquely defined (Thomée, 2006) and we will see below that (2.3) is equivalent to an (s + 1)-dimensional second-order elliptic system. Note that U is allowed to be discontinuous across the nodal points t0, … , tN and hence in general $$U\not \in \mathbb{W}(0,T)$$. In order to construct from U a conforming function we shall now define the Radau reconstruction, which was first introduced and analysed by Makridakis & Nochetto (2006) and is motivated by the close relation of the dG(s) to Runge–Kutta Radau IIA collocation methods. In fact, let c1 < ⋯ < cs+1 and b1, … , bs+1 be the abscissae and weights of the RadauIIA quadrature formula, which is exact of degree 2s, i.e.,   \begin{align} \sum_{j=1}^{s+1}b_{j} P(c_{j}) ={\int_{0}^{1}}P(t)\,\textrm{d}t\qquad \textrm{for all}\ P\in\mathbb{P}[2s]. \end{align} (2.4)We define $${\mathcal{U}}\in \mathbb{W}$$, $${\mathcal{U}}_{|I_{n}}\in \mathbb{P}_{s+1}(I_{n};{\mathbb{V}})$$ as the piecewise interpolation of U at the local RadauIIA points $${t_{n}^{\,j}}:=t_{n-1}+c_{j}\tau _{n}$$, i.e.,   \begin{align} {\mathcal{U}}\left({t_{n}^{\, j}}\right) = U_{\mid{I}_n}\left({t_{n}^{\,j}}\right)\in\mathbb{V}_n, \qquad j =1,\ldots,s+1. \end{align} (2.5a)The continuous embedding of $$\mathbb{W}$$ in $$C^{0}\left ([0,T];L^{2}(\varOmega )\right )$$ additionally enforces   \begin{align} {\mathcal{U}}(t_{n-1}) = U_{n-1}^{-}\in\mathbb{V}_{n-1}, \end{align} (2.5b)which uniquely defines $${\mathcal{U}}\in \mathbb{W}$$ since 0 =: c0 < c1 < ⋯ < cs are pairwise disjoint. For convenience, we shall adopt the representation of $${\mathcal{U}}$$ with Legendre polynomials from Davis & Rabinowitz (1984). We observe from Davis & Rabinowitz (1984) that the $${t_{n}^{j}}$$, j = 1, … , s are the zeros of   $$\frac{(-1)^{s}}{2}\left({L_{s}^{n}}-L_{s+1}^{n}\right)\in\mathbb{P}_{s+1},$$where for $$r\in{\mathbb{N}}_{0}$$, $${L_{r}^{n}}$$ denotes the affine transformation of the rth Legendre polynomial Lr on (−1, 1) to the interval In, i.e.,   $${L_{r}^{n}}(t):=L_{r}\left(2\frac{t-t_{n-1}}{\tau_{n}}-1\right)\!.$$Therefore, we have that the $$\{{L_{r}^{n}}\}_{r\in{\mathbb{N}}_{0}}$$ are L2-orthogonal on In and that $${L_{r}^{n}}(t_{n})=1$$, $${L_{r}^{n}}(t_{n-1})=(-1)^{r}$$, as well as $$\int _{I_{n}}|{L_{r}^{n}}|^{2}\,\textrm{d}t=\frac{\tau _{n}}{2r+1}$$. These properties readily imply the representation   \begin{align} {\mathcal{U}}_{|I_{n}}=U_{|I_{n}}+\frac{(-1)^{s}}{2}\left(L_{s+1}^{n}-{L_{s}^{n}}\right)\left(U_{n-1}^{+}-U_{n-1}^{-}\right)\!; \end{align} (2.6)compare with the studies by Ern et al. (2017) and Smears (2017). Using integration by parts with respect to time, (2.4) and (2.5), we observe that (2.3) is equivalent to   \begin{align} \int_{I_{n}}\langle{{\partial_{t}} {\mathcal{U}}},{V}\rangle + {\mathcal{B}[U,\,V]}\,\textrm{d}t = \int_{I_{n}}\langle{f_{n}},{ V}\rangle_\varOmega\textrm{d}t \end{align} (2.7)for all n = 1, … , n and $$V\in \mathbb{P}_s(\mathbb{V}_n)$$. We emphasise that $${\mathcal{U}}(t)$$ is a finite element function, since for t ∈ In, we have $${\mathcal{U}}(t)\in{\mathbb{V}}\left (\mathcal{G}_{n-1}\oplus \mathcal{G}_n\right )\subset{\mathbb{V}}$$, where $$\mathcal{G}_{n-1}\oplus \mathcal{G}_n\in{\mathbb{G}}$$ is the smallest common refinement of $$\mathcal{G}_{n-1}$$ and $$\mathcal{G}_{n}$$, which we call overlay. Continuity of $${\mathcal{U}}$$ in time in combination with $${\mathcal{U}}(t)\in{\mathbb{V}}$$ for all t ∈ I then implies $${\mathcal{U}}\in \mathbb{W}$$. Remark 2.1 For s = 0 we see from (2.3) that in each time step $$n\in{\mathbb{N}}$$ we need to solve for partial differential operators of the form − Δ + μ, with $$\mu =\frac 1{\tau _{n}}$$, in order to compute Un. Unfortunately, for s > 0, though still coercive, (2.3) becomes an (s + 1)-dimensional coupled nonsymmetric system. Recently, in the study by Smears (2017) a preconditioned conjugate gradient (PCG) method for a symmetrisation of (2.3) is proposed, which is fully robust with respect to the discretisation parameters s and τ, provided a solver for the operator − Δ + μ, μ ⩾ 0 is available. 3. A posteriori error estimation One basic ingredient of adaptive methods is a posteriori error indicator building up a reliable upper bound for the error in terms of the discrete solution and given data. The dG(0) method corresponds to the implicit Euler scheme and residual-based estimators for the heat equation can be found in the study by Verfürth (2003). In this section we generalise this result and prove reliable and efficient residual-based estimators for dG(s) schemes (2.3), with arbitrary $$s\in{\mathbb{N}}_{0}$$. Some arguments in this section are straightforward generalisations of those in the study by Verfürth (2003) and we only sketch their proofs; others are based on new ideas and therefore we shall prove them in detail. 3.1 Equivalence of error and residual In order to prove residual-based error estimators one first has to relate the error to the residual. To this end we note that (2.2) can be taken as an operator equation in $$L^{2}(0,T;{\mathbb{V}}^{\ast })\times L^{2}(\varOmega )$$. Its residual $$\textrm{Res}({\mathcal{U}})$$ in $${\mathcal{U}}\in \mathbb{W}(0,T)\$$ is given by   \begin{align} {\left\langle \textrm{Res}({\mathcal{U}}),\,v\right\rangle}=&\,{\left\langle{\partial_{t}}(u-{\mathcal{U}}),\,v\right\rangle}+{\mathcal{B}[u-{\mathcal{U}},\,v]}\nonumber\\ =&\,\langle{\,f-{\partial_{t}} {\mathcal{U}}},{v}\rangle-{\mathcal{B}[{\mathcal{U}},\,v]}\qquad\textrm{for all } v\in{\mathbb{V}}. \end{align} (3.1) From the study by Tantardini & Veeser (2016) we have the following identity between the residual and the error. Proposition 3.1 (Abstract error bound). Let $$u\in \mathbb{W}$$ be the solution of (2.2) and let $${\mathcal{U}}\in \mathbb{W}$$ be the approximation defined in (2.5) for time instances t0 = 0, … , tN = T and time-step sizes τn := tn − tn−1, n = 1, … , N. Then it holds for 0 ≤ k ≤ N that   \begin{align} \Vert{u}-\mathcal{U}\Vert^2_{\mathbb{W}(0,T)}=\Vert{u_0}-U_0\Vert^2_\varOmega+\Vert\textrm{Res}(\mathcal{U})\Vert^2_{{L^2}(0,T;\mathbb{V}\ast)} \end{align} (3.2a) and  \begin{align} \Vert\textrm{Res}(\mathcal{U})\Vert^2_{{L^2}(I_{n},\mathbb{V}^\ast)}\leqslant\left\{2\Vert\partial_{t}(u-\mathcal{U})\Vert^2_{{L^2}(I_{n};\mathbb{V}^\ast)}+\Vert{u}-\mathcal{U}\Vert^2_{{L^2}(I_{n};\mathbb{V})}\right\}\!. \end{align} (3.2b) The rest of this section concentrates on proving computable upper and lower bounds for the error. We note that the initial error $$\Vert{u_{0}-U_{0}}\Vert_\varOmega$$ in (3.2) is already a posteriori computable, whence it remains to estimate the dual norm of the residual. However, there is another issue of separating the influence of the temporal and the spatial discretisations on the error. In particular, defining the temporal residual $${\textrm{Res}_{\tau }}({\mathcal{U}}) \in L^{2}(0,T;{\mathbb{V}}^{\ast })$$ as   \begin{align} {\langle{\textrm{Res}_{\tau}}({\mathcal{U}}),\,v\rangle}\mathrel{:=}{\mathcal{B}[U-{\mathcal{U}},\,v]} \end{align} (3.3)and the spatial residual $${\textrm{Res}_{h}}({\mathcal{U}})\in L^{2}(0,T;{\mathbb{V}}^{\ast })$$ as   \begin{align} {\langle{\textrm{Res}_{h}}({\mathcal{U}}),\,v\rangle}\mathrel{:=}{\langle\, f_{n}-{\partial_{t}}{\mathcal{U}},\,v\rangle}-{\mathcal{B}[U,\,v]} \quad\textrm{on}\quad I_{n}, \end{align} (3.4)we obtain   \begin{align} \textrm{Res}({\mathcal{U}})= {\textrm{Res}_{\tau}}({\mathcal{U}})+{\textrm{Res}_{h}}({\mathcal{U}})+f-f_{n} \quad\textrm{on}\quad I_{n}. \end{align} (3.6)In what follows we use this decomposition to prove separated time and space error indicators, which build up a reliable and efficient bound for the error. 3.2 Temporal residual Recalling (2.6), we have   $${\mathcal{U}}(t)-U(t)=\frac{(-1)^{s}}{2}\left(L_{s+1}^{n} -{L_{s}^{n}}\right) \left(U_{n-1}^{+}-U_{n-1}^{-}\right)\quad\textrm{for}\ t\in I_{n},\ n=1,\ldots,N,$$and thanks to (2.5) and (2.4), we obtain   \begin{align} \int_{I_{n}}{\left\Vert{\textrm{Res}_{\tau}}({\mathcal{U}})\right\Vert}_{\mathbb{V}^{\ast}}^{2}\,\textrm{d}t=&\, \int_{I_{n}} {|||}{U- {\mathcal{U}}{|||}}_{\varOmega}^{2} \,\textrm{d}t ={{|||} U_{n-1}^{+}- U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\frac14\left|L_{s+1}^{n}-{L_{s}^{n}}\right|^{2}\,\textrm{d}t\nonumber\\ =&\, \tau_{n}\, C_{\tau}\,{{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}, \end{align} (3.6)where $$C_{\tau }=C_{\tau }(s):=\frac 14\left (\frac{1}{2s+3}+\frac 1{2s+1}\right )$$; see also the study by Ern et al. (2017). 3.3 The spatial residual In this section we estimate the spatial residual. Lemma 3.2 Let U be the approximation of (2.3) to the solution u of (2.2) and let $${\mathcal{U}}$$ be its interpolation defined by (2.5). Then there exists a constant $$C_{{\mathbb{G}}}>0$$ such that   $$\int_{I_{n}}\Vert{{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}\ast}^{2}\,\textrm{d}t\leqslant C_{{\mathbb{G}}} \sum_{{E} \in\mathcal{G}_{n}}\int_{I_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\Vert}_{E}^{2}+h_{{E}}\Vert{J(U)\Vert}_{\partial E}^{2}\,\textrm{d}t$$for all 1 ⩽ n ⩽ N. Thereby, for $$V\in\mathbb{V}_n\$$ we denote by J(V )|S for an interior side S the jump of the normal flux $${\mathbf{A}}\nabla V\cdot{\vec{n}}$$ across S and for boundary sides S we set J(V )|S ≡ 0. The mesh size of an element $${E} \in \mathcal{G}$$ is given by hE := |E|1/d. Proof. Recalling (3.4), we first observe that $${\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}^{\ast}}^{2}\in \mathbb{P}_{2s}$$, whence by (2.4) we have   $$\int_{I_{n}} \Vert{{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}\ast}^{2}\,\textrm{d}t = \tau_{n} \sum_{j=1}b_{j} {\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2}\!.$$Therefore, it suffices to estimate $${\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{{\mathbb{V}}^{\ast }}^{2}$$ at the abscissae of the RadauIIA quadrature formula. For arbitrary $$V_{j}\in{\mathbb{V}}_{n}$$, j = 1, … , s + 1 choose $$V\in \mathbb{P}_s\ ({\mathbb{V}}_{n})$$ in (2.7) such that $$V\left (t+c_{i}\tau _{n}\right )=V_{j}\delta _{ij}$$, 1 ⩽ i ⩽ s + 1. Then again exploiting (2.4) yields the Galerkin orthogonality   $${\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\!,\,V_{j}\right\rangle}= 0, \qquad j=1,\ldots,s+1.$$Since $$V_{j}\in \mathbb{V}_n$$ was arbitrary we have for any $$v\in{\mathbb{V}}$$ that   $${\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t_{n}^{\, j}}\right)\!,\,v\right\rangle}={\left\langle{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\!,\,v-V\right\rangle}\qquad \textrm{for all } V\in{\mathbb{V}}_{n}.$$Using integration by parts with respect to the space variable, the Cauchy–Schwarz inequality, and the scaled trace inequality and choosing V as a suitable interpolation of v, we arrive at   $${\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2} \leqslant C_{{\mathbb{G}}} \sum_{{E} \in\mathcal{G}_{n}} \left \{ h_{{E}} ^{2}{\left\Vert({\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}) \left({t^{\, j}_{n}}\right)\right\Vert}_{E}^{2} +h_{{E}} {\left\Vert J(U) \left({t^{\, j}_{n}}\right)\right\Vert}_{\partial E}^{2}\right\}.$$ The right-hand side is a pointwise evaluation of a polynomial of degree 2s and thus the claimed upper bound follows from (2.4). The following result shows that the spatial indicators are locally efficient in time as well. Lemma 3.3 Under the conditions of Lemma 3.2 we have   \begin{align*} \sum_{{E} \in\mathcal{G}_{n}}\int_{I_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\Vert}_{E}^{2}+h_{{E}} {\Vert J(U)\Vert}_{\partial E}^{2}\,\textrm{d}t \leqslant C \left\{ \int_{I_{n}}{\Vert{\textrm{Res}_{h}}({\mathcal{U}})\Vert}_{\mathbb{V}^{\ast}}^{2}+\textrm{osc}_{\mathcal{G}_{n}}^{2}(f_{n},{\mathcal{U}})\,\textrm{d}t\right\}\!, \end{align*}where   $$\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}}):=\sum_{{E} \in\mathcal{G}_{n}} h_{{E}} ^{2}{\Vert{\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}-R_{{E}} \Vert}_{E}^{2} + h_{{E}} {\left\Vert J(U)-J_{E}\right\Vert}_{\partial E}^{2}$$with, at time t ∈ In, pointwise L2(Ω)-best approximations $$R_{{E}} (t)\in \mathbb{P}_s [2\ell -2](E)$$, respectively $$J_{E}(t)_{|{S} }\in \mathbb{P}_s [2\ell -1]({S} )$$, for each side S ⊂ ∂E. The constant C > 0 depends solely on the shape regularity of $${\mathbb{G}}$$. Proof. With the same arguments as in the proof of Lemma 3.2, for each 1 ⩽ j ⩽ s + 1 it suffices to prove that   \begin{align*} &C_{{\mathbb{G}}} \sum_{E\in\mathcal{G}_{n}}h_{{E}} ^{2}{\left\Vert\left({\partial_{t}} {\mathcal{U}}+\mathcal{L} U-f_{n}\right)\left({t^{\, j}_{n}}\right)\right\Vert}_{E}^{2}+h_{{E}} {\left\Vert J(U)\left({t^{\, j}_{n}}\right)\right\Vert}_{\partial E}^{2} \\&\qquad\leqslant\! C\!\left\{{\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\left({t^{\, j}_{n}}\right)\right\Vert}_{\mathbb{V}^{\ast}}^{2}\!+\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n},U)\left({t^{\, j}_{n}}\right)\right\}.\end{align*}This, however, follows with standard techniques used in a posteriori estimation of elliptic second-order problems; see, e.g., Verfürth (2013) and Mekchay & Nochetto (2005) and compare with the case of the implicit Euler scheme s = 0 in the study by Verfürth (2003). 3.4 Estimation of the error By means of the decomposition of the residual (3.5) we can combine the above results to obtain a reliable and efficient error estimator for (1.1). To this end we introduce the following error indicators for the sake of brevity of presentation: for $$\mathcal{G}\in{\mathbb{G}}$$ and $$v\in{\mathbb{V}}$$ the estimator for the initial value is given by   \begin{align} \mathcal{E}_{0}^{2}(v,\mathcal{G}):={\Vert{v-{{\mathcal{I}}_{\mathcal{G}}} v}\Vert}_{\varOmega}^{2}. \end{align} (3.7a)For $$f\in L^{2}\left (0,T;L^{2}(\varOmega )\right )$$, t⋆ ∈ (0, T) and I = (t⋆, t⋆ + τ] ⊂ (t⋆, T], the so-called consistency error which is inherited by the decomposition of the residual (3.5) is defined by   \begin{align} \mathcal{E}_{f}^{2}(\,f,t_{\star},\tau):=3\frac{{C_{F}^{2}}}{a_{*}}\int_{I}{\Vert\, f- f_{I}\Vert}_{\varOmega}^{2}\,\textrm{d}t. \end{align} (3.7b)For $$v^{-},v^{+}\in{\mathbb{V}}$$, $$\mathcal{G}\in{\mathbb{G}}$$, $$V\in \mathbb{P}_s (\mathbb{V}(G)\ )$$, $${E} \in \mathcal{G}$$ and $$g\in \mathbb{P}_s (L^{2}(\varOmega ))$$ the indicator   \begin{align} \mathcal{E}_{c\tau}^{2}(v^{+},v^{-},\tau):= \tau3\, C_{\tau}\,{{|||} v^{-}-v^{+}{|||}}_{\varOmega}^{2} \end{align} (3.7c)is motivated by (3.6) and Lemma 3.2 suggests defining the spatial indicators by   \begin{align} \mathcal{E}_{\mathcal{G}}^{2}(V,v^{-},t_{\star},\tau,g,\mathcal{G},{E} ) :=&\,3 C_{{\mathbb{G}}} \int_{I} h_{{E}} ^{2}{\Vert{\partial_{t}} \mathcal{V}+\mathcal{L} V-g\Vert}_{E}^{2} +h_{{E}} {\Vert J(V)\Vert}_{\partial E}^{2}\,\textrm{d}t\\=&\,3 C_{{\mathbb{G}}} \,\tau\,\sum_{j=1}^{s+1}b_{j}\left\{ h_{{E}} ^{2}{\left\Vert\left({\partial_{t}} \mathcal{V}+\mathcal{L} V-g\right)\left(t_{\star}+c_{j}\tau\right)\right\Vert}_{E}^{2}+h_{{E}} {\left\Vert J(V)(t_{\star}+c_{j}\tau)\right\Vert}_{\partial E}^{2}\right\}\!.\nonumber \end{align} (3.7d)Here we have used, analogously to (2.6), that   \begin{align} \mathcal{V}(t):=V+\frac{(-1)^{s}}{2}\left(L_{s+1}^{I}-{L_{s}^{I}}\right)\left(V(t)-v^{-}\right)\!, \end{align} (3.8)where for $$r\in{\mathbb{N}}_{0}$$, $${L_{r}^{I}}$$ denotes the affine transformation of the Legendre polynomial Lr to I. Proposition 3.4 (Upper bound). Let $$u\in \textrm{W}$$ be the solution of (2.2) and let $${\mathcal{U}}\in \textrm{W}$$ be the approximation defined in (2.5) for time instances t0 = 0, … , tN = T and time-step sizes τn := tn − tn−1, n = 1, … , N. Then we have the estimate   \begin{align*} {\Vert u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}\leqslant&\, \mathcal{E}_{0}^{2}(u_{0},\mathcal{G}_{0})+\sum_{n=1}^{N}\left\{ \mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right) + \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\right. \\ &\qquad\qquad\qquad\qquad\left.+3 {\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}\right\}\!. \end{align*}In particular, this implies   \begin{align*} {\Vert u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}\leqslant \mathcal{E}_{0}^{2}(u_{0},\mathcal{G}_{0})\!+\!\sum_{n=1}^{N}\!\left\{ \mathcal{E}_{c\tau}^{2}\!\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right) \!+ \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right) \!+\mathcal{E}_{f}^{2}(\,f,t_{n-1},\tau_{n})\right\}\!. \end{align*} Proof. By the decomposition of the residual (3.5) and the triangle inequality we estimate on each interval In, n = 1, … , N,  \begin{align*} {\left\Vert \textrm{Res}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}&\leqslant 3 {\Vert{\textrm{Res}_{\tau}}({\mathcal{U}})\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2} +3{\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+3 {\Vert\, f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}. \end{align*}Now the first assertion follows with Proposition 3.1, (3.6) and Lemma 3.2. The second bound follows from the Friedrichs inequality and the ellipticity (2.1) of $$\mathcal{B}$$. Indeed, we have   \begin{align} {\Vert\, f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}\ {\lesssim}\ \frac{{C_{F}^{2}}}{a_{*}}{\Vert\, f-f_{n}\Vert}_{L^{2}\left(I_{n};L^{2}(\varOmega)\right)}^{2}. \end{align} (3.9) Proposition 3.5 (Lower bound). Supposing the conditions of Proposition 3.4 we have   \begin{align*} &\mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right)+ \mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\\ &\qquad\leq C\,\left\{{\left\Vert{\partial_{t}}(u-{\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+{\Vert u-{\mathcal{U}}\Vert}_{L^{2}(I_{n};{\mathbb{V}})}^{2}+\int_{I_{n}}\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}})\,\textrm{d}t+{\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2} \right\}\!, \end{align*}where the constant C depends solely on the shape regularity of $${\mathbb{G}}$$ and on ℓ. Proof. We first consider the spatial indicators. By Lemma 3.3 there exists C > 0 such that   $$\mathcal{E}_{\mathcal{G}}^{2}\left(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n}\right)\leqslant C {\left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+C\int_{I_{n}}\textrm{osc}_{\mathcal{G}_{n}}^{2}(\,f_{n}, {\mathcal{U}})\,\textrm{d}t.$$The first term on the right-hand side can be further estimated using the decomposition of the residual, the triangle inequality and (3.6) to obtain   \begin{align} \left\Vert{\textrm{Res}_{h}}({\mathcal{U}})\right\Vert_{L^{2}(I_{n};\mathbb{V}^{\ast})} \leqslant \Vert{\textrm{Res}({\mathcal{U}})}\Vert_{L^{2}(I_{n};{\mathbb{V}}^{\ast})}+ \Vert{\,f-f_{n}}\Vert_{L^{2}(I_{n};{\mathbb{V}}^{\ast})} + {\mathcal{E}_{c\tau}}\left(U_{n-1}^{-},U_{n-1}^{+},\tau_{n}\right)\!. \end{align} (3.10) The time/coarsening indicator $$\mathcal{E}_{c\tau }$$ can be bounded as in the study by Ern et al. (2017). In order to keep the paper self-contained we shall reproduce the proof here. Recalling the properties of the transformed Legendre functions we conclude with (2.6) that   \begin{align}\frac{1}{4}\frac{\tau_{n}}{2s+3}{{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}=&\,\tfrac14 {{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\left(L_{s+1}^{n}\right)^{2}\,\textrm{d}t\nonumber \\ =&\, {{|||} U_{n-1}^{+}-U_{n-1}^{-}{|||}}_{\varOmega}^{2}\int_{I_{n}}\tfrac{(-1)^{s}}{2}\left(L_{s+1}^{n}-{L_{s}^{n}}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\,\textrm{d}t \nonumber\\ =&\,\int_{I_{n}}{\left\langle\nabla\left({\mathcal{U}}-U\right),\nabla\left(U_{n-1}^{+}-U_{n-1}^{-}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}_{\varOmega}\,\textrm{d}t \nonumber\\ =&\,\int_{I_{n}}{\left\langle \textrm{Res}({\mathcal{U}}),\,\left(U_{n-1}^{-}-U_{n-1}^{+}\right)\tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}\,\textrm{d}t \nonumber\\ &+\int_{I_{n}}{\left\langle\, f-f_{n},\left(U_{n-1}^{-}-U_{n-1}^{+}\right) \tfrac{(-1)^{s}}{2} L_{s+1}^{n}\right\rangle}_{\varOmega}\,\textrm{d}t. \end{align} (3.11)Here we have used in the last step that $$L_{s+1}^{n}$$ is L2-orthogonal on In onto $$\mathbb{P}_{s}$$ and thus we have $$\int _{I_{n}}{\left \langle{\partial _{t}}{\mathcal{U}},\,L_{s+1}^{n}\right \rangle }\,\textrm{d}t=\int _{I_{n}}{\langle\, f_{n}},{L_{s+1}^{n}\rangle}_{\varOmega}\,\textrm{d}t=\int _{I_{n}}{\langle \nabla U, L_{s+1}^{n}\rangle}_{\varOmega}\,\textrm{d}t=0$$. The assertion follows then from a Cauchy–Schwarz inequality. Indeed, recalling (3.7c) and observing that   $$\frac{\frac14\frac{1}{2s+3}}{C_{\tau}}=\frac{2s+1}{4s+4}\in\left[\frac14,\frac12\right)\!,$$we have that the involved constants are robust with respect to s. Remark 3.6 (Consistency indicator $$\mathcal{E}_{f}$$). The upper and lower bounds in Propositions 3.4 and 3.5 in principle involve the consistency terms $${\Vert f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}$$, n = 1, … , N, which are not computable in general. Though these terms can be bounded by $$\mathcal{E}_{f}^{2}(f,t_{n-1},\tau _{n})$$ (compare with (3.11)), this still involves the best approximation $$f_{n}\in \mathbb{P} _{s}\left (I_{n};L^{2}(\varOmega )\right )$$ of $$f_{|I_{n}}$$ in $$L^{2}\left (I_{n};L^{2}(\varOmega )\right )$$. In a practical algorithm fn may therefore be replaced by some quasi-best approximation in $$\mathbb{P} _{s}\left (I_{n};L^{2}(\varOmega )\right )$$ like a Clément-type interpolation in time; see the study by Clément (1975). Remark 3.7 (Temporal residual). We emphasise that the techniques from the study by Ern et al. (2017) for bounding the indicator $$\mathcal{E}_{c\tau }$$ in the proof of Proposition 3.5 are different from the one in the study by Verfürth (2003) for the dG(0) scheme and it is noteworthy that this improves the constant even in this case. In fact, it follows from the proof of Proposition 3.5 that   $$\tfrac18\mathcal{E}_{f}^{2}\left(U_{n-1}^{+}, U_{n-1}^{-},\tau_{n}\right)^{2}\leqslant{\Vert \textrm{Res}({\mathcal{U}})\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2}+ {\Vert \,f-f_{n}\Vert}_{L^{2}(I_{n};\mathbb{V}^{\ast})}^{2};$$compare also with the study by Ern et al. (2017). Remark 3.8 (Time-space efficiency). We note that the lower bound in Proposition 3.5 is local in time but, however, global in space. In fact, though the estimate (3.12) of the indicators $$\mathcal{E}_{\mathcal{G}}$$ in principle is local also in space, we have that the estimate of $$\mathcal{E}_{c\tau }$$ in (3.11) follows from a global relation in space. Ern et al. (2017) presented an estimator, which is locally efficient in space and time for the modified error norm   \begin{align} \Vert{u-U}\Vert_{{\mathcal{E}_{Y}}}:= \left(\Vert{u-{\mathcal{U}}\Vert}_{\textrm{W}(0,T)}^{2}+\sum_{n=1}^{N}\mathcal{E}_{c\tau}^{2}\left(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}\right)^{2}\right)^{\!1/2}\!, \end{align} (3.12)which can be interpreted as a dG(s) norm controlling the temporal jumps of the discrete solution. With respect to this norm the local space and time efficiency of the indicator $$\mathcal{E}_{c\tau }$$ with respect to $$\Vert{u-U}\Vert_{\mathcal{E}_{Y}}$$ is then immediate. Moreover, by resorting to equilibrated flux estimators instead of the residual-based ones in (3.7d), the a posteriori bounds in the study by Ern et al. (2017) are also robust with respect to the polynomial degree ℓ in space. Remark 3.9 (Elliptic problem). In case of the implicit Euler scheme dG(0) it is well known that in each $$1\leqslant n\leqslant N$$, $$U_{|I_{n}} \in \mathbb{P}_s [0](\mathbb{V}_n )=\mathbb{V}_n$$ is the Ritz approximation to a coercive elliptic problem. Moreover, the spatial estimators (3.7d) are the standard residual-based estimators for this elliptic problem. This observation transfers to the dG(s) scheme for $$s\geqslant 1$$. To see this we observe that (after transformation to the unit interval) (2.3) is a Galerkin approximation to the solution $$u_{\tau }\in \mathbb{P}_s ({\mathbb{V}})$$ of a problem of the kind   \begin{align} {\int_{0}^{1}}\frac{1}{\tau}\langle{{\partial_{t}} u_{\tau}},{v}\rangle_{\varOmega} + {\mathcal{B}[u_{\tau},\,v]}\,\textrm{d}t +\frac{1}{\tau}\langle{u_{\tau}(0)},{v(0)}\rangle_{\varOmega} = {\int_{0}^{1}}\left\langle{\,\bar f},{v}\right\rangle_{\varOmega} \,\textrm{d}t +\frac{1}{\tau}\left\langle{v^{-}},{v(0)}\right\rangle_{\varOmega} \end{align} (3.13)for all $$v\in \mathbb{P}_s ({\mathbb{V}})$$ and some data $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$, v−∈ L2(Ω) and τ > 0. The mappings v ↦ v(0) and v ↦ ∂tv are linear and continuous on $$\mathbb{P}_s ({\mathbb{V}})$$, whence this equation can be taken as a vector-valued linear variational problem of second order on $${\mathbb{V}}^{s+1}$$. Testing with v = uτ proves coercivity:   \begin{align*} {\int_{0}^{1}}\frac{1}{\tau}\langle{{\partial_{t}} u_{\tau}},{u_{\tau}}\rangle_{\varOmega} + {\mathcal{B}[u_{\tau},\,u_{\tau}]}\,\textrm{d}t +\!\frac{1}{\tau}\langle{u_{\tau}(0)},{u_{\tau}(0)}\rangle_{\varOmega} =\frac{1}{2\tau}{\Vert u_{\tau}(0)\Vert}_{\varOmega}^{2}\!+\!\frac{1}{2\tau}{\Vert u_{\tau}(1)\Vert}_{\varOmega}^{2}+ {\int_{0}^{1}}{{|||} u_{\tau}{|||}}_{\varOmega}^{2}\,\textrm{d}t.\end{align*}Obviously, its residual in $$V\in \mathbb{P}_s ({\mathbb{V}})$$ is given by   $${\langle{\textrm{Res}_{h}}(\mathcal{V}),\,v\rangle}={\langle\, \bar f-{\partial_{t}}\mathcal{V},\,v\rangle}-{\mathcal{B}[V,\,v]},\quad v\in\mathbb{P}_s({\mathbb{V}}),$$where $$\mathcal{V}\in \mathbb{P}_s [s+1]({\mathbb{V}})$$ is such that $$\mathcal{V}(c_{j})=V(c_{j})$$, j = 1, … , s and $$\mathcal{V}(0)=v^{-}$$; compare with (2.5). Thanks to Lemmas 3.2 and 3.3, for $$V\in \mathbb{P}_s (\mathbb{V}(\mathcal{G}) )$$, $$\mathcal{G}\in{\mathbb{G}}$$, the standard residual-based estimator for this problem is given by $$\mathcal{E}_{\mathcal{G}}^{2}(V,v^{-},\tau ,0,\bar f,\mathcal{G})$$. Energy estimation. We shall now generalise the energy estimate from the study by Kreuzer et al. (2012) to higher-order dG(s) schemes. Similar estimates have been obtained in the studies by Eriksson & Johnson (1991, 1995), although they use different techniques involving restrictions on the time step and mesh sizes, as well as constants resulting from inverse inequalities. Proposition 3.10 (Uniform global energy estimate). Assume $$N\in{\mathbb{N}}\cup \{\infty \}$$ and arbitrary time instances $$0=t_{0}<\cdots <t_{N}\leqslant T$$ with time–step sizes τ1, … , τN > 0. Let $$U_{0}= \varPi_{0}u_{0}$$ and for $$1\leqslant n\leqslant N$$ let $$U_{|I_{n}} \in \mathbb{P}_s (\mathbb{V}_n )$$ be the discrete solutions to (2.3) and let $${\mathcal{U}}\in \textrm{W}$$ as defined in (2.5). Then for any $$m=1,\dots ,N$$ we have   $$\sum_{n=1}^{m} \int_{I_{n}}{\Vert{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2}\,\textrm{d}t + {{|||} U_{n-1}^{+}-\varPi_{n} U_{n-1}^{-}{|||}}^{2} +{{|||} U_{n}^{-}{|||}}_{\varOmega}^{2} -{{|||} \varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \leqslant \sum_{n=1}^{m}\int_{I_{n}}{\Vert\, f_{n}\Vert}_{\varOmega}^{2}\,\textrm{d}t.$$ Proof. We choose $$V\mathrel{:=}\varPi_{n}{\partial _{t}}{\mathcal{U}}_{|I_{n}}\in \mathbb{P}_s (\mathbb{V}_n )$$ as a test function in (2.7) obtaining   \begin{align} \int_{I_{n}}{\Vert \varPi_{n}{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2} + {\mathcal{B}[U,\,\varPi_{n} {\partial_{t}}{\mathcal{U}}]}\,\textrm{d}t = \int_{I_{n}} \langle{\,f_{n}},{\varPi_{n}{\partial_{t}}{\mathcal{U}}}\rangle_{\varOmega}\,\textrm{d}t. \end{align} (3.14)In order to analyse the second term on the left-hand side we first observe that $$\varPi_{n}{\partial _{t}}{\mathcal{U}}_{|I_{n}}={\partial _{t}}\varPi_{n}{\mathcal{U}}_{|I_{n}}\in \mathbb{P}_s (\mathbb{V}_n )$$. Recalling (2.5b) and that $$\mathcal{B}:{\mathbb{V}}\times{\mathbb{V}}\to{\mathbb{R}}$$ is constant in time we obtain, integrating by parts, that   $$\int_{I_{n}} {\mathcal{B}\left[U,\,\varPi_{n}{\partial_{t}}{\mathcal{U}}\right]}\,\textrm{d}t = - \int_{I_{n}} {\mathcal{B}\left[{\partial_{t}} U,\,\varPi_{n}{\mathcal{U}}\right]}\,\textrm{d}t + {|||}{U_n^{-}{|||}}^{2}-{\mathcal{B}\left[U_{n-1}^{+},\,\varPi_{n}U_{n-1}^{-}\right]}.$$Since $${\mathcal{B}[{\partial _{t}} U,\,\varPi_{n}{\mathcal{U}}]}_{|I_{n}}\in \mathbb{P}_s [2s]$$ we can apply (2.4) and conclude with (2.5a) that   \begin{align*} \int_{I_{n}} {\mathcal{B}\left[U,\,\varPi_{n}{\partial_{t}}{\mathcal{U}}\right]}\,\textrm{d}t&= - \int_{I_{n}} {\mathcal{B}\left[{\partial_{t}} U,\,U\right]}\,\textrm{d}t + {{|||} U_n^{-}{|||}}_{\varOmega}^{2}-{\mathcal{B}\left[U_{n-1}^{+},\,\varPi_{n}U_{n-1}^{-}\right]}\\ &= \frac12{{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} - \frac12{ {|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}+ \frac12{{|||} U_n^{-}{|||}}_{\varOmega}^{2}, \end{align*}where we used that $${\mathcal{B}\left [{\partial _{t}} U_{|I_n} ,\,U_{|I_n} \right ]}= \frac 12{\partial _{t}} {\Vert U_{|I_n}\Vert}_{\varOmega}^{2}$$. Inserting this in (3.14) yields   $$\int_{I_{n}}{\Vert\varPi_{n}{\partial_{t}}{\mathcal{U}}\Vert}_{\varOmega}^{2}\,\textrm{d}t+ \frac12{{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} - \frac12{{|||} \varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}+ \frac12{{|||} U_{n}^{-}{|||}}_{\varOmega}^{2} = \int_{I_{n}} \langle f_{n}, \varPi_{n}{\partial_{t}}{\mathcal{U}}\rangle_\varOmega\,\textrm{d}t.$$Estimating the right-hand side with the help of the Cauchy–Schwarz and the Young inequalities proves the assertion. Corollary 3.11 Under the conditions of Proposition 3.10 assume that   \begin{align} {{|||}U_{n-1}^{-}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t \geqslant 0\quad\textrm{ for}\quad n=1,\dots,N. \end{align} (3.15)Then we have the estimate   $$\sum_{n=1}^{m} \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t + {{|||}U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \leqslant \Vert\,{f}\Vert_{\varOmega\times(0,t_{m})}^{2} + {|||} U_{0}{|||}_{\varOmega}^{2}-{{|||}U_{m}^{-}{|||}}_{\varOmega}^{2}.$$In particular, the series $$\sum _{n=1}^{N} {{|||} U_{n-1}^{+}-\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2}$$ is uniformly bounded irrespective of the sequence of time-step sizes used. Proof. Summing the non-negative terms in (3.15) yields   $$0\leqslant \sum_{n=1}^{m}{{|||}U_{n-1}^{-}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t,$$which is equivalent to   $${{|||}U_{m}^{-}{|||}}_{\varOmega}^{2} - {{|||}U_0{|||}}_{\varOmega}^{2}\leqslant\sum_{n=1}^{m}{{|||}U_{n-1}^{+}{|||}}_{\varOmega}^{2}-{{|||}\varPi_{n}U_{n-1}^{-}{|||}}_{\varOmega}^{2} + \frac12\int_{I_{n}} {\left\Vert\varPi_{n}{\partial_{t}} {\mathcal{U}}\right\Vert}_{\varOmega}^{2}\,\textrm{d}t.$$Using this in the estimate of Proposition 3.10 yields the desired estimate. Having a closer look at the indicator $${\mathcal{E}_{c\tau }}$$ we note that since we allow for coarsening, it is not a pure temporal error indicator. Coarsening may cause the loss of information and too little information may lead to wrong decisions within the adaptive method. For this reason we use the triangle inequality to split   \begin{align} {\mathcal{E}}_{c\tau }^{2}(v^{-},v^{+},\tau,\mathcal{G})\leqslant \mathcal{E}^2_{c}(v^{-},\tau,\mathcal{G}) + \mathcal{E}^2_{\tau}(v^{+},v^{-},\tau,\mathcal{G}) \end{align} (3.16a)into a measure   \begin{align} \mathcal{E}^2_c(v^{-},\tau,\mathcal{G}) {:}{=}\sum_{{E} \in\mathcal{G}} \mathcal{E}^2_c(v^{-},\tau,\mathcal{G},E) := 6 C_{\tau}\sum_{{E} \in\mathcal{G}}\tau{{|||}\varPi_{\mathcal{G}} v^{-} - v^{-}{|||}}_{E}^{2} \end{align} (3.16b)for the coarsening error and   \begin{align} \mathcal{E}^2_\tau(v^{+},v^{-},\tau,\mathcal{G}) \mathrel{:=} 6 C_{\tau}\tau{{|||} v^{+} - \varPi_{\mathcal{G}} v^{-}{|||}}_{\varOmega}^{2}, \end{align} (3.16c)which serves as an indicator for the temporal error. This allows us to control the coarsening error separately. Assuming that (3.15) holds, Corollary 3.11 provides control of the sum of the time error indicators $$\mathcal{E}_{\tau}^{2} \left({U_{n-1}^{+}},{U_{n-1}^{-}},\tau _{n},\mathcal{G}_{n}\right) = 6C_{\tau }\tau{{{|||}U_{n-1}^{+}}-\varPi_{n}{U_{n-1}^{-}}{|||}}_{\varOmega}^{2}$$. Assumption (3.15) would trivially be satisfied for the Ritz projection $$R_{n}U_{n-1}^{-}$$ of $$U_{n-1}^{-}$$ into $$\mathbb{V}_n$$, since $${{|||} R_{n}U_{n-1}^{-}{|||}}_{\varOmega}\leqslant {|||}{U_{n-1}^{-}{|||}}_{\varOmega}$$. The L2-projection $$\varPi_n U_{n-1}^{-}$$, however, does not satisfy this monotonicity property in general and therefore coarsening may lead to an increase of energy. The algorithm presented below ensures that (3.15) is fulfilled at the end of every time step. To this end, using the notation (3.8) we define for $$V\in \mathbb{P}_s \left ({\mathbb{V}}(\mathcal{G})\right )$$, $$v^{-}\in{\mathbb{V}}$$, t⋆ ∈ (0, T), I = (t⋆, t⋆ + τ] ⊂ (t⋆, T], $$\mathcal{G}\in{\mathbb{G}}$$ and $${E} \in \mathcal{G}$$ the indicators   $$\mathcal{E}^2_{\star}(V,v^{-},t_{\star},\tau,\mathcal{G},{E}) \mathrel{:=} {{|||}\varPi_{\mathcal{G}} v^{-}{|||}}_{E}^{2} - {|||}{v^{-}{|||}}_{E}^{2} - \frac12\int_{I} {\Vert \varPi_{\mathcal{G}}{\partial_{t}} \mathcal{V}\Vert}_{E}^{2}\,\textrm{d}t,$$as well as the convenient notation $$\mathcal{E}^2_{\star}(V,v^{-},t_{\star },\tau ,\mathcal{G}) \mathrel{:=} \sum _{E\in \mathcal{G}} \mathcal{E}_{\star}^2 (V,v^{-},t_{\star },\tau ,\mathcal{G},{E} )$$. Condition (3.15) is then equivalent to $$\mathcal{E}^2_{\star} (U,{U_{n-1}^{-}},t_{n-1},\tau _{n},\mathcal{G}_{n})\leqslant 0$$, $$n=1,\dots ,N$$. Note that the term $$- \int _{I_{n}} { \Vert\varPi_{n} {\partial _{t}} \mathcal{U}\Vert}_{E}^{2}$$ may compensate for $${{|||}\varPi_{n} U_{n-1}^{-}{|||}}_{E}^{2}> {|||}{U_{n-1}^{-}{|||}}_{E}^{2}$$. 4. The adaptive algorithm TAFEM Based on the observations in the previous section and a new concept for marking we shall next describe the adaptive algorithm TAFEM in this section. In contrast to the algorithms presented in the studies by Kreuzer et al. (2012) and Chen & Feng (2004), TAFEM is based on a different marking philosophy. In fact, they mark according to the same indicators, (3.7b)–(3.7d) and (3.16a), but in contrast to the studies by Kreuzer et al. (2012) and Chen & Feng (2004), TAFEM uses an L2-instead of an $$L^{\infty }$$-strategy. Philosophically, this aims at an L2 rather than an $$L^{\infty }$$ equal distribution of the error in time; compare also with the introductory Section 1. We follow a bottom-up approach, i.e., we first state basic properties on some rudimentary modules that are treated as black-box routines, then describe three core modules in detail and finally combine these procedures in the adaptive algorithm TAFEM . 4.1 Black-box modules As in the study by Kreuzer et al. (2012) we use standard modules ADAPT_INIT, COARSEN, MARK_REFINE and SOLVE as black-box routines, while adding the module ENLARGE as a black box as well. In particular, we use the subroutine MARK_REFINE in an object-oriented fashion, i.e., the functionality of MARK_REFINE changes according to its arguments. We next state the basic properties of these routines. Assumption 4.1 (Properties of modules). We suppose that all rudimentary modules terminate with an output having the following properties: (1) For a given initial datum u0 ∈ L2(Ω) and tolerance $${\texttt{TOL}_{0}}>0$$ the output   $$(U_{0}, \mathcal{G}_0) = \texttt{ADAPT_INIT}{{\left(u_{0},{\mathcal{G}_{\textrm{init}}},{\texttt{TOL}_{0}}\right)}}$$is a refinement $$\mathcal{G}_0\geqslant{\mathcal{G}_{\textrm{init}}}$$ and an approximation $$U_{0}\in{\mathbb{V}}(\mathcal{G}_0)$$ to u0 such that $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant \texttt{TOL}_{0}^{2}$$. (2) For given g ∈ L2(Ω), $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$, t⋆ ∈ (0, T), I = (t⋆, t⋆ + τ] ⊂ (t⋆, T] and $$\mathcal{G}\in{\mathbb{G}}$$, the output   $$U_{I} = \texttt{SOLVE}{{(g, \bar f, t, \tau, \mathcal{G})}}$$is the solution $$U_{I}\in \mathbb{P}_s (I,\mathbb{V}(G) )$$ to the discrete elliptic problem   $$\int_{I}\langle{{\partial_{t}} U_{I}},{V}\rangle_\varOmega + {\mathcal{B}[U_{I},\,V]}\,\textrm{d}t +\langle{U_{I}(t)},{V(t)}\rangle_\varOmega= \langle{g},{V}\rangle_\varOmega + \int_{I}\langle{\bar f},{V}\rangle_\varOmega \,\textrm{d}t$$for all $$V\in \mathbb{P}_s \left (\mathbb{V}(G) \right )$$; compare with (2.3). Hereby, we assume exact integration and linear algebra. (3) For a given grid $$\mathcal{G}\in{\mathbb{G}}$$ and a discrete function $$V\in{\mathbb{V}}(\mathcal{G})$$, the output   $$\mathcal{G}_{*} = \texttt{COARSEN}{(V,\mathcal{G})}$$satisfies $$\mathcal{G}_{*}\leqslant \mathcal{G}$$. (4) For a given function $$f\in L^{2}\left (\varOmega \times (0,T]\right )$$, an initial time t ∈ (0, T) and an initial step size τ ∈ (0, T − t], the output   $$\tau_{*} = \texttt{ENLARGE}\left(f, t, \tau,{\texttt{tol}_{f}}\right)$$satisfies $$\tau _{*}\geqslant \tau$$ or τ = T − t. The argument $${\texttt{tol}_{f}}$$ can be used to additionally restrict the enlargement; compare with Remark 4.3 below. (5) For a given grid $$\mathcal{G}$$ and a set of indicators $$\{{\mathcal{E}_{{E} }}\}_{{E} \in \mathcal{G}}$$, the output   $$\mathcal{G}_{*} = \texttt{MARK_REFINE}{{\left(\{{\mathcal{E}_{{E}}}\}_{{E}\in\mathcal{G}}, \mathcal{G}\right)}}\in{\mathbb{G}}$$is a conforming refinement of $$\mathcal{G}$$, where at least one element in the subset $${\arg \!\max }\{{\mathcal{E}_{{E}}}\!:\!{E} \!\in\! \mathcal{G}\}\subset \mathcal{G}$$ has been refined. (6) For given grids $$\mathcal{G},\mathcal{G}_{\textrm{old}}\in{\mathbb{G}}$$ and a set of indicators $$\{{\mathcal{E}_{{E} }}\}_{{E} \in \mathcal{G}}$$, the output   $$\mathcal{G}_{*} = \texttt{MARK_REFINE}{{\left(\{{\mathcal{E}_{{E} }}\}_{{E} \in\mathcal{G}}, \mathcal{G},\mathcal{G}_{\textrm{old}}\right)}} \in{\mathbb{G}}$$is a conforming refinement of $$\mathcal{G}$$, where at least one element of the set $$\left \{{E} \in \mathcal{G}\colon h_{\mathcal{G}|{E} }>h_{\mathcal{G}_{\textrm{old}}|{E} }\right \}$$ of coarsened elements (with respect to $$\mathcal{G}_{\textrm{old}}$$) is refined. For a more detailed description of these modules see the study by Kreuzer et al. (2012, Section 3.3.1). 4.2 The core modules The first core module CONSISTENCY controls the consistency error $${\mathcal{E}_{f}}$$. Recalling its definition in (3.7b) we see that the consistency error is solely influenced by the time-step size and can be computed without solving expensive discrete systems. Therefore, CONSISTENCY is used in the initialisation of each time step to adjust the time-step size such that the local consistency indicator $$\mathcal{E}^2_f(f,t,\tau)$$ is below a local tolerance $${{\texttt{tol}}_{f}}$$. It is important to notice that this module can be chosen to follow the classic thresholding algorithm, which ensures quasi-optimal order of convergence in terms of the degrees of freedom (DoFs); see Remark 4.3. We start with termination of the module CONSISTENCY. Lemma 4.2 (Termination of CONSISTENCY). Assume $$f \in L^{2}\left ((0,T);L^{2}(\varOmega )\right )$$. Then for any t ∈ (0, T) and τin ∈ (0, T − t],   $$\left(\,\bar f,\tau\right)=\texttt{CONSISTENCY}\left(f,t,\tau^{\textrm{in}},{{\texttt{tol}}_{f}}\right)$$terminates and   \begin{align} \mathcal{E}^2_f(f,t,\tau) \leqslant {\texttt{tol}}_{f}^{2}. \end{align} (4.1) Proof. The proof is straightforward since $$\mathcal{E}^2_f(f,t,\tau)$$ is monotone nonincreasing and $$\mathcal{E}^2_f(f,t,\tau) \rightarrow 0$$ when $$\tau \rightarrow 0$$. Remark 4.3 We note that the enlargement step in line 2 of CONSISTENCY is not relevant for Lemma 4.2, i.e., it could for instance be implemented by Algorithm 2. In this case the module CONSISTENCY becomes a classical thresholding algorithm; compare, e.g., with the study by Binev et al. (2002). However, a too aggressive enlargement of the time-step size may possibly need to be reduced later because of the time estimator $${\mathcal{E}_{\tau }}$$ (e.g., if $${\mathcal{E}_{f}}\equiv 0$$), which requires the solution of expensive linear systems. Therefore, we decided to introduce the abstract enlargement routine $$\texttt{ENLARGE}$$, which can be adjusted individually to the characteristics of the considered problem; compare also with Section 6.2.3, where we choose κ2 > 1, i.e., $$\min \{\kappa _{2}\tau ,T-t\}=\texttt{ENLARGE}(f,t,\tau ,{{\texttt{tol}}_{f}})$$ for the rough initial data experiment. Obviously, a local control of the form (4.1) does not guarantee that the global consistency error is below some prescribed tolerance $${{\texttt{TOL}}_{f}}$$. For this reason we first precompute some local tolerance $${{\texttt{tol}}_{f}}$$ from the global tolerance $${{\texttt{TOL}}_{f}}$$ by the following module $$\texttt{TOLFIND}$$. The next result states that if all local consistency indicators are below the threshold $${{\texttt{tol}}_{f}}$$ then the accumulation of the consistency indicators stays indeed below the prescribed global tolerance $${{\texttt{TOL}}_{f}}$$. Lemma 4.4 (Termination of TOLFIND). Assume $$f \in L^{2}\left ((0,T);L^{2}(\varOmega )\right )$$. Then for any $${{\texttt{TOL}}_{f}}>0$$ we have that   $${{\texttt{tol}}_{f}}=\texttt{TOLFIND}\left(\,f,T,{{\texttt{TOL}}_{f}}\right)>0$$terminates. Moreover, let 0 = t0 < t1 < ⋯< tN = T be arbitrary with $$\tau _{n}=t_n -t_{n-1}$$, n = 1, … , N, then   \begin{align} \mathcal{E}^2_f(f,{t_{n-1}},\tau_{n})\leqslant {\texttt{tol}}_{f}^{2},\quad n=1,\ldots, N \quad\Rightarrow\quad \sum_{n=1}^{N} \mathcal{E}^2_f(f,{t_{n-1}},\tau_{n}) \leqslant {\texttt{TOL}}_{f}^{2}. \end{align} (4.2) Proof. The proof is divided into three steps. (1) We show that the process from lines 4 to 8 terminates. To this end we recall the parameter κ1 ∈ (0, 1) from $$\texttt{CONSISTENCY}\left (\,f,{\tilde t_{n-1}},\tilde{\tau }_{n-1},{{\texttt{tol}}_{f}}\right )$$. We argue by contradiction and assume that an infinite monotone sequence $$\left \{\tilde t_{n}\right \}_{n\geqslant 0} \subset [0,T]$$ is constructed by TOLFIND with $$\lim _{n\to \infty }\tilde t_{n}=t^{\star }\in (0,T]$$. Defining the intervals $$I_{n}^{\ast }=\left (\tilde t_{n}, \tilde{t}_{n}^{\ast }\right ]$$ where $$\tilde{t}_{n}^{\ast }=\min \left \{ \kappa _{1}^{-1}(t^{\ast }-\tilde{t}_{n})+\tilde{t}_{n} ,T\right \}$$ we have that there exists $$n_{0}\in{\mathbb{N}}$$ such that   $$\Vert\,{f}\Vert_{\varOmega\times\left(\tilde{t}_{n}, \tilde{t}_{n}^{\ast}\right)}^{2}\leqslant {\texttt{tol}}_{f}^{2}$$for all n ⩾ n0 since $$\tilde{t}_{n}\to t^{\ast }$$. Let m > n0 such that the condition in line 3 of the module $$\texttt{CONSISTENCY} \left (\,f,{\tilde{t}_{m-1}},\tilde{\tau }_{m-1},{{\texttt{tol}}_{f}}\right )$$ is activated at least once (the existence of such m is a direct consequence of $$\tilde{t}_{n}\to t^{\ast }$$), then   \begin{align*} {\texttt{tol}}_{f}^{2} &< \mathcal{E}^2_f(\,f,{\tilde t_{m-1}},\kappa^{-1}_{1} \tilde{\tau}_{m} )\\ &\leqslant \Vert\,{f}\Vert_{\varOmega\times\left(\tilde t_{m-1},\tilde t_{m-1}+\kappa^{-1}_{1} \tilde{\tau}_{m}\right)}^{2} \\ &\leqslant \Vert\,{f}\Vert_{\varOmega\times\left(\tilde t_{m-1},\tilde t_{m-1}^{\ast}\right)}^{2}\leqslant {\texttt{tol}}_{f}^{2}. \end{align*}This contradiction implies that the sequence $$\left \{\tilde t_{n}\right \}_{n\geqslant 0}$$ is finite. (2) We next check that the condition of line 10 is violated after finitely many steps. Since the span of characteristics of dyadic intervals is dense in L2(0, T) we can choose M > 0 such that the squared consistency error on the grid of 2M uniform intervals is below $$\frac 14 {\texttt{TOL}}_{f}^{2}$$. We split the intervals generated in $$\texttt{TOLFIND}\left (\,f,T,{{\texttt{tol}}_{f}}\right )$$ into   $${\mathbb{I}}_{\textrm{in}}:=\left\{n: \left(\tilde t_{n-1},\tilde t_{n}\right]\subset T\big(m2^{-M},(m+1)2^{-M}\big]\ \textrm{for some}\ m\in\big\{0,\ldots,2^{M}-1\big\}\right\}$$and $${\mathbb{I}}_{\textrm{out}}:=\left \{1,\ldots ,N_{f}\right \}\setminus{\mathbb{I}}_{\textrm{in}}$$ according to whether or not they are included in one of the dyadic intervals. Therefore, we have, with the monotonicity of the consistency error and $$\#{\mathbb{I}}_{\textrm{out}}\leqslant 2^{M}$$, that   $$\epsilon = \sum_{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{\tilde t_{n-1}},\tilde{\tau}_{n})+\sum_{n\in{\mathbb{I}}_{\textrm{out}}}\mathcal{E}^2_f(\,f,{\tilde t_{n-1}},\tilde{\tau}_{n})\leqslant \frac14{\texttt{TOL}}_{f}^{2} + 2^{M}{\texttt{tol}}_{f}^{2}.$$Taking $${\texttt{tol}}_{f}^{2}\!<\! 2^{-(M+2)}{\texttt{TOL}}_{f}^{2}$$ we see that the condition of line 10 is violated, which proves the assertion. (3) Combining the above steps we conclude that TOLFIND terminates and it remains to prove (4.2). To this end we proceed similarly as in (2) and let   $${\mathbb{I}}_{\textrm{in}}:=\left\{n: \left(t_{n-1},t_{n}\right]\subset \left(\tilde t_{m-1},\tilde t_{m}\right]\ \textrm{for some}\ m\in\left\{1,\ldots,N_{f}\right\}\right\}$$and $${\mathbb{I}}_{\textrm{out}}:=\left \{1,\ldots , N\right \}\setminus{\mathbb{I}}_{\textrm{in}}$$. By monotonicity we have $$\sum _{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{ t_{n-1}},\tau _{n})\leqslant \sum _{n=1}^{N_{f}}\mathcal{E}^2_f(\,f,{ \tilde t_{n-1}},\tilde \tau _{n})\leqslant {\texttt{TOL}}_{f}^{2}/2$$ and thus the assertion follows from   \begin{align*} \sum_{n=1}^{N}\mathcal{E}^2_f(\,f,{t_{n-1}},\tau_{n})&= \sum_{n\in{\mathbb{I}}_{\textrm{in}}}\mathcal{E}^2_f(\,f,{ t_{n-1}},\tau_{n})+\sum_{n\in{\mathbb{I}}_{\textrm{out}}}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n}) \\ &\leqslant \frac{{\texttt{TOL}}_{f}^{2}}2 + N_{f}{\texttt{tol}}_{f}^{2} = \frac{{\texttt{TOL}}_{f}^{2}}2 + N_{f}\frac{{\texttt{TOL}}_{f}^{2}}{2N_{f}}\leqslant {\texttt{TOL}}_{f}^{2}. \end{align*} Remark 4.5 (Estimation of $${{\texttt{tol}}_{f}}$$ under regularity assumptions). Supposing the regularity assumption $$f \in H^{s}\left ((0,T);L^{2}(\varOmega )\right )$$, s ∈ (0, 1] the following idea may be used as an alternative for the estimation of $${{\texttt{tol}}_{f}}$$ with TOLFIND. Let δ > 0. Then using Lemma 4.2 together with Poincaré’s inequality in Hs and the fact that there are at most $$\frac{T}{\delta }$$ disjoint intervals of length δ in (0, T], we obtain   \begin{align*} \sum_{n=1}^{N}\mathcal{E}^2_f(f,{t_{n-1}},\tau_{n})&= \sum_{\tau_{n}>\delta}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n})+\sum_{\tau_{n}\leq\delta}\mathcal{E}^2_f(f,{ t_{n-1}},\tau_{n}) \\ &\leqslant \frac{T}{\delta}{\texttt{tol}}_{f}^{2} +\sum_{\tau_{n}\leqslant\delta} \tau_{n}^{2s} \Vert\,{f}\Vert_{H^{s}(t_{n-1},t_{n},L^{2}(\varOmega))}^{2} \\ &= \frac{T}{\delta}{\texttt{tol}}_{f}^{2} + \delta^{2s} \Vert\,{f}\Vert_{H^{s}(0,T,L^{2}(\varOmega))}^{2}. \end{align*}By choosing $$\delta = \left (\frac{T \, {{\texttt{tol}}_{f}}}{ \Vert\,{f}\Vert_{H^{s}\left (0,T,L^{2}(\varOmega )\right )}}\right )^{\frac{2}{2s+1}}$$ the previous estimate turns into   $$\sum_{n=1}^{N}\mathcal{E}^2_f(\,f,{t_{n-1}},\tau_{n})\leqslant 2 T^{\frac{2s}{2s+1}} \, \Vert\,{f}\Vert_{H^{s}\left(0,T,L^{2}(\varOmega)\right)}^{\frac{2}{2s+1}} {\texttt{tol}}_{f}^{\frac{4s}{2s+1}}.$$ In other words if a priori knowledge of the regularity of the right-hand side is available then TOLFIND can be replaced by the somewhat simpler term   $${\texttt{tol}}_{f}^{2} = 2^{-\frac{2s+1}{2s}}\,T^{-1} \Vert{\, f}\Vert_{H^{s}\left(0,T,L^{2}(\varOmega)\right)}^{-\frac{1}{s}} \,{\texttt{TOL}}_{f}^{\frac{2s+1}{s}}.$$ We turn to the module ST_ADAPTATION, listed in Algorithm 4, which handles a single time step. The module adapts the grid and the time-step size according to the indicators involving the discrete solution of the current time step, namely the space indicator $${\mathcal{E}_{\mathcal{G}}}$$ and the separated coarsening and time indicators $${\mathcal{E}_{c}}$$ and $${\mathcal{E}_{\tau }}$$. The routine requires, right at the start of each iteration, the computation of the discrete solution on the actual grid and with the current time-step size; see line 5. Note that in ST_ADAPTATION only refinements are performed (both in space and in time). Recalling the discussion in the introductory Section 1, we aim to use a thresholding algorithm for the indicators $${\mathcal{E}_{\tau }}$$ in order to equally distribute the time error. To this end we need to guarantee $${\mathcal{E}_{*}}\leqslant 0$$ in order to control the global time error with the help of the uniform energy estimate from Corollary 3.11. Since there is no similar control available for either the space or the coarsening errors, we relate the corresponding indicators to the time or the consistency indicator, i.e., to adapt the spatial triangulation until   \begin{align} \mathcal{E}_{c}^{2},\mathcal{E}_{\mathcal{G}}^{2}\leqslant \mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}. \end{align} (4.3)Here we have invoked the consistency indicator $${\mathcal{E}_{f}}$$ on the right-hand side although it is controlled by CONSISTENCY outside ST_ADAPTATION—note that $${\mathcal{E}_{f}}$$ does not depend on the discrete solution. In fact, from the uniform energy estimate, Corollary 3.11, we have that $${\mathcal{E}_{\tau }}$$ vanishes faster than $${\mathcal{E}_{f}}$$ by one order, when no additional regularity of f is assumed. Consequently, in this case the time-step size is dictated by $${\mathcal{E}_{f}}$$, which may lead to $${\mathcal{E}_{\tau }}\ll{{\texttt{tol}}_{\mathcal{G}\tau }}$$. Thanks to Lemma 4.4 we expect that (4.3) leads to an equal distribution of the errors in time in most cases. However, the case $$\max \left \{{\mathcal{E}_{\tau }},{\mathcal{E}_{f}}\right \}\ll \min \left \{{{\texttt{tol}}_{\mathcal{G}\tau }},{{\texttt{tol}}_{f}}\right \}$$ cannot be avoided theoretically; hence, we have supplemented (4.3) with the safeguard $$L^{\infty }$$ marking tolerance $$\tau{{\texttt{tol}}_{\mathcal{G}\tau }}$$; compare with lines 8 and 15 of ST_ADAPTATION. In particular, in order to not waste tolerances we have implemented the safeguard additively thanks to   $$\max\left\{\mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}, \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\right\}\leqslant \mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}+ \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\leqslant 2 \max\left\{\mathcal{E}_{\tau}^{2}+\mathcal{E}_{f}^{2}, \tau{{\texttt{tol}}_{\mathcal{G}\tau}}\right\}\!.$$ Note that in the above discussion we have concentrated on an equal distribution in time and have tacitly assumed that in each time step the local space indicators are optimally distributed, which is motivated by the optimal convergence analysis for elliptic problems; compare, e.g., with the studies by Cascon et al. (2008); Diening et al. (2016) and Stevenson (2007). Remark 4.6 We note that the if conditions in lines 15 and 8 of ST_ADAPTATION may involve additional parameters. For instance, line 8 may be replaced by   $$15:\mathbf{else}\ \mathbf{if} \ \mathcal{E}^2_c (U_{t}^{-},\tau ,\mathcal{G})\geqslant \gamma _{c}\, \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau ,\mathcal{G}) + \rho _{c}\mathcal{E}^2_f(f,t,\tau) +\sigma _{c} \tau{{\texttt{tol}}_{\mathcal{G}\tau }}\ \mathbf{then}$$with γc, ρc, σc > 0 and similar for the space indicator $$\mathcal{E}_{\mathcal{G}}$$ in line 15 with constants $$\gamma _{\mathcal{G}},\rho _{\mathcal{G}}, \sigma _{\mathcal{G}}>0$$. This requires some modifications of TAFEM, which would make the presentation more technical. For the sake of clarity of the presentation we decided to skip these customisation possibilities; compare also with Remark 5.3. 4.3 The main module TAFEM We are now in a position to formulate TAFEM in Algorithm 5 below. In the initialisation phase the given tolerance $${{\texttt{TOL}}}>0$$ is split into tolerances $${{\texttt{TOL}}_{0}},{{\texttt{TOL}}_{f}},{{\texttt{TOL}}_{\mathcal{G}\tau }}>0$$. Next ADAPT_INIT provides a sufficiently good approximation $$U_{0}$$ of the initial datum u0. The constant CT is computed in order to determine the right scaling of the tolerances in the module ST_ADAPTATION. In particular, the last term 2T in CT accounts for the $$L^{\infty }$$ safeguard marking, whereas the other terms result from (4.3) in combination with the estimate for $$\mathcal{E}_{\tau }$$ from Corollary 3.11; compare also with the proof of Theorem 5.2. Then the time-step iteration is entered, where each single time step consists of the following main steps. We first initialise the time-step size by CONSISTENCY and then conduct one coarsening step with COARSEN. The adaptation of the grid and time-step size with respect to the indicators for the spatial, temporal and coarsening error is done by ST_ADAPTATION. 5. Convergence In this section we first prove that the core modules and TAFEM terminate and then verify that the estimators and thus the error are below the given tolerance. Throughout the section we suppose that the black-box modules satisfy Assumption 4.1. Before turning to the main module ST_ADAPTATION, as an auxiliary result, we shall consider convergence of the adaptive finite element method for stationary elliptic problems of the kind (3.13), which have to be solved in each time step. Proposition 4.7 (Convergence for the elliptic problem). Suppose that $$v^- \in L^{2}(\varOmega )$$, $$\bar f\in \mathbb{P}_s \left (L^{2}(\varOmega )\right )$$ and τ > 0. Then starting from any grid $$\mathcal{G}^{0}\in{\mathbb{G}}$$ we have for the sequence $$\{\mathcal{G}^{k},U_{\tau }^{k}\}_{k\geqslant 0}\subset{\mathbb{G}} \times \mathbb{P}_s ({\mathbb{V}})$$ generated by AFEM$$(v^{-},\bar f,t,\tau ,\mathcal{G}^{0})$$ that   $$\mathcal{E}_{\mathcal{G}}^{2}\left(U_\tau^k,v^-,\tau,t,\bar f,\mathcal{G}^k\right) \rightarrow 0\quad\textrm{as}\ k\to\infty.$$ Proof. Recalling Remark 3.9 we have that $$\mathcal{E}_{\mathcal{G}}^{2}(U_{\tau }^{k},v^{-},\tau ,0,\bar f,\mathcal{G}^{k})$$ are the standard residual-based a posteriori error estimators for the coercive problem (3.13). From Lemmas 3.2 and 3.3 and Assumption 4.1 on MARK_REFINE we have that the conditions of Siebert (2011) are satisfied. This yields the assertion. Lemma 4.8 (Termination of ST_ADAPTATION). For any t ∈ (0, T), τin ∈ (0, T − t], $$\mathcal{G},\mathcal{G}_{\textrm{old}}\in{\mathbb{G}}$$ and $$U_{t}^{-}\in{\mathbb{V}}(\mathcal{G}_{\textrm{old}})$$ we have that   $$\left(U_I,\tau,\bar f,\mathcal{G}\right)=\texttt{ST_ADAPTATION}{\left(U_t^-,f,t,\tau^{\textrm{in}},\mathcal{G}_{\textrm{in}},\mathcal{G}_{\textrm{old}}, {{\texttt{tol}}_{\mathcal{G}\tau}}\right)}$$terminates. Moreover, we have $$\mathcal{G}\geqslant \mathcal{G}_{0}$$, $$\mathcal{E}^2_\star ({U_{t}^{+}, U_{t}^{-},\tau ,\mathcal{G}})\leqslant 0$$,   $$\tau^{\textrm{in}}\geqslant \tau \geqslant \min\left\{\tau^{\textrm{in}},\frac{\kappa\,{\texttt{tol}}_{\mathcal{G}\tau}^{2}}{ 6 \left( \Vert\,{f}\Vert_{\varOmega\times(t,\, t+\tau)}^{2} + {{|||} U_{t}^{-}{|||}}_{\varOmega}^{2} \right)}\right\}$$and the indicators satisfy the tolerances   \begin{align*} \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G}) &\leqslant {\texttt{tol}}_{\mathcal{G}\tau}^{2}, \\ \mathcal{E}_{\mathcal{G}}^{2}\left(U_{I},U_{t}^{-},t,\tau,\bar f,\mathcal{G}\right)&\leqslant \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G}) +\mathcal{E}^2_f(f,t,\tau) + \tau{{\texttt{tol}}_{\mathcal{G}\tau}}, \\ \mathcal{E}^2_c(U_{t}^{-},\tau,\mathcal{G})&\leqslant \mathcal{E}^2_\tau(U_{t}^{+},U_{t}^{-},\tau,\mathcal{G})+ \mathcal{E}^2_f(f,t,\tau) + \tau{{\texttt{tol}}_{\mathcal{G}\tau}}, \end{align*}where $$U_{t}^{+}=\lim _{s\searrow t}U_{(t,t+\tau ]}(s)$$. Proof. In each iteration of the loop in ST_ADAPTATION at first, a discrete solution UI is computed on the current grid $$\mathcal{G}$$ with the actual time-step size τ. Then either the time-step size is reduced or the actual grid is refined. More precisely, exactly one of the statements labelled $$\fbox{\textsf{A}}$$,…,$$\fbox{\textsf{D}}$$ in Algorithm 4 is executed, any of them terminating by Assumption 4.1. Whenever one of these statements is executed the corresponding indicator is positive. In statement $$\fbox{\textsf{A}}$$ the grid is refined due to the coarsening indicator $${\mathcal{E}_{c}}$$. Thanks to Assumption 4.1 (5), after a finite number of executions of $$\fbox{\textsf{A}}$$, a grid $$\mathcal{G}$$ is obtained with $$\mathcal{G}_{\textrm{old}}\leqslant \mathcal{G}$$ and thus $$\mathcal{E}^2_c (U_{t}^{-},\mathcal{G})=0$$, i.e., statement $$\fbox{\textsf{A}}$$ is not entered anymore. This happens irrespective of refinements in other statements. In statement $$\fbox{\textsf{B}}$$ the grid is refined with respect to the indicators $$\mathcal{E}_{*}$$ controlling the energy gain due to coarsening. Therefore, it follows from the same reasoning as for statement $$\fbox{\textsf{A}}$$ that statement $$\fbox{\textsf{B}}$$ is also executed at most until the coarsening is fully removed after finitely many refinement steps. Assume that statement $$\fbox{\textsf{C}}$$ is executed infinitely many times. Then according to the above considerations, we have that, after a finite number of iterations, statement $$\fbox{\textsf{B}}$$ is not executed anymore, i.e., (3.15) is always satisfied according to line 10. Consequently, the condition in line 12 and Corollary 3.11 imply   $$\frac{1}{\tau} \leqslant \frac{1}{\tau} 6\,\tau{{|||} U_{t}^{+}-\varPi_{\mathcal{G}} U_{t}^{-}{|||}}_{\varOmega}^{2} \frac{1}{{\texttt{tol}}_{\mathcal{G}\tau}^{2}} \leqslant \frac{1}{{\texttt{tol}}_{\mathcal{G}\tau}^{2}} 6\, \left( \Vert\,{f}\Vert_{\varOmega\times(t,t+\tau)}^{2} + {{|||} U^{-}_{t}{|||}}_{\varOmega}^{2} \right)\!,$$which contradicts the assumption that statement $$\fbox{\textsf{C}}$$ is executed infinitely many times. The same argument also proves the asserted lower bound on the finally accepted time-step size. Assuming that ST_ADAPTATION does not terminate we infer from the fact that all other statements are conducted only finitely many times, that statement $$\fbox{\textsf{D}}$$ has to be executed infinitely many times. In other words the loop reduces to the adaptive iteration AFEM with fixed data $$U_{t}^{-}$$, $$\bar f$$, t and τ. Therefore, Proposition 4.7 contradicts the condition in line 15. In summary, we deduce that ST_ADAPTATION terminates and the iteration is abandoned in line 18. This proves the assertion. We next address the termination of the main module TAFEM . Proposition 5.1 (Termination of TAFEM). The adaptive algorithm TAFEM terminates for any initial time-step size τ0 > 0 and produces a finite number of time instances $$0=t_{0}<\dots <t_{N}=T$$. Moreover, we have $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant {\texttt{TOL}}_{0}^{2}$$ and that the consistency error complies with (4.2). For all n = 1, … , N we have that the estimates in Lemma 4.8 are satisfied with t = tn−1, τ = τn, $$U_{I}=U_{|I_{n}}$$, $$U_{t}^{\pm }=U_{n-1}^{\pm }$$, $$\mathcal{G}=\mathcal{G}_{n}$$ and $$\mathcal{G}_{\textrm{old}}=\mathcal{G}_{n-1}$$. Proof. Each loop starts with setting the time-step size such that $$\tau _{n}\leqslant T-t_{n}$$, $$n\in{\mathbb{N}}$$. Thanks to Assumption 4.1 for the black-box modules, Lemma 4.2 for CONSISTENCY and Lemma 4.8 for ST_ADAPTATION, all modules of TAFEM terminate and in each time step the asserted properties are satisfied. Since we have $$\mathcal{E}^2_\star({U_{n-1}^{+}, U_{n-1}^{-},\tau _{n},\mathcal{G}_{n}})\leqslant 0$$ for all n we may conclude $${{|||} U_{n-1}^{-}{|||}}_{\varOmega}\leqslant \Vert\, {f}\Vert_{\varOmega \times (0,T)}^{2}+ {|||} U_0 {|||}_{\varOmega}$$ from Lemma 3.11 and thus it follows with Lemma 4.8 that   $$\tau^{\textrm{in}}_{n}\geqslant \tau_{n} \geqslant \min\left\{\tau^{\textrm{in}}_{n},\frac{\kappa\,{\texttt{tol}}_{\mathcal{G}\tau}^{2}}{12 \left( \Vert{f}\Vert_{\varOmega\times(0,T)}^{2} + { {|||} U_{0}{|||}}_{\varOmega}^{2} \right)}\right\}\!,$$where $$\tau ^{\textrm{in}}_{n}=\texttt{CONSISTENCY}(\,{f,t_{n-1},\tau _{n-1}}, {{\texttt{tol}}_{f}})$$. Assuming that the final time is not reached implies τn → 0 as $$n\to \infty$$ and therefore there exists $$n_{0}\in{\mathbb{N}}$$ such that τn = τnin for all n ⩾ n0. Now the contradiction follows as in step (1) of the proof of Lemma 4.4. Collecting the results derived above allows us to prove the main result. Theorem 5.2 (Convergence into tolerance). Algorithm TAFEM computes for any prescribed tolerance $${{\texttt{TOL}}}>0$$ and initial time-step size τ0 > 0 a partition $$0<t_{0}<\dots <t_{N}=T$$ with associated meshes $$\{\mathcal{G}_n\}_{n=0,\dots ,N}$$, such that we have for the corresponding approximation $${\mathcal{U}}\in \textrm{W}$$ from (2.7) that   $$\Vert{u-{\mathcal{U}}}\Vert_{\mathbb{W}(0,T)} \leqslant{{\texttt{TOL}}}.$$ Proof. Thanks to Proposition 5.1 we have that TAFEM terminates and it remains to prove the error bound. For the sake of brevity of the presentation we shall use the abbreviations   \begin{alignat*}{2} \mathcal{E}^2_{\tau}(n)&:= \mathcal{E}^2_{\tau}({U_{n-1}^{+},U_{n-1}^{-}, \tau_{n},\mathcal{G}_{n}}),&\qquad \mathcal{E}^2_f(n)&:= \mathcal{E}^2_f(\,f,t_{n-1},\tau_{n}), \\ \mathcal{E}_{\mathcal{G}}^{2}(n)&:=\mathcal{E}_{\mathcal{G}}^{2}({U,U_{n-1}^{-},t_{n-1},\tau_{n}, f_{n},\mathcal{G}_n}) &\quad\textrm{and}\quad \mathcal{E}^2_c (n)&:= \mathcal{E}^2_c ({U_{n-1}^{-},\tau_{n}, \mathcal{G}_n}). \end{alignat*} The initial error satisfies $$\mathcal{E}^2_0({u_{0},\mathcal{G}_0})\leqslant {\texttt{TOL}}_{0}^{2}$$ by Assumption 4.1. Thanks to the choice of the precomputed local tolerance $${{\texttt{tol}}_{f}}$$ we know from Lemma 4.4 that the consistency error is bounded by $${{\texttt{TOL}}_{f}}$$, i.e., we have (4.2). When finalizing a time step, we also have from Lemma 5.1 that   $$\mathcal{E}^2_{\tau}(n) \leqslant {\texttt{tol}}_{\mathcal{G}\tau}^{2} \qquad \textrm{and} \qquad \mathcal{E}_{\mathcal{G}}^{2}(n), \mathcal{E}^2_c (n)\leqslant \mathcal{E}^2_\tau(n)+\mathcal{E}^2_f(n) +\tau_{n} {{\texttt{tol}}_{\mathcal{G}\tau}},$$with $${{\texttt{tol}}_{\mathcal{G}\tau }}={\texttt{TOL}}_{\mathcal{G}\tau }^{2}/C_{T}$$. Combining this with (3.16a) and (4.2) we conclude that  \begin{align*} \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\,\mathcal{E}_{c\tau}^{2}\left({U_{n-1}^{+}},{U_{n-1}^{-},\tau_{n}}\right) &\leqslant \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\mathcal{E}^2_c (n) +\mathcal{E}^2_{\tau}(n) \\ &\leqslant \sum_{n=1}^{N} 2\,\tau_{n} {{\texttt{tol}}_{\mathcal{G}\tau}}+2\,\mathcal{E}^2_f(n)+ 3\, \mathcal{E}^2_{\tau}(n) \\ &\leqslant 2 T \, {{\texttt{tol}}_{\mathcal{G}\tau}} + 2\,{\texttt{TOL}}_{f}^{2}+3 \sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n). \end{align*}Using Corollary 3.11 for the last term we get for any δ > 0 that   \begin{align*} \sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n) &= \sum_{\tau_{n}>\delta} \mathcal{E}^2_{\tau}(n) + \sum_{\tau_{n}\leqslant\delta} \mathcal{E}^2_{\tau}(n) \\ & \leqslant \frac{T}{\delta} {\texttt{tol}}_{\mathcal{G}\tau}^{2} + \delta \sum_{n=1}^{N} 6\, C_{\tau}\,{{|||}U_{n-1}^{+} -\varPi_{\mathcal{G}_{n}}U_{n-1}^{-}{|||}}_{\varOmega}^{2} \\ &\leqslant \frac{T}{\delta} {\texttt{tol}}_{\mathcal{G}\tau}^{2} + \delta 6 \,C_{\tau} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}^{2} \right) \end{align*}and by choosing   $$\delta = \left( \frac{T}{ 6 \,C_{\tau} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}_{\varOmega}^{2} \right)}\right)^{\frac{1}{2}} {{\texttt{tol}}_{\mathcal{G}\tau}},$$ we obtain   $$\sum_{n=1}^{N} \mathcal{E}^2_{\tau}(n) \leqslant 2\left( 6 C_{\tau} \,T \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + { \Vert U_{0}\Vert}_{\varOmega}^{2} \right) \right)^{\frac12} {{\texttt{tol}}_{\mathcal{G}\tau}} .$$Inserting this into the above estimate yields   \begin{align*} \sum_{n=1}^{N}\mathcal{E}_{\mathcal{G}}^{2}(n) +\mathcal{E}_{c\tau}^{2}\left({U_{n-1}^{+}},{U_{n-1}^{-},\tau_{n}}\right) &\leqslant \underbrace{\left( 6\,\sqrt{6 C_{\tau}\,T} \left( \Vert\,{f}\Vert_{\varOmega\times(0,T)}^{2} + {{|||} U_{0}{|||}}_{\varOmega}^{2} \right)^{\frac12}+ 2 T \right)}_{=C_{T}}\, {{\texttt{tol}}_{\mathcal{G}\tau}}+2\,{\texttt{TOL}}_{f}^{2} \\ &\leqslant{\texttt{TOL}}_{\mathcal{G}\tau}^{2}+2\,{\texttt{TOL}}_{f}^{2}. \end{align*} Collecting the bounds for the indicators $${\mathcal{E}_{0}}$$, $${\mathcal{E}_{\mathcal{G}}},$$$${\mathcal{E}_{c\tau }}$$, and $${\mathcal{E}_{f}}$$, recalling the splitting   $${\texttt{TOL}}_0^2 + 3\,{\texttt{TOL}}_f^2+ {\texttt{TOL}}_{\mathcal{G}\tau}^2 = {{\texttt{TOL}}}^2$$and taking into account the upper bound of Proposition 3.4 proves the assertion. Remark 5.3 In order to guarantee the main result (Theorem 5.2) also for the modifications of Remark 4.6, line 5 in TAFEM must be changed to 5: compute $$C_{T}:= \left (1+\gamma _{c}+\gamma _{\mathcal{G}}\right )\,2\,\sqrt{6 \,C_{\tau } \, T} \left ( \Vert\,{f}\Vert_{\varOmega \times (0,T)}^{2} + {{|||} U_{0}^{-}{|||}}_{\varOmega}^{2} \right )^{\frac 12} + \left (\sigma _{c}+\sigma _{\mathcal{G}}\right )\,T$$. Moreover, the splitting of the tolerances in line 2 must be changed to 2: split tolerance $${{\texttt{TOL}}}>0$$ such that $${\texttt{TOL}}_{0}^{2}+(1+\rho _{\mathcal{G}}+\rho _{c}){\texttt{TOL}}_{f}^{2}+{\texttt{TOL}}_{\mathcal{G}\tau }^{2}={\texttt{TOL}}^{2}$$. 6. Numerical aspects and experiments We conclude the article by illustrating some practical aspects of the implementation with three numerical experiments. We compare the presented algorithm TAFEM with the algorithm ASTFEM introduced in the study by Kreuzer et al. (2012). 6.1 The implementation The experiments are implemented in DUNE (Blatt et al., 2016) using the DUNE-ACFEM ( https://www.dune-project.org/modules/dune-acfem/) module. The computations utilise linear finite elements in space and the dG(0) time-stepping scheme. All simulations were performed on an AMD® Opteron™7274 Processor with 128 GB RAM. Both algorithms TAFEM and ASTFEM start from exactly the same initial mesh $${\mathcal{G}_{\textrm{init}}}$$. The initial values are interpolated on the mesh and local refinements are performed in order to comply with the initial tolerance. On the resulting meshes the needed constants are computed (the minimal time-step size τ* for ASTFEM and $${{\texttt{tol}}_{f}}$$ from TOLFIND for TAFEM ). In order to control $$\mathcal{E}_{c}$$ and $$\mathcal{E}_{*}$$ the algorithms need to handle two meshes and corresponding finite element spaces at every new time step. This is realised by exploiting the tree structure of the refinements of macroelements as in the study by Kreuzer et al. (2012). At every new time step the time-step size is adjusted by the module CONSISTENCY and all elements of the mesh are marked to be coarsened up to two times and then adapted again if necessary. The mentioned estimators are computed only up to constants and used for the adaptive refinement progress. The spatial marking relies on the equidistrubution strategy, which marks every element with an estimator bigger than the arithmetic mean. The following remark lists the tolerance splitting used by ASTFEM. Remark 6.1 In the study by Kreuzer et al. (2012), ASTFEM uses the tolerance splitting   $${{\texttt{TOL}}}^{2}={{{\texttt{TOL}}}}_{0}^{2}+T\widetilde{{{\texttt{TOL}}}}_{f}^{2}+T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2}.$$Thereby, $${{\texttt{TOL}}}_{*}^{2}$$ is used to compute a minimal safeguard step size τ*. The method computes then an approximation $$\mathcal{U}\in \textrm{W}$$ to (2.2) such that   $$\mathcal{E}^2_0(u_{0},\mathcal{G}_{0})\leqslant{{\texttt{TOL}}}_{0}^{2},\qquad \sum_{n=1}^{N}\left\{\mathcal{E}^2_f(\,f,t_{n-1},\tau_{n})\right\}\leqslant T\widetilde{{{\texttt{TOL}}}}_{f}^{2}$$and   $$\sum_{n=1}^{N}\left\{ \mathcal{E}^2_0(U_{n-1}^{+},U_{n-1}^{-},\tau_{n}) +\mathcal{E}^2_0(U,U_{n-1}^{-},t_n,\tau_{n},f_{n},\mathcal{G}_{n})\right\} \leqslant T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2}.$$This motivates the relations   $$T\widetilde{{{\texttt{TOL}}}}_{f}^{2}=3{{{\texttt{TOL}}}}_{f}^{2}\quad \textrm{and}\quad T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau}^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2} ={{\texttt{TOL}}}_{\mathcal{G}\tau}^{2}$$in the examples below. For the simulations presented below we have used the following comparable splittings for the two methods ASTFEM and TAFEM relative to the total tolerance $${{\texttt{TOL}}}$$: $${{\texttt{TOL}}}_{0}^{2}=\frac 1{10}{\texttt{TOL}}^{2}$$, $${{\texttt{TOL}}}_{f}^{2}=T\widetilde{{{\texttt{TOL}}}}^{2}_{f} =\frac 4{10} {{\texttt{TOL}}}^{2}$$, $${{\texttt{TOL}}}_{\mathcal{G}\tau }^{2}=T\widetilde{{{\texttt{TOL}}}}_{\mathcal{G}\tau }^{2}+\widetilde{{{\texttt{TOL}}}}_{*}^{2} =\frac 6{10} {{\texttt{TOL}}}^{2}$$, $$\widetilde{{{\texttt{TOL}}}}_{*}^{2} =\frac 1{100} {{\texttt{TOL}}}^{2}$$. 6.2 The experiments In this section we introduce the three numerical experiments in detail and discuss the numerical results. 6.2.1 Singularity in time This numerical experiment is constructed on the spatial domain $$\varOmega =(0,1)^{2}\subset{\mathbb{R}}^{2}$$ over the time interval (0, T) = (0, 2) with homogeneous Dirichlet boundary conditions and homogeneous initial data. The right-hand side f is chosen such that the exact solution is given by   $$u(\boldsymbol{x},t) = |t-\bar{t}|^{\alpha}\sin\big(\pi\big(x^{2}-x\big)t\big)\sin\big(\pi(y^{2}-y)t\big)$$with parameters $$\bar{t}=\frac{\pi }{3}$$ and α = 0.7. The graph of u has a singularity in time at $$t=\frac{\pi }{3}$$. Hence, the right-hand side contains the term $$\textrm{sgn}(t-\bar{t})\alpha |t-\bar{t}|^{\alpha -1}$$. A direct calculation shows that this term is L2-integrable but is not in H1. This particular example shows one main advantage of TAFEM . In fact, in contrast to ASTFEM, TAFEM does not require the right-hand side f to have a temporal derivative in L2 in order to control the consistency error $${\mathcal{E}_{f}}$$. As in the example of Section 6.2.2 the threshold-like version of ENLARGE from Remark 4.3 is used for the computations, with enlargement constant κ2 = 4. However, test runs with the fixed enlargement, i.e., $$\min \{\kappa _{2}\tau ,T-t\}=\texttt{ENLARGE}(\,f,t,\tau ,{{\texttt{tol}}_{f}})$$, produce nearly the same results. Fig. 1. View largeDownload slide DoFs and time-step sizes for the singularity in time problem. Fig. 1. View largeDownload slide DoFs and time-step sizes for the singularity in time problem. Fig. 2. View largeDownload slide The local error estimators $${\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}$$ for TAFEM and ASTFEM, as well as the sum of local $$L^{\infty }$$ indicators $$\frac 1\tau \left ({\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}\right )$$ used by ASTFEM for the singularity in time problem. Fig. 2. View largeDownload slide The local error estimators $${\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}$$ for TAFEM and ASTFEM, as well as the sum of local $$L^{\infty }$$ indicators $$\frac 1\tau \left ({\mathcal{E}_{\tau }}^{2}+{\mathcal{E}_{\mathcal{G}}}^{2}+{\mathcal{E}_{c}}^{2}+{\mathcal{E}_{f}}^{2}\right )$$ used by ASTFEM for the singularity in time problem. ASTFEM was killed after time step 321 in which 27 513668 DoFs are used as well as a time-step size of 7.3172e−10. As can be observed from Fig. 1, ASTFEM massively refines in time and space. It was killed before reaching the singularity at $$\bar{t}=\frac{\pi }{3}$$, thereby accumulating a total number of 1070889132 DoFs. The reason for this behaviour lies in the $$L^{\infty }$$ marking. Indeed, Fig. 2 shows that ASTFEM equally distributes the $$L^{\infty }$$ indicators, thereby leading to very small local errors, which cause the strong spatial refinement. Note that the minimal step size τ* in ASTFEM applies only when temporal refinement is performed due to the time indicator $${\mathcal{E}_{\tau }}$$, i.e., time-step sizes below the threshold τ* can be chosen when required by the consistency estimator $$\mathcal{E}_{f}$$, which is the case close to the singularity. Consequently, the behaviour of ASTFEM cannot essentially be improved by a different choice of $${{\texttt{TOL}}}_{*}$$. In contrast, the local estimators in TAFEM appear to be quite equally distributed. It uses slightly larger time steps and far fewer DoFs close to the singularity; compare with the table of Table 1. It completely outperforms ASTFEM and reaches the final time with a total of 2481808 DoFs in 575 time steps. Table 1 Time steps and DoFs of ASTFEM and TAFEM for the singularity in time problem Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Table 1 Time steps and DoFs of ASTFEM and TAFEM for the singularity in time problem Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  Time  Time step ASTFEM  DoFs ASTFEM  Time step TAFEM  DoFs TAFEM  1.0  0.00613614  97  0.00353591  221  1.02  0.00433893  85  0.00250028  221  1.03  0.0030681  97  0.0025003  105  1.04  0.00153406  97  0.00125015  181  1.042  0.00108475  97  0.00125016  128  1.044  0.00108475  97  0.000883996  128  1.047  3.3899e–05  713  0.0001105  478  1.0471  1.19852e–05  2073  3.90677e–05  1136  1.0472  9.36409e–08  27513668  4.76905e–09  224914  1.0473  non  non  5.5251e–05  759  6.2.2 Jumping singularity Inspired by Morin et al. (2000, example 5.3) we construct an experiment where the solution has a strong spatial singularity that changes its position in time. In the domain Ω × (0, 4], with Ω = (0, 3) × (0, 3), we define the elliptic operator $$\mathcal{L}$$u = −div A∇u, where   $${\mathbf{A}}(t,x) = \begin{cases} a_{1}\mathbb{I}\qquad & \textrm{if } (x-x_{i})(y-y_{i})\geqslant0,\\ a_{2}\mathbb{I}\qquad & \textrm{if } (x-x_{i})(y-y_{i})<0, \end{cases}$$with a1 = 161.4476387975881, a2 = 1, i = ⌈t⌉, (x1, y1) = (1, 2), (x2, y2) = (1, 1), (x3, y3) = (2, 1) and (x4, y4) = (2, 2). This operator will ‘move’ the singularity through the points xi. Let u be the function   $$u(x,t) = \sum_{i=1}^{4} s_{i}(t)\ r_{i}^{\gamma}\ \mu(\theta_{i}),$$where   $$s_{i}(t) = \begin{cases} (t-(i-1))^{2}(t-i)^{2}\quad &\textrm{if } i-1\leqslant t \leqslant i,\\ 0\qquad & \textrm{otherwise}, \end{cases}$$and   $$\mu(\theta) = \begin{cases} \cos\left(\left(\frac{\pi}{2}-\sigma\right)\gamma\right)\cos\left(\left(\theta-\frac{\pi}{2}+\rho\right)\gamma\right) \quad& \textrm{if } 0\leqslant\theta<\frac{1}{2}\pi,\\ \cos(\rho\gamma)\cos((\theta-\pi+\sigma)\gamma) & \textrm{if } \frac{1}{2}\pi\leqslant\theta<\pi,\\ \cos(\sigma\gamma)\cos((\theta-\pi-\rho)\gamma) & \textrm{if } \pi\leqslant\theta<\frac{3}{2}\pi,\\ \cos\left(\left(\frac{\pi}{2}-\rho\right)\gamma\right)\cos\left(\left(\theta-\frac{3\pi}{2}-\sigma\right)\gamma\right) & \textrm{if } \frac{3}{2}\pi\leqslant\theta<2\pi,\\ \end{cases}$$with γ = 0.1, $$\rho =\frac{\pi }{4}$$, σ = −14.92256510455152, $$x-x_{i}=r_{i}\cos (\theta _{i})$$ and $$y-y_{i}=r_{i}\sin (\theta _{i})$$. It is easy to check that u satisfies   $$\partial_{t} u(x,t) +\mathcal{L} u(x,t) = \sum_{i=1}^{4} r_{i}^{\gamma} \mu(\theta_{i})\ \partial_{t} s_{i}(t).$$ Based on the ideas presented in Remark 6.1 we compare TAFEM and ASTFEM with the same tolerance $${{\texttt{TOL}}}=0.007$$. ASTFEM makes excessive use of the nonstandard exit, i.e., the time-step sizes equal minimal time-step size τ* = 0.0123477 for 231 of a total of 302 time steps, and uses a total of 893 771 DoFs. The L2−H1 error is 0.0546689, the L2−L2 error is 0.0355061 and the total computation time was 628.398 seconds. TAFEM uses a total of 508 352 DoFs in 228 time steps. The $$L^{2}\left (0,4,H^{1}(\varOmega )\right )$$ error is 0.0563979, the $$L^{2}\left (0,4,L^{2}(\varOmega )\right )$$ error is 0.0367746 and the total computation time was 646.126 seconds (including TOLFIND). By looking at Fig. 3 we see that TAFEM makes more use of the spatial and temporal adaptivity and achieves a similar result with slightly less effort. The adaptive meshes generated by TAFEM are displayed in Fig. 4. We see that the spatial adaptivity captures the position of the singularity by local refinement and coarsens the region when the singularity has passed by. Fig. 3. View largeDownload slide DoFs and time-step sizes for the jumping singularity problem. Fig. 3. View largeDownload slide DoFs and time-step sizes for the jumping singularity problem. Fig. 4. View largeDownload slide Adaptive grids for the jumping singularity problem. Fig. 4. View largeDownload slide Adaptive grids for the jumping singularity problem. Fig. 5. View largeDownload slide The local time indicator $$\mathcal{E}_{\tau}^{2}$$ for TAFEM and ASTFEM, as well as the local $$L^{\infty }$$ indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ used by ASTFEM for the rough initial data problem. Fig. 5. View largeDownload slide The local time indicator $$\mathcal{E}_{\tau}^{2}$$ for TAFEM and ASTFEM, as well as the local $$L^{\infty }$$ indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ used by ASTFEM for the rough initial data problem. Fig. 6. View largeDownload slide DoFs and time-step sizes for the rough initial data problem. Fig. 6. View largeDownload slide DoFs and time-step sizes for the rough initial data problem. Fig. 7. View largeDownload slide Adapted meshes generated with ASTFEM (a−d) and TAFEM (e−h) for the rough initial data problem. Fig. 7. View largeDownload slide Adapted meshes generated with ASTFEM (a−d) and TAFEM (e−h) for the rough initial data problem. The advantages of TAFEM come fully into their own in the presence of singularities in time (see Section 6.2.1). For regular (in time) problems, TAFEM is expected to perform similarly to ASTFEM up to the disadvantage that, at the beginning, the module TOLFIND needs several adaptive iterations over the time span, whereas the computation for the minimal time-step size in ASTFEM iterates only once over the time. This is reflected in the comparable computing times for the jumping singularity problem. 6.2.3 Rough initial data We conclude with an example inspired by Kreuzer et al. (2012, numerical experiment 5.3.2) the numerical experiment 5.3.2 with homogeneous Dirichlet boundary conditions and homogeneous right-hand side f ≡ 0. As initial data we choose a checkerboard pattern over Ω = (0, 1)2 where u0 ≡−1 on $$\varOmega _{1}=\left (\frac{1}{3},\frac{2}{3}\right )\times \left ( \left (0,\frac{1}{3}\right )\cup \left (\frac{2}{3},1\right ) \right )\ \cup \ \left ( \left (0,\frac{1}{3}\right )\cup \left (\frac{2}{3},1\right ) \right ) \times \left (\frac{1}{3},\frac{2}{3}\right )$$, u0 ≡ 1 on Ω ∖ Ω1 and u0 ≡ 0 on ∂Ω. Starting with an initial mesh with only 5 DoFs, the approximation of u0 uses Lagrange interpolation and refines the mesh until $$\|U_{0}-u_{0}\|_{\varOmega }^{2}\leqslant {\texttt{TOL}}_{0}^{2} = 10^{-2}$$ is fulfilled. For this example we choose $$\min \left \{\kappa _{2}\tau ,T-t\right \}=\texttt{ENLARGE}\left (f,t,\tau ,{{\texttt{tol}}_{f}}\right )$$. In fact, since f ≡ 0, using the threshold-like version of ENLARGE from Remark 4.3 would yield $$T-t=\texttt{ENLARGE}\left (f,t,\tau ,{{\texttt{tol}}_{f}}\right )$$ and then in ST_ADAPTATION the time step must be reduced according to $${\mathcal{E}_{\tau, }}$$ again requiring the solution of an expensive linear system in each iteration. This slows down TAFEM by a factor of about 10. Starting ASTFEM and TAFEM with a tolerance of $${{\texttt{TOL}}}=10^{-1}$$ and running to the final time T = 1 we get the following results: ASTFEM needs 811 time steps, a total of 436 199 377 DoFs, with an estimated total error of 0.0230905 and a total computation time of 144 114 seconds. ASTFEM makes use of the nonstandard exit for the first 270 time steps, with minimal time-step size of τ* = 7.77573e-7; the small size of the time steps at the beginning is also accompanied by extreme spatial refinements contributing to the large total number of DoFs. This is due to the $$L^{\infty }$$-strategy that aims to equally distribute the time-indicators $$\frac 1\tau \mathcal{E}_{\tau}^{2}$$ rather than $$\mathcal{E}_{\tau}^{2}$$. In order to highlight this effect close to the initial time we have used a log scale for the time in Figs. 5 and 6. TAFEM needs only 81 time steps and a total of 672 159 DoFs, resulting in an estimated total error of 0.0574711. It is about 725 times faster with a total computation time of 198.682 seconds (including TOLFIND). TAFEM refines the mesh initially and then almost steadily coarsens in time and space (see Figs. 6 and 7 (e−h)). Figure 5 shows that the time indicators $$\mathcal{E}_{\tau}^{2}$$ are nearly equally distributed. Both algorithms reduce the spatial resolution once the singular behaviour of the solution is reduced; see Figs. 6 and 7. Funding Deutsche Forschungsgemeinschaft (DFG) project SI 814/7-1. References Bänsch, E. ( 1991) Local mesh refinement in 2 and 3 dimensions. IMPACT Comput. Sci. Eng. , 3, 181– 191. Google Scholar CrossRef Search ADS   Binev, P., Dahmen, W. & DeVore, R. A. ( 2004) Adaptive finite element methods with convergence rates. Numer. Math. , 97, 219– 268. Google Scholar CrossRef Search ADS   Binev, P., Dahmen, W., DeVore, R. A. & Petrushev, P. ( 2002) Approximation classes for adaptive methods. Serdica Math. J. , 28, 391– 416. Blatt, M., Burchardy, A., Dedner, A., Engwer, C., Fahlle, J., Flemisch, B., Gersbacher, C., Gräser, C., Gruber, F., Grüninger, C., Kempf, D., Klöfkorn, R., Malkmus, T., Müthing, S., Nolte, M., Piatkowski, M. & Sander, O. ( 2016) The distributed and unified numerics environment (DUNE) Arch. Numerical Software , 100, 13– 29. Cascon, J. M., Kreuzer, C., Nochetto, R. H. & Siebert, K. G. ( 2008) Quasi-optimal convergence rate for an adaptive finite element method. SIAM J. Numer. Anal. , 46, 2524– 2550. Google Scholar CrossRef Search ADS   Chen, Z. & Feng, J. ( 2004) An adaptive finite element algorithm with reliable and efficient error control for linear parabolic problems. Math. Comp. , 73, 1167– 1193. Google Scholar CrossRef Search ADS   Clément, P. ( 1975) Approximation by finite element functions using local regularization. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. RAIRO Analyse Numérique , 9, 77– 84. Davis, P. J. & Rabinowitz, P. ( 1984) Methods of Numerical Integration, 2nd edn. Computer Science and Applied Mathematics . Orlando, FL: Academic Press. Diening, L. & Kreuzer, C. ( 2008) Convergence of an adaptive finite element method for the p-Laplacian equation. SIAM J. Numer. Anal. , 46, 614– 638. Google Scholar CrossRef Search ADS   Diening, L., Kreuzer, C. & Stevenson, R. ( 2016) Instance optimality of the adaptive maximum strategy. Found. Comut. Math. , 16, 33– 68. Google Scholar CrossRef Search ADS   Dörfler, W. ( 1996) A convergent adaptive algorithm for Poisson’s equation. SIAM J. Numer. Anal. , 33, 1106– 1124. Google Scholar CrossRef Search ADS   Eriksson, K. & Johnson, C. ( 1991) Adaptive finite element methods for parabolic problems I: a linear model problem. SIAM J. Numer. Anal. , 28, 43– 77. Google Scholar CrossRef Search ADS   Eriksson, K. & Johnson, C. ( 1995) Adaptive finite element methods for parabolic problems II: optimal error estimates in $$L_\infty L_2$$ and $$L_\infty L_\infty$$. SIAM J. Numer. Anal. , 32, 706– 740. Ern, A., Smears, I. & Vohralík, M. ( 2017) Guaranteed, locally space-time efficient, and polynomial-degree robust a posteriori error estimates for high-order discretizations of parabolic problems. SIAM J. Numer. Anal. , 55, 2811– 2834. Google Scholar CrossRef Search ADS   Evans, L. C. ( 2010) Partial Differential Equations, 2nd edn. Graduate Studies in Mathematics , vol. 19. Providence, RI: American Mathematical Society. Gaspoz, F. D. & Morin, P. ( 2014) Approximation classes for adaptive higher order finite element approximation. Math. Comp. , 83, 2127– 2160. Google Scholar CrossRef Search ADS   Gilbarg, D. & Trudinger, N. S. ( 2001) Elliptic Partial Differential Equations of Second Order . Classics in Mathematics. Berlin: Springer. Kossaczký, I. ( 1994) A recursive approach to local mesh refinement in two and three dimensions. J. Comput. Appl. Math. , 55, 275– 288. Google Scholar CrossRef Search ADS   Kreuzer, C., Möller, C. A., Schmidt, A. & Siebert, K. G. ( 2012) Design and convergence analysis for an adaptive discretization of the heat equation. IMA J. Numer. Anal. , 32, 1375– 1403. Google Scholar CrossRef Search ADS   Kreuzer, C. & Siebert, K. G. ( 2011) Decay rates of adaptive finite elements with Dörfler marking. Numer. Math. , 117, 679– 716. Google Scholar CrossRef Search ADS   Lakkis, O. & Makridakis, C. ( 2006) Elliptic reconstruction and a posteriori error estimates for fully discrete linear parabolic problems. Math. Comp. , 75, 1627– 1658. Google Scholar CrossRef Search ADS   Makridakis, C. & Nochetto, R. H. ( 2003) Elliptic reconstruction and a posteriori error estimates for parabolic problems. SIAM J. Numer. Anal. , 41, 1585– 1594. Google Scholar CrossRef Search ADS   Makridakis, C. & Nochetto, R. H. ( 2006) A posteriori error analysis for higher order dissipative methods for evolution problems. Numer. Math. , 104, 489– 514. Google Scholar CrossRef Search ADS   Maubach, J. M. ( 1995) Local bisection refinement for n-simplicial grids generated by reflection. SIAM J. Sci. Comput. , 16, 210– 227. Google Scholar CrossRef Search ADS   Mekchay, K. & Nochetto, R. H. ( 2005) Convergence of adaptive finite element methods for general second order linear elliptic PDEs. SIAM J. Numer. Anal. , 43, 1803– 1827. Google Scholar CrossRef Search ADS   Morin, P., Nochetto, R. H. & Siebert, K. G. ( 2000) Data oscillation and convergence of adaptive FEM. SIAM J. Numer. Anal. , 38, 466– 488. Google Scholar CrossRef Search ADS   Morin, P., Nochetto, R. H. & Siebert, K. G. ( 2002) Convergence of adaptive finite element methods. SIAM Rev. , 44, 631– 658. Google Scholar CrossRef Search ADS   Morin, P., Siebert, K. G. & Veeser, A. ( 2008) A basic convergence result for conforming adaptive finite elements. Math. Models Methods Appl. Sci. , 18, 707– 737. Google Scholar CrossRef Search ADS   Nochetto, R. H., Siebert, K. G. & Veeser, A. ( 2009) Theory of adaptive finite element methods: an introduction. Multiscale, Nonlinear and Adaptive Approximation . Berlin: Springer, pp. 409– 542. Picasso, M. ( 1998) Adaptive finite elements for a linear parabolic problem. Comput. Methods Appl. Mech. Eng. , 167, 223– 237. Google Scholar CrossRef Search ADS   Schmidt, A. & Siebert, K. G. ( 2005) Design of Adaptive Finite Element Software: The Finite Element Toolbox ALBERTA . Lecture Notes in Computational Science and Engineering, vol. 42. Berlin: Springer. Schwab, C. & Stevenson, R. ( 2009) Space-time adaptive wavelet methods for parabolic evolution problems. Math. Comp. , 78, 1293– 1318. Google Scholar CrossRef Search ADS   Siebert, K. G. ( 2011) A convergence proof for adaptive finite elements without lower bound. IMA J. Numer. Anal. , 31, 947– 970. Google Scholar CrossRef Search ADS   Smears, I. ( 2017) Robust and efficient preconditioners for the discontinuous Galerkin time-stepping method. IMA J. Numer. Anal. , 37, 1961– 1985. Stevenson, R. ( 2007) Optimality of a standard adaptive finite element method. Found. Comput. Math. , 7, 245– 269. Google Scholar CrossRef Search ADS   Tantardini, F. & Veeser, A. ( 2016) The L2−projection and quasi-optimality of Galerkin methods for parabolic equations. SIAM J. Numer. Anal. , 54, 317– 340. Google Scholar CrossRef Search ADS   Thomée, V. ( 2006) Galerkin Finite Element Methods for Parabolic Problems , 2nd edn. Springer Series in Computational Mathematics, vol. 25. Berlin: Springer. Traxler, C. T. ( 1997) An algorithm for adaptive mesh refinement in n dimensions. Computing , 59, 115– 137. Google Scholar CrossRef Search ADS   Verfürth, R. ( 2003) A posteriori error estimates for finite element discretizations of the heat equation. Calcolo , 40, 195– 212. Google Scholar CrossRef Search ADS   Verfürth, R. ( 2013) A Posteriori Error Estimation Techniques for Finite Element Methods . Numerical Mathematics and Scientific Computation . Oxford: Oxford University Press. Google Scholar CrossRef Search ADS   © The Author(s) 2018. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) For permissions, please e-mail: journals. permissions@oup.com

### Journal

IMA Journal of Numerical AnalysisOxford University Press

Published: Apr 4, 2018

## You’re reading a free preview. Subscribe to read the entire article.

### DeepDyve is your personal research library

It’s your single place to instantly
that matters to you.

over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month ### Explore the DeepDyve Library ### Search Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly ### Organize Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place. ### Access Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals. ### Your journals are on DeepDyve Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more. All the latest content is available, no embargo periods. DeepDyve ### Freelancer DeepDyve ### Pro Price FREE$49/month
\$360/year

Save searches from
PubMed

Create lists to

Export lists, citations