Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Nonparametric C- and D-vine-based quantile regression

Nonparametric C- and D-vine-based quantile regression 1IntroductionAs a robust alternative to the ordinary least squares regression, which estimates the conditional mean, quantile regression [38] focuses on the conditional quantiles. This method has been studied extensively in statistics, economics, and finance. The pioneer literature by Koenker [35] investigated linear quantile regression systematically. It presented properties of the estimators including asymptotic normality and consistency, under various assumptions such as independence of the observations, independent and identically distributed (i.i.d.) errors with continuous distribution, and predictors having bounded second moment. Subsequent extensions of linear quantile regression have been intensively studied, see for example adapting quantile regression in the Bayesian framework [66], for longitudinal data [34], time-series models [63], high-dimensional models with l1{l}_{1}-regularizer [7], nonparametric estimation by kernel weighted local linear fitting [65], by additive models [37,20], etc. The theoretical analysis of the above-mentioned extensions is based on imposing additional assumptions such as samples that are i.i.d. (see for example Belloni and Chernozhukov [7], Yu and Jones [65]), or that are generated by a known additive function (see for example Koenker [37,34]). Such assumptions, which guarantee the performance of the proposed methods for certain data structures, cause concerns in applications due to the uncertainty of the real-world data structures. Bernard and Czado [8] addressed other potential concerns such as quantile crossings and model-misspecification, when the dependence structure of the response variables and the predictors does not follow a Gaussian copula. Flexible models without assuming homoscedasticity, or a linear relationship between the response and the predictors are of interest. Recent research on dealing with this issue includes quantile forests [2,42,44] inspired by the earlier work of random forests [9] and modeling conditional quantiles using copulas (see also Chen et al. [13], Noh et al. [50,51]).Vine copulas in the context of conditional quantile prediction have been investigated by Kraus and Czado [41] using drawable vine copulas (D-vines) and by Chang and Joe [11] and, most recently, Zhu et al. [67] using restricted regular vines (R-vines). The approach of Kraus and Czado [11] is based on first finding the locally optimal regular vine structure among all predictors and then adding the response to each selected tree in the vine structure as a leaf, as also followed by Bauer and Czado [5] in the context of non-Gaussian conditional independence testing. The procedure in Chang and Joe [11] allows for a recursive determination of the response quantiles, which is restricted through the prespecified dependence structure among predictors. The latter might not be the one maximizing the conditional response likelihood, which is the main focus in regression setup. The approach of Kraus and Czado [41] is based on optimizing the conditional log-likelihood and selecting predictors sequentially until no improvement of the conditional log-likelihood is achieved. This approach based on the conditional response likelihood is more appropriate to determine the associated response quantiles. Furthermore, the intensive simulation study in Kraus and Czado [41] showed the superior performance of the D-vine copulas-based quantile regression compared to various quantile regression methods, i.e. linear quantile regression [38], boosting additive quantile regression [35,37,20], nonparametric quantile regression [43], and semiparametric quantile regression [51]. In parallel to our work, Zhu et al. [67] proposed an extension of this D-vine-based forward regression to a restricted R-vine forward regression with comparable performance to the D-vine regression. Thus, the D-vine quantile regression will be our benchmark model.We extend the method of Kraus and Czado [41] in two ways: (1) our approach is applicable to both C-vine and D-vine copulas; (2) a two-step ahead construction is introduced, instead of the one-step ahead construction. Since the two-step ahead construction is the main difference between our method and Kraus and Czado [41], we further explain the second point in more detail. Our proposed method proceeds by adding predictors to the model sequentially. However, in contrast to Kraus and Czado [41] with only one variable ahead, our new approach proposes to look up two variables ahead for selecting the variable to be added in each step. The general idea of this two-step ahead algorithm is as follows: in each step of the algorithm, we study combinations of two variables to find the variable, which in combination with the other improves the conditional log-likelihood the most. Thus, in combination with a forward selection method, this two-step ahead algorithm allows us to construct nonparametric quantile estimators that improve the conditional log-likelihood in each step and, most importantly, take possible future improvements into account. Our method is applicable to both low and high-dimensional data. By construction, quantile crossings are avoided. All marginal densities and copulas are estimated nonparametrically, allowing more flexibility than parametric specifications. Kraus and Czado [41] addressed the necessity and possible benefit of the nonparametric estimation of bivariate copulas in the quantile regression framework. This construction permits a large variety of dependence structures, resulting in a well-performing conditional quantile estimator. Moreover, extending to the C-vine copula class, in addition to the D-vine copulas, provides greater flexibility.The article is organized as follows. Section 2 introduces the general setup, the concept of C-vine and D-vine copulas, and the nonparametric approach for estimating copula densities. Section 3 describes the vine-based approach for quantile regression. The new two-step ahead forward selection algorithms are described in Section 4. We investigate in Proposition 4.1 the consistency of the conditional quantile estimator for given variable orders. The finite sample performance of the vine-based conditional quantile estimator is evaluated in Section 5 by several quantile-related measurements in various simulation settings. We apply the newly introduced algorithms to low- and high-dimensional real data in Section 6. In Section 7, we conclude and discuss possible directions of future research.2Theoretical backgroundConsider the random vector X=(X1,…,Xd)T{\boldsymbol{X}}={\left({X}_{1},\ldots ,{X}_{d})}^{T}with observed values x=(x1,…,xd)T{\boldsymbol{x}}={\left({x}_{1},\ldots ,{x}_{d})}^{T}, joint distribution and density function FFand ff, marginal distribution and density functions FXj{F}_{{X}_{j}}and fXj{f}_{{X}_{j}}for Xj,j=1,…,d{X}_{j},j=1,\ldots ,d. Sklar’s theorem [57] allows us to represent any multivariate distribution in terms of its marginals FXj{F}_{{X}_{j}}and a copula CCencoding the dependence structure. In the continuous case, CCis unique and satisfies F(x)=C(FX1(x1),…,FXd(xd))F\left({\boldsymbol{x}})=C\left({F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{d}}\left({x}_{d})\hspace{-0.08em})and f(x)=c(FX1(x1),…,FXd(xd))[∏j=1dfXj(xj)]f\left({\boldsymbol{x}})=c\left({F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{d}}\left({x}_{d}))\left[\mathop{\prod }\limits_{j=1}^{d}{f}_{{X}_{j}}\left({x}_{j})], where ccis the density function of the copula CC. To characterize the dependence structure of X{\boldsymbol{X}}, we transform each Xj{X}_{j}to a uniform variable Uj{U}_{j}by applying the probability integral transform, i.e. Uj≔FXj(Xj),j=1,…,d{U}_{j}:= {F}_{{X}_{j}}\left({X}_{j}),\hspace{0.33em}j=1,\ldots ,d. Then the random vector U=(U1,…,Ud)T{\boldsymbol{U}}={\left({U}_{1},\ldots ,{U}_{d})}^{T}with observed values (u1,…,ud)T{\left({u}_{1},\ldots ,{u}_{d})}^{T}has a copula as a joint distribution denoted as CU1,…,Ud{C}_{{U}_{1},\ldots ,{U}_{d}}with associated copula density function cU1,…,Ud{c}_{{U}_{1},\ldots ,{U}_{d}}. While the catalogue of bivariate parametric copula families is large, this is not true for d>2d\gt 2. Therefore, conditioning was applied to construct multivariate copulas using only bivariate copulas as building blocks. Joe [28] gave the first pair copula construction for dddimensions in terms of distribution functions, whereas Bedford and Cooke [6] independently developed constructions in terms of densities together with a graphical building plan, called a regular vine tree structure. It consists of a set of linked trees T1,…,Td{T}_{1},\ldots ,{T}_{d}(edges in tree Tj{T}_{j}become nodes in tree Tj+1{T}_{j+1}) satisfying a proximity condition, which allows us to identify all possible constructions. Each edge of the trees is associated with a pair copula CUi,Uj;UD{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}, where DDis a subset of indices not containing i,ji,j. In this case, the set {i,j}\left\{i,j\right\}is called the conditioned set, while DDis the conditioning set. A joint density using the class of vine copulas is then the product of all pair copulas identified by the tree structure evaluated at appropriate conditional distribution functions FXj∣XD{F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}and the product of the marginal densities fXj,j=1,…,d{f}_{{X}_{j}},j=1,\ldots ,d. A detailed treatment of vine copulas together with estimation methods and model choice approaches are given, for example in the studies of Joe [30] and Czado [17].Since we are interested in simple copula-based estimation methods for conditional quantiles, we restrict to two subclasses of the regular vine tree structure, namely, the D- and C-vine structure. We show later that these structures allow us to express conditional distribution and quantiles in closed form. In the D-vine tree structure all trees are paths, i.e. all nodes have degree ≤2\le 2. Nodes with degree 1 are called leaf nodes. A C-vine structure occurs, when all trees are stars with a root node in the centre. The right and left panels of Figure 1 illustrate a D-vine and a C-vine tree sequence in four dimensions, respectively.Figure 1C-vine tree sequence (left panel) and a D-vine tree sequence (right panel) in four dimensions.For these sub classes we can easily give the corresponding vine density [17, Chapter 4]. For a D-vine density we have a permutation s1,…,sd{s}_{1},\ldots ,{s}_{d}of 1,…,d1,\ldots ,dsuch that (1)f(x1,…,xd)=∏j=1d−1∏i=1d−jcUsi,Usi+j;Usi+1,…,Usi+j−1(FXsi∣Xsi+1,…,Xsi+j−1(xsi∣xsi+1,…,xsi+j−1),FXsi+j∣Xsi+1,…,Xsi+j−1(xsi+j∣xsi+1,…,xsi+j−1))⋅∏k=1dfXsk(xsk),\begin{array}{rcl}f\left({x}_{1},\ldots ,{x}_{d})& =& \mathop{\displaystyle \prod }\limits_{j=1}^{d-1}\mathop{\displaystyle \prod }\limits_{i=1}^{d-j}{c}_{{U}_{{s}_{i}},{U}_{{s}_{i+j}};{U}_{{s}_{i+1}},\ldots ,{U}_{{s}_{i+j-1}}}({F}_{{X}_{{s}_{i}}| {X}_{{s}_{i+1}},\ldots ,{X}_{{s}_{i+j-1}}}\left({x}_{{s}_{i}}| {x}_{{s}_{i+1}},\ldots ,{x}_{{s}_{i+j-1}}),\\ & & {F}_{{X}_{{s}_{i+j}}| {X}_{{s}_{i+1}},\ldots ,{X}_{{s}_{i+j-1}}}\left({x}_{{s}_{i+j}}| {x}_{{s}_{i+1}},\ldots ,{x}_{{s}_{i+j-1}}))\cdot \mathop{\displaystyle \prod }\limits_{k=1}^{d}{f}_{{X}_{{s}_{k}}}\left({x}_{{s}_{k}}),\end{array}while for a C-vine density the following representation holds (2)f(x1,…,xd)=∏j=1d−1∏i=1d−jcUsj,Usj+i;Us1,…,Usj−1(FXsj∣Xs1,…,Xsj−1(xsj∣xs1,…,xsj−1),FXsj+i∣Xs1,…,Xsj−1(xsj+i∣xs1,…,xsj−1))⋅∏k=1dfXsk(xsk).\begin{array}{rcl}f\left({x}_{1},\ldots ,{x}_{d})& =& \mathop{\displaystyle \prod }\limits_{j=1}^{d-1}\mathop{\displaystyle \prod }\limits_{i=1}^{d-j}{c}_{{U}_{{s}_{j}},{U}_{{s}_{j+i}};{U}_{{s}_{1}},\ldots ,{U}_{{s}_{j-1}}}({F}_{{X}_{{s}_{j}}| {X}_{{s}_{1}},\ldots ,{X}_{{s}_{j-1}}}\left({x}_{{s}_{j}}| {x}_{{s}_{1}},\ldots ,{x}_{{s}_{j-1}}),\\ & & {F}_{{X}_{{s}_{j+i}}| {X}_{{s}_{1}},\ldots ,{X}_{{s}_{j-1}}}\left({x}_{{s}_{j+i}}| {x}_{{s}_{1}},\ldots ,{x}_{{s}_{j-1}}))\cdot \mathop{\displaystyle \prod }\limits_{k=1}^{d}{f}_{{X}_{{s}_{k}}}\left({x}_{{s}_{k}}).\end{array}To determine the needed conditional distribution FXj∣XD{F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}in Eqs. (1) and (2) for appropriate choices of jjand DD, the recursion discussed in Joe [28] is available. Using uj=FXj∣XD(xj∣xD){u}_{j}={F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}\left({x}_{j}| {{\boldsymbol{x}}}_{D})for j=1,…,dj=1,\ldots ,dwe can express them as compositions of h-functions. These are defined in general as hUi∣Uj;UD(ui∣uj;uD)=∂∂ujCUi,Uj;UD(ui,uj;uD){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})=\frac{\partial }{\partial {u}_{j}}{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j};{{\boldsymbol{u}}}_{D}). Additionally, we made in Eqs. (1) and (2) the simplifying assumption ([17], Section 5.4) that is, the copula function CUi,Uj;UD{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}does not depend on the specific conditioning value of uD{{\boldsymbol{u}}}_{D}, i.e. CUi,Uj;UD(ui,uj;uD)=CUi,Uj;UD(ui,uj){C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j};{{\boldsymbol{u}}}_{D})={C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j}). The dependence on uD{{\boldsymbol{u}}}_{D}in Eqs. (1) and (2) is completely captured by the arguments of the pair copulas. This assumption is often made for tractability reasons in higher dimensions (Haff et al. [27] and Stoeber et al. [58]). It implies further that the h-function satisfies hUi∣Uj;UD(ui∣uj;uD)=∂∂ujCUi,Uj;UD(ui,uj)=CUi∣Uj;UD(ui∣uj){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})=\frac{\partial }{\partial {u}_{j}}{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j})={C}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j})and is independent of uD{{\boldsymbol{u}}}_{D}.2.1Nonparametric estimators of the copula densities and h-functionsThere are many methods to estimate a bivariate copula density cUi,Uj{c}_{{U}_{i},{U}_{j}}nonparametrically. Examples are the transformation estimator [12], the transformation local likelihood estimator [22], the tapered transformation estimator [61], the beta kernel estimator [12], and the mirror-reflection estimator [24]. Among the above-mentioned kernel estimators, the transformation local likelihood estimator [22] was found by Nagler et al. [47] to have an overall best performance. The estimator is implemented in the R packages kdecopula [45] and rvinecopulib [49] using Gaussian kernels. We review its construction in Appendix A. To satisfy the copula definition, it is scaled to have uniform margins.As mentioned above the simplifying assumption implies that hUi∣Uj;UD(ui∣uj;uD){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})is independent of specific values of uD{{\boldsymbol{u}}}_{D}. Thus, it is sufficient to show how the h-function hUi∣Uj=CUi∣Uj(ui∣uj){h}_{{U}_{i}| {U}_{j}}={C}_{{U}_{i}| {U}_{j}}\left({u}_{i}| {u}_{j})can be estimated nonparametrically. For this, we use as estimator CˆUi∣Uj(ui∣uj)=∫0uicˆUi,Uj(u˜i,uj)du˜i,{\hat{C}}_{{U}_{i}| {U}_{j}}\left({u}_{i}| {u}_{j})=\underset{0}{\overset{{u}_{i}}{\int }}{\hat{c}}_{{U}_{i},{U}_{j}}\left({\tilde{u}}_{i},{u}_{j}){\rm{d}}{\tilde{u}}_{i},where cˆUi,Uj{\hat{c}}_{{U}_{i},{U}_{j}}is one of the aforementioned nonparametric estimators of the bivariate copula density of (Ui,Uj)\left({U}_{i},{U}_{j})for which it holds that cˆUi,Uj{\hat{c}}_{{U}_{i},{U}_{j}}integrates to 1 and has uniform margins.3Vine-based quantile regressionIn the general regression framework, the predictive ability of a set of variables X=(X1,…,Xp)T{\boldsymbol{X}}={\left({X}_{1},\ldots ,{X}_{p})}^{T}for the response Y∈RY\in {\mathbb{R}}is studied. The main interest of vine-based quantile regression is to predict the α∈(0,1)\alpha \in \left(0,1)quantile qα(x1,…,xp)=FY∣X1,…,Xp−1(α∣x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})={F}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})of the response variable YYgiven X{\boldsymbol{X}}by using a copula-based model of (Y,X)T{\left(Y,{\boldsymbol{X}})}^{T}. As shown in Kraus and Czado [41], this can be expressed as (3)FY∣X1,…,Xp−1(α∣x1,…,xp)=FY−1(CV∣U1,…,Up−1(α∣FX1(x1),…,FXp(xp))),{F}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})={F}_{Y}^{-1}\left({C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}\left(\alpha | {F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{p}}\left({x}_{p}))),where CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}is the conditional distribution function of V=FY(Y)V={F}_{Y}\left(Y)given Uj=FXj(Xj)=uj{U}_{j}={F}_{{X}_{j}}\left({X}_{j})={u}_{j}for j=1,…,pj=1,\ldots ,p, with corresponding density cV∣U1,…,Up{c}_{V| {U}_{1},\ldots ,{U}_{p}}, and CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}denotes the (p+1)\left(p+1)-dimensional copula associated with the joint distribution of (Y,X)T{\left(Y,{\boldsymbol{X}})}^{T}. In view of Section 1, we have d=p+1d=p+1. An estimate of qα(x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})can be obtained using estimated marginal quantile functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1,j=1,…,p{\hat{F}}_{{X}_{j}}^{-1},\hspace{0.33em}j=1,\ldots ,p, and the estimated conditional distribution function CˆV∣U1,…,Up−1{\hat{C}}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}giving qˆα(x1,…,xp)=FˆY−1(CˆV∣U1,…,Up−1(α∣FˆX1(x1),…FˆXp(xp))){\hat{q}}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})={\hat{F}}_{Y}^{-1}\left({\hat{C}}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}\left(\alpha | {\hat{F}}_{{X}_{1}}\left({x}_{1}),\ldots {\hat{F}}_{{X}_{p}}\left({x}_{p}))).In general, CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}can be any (p+1)\left(p+1)-dimensional multivariate copula, however, for certain vine structures the corresponding conditional distribution function CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}can be obtained in closed form not requiring numerical integration. For D-vine structures this is possible and has been already used in the studies of Kraus and Czado [41]. Tepegjozova [59] showed that this is also the case for certain C-vine structures. More precisely, the copula CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}with D-vine structure allows us to express CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}in a closed form if and only if the response VVis a leaf node in the first tree of the tree sequence. For a C-vine structure we need, that the node containing the response variable VVin the conditioned set is not a root node in any tree. Additional flexibility in using such D- and C-vine structures is achieved by allowing for nonparametric pair-copulas as building blocks.The order of the predictors within the tree sequences itself is a free parameter with direct impact on the target function CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}and thus, on the corresponding prediction performance of qα(x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p}). For this, we recall the concept of a node order for C- and D-vine copulas introduced in the study by Tepegjozova [59]. A D-vine copula denoted by CD{{\mathcal{C}}}_{D}has order OD(CD)=(V,Ui1,…,Uip),{{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{D})=\left(V,{U}_{{i}_{1}},\ldots ,{U}_{{i}_{p}}),if the response VVis the first node of the first tree T1{T}_{1}and Uik{U}_{{i}_{k}}is the (k+1k+1)th node of T1{T}_{1}, for k=1,…,pk=1,\ldots ,p. A C-vine copula CC{{\mathcal{C}}}_{C}has order OC(CC)=(V,Ui1,…,Uip),{{\mathcal{O}}}_{C}\left({{\mathcal{C}}}_{C})=\left(V,{U}_{{i}_{1}},\ldots ,{U}_{{i}_{p}}),if Ui1{U}_{{i}_{1}}is the root node in the first tree T1{T}_{1}, Ui2Ui1{U}_{{i}_{2}}{U}_{{i}_{1}}is the root node in the second tree T2{T}_{2}, and UikUik−1;Ui1,…,Uik−2{U}_{{i}_{k}}{U}_{{i}_{k-1}};{U}_{{i}_{1}},\ldots ,{U}_{{i}_{k-2}}is the root node in the kth tree Tk{T}_{k}for k=3,…,p−1k=3,\ldots ,p-1.Now our goal is to find an optimal order of D- or C-vine copula model with regard to a fit measure. This measure has to allow us to quantify the explanatory power of a model. One such measure is the estimated conditional copula log-likelihood function as a fit measure. For NNi.i.d. observations v≔(v(1),…,v(N))T{\boldsymbol{v}}:= {\left({v}^{\left(1)},\ldots ,{v}^{\left(N)})}^{T}and uj≔(uj(1),…,uj(N))T{{\boldsymbol{u}}}_{j}:= {\left({u}_{j}^{\left(1)},\ldots ,{u}_{j}^{\left(N)})}^{T}, for j=1,…,pj=1,\ldots ,pof the random vector (V,U1,…,Up)T{\left(V,{U}_{1},\ldots ,{U}_{p})}^{T}we fit a C- or D-vine copula with order O(Cˆ)=(V,U1,…,Up){\mathcal{O}}\left(\hat{{\mathcal{C}}})=\left(V,{U}_{1},\ldots ,{U}_{p})using nonparametric pair copulas. We denote this copula by Cˆ\hat{{\mathcal{C}}}, then the fitted conditional log-likelihood can be determined as cll(Cˆ,v,(u1,…,up))=∑n=1NlncˆV∣U1,…,Up(v(n)∣u1(n),…,up(n))=∑n=1NlncˆV,U1(v(n),u1(n))+∑j=2plncˆV,Uj∣U1,…,Uj−1(CˆV∣U1,…,Uj−1(v(n)∣u1(n),…,uj−1(n)),CˆUj∣U1,…,Uj−1(uj(n)∣u1(n),…,uj−1(n))).\begin{array}{rcl}cll\left(\hat{{\mathcal{C}}},{\boldsymbol{v}},\left({{\boldsymbol{u}}}_{1},\ldots ,{{\boldsymbol{u}}}_{p}))& =& \mathop{\displaystyle \sum }\limits_{n=1}^{N}\mathrm{ln}{\hat{c}}_{V| {U}_{1},\ldots ,{U}_{p}}\left({v}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{p}^{\left(n)})\\ & =& \mathop{\displaystyle \sum }\limits_{n=1}^{N}\left[\mathrm{ln}{\hat{c}}_{V,{U}_{1}}\left({v}^{\left(n)},{u}_{1}^{\left(n)})+\mathop{\displaystyle \sum }\limits_{j=2}^{p}\mathrm{ln}{\hat{c}}_{V,{U}_{j}| {U}_{1},\ldots ,{U}_{j-1}}\left({\hat{C}}_{V| {U}_{1},\ldots ,{U}_{j-1}}\left({v}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{j-1}^{\left(n)}),{\hat{C}}_{{U}_{j}| {U}_{1},\ldots ,{U}_{j-1}}\left({u}_{j}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{j-1}^{\left(n)}))\right].\end{array}Penalizations for model complexity when parametric pair copulas are used can be added as shown in the study by Tepegjozova [59]. To define an appropriate penalty in the case of using nonparametric pair copulas is an open research question (see also Section 7).4Forward selection algorithmsHaving a set of pppredictors, there are p!p\!different orders that uniquely determine p!p\!C-vines and p!p\!D-vines. Fitting and comparing all of them are computationally inefficient. Thus, the idea is to have an algorithm that will sequentially choose the elements of the order, so that at every step the resulting model for the prediction of the conditional quantiles has the highest conditional log-likelihood. Building upon the idea of Kraus and Czado [41] for the one-step ahead D-vine regression, we propose an algorithm which allows for more flexibility and which is less greedy, given the intention to obtain a globally optimal C- or D-vine fit. The algorithm builds the C- or D-vine step by step, starting with an order consisting of only the response variable VV. Each step adds one of the predictors to the order based on the improvement of the conditional log-likelihood, while taking into account the possibility of future improvement, i.e. extending our view two steps ahead in the order. As discussed in Section 2.1, the pair copulas at each step are estimated nonparametrically in contrast to the parametric approach of Kraus and Czado [41]. We present the implementation for both C-vine and D-vine-based quantile regression in a single algorithm, in which the user decides whether to fit a C-vine or D-vine model based on the background knowledge of dependency structures in the data. Implementation for a large data set is computationally challenging; therefore, randomization is introduced to guarantee computational efficiency in high dimensions.4.1Two-step ahead forward selection algorithm for C- and D-vine-based quantile regressionInput and data preprocessing: Consider NNi.i.d observations y≔(y(1),…,y(N)){\boldsymbol{y}}:= ({y}^{\left(1)},\ldots ,{y}^{\left(N)})and xj≔(xj(1),…,xj(N)){{\boldsymbol{x}}}_{j}:= \left({x}_{j}^{\left(1)},\ldots ,{x}_{j}^{\left(N)})for j=1,…,pj=1,\ldots ,p, from the random vector (Y,X1,…,Xp)T{\left(Y,{X}_{1},\ldots ,{X}_{p})}^{T}. The input data are on the x-scale, but in order to fit bivariate copulas we need to transform it to the u-scale using the probability integral transform. Since the marginal distributions are unknown we estimate them, i.e. FY{F}_{Y}and FXj{F}_{{X}_{j}}, for j=1,…,pj=1,\ldots ,p, are estimated using a univariate nonparametric kernel density estimator with the R package kde1d [48]. This results in the pseudo copula data vˆ(n)≔FˆY(y(n)){\hat{v}}^{\left(n)}:= {\hat{F}}_{Y}({y}^{\left(n)})\hspace{0.33em}and uˆj(n)≔FˆXj(xj(n)),{\hat{u}}_{j}^{\left(n)}:= {\hat{F}}_{{X}_{j}}\left({x}_{j}^{\left(n)}),for n=1,…,N,j=1,…,pn=1,\ldots ,N,\hspace{0.33em}j=1,\ldots ,p. The normalized marginals (zz-scale) are defined as Zj≔Φ−1(Uj){Z}_{j}:= {\Phi }^{-1}\left({U}_{j})for j=1,…,p,j=1,\ldots ,p,and ZV≔Φ−1(V){Z}_{V}:= {\Phi }^{-1}\left(V), where Φ\Phi denotes the standard normal distribution function.Step 1: To reduce computational complexity, we perform a pre-selection of the predictors based on Kendall’s τ\tau . This is motivated by the fact that Kendall’s τ\tau is rank-based, therefore, invariant with respect to monotone transformations of the marginals and can be expressed in terms of pair copulas. Using the pseudo copula data (vˆ,uˆj)={vˆ(n),uˆj(n)∣n=1,…,N}\left(\hat{{\boldsymbol{v}}},{\hat{{\boldsymbol{u}}}}_{j})=\left\{{\hat{v}}^{\left(n)},{\hat{u}}_{j}^{\left(n)}| n=1,\ldots ,N\right\}, estimates τˆVUj{\hat{\tau }}_{V{U}_{j}}of the Kendall’s τ\tau values between the response VV, and all possible predictors Uj{U}_{j}for j=1,…,pj=1,\ldots ,p, are obtained. For a given k≤pk\le p, the kklargest estimates of ∣τˆVUj∣| {\hat{\tau }}_{V{U}_{j}}| are selected and the corresponding indices q1,…,qk{q}_{1},\ldots ,{q}_{k}are identified such that ∣τˆVUq1∣≥∣τˆVUq2∣≥⋯≥∣τˆVUqk∣≥∣τˆVUqk+1∣≥⋯≥∣τˆVUqp∣| {\hat{\tau }}_{V{U}_{{q}_{1}}}| \ge | {\hat{\tau }}_{V{U}_{{q}_{2}}}| \hspace{0.33em}\ge \cdots \ge \hspace{0.33em}| {\hat{\tau }}_{V{U}_{{q}_{k}}}| \ge | {\hat{\tau }}_{V{U}_{{q}_{k+1}}}| \hspace{0.33em}\ge \cdots \ge \hspace{0.33em}| {\hat{\tau }}_{V{U}_{{q}_{p}}}| . The parameter kkis a hyper-parameter and therefore, subject to tuning. To obtain a parsimonious model, we suggest a kkcorresponding to 5–20% of the total number of predictors. The kkcandidate predictors and the corresponding candidate index set of step 1 are defined as Uq1,…,Uqk{U}_{{q}_{1}},\ldots ,{U}_{{q}_{k}}and K1={q1,…,qk}{K}_{1}=\{{q}_{1},\ldots ,{q}_{k}\}, respectively. For all c∈K1c\in {K}_{1}and j∈{1,…,p}⧹{c}j\in \{1,\ldots ,p\}\setminus \{c\}the candidate two-step ahead C- or D-vine copulas are defined as the three-dimensional copulas Cc,j1{{\mathcal{C}}}_{c,j}^{1}with order O(Cc,j1)=(V,Uc,Uj){\mathcal{O}}\left({{\mathcal{C}}}_{c,j}^{1})=\left(V,{U}_{c},{U}_{j}). The first predictor is added to the order based on the conditional log-likelihood of the candidate two-step ahead C- or D-vine copulas, Cc,j1{{\mathcal{C}}}_{c,j}^{1}, given as cll(Cc,j1,vˆ,(uˆc,uˆj))=∑n=1N[logcˆV,Uc(vˆ(n),uˆc(n))+logcˆV,Uj∣Uc(hˆV∣Uc(vˆ(n)∣uˆc(n)),hˆUj∣Uc(uˆj(n)∣uˆc(n)))].cll({{\mathcal{C}}}_{c,j}^{1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j}))=\mathop{\sum }\limits_{n=1}^{N}\left[\log {\hat{c}}_{V,{U}_{c}}\left({\hat{v}}^{\left(n)},{\hat{u}}_{c}^{\left(n)})+\log {\hat{c}}_{V,{U}_{j}| {U}_{c}}\left({\hat{h}}_{V| {U}_{c}}\left({\hat{v}}^{\left(n)}| {\hat{u}}_{c}^{\left(n)}),{\hat{h}}_{{U}_{j}| {U}_{c}}\left({\hat{u}}_{j}^{\left(n)}| {\hat{u}}_{c}^{\left(n)}))].For each candidate predictor Uc{U}_{c}, the maximal two-step ahead conditional log-likelihood at step 1, cllc1cl{l}_{c}^{1}, is defined as cllc1≔maxj∈{1,…,p}⧹{c}cll(Cc,j1,vˆ,(uˆc,uˆj)),∀c∈K1cl{l}_{c}^{1}:= {\max }_{j\in \left\{1,\ldots ,p\right\}\setminus \left\{c\right\}}cll({{\mathcal{C}}}_{c,j}^{1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j})),\hspace{0.33em}\forall c\in {K}_{1}. Finally, based on the maximal two-step ahead conditional log-likelihood at step 1, cllc1cl{l}_{c}^{1}, the index t1{t}_{1}is chosen as t1≔argmaxc∈K1cllc1{t}_{1}:= \arg {\max }_{c\in {K}_{1}}cl{l}_{c}^{1}, and the corresponding candidate predictor Ut1{U}_{{t}_{1}}is selected as the first predictor added to the order. An illustration of the vine tree structure of the candidate two-step ahead copulas Cc,j1{{\mathcal{C}}}_{c,j}^{1}, in the case of fitting a D-vine model, with order OD(Cc,j1)=(V,Uc,Uj){{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{c,j}^{1})=\left(V,{U}_{c},{U}_{j})is given in Figure 2. Finally, the current optimal fit after the first step is the C-vine or D-vine copula, C1{{\mathcal{C}}}_{1}with order O(C1)=(V,Ut1){\mathcal{O}}\left({{\mathcal{C}}}_{1})=\left(V,{U}_{{t}_{1}}).Figure 2VVis fixed as the first node of T1{T}_{1}and the first candidate predictor to be included in the model, Uc{U}_{c}(grey), is chosen based on the conditional log-likelihood of the two-step ahead copula including the predictor Uj{U}_{j}(grey filled).Step r{\boldsymbol{r}}: After r−1r-1steps, the current optimal fit is the C- or D-vine copula Cr−1{{\mathcal{C}}}_{r-1}with order O(Cr−1)=(V,Ut1,…,Utr−1){\mathcal{O}}\left({{\mathcal{C}}}_{r-1})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}). At each previous step ii, the order of the current optimal fit is sequentially updated with the predictor Uti{U}_{{t}_{i}}for i=1,…,r−1i=1,\ldots ,r-1. At the rth step, the next predictor candidate is to be included. To do so, the set of potential candidates is narrowed based on a partial correlation measure. Defining a partial Kendall’s τ\tau is not straightforward and requires the notion of a partial copula, which is the average over the conditional copula given the values of the conditioning values (e.g. see Gijbels and Matterne [23] and references therein). In addition, the computation in the case of multivariate conditioning is very demanding and still an open research problem. Therefore, we took a pragmatic view and base our candidate selection on partial correlation. Due to the assumption of Gaussian margins inherited to the Pearson’s partial correlation, the estimates are computed on the zz-scale. Estimates of the empirical Pearson’s partial correlation, ρˆZV,Zj;Zt1,…,Ztr−1{\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}}}, between the normalized response variable VVand available predictors Uj{U}_{j}for j∈{1,2,…,p}⧹{t1,…,tr−1}j\in \left\{1,2,\ldots ,p\right\}\setminus \left\{{t}_{1},\ldots ,{t}_{r-1}\right\}are obtained. Similar to the first step, a set of candidate predictors of size kkis selected based on the largest values of ∣ρˆZV,Zj;Zt1,…,Ztr−1∣| {\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}}}| and the corresponding indices q1,…,qk{q}_{1},\ldots ,{q}_{k}. The kkcandidate predictors and the corresponding candidate index set of step rrare defined as Uq1,…,Uqk{U}_{{q}_{1}},\ldots ,{U}_{{q}_{k}}and the set Kr={q1,…,qk}{K}_{r}=\{{q}_{1},\ldots ,{q}_{k}\}, respectively. For all c∈Krc\in {K}_{r}and j∈{1,2,…,p}⧹{t1,…,tr−1,c}j\in \{1,2,\ldots ,p\}\setminus \{{t}_{1},\ldots ,{t}_{r-1},c\}, the candidate two-step ahead C- or D-vine copulas are defined as the copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}with order O(Cc,jr)=(V,Ut1,…,Utr−1,Uc,Uj){\mathcal{O}}\left({{\mathcal{C}}}_{c,j}^{r})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c},{U}_{j}). There are k(p−r)k\left(p-r)different candidate two-step ahead C- or D-vine copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}(since we have kkcandidates for the one-step ahead extension Uc{U}_{c}, and for each, p−(r−1)−1p-\left(r-1)-1two step ahead extensions Uj{U}_{j}). Their corresponding conditional log-likelihood functions are given as cll(Cc,jr,vˆ,(uˆt1…uˆtr−1,uˆc,uˆj))=cll(Cr−1,vˆ,(uˆt1…uˆtr−1))+∑n=1NlogcˆVUc;Ut1,…,Utr−1(CˆV∣Ut1,…,Utr−1(vˆ(n)∣uˆt1(n),…,uˆtr−1(n)),CˆUc∣Ut1,…,Utr−1(uˆc(n)∣uˆt1(n),…,uˆtr−1(n)))+∑n=1NlogcˆVUj;Ut1,…,Utr−1,Uc(CˆV∣Ut1,…,Utr−1,Uc(vˆ(n)∣uˆt1(n),…,uˆtr−1(n),uˆc(n)),CˆUj∣Ut1,…,Utr−1,Uc(uˆj(n)∣uˆt1(n),…,uˆtr−1(n),uˆc(n))).\begin{array}{l}cll({{\mathcal{C}}}_{c,j}^{r},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}},{\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j}))\\ \hspace{1.0em}=cll({{\mathcal{C}}}_{r-1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}}))\\ \hspace{2.0em}+\mathop{\displaystyle \sum }\limits_{n=1}^{N}\log {\hat{c}}_{V{U}_{c};{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}({\hat{C}}_{V| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}\left({\hat{v}}^{(n)}| {\hat{u}}_{{t}_{1}}^{(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)}),{\hat{C}}_{{U}_{c}| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}\left({\hat{u}}_{c}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)}))\\ \hspace{2.0em}+\mathop{\displaystyle \sum }\limits_{n=1}^{N}\log {\hat{c}}_{V{U}_{j};{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}({\hat{C}}_{V| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}\left({\hat{v}}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)},{\hat{u}}_{c}^{\left(n)}),\\ \hspace{3.025em}{\hat{C}}_{{U}_{j}| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}\left({\hat{u}}_{j}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)},{\hat{u}}_{c}^{\left(n)})).\end{array}The rth predictor is then added to the order based on the maximal two-step ahead conditional log-likelihood at Step rr, cllcrcl{l}_{c}^{r}, defined as (4)cllcr≔maxj∈{1,2,…,p}⧹{t1,…,tr−1,c}cll(Cc,jr,vˆ,(uˆt1…uˆtr−1,uˆc,uˆj)),∀c∈Kr.cl{l}_{c}^{r}:= \mathop{\max }\limits_{j\in \{1,2,\ldots ,p\}\setminus \{{t}_{1},\ldots ,{t}_{r-1},c\}}cll({{\mathcal{C}}}_{c,j}^{r},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}},{\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j})),\hspace{0.33em}\forall c\in {K}_{r}.The index tr{t}_{r}is chosen as tr≔argmaxc∈Krcllcr{t}_{r}:= \arg {\max }_{c\in {K}_{r}}\hspace{0.33em}cl{l}_{c}^{r}, and the predictor Utr{U}_{{t}_{r}}is selected as the rth predictor of the order. An illustration of the vine tree structure of the candidate two-step ahead copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}for a D-vine model with order OD(Cc,jr)=(V,Ut1,…,Utr−1,Uc,Uj){{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{c,j}^{r})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c},{U}_{j})is given in Figure 3. At this step, the current optimal fit is the C-vine or D-vine copula Cr{{\mathcal{C}}}_{r}, with order O(Cr)=(V,Ut1,…Utr){\mathcal{O}}\left({{\mathcal{C}}}_{r})=\left(V,{U}_{{t}_{1}},\ldots {U}_{{t}_{r}}). The iterative procedure is repeated until all predictors are included in the order of the C- or D-vine copula model.Figure 3In step rr, the current optimal fit, Cr−1{{\mathcal{C}}}_{r-1}(black), is extended by one more predictor, Uc{U}_{c}(grey), to obtain the new current optimal fit Cr{{\mathcal{C}}}_{r}(black and grey), based on the conditional log-likelihood of the two-step ahead copula Cc,jr{{\mathcal{C}}}_{c,j}^{r}which also includes the predictor Uj{U}_{j}(grey filled) (In the figure, we use the shortened notation Ut1:r−1{U}_{{t}_{1:r-1}}instead of writing Ut1,…,Utr−1{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}and we use Ut1:r−1,c{U}_{{t}_{1:r-1},c}instead of Ut1,…,Utr−1,Uc{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}.).4.1.1Additional variable reduction in higher dimensionsThe above search procedure requires calculating p−rp-rconditional log-likelihoods for each candidate predictor at a given step rr. This leads to calculating a total of (p−r)k\left(p-r)kconditional log-likelihoods, where kkis the number of candidates. For pplarge, this procedure would cause a heavy computational burden. Hence, the idea is to reduce the number of conditional log-likelihoods calculated for each candidate predictor. This is achieved by reducing the size of the set, over which the maximal two-step ahead conditional log-likelihood cllcrcl{l}_{c}^{r}in Eq. (4), is computed. Instead of over the set {1,2,…,p}⧹{t1,…,tr−1,c}\left\{1,2,\ldots ,p\right\}\setminus \left\{{t}_{1},\ldots ,{t}_{r-1},c\right\}, the maximum can be taken over an appropriate subset. This subset can then be chosen either based on the largest Pearson’s partial correlations in absolute value denoted as ∣ρˆZV,Zj;Zt1,…,Ztr−1,Zc∣| {\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}},{Z}_{c}}| , by random selection, or a combination of the two. The selection method and the size of reduction are user-decided.4.2Consistency of the conditional quantile estimatorThe conditional quantile function on the original scale in Eq. (3) requires the inverse of the marginal distribution function of YY. Following Kraus and Czado [41] and Noh et al. [50], the marginal cumulative distribution functions FY{F}_{Y}and FXj,j=1,…p{F}_{{X}_{j}},j=1,\ldots p, are estimated nonparametrically to reduce the bias caused by model misspecification. Examples of nonparametric estimators for the marginal distributions FY{F}_{Y}and FXj{F}_{{X}_{j}}, are the continuous kernel smoothing estimator [52] and the transformed local likelihood estimator in the univariate case [21]. Using a Gaussian kernel, the above two estimators of the marginal distribution are uniformly strong consistent. When all inverses of the h-functions are also estimated nonparametrically, we establish the consistency of the conditional quantile estimator FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}in Proposition 4.1 for fixed variable orders. By showing the uniform consistency, Proposition 4.1 gives an indication of the performance of the conditional quantile estimator FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}for fixed variable orders, while combining the consistent estimators of FY{F}_{Y}, FXj{F}_{{X}_{j}}, and bivariate copula densities. Under the consistency guarantee, the numerical performance of FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}investigated by extensive simulation studies is presented in Section 5.Proposition 4.1Let the inverse of the marginal distribution functions FY{F}_{Y}and FXj{F}_{{X}_{j}}j=1,…,pj=1,\ldots ,pbe uniformly continuous and estimated nonparametrically, and let the inverse of the h-functions expressing the conditional quantile estimator CV∣U1,…,Up−1{C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}be uniformly continuous and estimated nonparametrically in the interior of the support of bivariate copulas, i.e. [δ,1−δ]2,δ→0+{\left[\delta ,1-\delta ]}^{2},\delta \to {0}_{+}. If estimators of the inverse of marginal functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, j=1,…,pj=1,\ldots ,p, are uniformly strong consistent on the support [δ,1−δ],δ→0+\left[\delta ,1-\delta ],\delta \to {0}_{+}, and the estimators of the inverse of h-functions composing the conditional quantile estimator CV∣U1,…,Up−1{C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}are uniformly strong consistent, then the estimator FˆY∣X1,…,Xp−1(α∣x1,…,xp){\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})is also uniformly strong consistent.If estimators of the inverse of marginal functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, j=1,…,pj=1,\ldots ,p, are at least weak consistent, and the estimators of the inverse of h-functions are also at least weak consistent, then the estimator FˆY∣X1,…,Xp−1(α∣x1,…,xp){\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})is weak consistent.For more details about uniform continuous functions see the studies by Bartle and Sherbert ([4] Section 5.4), Kolmogorov and Fomin ([39], p. 109, Def. 1). For a definition of strong uniform consistency or convergence with probability one, see the studies by Ryzin [54], Silverman [56], and Durrett ([19], p. 16), while for a definition for weak consistency or convergence in probability, see the studies by Durrett ([19], p. 53). The strong uniform consistency result in Proposition 1 requires additionally that all estimators of FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, for j=1,…pj=1,\ldots p, are strong uniformly consistent on a truncated compact interval [δ,1−δ],δ→0+\left[\delta ,1-\delta ],\delta \to {0}_{+}. Although not directly used in the proof of Proposition 4.1 in Appendix B, the truncation is an essential condition for guaranteeing the strong uniform consistency of all estimators of the inverse of the marginal distributions (i.e. estimators of quantile functions), see the studies by Cheng [15,14], Van Keilegom and Veraverbeke [60].5Simulation studyThe proposed two-step ahead forward selection algorithms for C- and D-vine-based quantile regression, from Section 4.1, are implemented in the statistical language R [53]. The D-vine one-step ahead algorithm is implemented in the R package vinereg [46]. In the simulation study from R [41], it is shown that the D-vine one-step ahead forward selection algorithm performs better or similar, compared to other state-of-the-art quantile methods, boosting additive quantile regression [20,36], nonparametric quantile regression [43], semi-parametric quantile regression [51], and linear quantile regression [38]. Thus, we use the one-step ahead algorithm as the benchmark competitive method in the simulation study. We set up the following simulation settings given below. Each setting is replicated for R=100R=100times. In each simulation replication, we randomly generate Ntrain{N}_{{\rm{train}}}samples used for fitting the appropriate nonparametric vine-based quantile regression models. Additionally, another Neval=12Ntrain{N}_{{\rm{eval}}}=\frac{1}{2}{N}_{{\rm{train}}}sample for settings (a)–(f) and Neval=Ntrain{N}_{{\rm{eval}}}={N}_{{\rm{train}}}for settings (g) and (h) are generated for predicting conditional quantiles from the models. settings (a)–(f) are designed to test quantile prediction accuracy of nonparametric C- or D-vine quantile regression in cases where p≤Np\le N; hence, we set Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}or 300. settings (g) and (h) test quantile prediction accuracy in cases where p>Np\gt N; hence, we set Ntrain=100{N}_{{\rm{train}}}=100.(a)Simulation setting M5 from Kraus and Czado [41]: Y=∣2X1−X2+0.5∣+(−0.5X3+1)(0.1X43)+σε,Y=\sqrt{| 2{X}_{1}-{X}_{2}+0.5| }+\left(-0.5{X}_{3}+1)\left(0.1{X}_{4}^{3})+\sigma \varepsilon ,with ε∼N(0,1),σ∈{0.1,1}\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,1),\sigma \in \left\{0.1,1\right\}, (X1,X2,X3,X4)T∼N4(0,Σ){\left({X}_{1},{X}_{2},{X}_{3},{X}_{4})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{4}\left(0,\Sigma ), and the (i,j)\left(i,j)th component of the covariance matrix given as (Σ)i,j=0.5∣i−j∣{\left(\Sigma )}_{i,j}=0.{5}^{| i-j| }.(b)(Y,X1,…,X5)T{\left(Y,{X}_{1},\ldots ,{X}_{5})}^{T}follows a mixture of two six-dimensional TTcopulas with degrees of freedom equal to 3 and mixture probabilities 0.3 and 0.7. Association matrices R1{R}_{1}, R2{R}_{2}and marginal distributions are recorded in Table 1.Table 1Association matrices of the multivariate t-copula and marginal distributions for setting (b)R1=10.60.50.60.70.10.610.50.50.50.50.50.510.50.50.50.60.50.510.50.50.70.50.50.510.50.10.50.50.50.51{R}_{1}=\left(\begin{array}{cccccc}1& 0.6& 0.5& 0.6& 0.7& 0.1\\ 0.6& 1& 0.5& 0.5& 0.5& 0.5\\ 0.5& 0.5& 1& 0.5& 0.5& 0.5\\ 0.6& 0.5& 0.5& 1& 0.5& 0.5\\ 0.7& 0.5& 0.5& 0.5& 1& 0.5\\ 0.1& 0.5& 0.5& 0.5& 0.5& 1\end{array}\right)R2=1-0.3-0.5-0.4-0.5-0.1-0.310.50.50.50.5-0.50.510.50.50.5-0.40.50.510.50.5-0.50.50.50.510.5-0.10.50.50.50.51{R}_{2}=\left(\begin{array}{cccccc}1& -0.3& -0.5& -0.4& -0.5& -0.1\\ -0.3& 1& 0.5& 0.5& 0.5& 0.5\\ -0.5& 0.5& 1& 0.5& 0.5& 0.5\\ -0.4& 0.5& 0.5& 1& 0.5& 0.5\\ -0.5& 0.5& 0.5& 0.5& 1& 0.5\\ -0.1& 0.5& 0.5& 0.5& 0.5& 1\end{array}\right)YYX1{X}_{1}X2{X}_{2}X3{X}_{3}X4{X}_{4}X5{X}_{5}N(0,1)N\left(0,1)t4{t}_{4}N(1,4)N\left(1,4)t4{t}_{4}N(1,4)N\left(1,4)t4{t}_{4}(c)Linear and heteroscedastic [11]: Y=5(X1+X2+X3+X4)+10(U1+U2+U3+U4)ε,Y=5\left({X}_{1}+{X}_{2}+{X}_{3}+{X}_{4})+10\left({U}_{1}+{U}_{2}+{U}_{3}+{U}_{4})\varepsilon ,where (X1,X2,X3,X4)T∼N4(0,Σ){\left({X}_{1},{X}_{2},{X}_{3},{X}_{4})}^{T} \sim \hspace{0.25em}{N}_{4}\left(0,\Sigma ), Σi,j=0.5I{i≠j}{\Sigma }_{i,j}=0.{5}^{I\left\{i\ne j\right\}}, ε∼N4(0,0.5)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}{N}_{4}\left(0,0.5), and Uj{U}_{j}, j=1,…,4j=1,\ldots ,4are obtained from the Xj{X}_{j}’s by the probability integral transform.(d)Nonlinear and heteroscedastic [11]: Y=U1U2e1.8U3U4+0.5(U1+U2+U3+U4)εY={U}_{1}{U}_{2}{{\rm{e}}}^{1.8{U}_{3}{U}_{4}}+0.5\left({U}_{1}+{U}_{2}+{U}_{3}+{U}_{4})\varepsilon , where Uj,j=1,…,4{U}_{j},j=1,\ldots ,4are probability integral transformed from N4(0,Σ){N}_{4}\left(0,\Sigma ), Σi,j=0.5I{i≠j}{\Sigma }_{i,j}={0.5}^{I\left\{i\ne j\right\}}, and ε∼N(0,0.5)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,0.5).(e)R-vine copula [17]: (V,U1,…,U4)T{\left(V,{U}_{1},\ldots ,{U}_{4})}^{T}follows an R-vine distribution with pair copulas given in Table 2.Table 2Pair copulas of the R-vine CV,U1,U2,U3,U4{C}_{V,{U}_{1},{U}_{2},{U}_{3},{U}_{4}}, with their family parameter and Kendall’s τ\tau for setting (e)TreeEdgeConditioned; ConditioningFamilyParameterKendall’s τ\tau 11U1,U3{U}_{1},{U}_{3};Gumbel3.90.7412U2,U3{U}_{2},{U}_{3};Gauss0.90.7113V,U3V,{U}_{3};Gauss0.50.3314V,U4V,{U}_{4};Clayton4.80.7121V,U1V,{U}_{1}; U3{U}_{3}Gumbel(90)6.5−0.85-0.8522V,U2V,{U}_{2}; U3{U}_{3}Gumbel(90)2.6−0.62-0.6223U3,U4{U}_{3},{U}_{4}; VVGumbel1.90.4831U1,U2{U}_{1},{U}_{2}; V,U3V,{U}_{3}Clayton0.90.3132U2,U4{U}_{2},{U}_{4}; V,U3V,{U}_{3}Clayton(90)5.1−0.72-0.7241U1,U4{U}_{1},{U}_{4}; V,U2,U3V,{U}_{2},{U}_{3}Gauss0.20.13(f)D-vine copula [59]: (V,U1,…,U5)T{\left(V,{U}_{1},\ldots ,{U}_{5})}^{T}follows a D-vine distribution with pair copulas given in Table 3.Table 3Pair copulas of the D-vine CV,U1,U2,U3,U4,U5{C}_{V,{U}_{1},{U}_{2},{U}_{3},{U}_{4},{U}_{5}}, with their family parameter and Kendall’s τ\tau for setting (f)TreeEdgeConditioned; ConditioningFamilyParameterKendall’s τ\tau 11V,U1V,{U}_{1};Clayton3.000.6012U1,U2{U}_{1},{U}_{2};Joe8.770.8013U2,U3{U}_{2},{U}_{3};Gumbel2.000.5014U3,U4{U}_{3},{U}_{4};Gauss0.200.1315U4,U5{U}_{4},{U}_{5};Indep.0.000.0021V,U2V,{U}_{2}; U1{U}_{1}Gumbel5.000.8022U1,U3{U}_{1},{U}_{3}; U2{U}_{2}Frank9.440.6523U2,U4{U}_{2},{U}_{4}; U3{U}_{3}Joe2.780.4924U3,U5{U}_{3},{U}_{5}; U4{U}_{4}Gauss0.200.1331V,U3V,{U}_{3}; U1,U2{U}_{1},{U}_{2}Joe3.830.6032U1,U4{U}_{1},{U}_{4}; U2,U3{U}_{2},{U}_{3}Frank6.730.5533U2,U5{U}_{2},{U}_{5}; U3,U4{U}_{3},{U}_{4}Gauss0.290.1941V,U4V,{U}_{4}; U1,U2,U3{U}_{1},{U}_{2},{U}_{3}Clayton2.000.5042U1,U5{U}_{1},{U}_{5}; U2,U3,U4{U}_{2},{U}_{3},{U}_{4}Gauss0.090.0651V,U5V,{U}_{5}; U1,U2,U3,U4{U}_{1},{U}_{2},{U}_{3},{U}_{4}Indep.0.000.00(g)Similar to setting (a), Y=∣2X1−X2+0.5∣+(−0.5X3+1)(0.1X43)+(X5,…,X110)(0,…,0)T+σε,Y=\sqrt{| 2{X}_{1}-{X}_{2}+0.5| }+\left(-0.5{X}_{3}+1)\left(0.1{X}_{4}^{3})+\left({X}_{5},\ldots ,{X}_{110}){\left(0,\ldots ,0)}^{T}+\sigma \varepsilon ,where (X1,…,X110)T∼N110(0,Σ){\left({X}_{1},\ldots ,{X}_{110})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{110}\left(0,\Sigma )with the (i,j)\left(i,j)th component of the covariance matrix (Σ)i,j=0.5∣i−j∣{\left(\Sigma )}_{i,j}=0.{5}^{| i-j| }, ε∼N(0,1)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,1), and σ∈{0.1,1}\sigma \in \left\{0.1,1\right\}.(h)Similar to (g), Y=(X13,…,X1103)β+εY=\left({X}_{1}^{3},\ldots ,{X}_{110}^{3}){\boldsymbol{\beta }}+\varepsilon , where (X1,…,X10)T∼N10(0,ΣA){\left({X}_{1},\ldots ,{X}_{10})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{10}\left(0,{\Sigma }_{A})with the (i,j)\left(i,j)th component of the covariance matrix (ΣA)i,j=0.8∣i−j∣{\left({\Sigma }_{A})}_{i,j}=0.{8}^{| i-j| }, (X11,…,X110)T∼N100(0,ΣB){\left({X}_{11},\ldots ,{X}_{110})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{100}\left(0,{\Sigma }_{B})with (ΣB)i,j=0.4∣i−j∣{\left({\Sigma }_{B})}_{i,j}=0.{4}^{| i-j| }. The first ten entries of β{\boldsymbol{\beta }}are a descending sequence between (2,1.1)\left(2,1.1)with increment of 0.1, respectively, and the rest are equal to 0. We assume ε∼N(0,σ)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,\sigma )and σ∈{0.1,1}\sigma \in \left\{0.1,1\right\}.Since the true regression quantiles are difficult to obtain in most settings, we consider the averaged check loss [41,40] and the interval score [11,25], instead of the out-of-sample mean averaged square error in Kraus and Czado [41], to evaluate the performance of the estimation methods. For a chosen α∈(0,1)\alpha \in \left(0,1), the averaged check loss is defined as (5)CL^α=1R∑r=1R1Neval∑n=1Neval{γα(Yr,neval−qˆα(Xr,neval))},{\widehat{\text{CL}}}_{\alpha }=\frac{1}{R}\mathop{\sum }\limits_{r=1}^{R}\left\{\frac{1}{{N}_{{\rm{eval}}}}\mathop{\sum }\limits_{n=1}^{{N}_{{\rm{eval}}}}\left\{{\gamma }_{\alpha }({Y}_{r,n}^{{\rm{eval}}}-{\hat{q}}_{\alpha }\left({X}_{r,n}^{{\rm{eval}}}))\right\}\right\},where γα{\gamma }_{\alpha }is the check loss function.The interval score, for the (1−α)×100%\left(1-\alpha )\times 100 \% prediction interval, is defined as (6)IS^α=1R∑r=1R1Neval∑n=1Neval{(qˆα∕2(Xr,neval)−qˆ1−α∕2(Xr,neval))+2α(qˆ1−α∕2(Xr,neval)−Yr,neval)I{Yr,neval≤qˆ1−α∕2(Xr,neval)}+2α(Yr,neval−qˆα∕2(Xr,neval))I{Yr,neval>qˆα∕2(Xr,neval)},\begin{array}{rcl}{\widehat{\text{IS}}}_{\alpha }& =& \frac{1}{R}\mathop{\displaystyle \sum }\limits_{r=1}^{R}\left\{\frac{1}{{N}_{{\rm{eval}}}}\mathop{\displaystyle \sum }\limits_{n=1}^{{N}_{{\rm{eval}}}}\left\{\phantom{\rule[-1.15em]{}{0ex}}\left({\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})-{\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}}))\right.\\ & & +\frac{2}{\alpha }\left({\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})-{Y}_{r,n}^{{\rm{eval}}})I\left\{{Y}_{r,n}^{{\rm{eval}}}\le {\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})\right\}\\ & & \left.\left.+\frac{2}{\alpha }\left({Y}_{r,n}^{{\rm{eval}}}-{\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}}))I\left\{{Y}_{r,n}^{{\rm{eval}}}\gt {\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})\right\}\phantom{\rule[-1.25em]{}{0ex}}\right\}\right\},\end{array}and smaller interval scores are better.For settings (a)–(f), the estimation procedure for the two-step ahead C- or D-vine quantile regression follows exactly Section 4.1 where the candidate sets at each step include all possible remaining predictors. The additional variable reduction described in Section 4.1.1 is not applied; thus, we calculate all possible conditional log-likelihoods in each step. On the contrary, due to computational burden in settings (g) and (h), we set the number of candidates to be k=5k=5and the additional variable reduction from Section 4.1.1 is applied. The chosen subset contains 20% of all possible choices, where 10% are predictors having the highest Pearson’s partial correlation with the response and the remaining 10% are chosen randomly from the remaining predictors. Performance of the C- and D-vine two-step ahead quantile regression is compared with the C- and D-vine one-step ahead quantile regression. The performance of the competitive methods, evaluated by the averaged check loss at 5, 50, and 95% quantile levels and interval score for the 95% prediction interval, is recorded in Tables 4 and 5. All densities are estimated nonparametrically for a fair comparison. Table 4 shows that the C- and D-vine two-step ahead regression models outperform the C- and D-vine one-step ahead regression models in five out of seven settings, except settings (b) and (e), in which all models perform quite similar to each other. Again, when comparing regression models within the same vine copula class, the C-vine two-step ahead regression models outperform the C-vine one-step ahead models in five out of seven settings. Similarly, the D-vine two-step ahead models outperform the D-vine one-step ahead models in six out of seven scenarios, except setting (b). In scenarios where there is no significant improvement through the second step, both one-step and two-step ahead approaches perform very similar. All of that implies that the two-step ahead vine-based quantile regression greatly improves the performance of the one-step ahead quantile regression. Table 5 indicates that in the high-dimensional settings, where the two-step ahead quantile regression was used in combination with the additional variable selection from Section 4.1.1, in three out of four simulation settings, the two-step ahead models outperform the one-step ahead models. In setting (g), we can see that all models show similar performance. In setting (g) with standard deviation σ=0.1\sigma =0.1, the D-vine one-step ahead model outperforms the other models, while in setting (g) with σ=1\sigma =1, the D-vine two-step ahead model shows a better performance. In setting (h), we see a significant improvement in the two-step ahead models compared to the one-step ahead models. For both σ=0.1\sigma =0.1and σ=1\sigma =1, the best performing model is the C-vine two-step ahead model. These results indicate that the newly proposed method improves the accuracy of the one-step ahead quantile regression in high dimensions, even with an attempt to ease the computational complexity of the two-step ahead model with a low number of candidates, compared to the number of predictors.Table 4Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}for settings (a)–(f) with Ntrain=300{N}_{{\rm{train}}}=300and Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}SettingModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}IS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}Ntrain=300{N}_{{\rm{train}}}=300Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}(a)D-vine One-step55.540.660.160.5155.890.670.150.50σ=0.1\sigma =0.1D-vine Two-step43.330.470.100.4140.740.450.090.37∗∗\ast \ast C-vine One-step53.510.640.160.4954.520.660.150.49C-vine Two-step42.010.450.100.4040.040.440.090.37(a)D-vine One-step154.351.630.451.62162.121.700.431.66σ=1\sigma =1D-vine Two-step148.531.570.451.56156.771.630.421.62∗∗\ast \ast C-vine One-step151.601.610.451.60160.781.680.431.65C-vine Two-step148.411.560.451.56156.791.630.421.62(b)D-vine One-step118.751.290.421.30125.331.370.401.36∗\ast D-vine Two-step119.101.300.421.30125.241.360.401.36C-vine One-step119.081.300.411.30125.121.360.401.36C-vine Two-step118.901.300.421.30125.301.360.401.36(c)D-vine One-step2908.9030.548.5530.423064.7831.698.1531.47∗∗\ast \ast D-vine Two-step2853.5230.218.7029.953041.9531.618.2031.26C-vine One-step2859.2330.248.5929.953046.5231.648.1831.25C-vine Two-step2850.1030.198.6429.843042.4631.628.2031.23(d)D-vine One-step86.400.920.240.9191.110.960.220.95∗∗\ast \ast D-vine Two-step83.540.900.240.8889.560.960.220.92C-vine One-step84.990.910.240.9090.400.960.220.94C-vine Two-step83.330.900.240.8789.470.960.220.92(e)D-vine One-step10.590.110.030.1110.490.110.030.11∗\ast D-vine Two-step10.320.100.030.1110.260.090.020.11C-vine One-step10.230.110.030.1010.020.100.020.10C-vine Two-step10.350.100.030.1110.330.100.020.11(f)D-vine One-step13.790.160.040.1413.700.160.040.14∗∗\ast \ast D-vine Two-step8.440.090.020.088.280.090.020.08C-vine One-step12.620.140.040.1312.230.130.040.13C-vine Two-step9.090.100.020.098.930.090.020.08Lower values, indicating better performance, are highlighted in bold. With ∗∗\ast \ast we denote the scenarios in which there is an improvement through the second step and with ∗\ast we denote scenarios in which the models perform similar.Table 5Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}for settings (g)–(h) with Ntrain=100{N}_{{\rm{train}}}=100ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}IS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}(g), σ=0.1∗\sigma =0.{1}^{\ast }(g), σ=1∗∗\sigma ={1}^{\ast \ast }D-vine One-step19.630.260.250.2353.380.690.670.65 D-vine Two-step20.480.260.260.2552.170.680.650.63C-vine One-step19.730.250.250.2453.620.690.670.65 C-vine Two-step19.790.250.250.2552.350.670.650.64 (h), σ=0.1∗∗\sigma =0.{1}^{\ast \ast }(h), σ=1∗∗\sigma ={1}^{\ast \ast }D-vine One-step558.366.926.987.04554.186.876.936.99 D-vine Two-step529.516.466.626.78531.306.646.646.64 C-vine One-step514.086.056.436.81512.966.396.416.44 C-vine Two-step479.665.876.006.12483.926.056.056.05Lower values, indicating better performance, are highlighted in bold. With ∗∗\ast \ast we denote the scenarios in which there is an improvement through the second step and with ∗\ast we denote scenarios in which the models perform similar.The proposed two-step algorithms, as compared to the one-step algorithms, are computationally more intensive. We present the averaged computation time over R=100R=100replications on 100 paralleled cores (Xeon Gold 6140 CPUSs@2.6 GHz) in settings (g) and (h) where p>Ntrainp\gt {N}_{{\rm{train}}}, for the one step ahead and the two-step ahead approach. The high-dimensional settings have similar computational times since the computational intensity depends on the number of pair copula estimations and the number of candidates, which are the same for settings (g) and (h). Hence, we only report the averaged computational times for settings (g) and (h). The average computation time in minutes for the one-step ahead (C- and D-vine) approach is 83.01, in contrast to 200.28 by the two-step ahead (C- and D-vine) approach. With the variable reduction from Section 4.1.1, the two-step algorithms double the time consumption of the one-step algorithms in exchange for prediction accuracy.6Real data examplesWe test the proposed methods on two real data sets, i.e. the Concrete data set from Yeh [64] corresponding to p≤Np\le N, and the riboflavin data set from Bühlmann and van de Geer [10] corresponding to p>Np\gt N. For both, performance of the four competitive algorithms is evaluated by the averaged check loss defined in Eq. (5) at 5, 50, and 95% quantile levels, and the 95% prediction interval score defined in Eq. (6), by randomly splitting the data set into training and evaluation sets 100 times.6.1Concrete data setThe concrete data set was initially used in Yeh [64], and is available at the UCI Machine Learning Repository [18]. The data set has in total 1,030 samples. Our objective is quantile predictions of the concrete compressive strength, which is a highly nonlinear function of age and ingredients. The predictors are age (AgeDay, counted in days) and seven physical measurements of the concrete ingredients (given in kg in a m3{{\rm{m}}}^{3}mixture): cement (CementComp), blast furnace slag (BlastFur), fly ash (FlyAsh), water (WaterComp), superplasticizer, coarse aggregate (CoarseAggre), and fine aggregate (FineAggre). We randomly split the data set into a training set with 830 samples and an evaluation set with 200 samples; the random splitting is repeated 100 times. Performance of the proposed C- and D-vine two-step ahead quantile regression, compared to the C- and D-vine one-step ahead quantile regression, is evaluated by several measurements reported in Table 6 after 100 repetitions of fitting the models. It is not unexpected that the results of the four algorithms are more distinct than most simulation settings, given the small number of predictors. However, there is an improvement in the performance of the two-step ahead approach compared to the one-step ahead approach for both C- and D-vine-based models. Also, the C-vine model seems more appropriate for modeling the dependency structure in the data set. Finally, out of all models, the C-vine two-step ahead algorithm is the best performing algorithm in terms of out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}on the Concrete data set, as seen in Table 6.In Figure 4, the marginal effect plots based on the fitted quantiles, from the C-vine two-step model, for the three most influential predictors are given. The marginal effect of a predictor is its expected impact on the quantile estimator, where the expectation is taken over all other predictors. This is estimated using all fitted conditional quantiles and smoothed over the predictors considered.Table 6Concrete data set: out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}D-vine One-step1032.3210.752.7610.52D-vine Two-step987.1010.542.789.82C-vine One-step976.7510.652.709.45C-vine Two-step967.0010.522.649.45The best performing model is highlighted in bold.Figure 4Marginal effect plots for the three most influential predictors on the concrete compressive strength for α\alpha values of 0.05 (red colour), 0.5 (green colour), and 0.95 (blue colour).6.2Riboflavin data setThe riboflavin data set, available in the R package hdi, aims at quantile predictions of the log-transformed production rate of Bacillus subtilis using log-transformed expression levels of 4,088 genes. To reduce the computational burden, we perform a pre-selection of the top 100 genes with the highest variance [10], resulting in a subset with p=100p=100log-transformed gene expressions and N=71N=71samples. Random splitting of the subset into training set with 61 samples and evaluation set with 10 samples, is repeated for 100 times. For the C- and D-vine two-step ahead quantile regression the number of candidates is set to k=10k=10. Additionally, to further reduce the computational burden the additional variable selection from Section 4.1.1 is applied with the chosen subset containing 25% of all possible choices, where 15% are predictors having the highest partial correlation with the log-transformed Bacillus subtilis production rate and the remaining 10% are chosen randomly from the remaining predictors. Performance of competitive quantile regression models is reported in Table 7, where we see that the proposed C-vine two-step ahead quantile regression is the best performing model and outperforms both the D-vine one-step ahead quantile regression from Kraus and Czado [41] and the C-vine one-step ahead quantile regression to a large extent. Furthermore, the second best performing method is the D-vine two-step ahead model which, while performing slightly worse than the C-vine two-step ahead model, also significantly outperforms both the C-vine and D-vine one-step ahead models. Since the predictors entering the C- and D-vine models yield a descending order of the predictors contributing to maximizing the conditional log-likelihood, the order indicates the influence of the predictors to the response variable. It is often of practical interest to know which gene expressions are of the highest importance for prediction. Since we repeat the random splitting of the subset for R=100R=100times, the importance of the gene expressions is ranked sequentially by choosing the one with the highest frequency of each element in the order excluding the gene expressions chosen in the previous steps. For instance, the most important gene expression is chosen as the one most frequently ranked first; the second most important gene is chosen as the one most frequently chosen as the second element in the order, excluding the most important gene selected in the previous step. The top ten most influential gene expressions using the C- and D-vine one- or two-step ahead models are recorded in Table 8. Figure 5 shows the marginal effect plots based on the fitted quantiles, from the C-vine two-step model, for the ten most influential predictors on the log-transformed Bacillus subtilis production rate.Table 7Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}D-vine One-step33.830.440.420.41D-vine Two-step30.570.440.380.33C-vine One-step34.520.490.430.38C-vine Two-step28.590.410.360.30The best performing model is highlighted in bold.Table 8Ten most influential gene expressions on the conditional quantile function, ranked based on their position in the orderModel/Position12345678910D-vine One-stepGGTYCICMTARPSEYVAKTHIKANSBSPOVBYVZBYQJBD-vine Two-stepMTARPSETHIKYMFEYCICsigMPGMYACCYVQFYKPBC-vine One-stepGGTYCICMTARPSEHITBFMBABPHRCYBAEPGMYHEFC-vine Two-stepMTARPSETHIKYCICYURUPGMsigMYACCYKRMASNBFigure 5Marginal effect plots for the ten most influential predictors on the log-transformed Bacillus subtilis production rate for α=0.5\alpha =0.5.7Summary and discussionIn this article, we introduce a two-step ahead forward selection algorithm for nonparametric C- and D-vine copula-based quantile regression. Inclusion of future information, obtained through considering the next tree in the two-step ahead algorithm, yields a significantly less greedy sequential selection procedure in comparison to the already existing one-step ahead algorithm for D-vine-based quantile regression in Kraus and Czado [41]. We extend the vine-based quantile regression framework to include C-vine copulas, providing an additional choice for the dependence structure. Furthermore, for the first time, nonparametric bivariate copulas are used to construct vine copula-based quantile regression models. The nonparametric estimation overcomes the problem of possible family misspecification in the parametric estimation of bivariate copulas and allows for even more flexibility in dependence estimation. Additionally, under mild regularity conditions, the nonparametric conditional quantile estimator is shown to be consistent.The extensive simulation study, including several different settings and data sets with different dimensions, strengths of dependence, and tail dependencies, shows that the two-step ahead algorithm outperforms the one-step ahead algorithm in most scenarios. The results for the Concrete and Riboflavin data sets are especially interesting, as the C-vine two-step ahead algorithm has a significant improvement compared to the other algorithms. These findings provide strong evidence for the need of modeling the dependence structure following a C-vine copula. In addition, the two-step ahead algorithm allows controlling the computational intensity independently of the data dimensions through the number of candidate predictors and the additional variable selection discussed in Section 5. Thus, fitting vine-based quantile regression models in high dimensions becomes feasible. As seen in several simulation settings, there is a significant gain by introducing additional dependence structures other than the D-vine-based quantile regression. A further research area is developing similar forward selection algorithms for R-vine tree structures while optimizing the conditional log-likelihood.At each step of the vine building stage, we compare equal-sized models with the same number of variables. The conditional log-likelihood is suited for such a comparison. Other questions might come in handy, such as choosing between C-vine, D-vine, or R-vine information criteria. When maximum likelihood estimation is employed at all stages, the selection criteria of Akaike (AIC) [1], the Bayesian information criterion (BIC) [55], and the focused information criterion (FIC) [16] might be used immediately. Ko et al. [33] studied FIC and AIC specifically for the selection of parametric copulas. The copula information criterion in the spirit of the Akaike information criterion by Gronneberg and Hjort [26] can be used for selection among copula models with empirically estimated margins, while Ko and Hjort [32] studied such a criterion for parametric copula models. We plan a deeper investigation of the use of information criteria for nonparametrically estimated copulas and for vines in particular. Such a study is beyond the scope of this article but could be interesting to study stopping criteria too for building vines.Nonparametrically estimated vines are offering considerable flexibility. Their parametric counterparts, on the other hand, are enjoying simplicity. An interesting route for further research is to combine parametric and nonparametric components in the construction of the vines in an efficient way to bring the most benefit, which should be made tangible through some criteria such that guidance can be provided about which components should be modelled nonparametrically and which others are best modelled parametrically. For some types of models, such choice between a parametric and a nonparametric model has been investigated by Jullum and Hjort [31] via the focused information criterion. This and alternative methods taking the effective degrees of freedom into account are worth further investigating for vine copula models. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Dependence Modeling de Gruyter

Nonparametric C- and D-vine-based quantile regression

Loading next page...
 
/lp/de-gruyter/nonparametric-c-and-d-vine-based-quantile-regression-i9ejcVHbCt
Publisher
de Gruyter
Copyright
© 2022 Marija Tepegjozova et al., published by De Gruyter
ISSN
2300-2298
eISSN
2300-2298
DOI
10.1515/demo-2022-0100
Publisher site
See Article on Publisher Site

Abstract

1IntroductionAs a robust alternative to the ordinary least squares regression, which estimates the conditional mean, quantile regression [38] focuses on the conditional quantiles. This method has been studied extensively in statistics, economics, and finance. The pioneer literature by Koenker [35] investigated linear quantile regression systematically. It presented properties of the estimators including asymptotic normality and consistency, under various assumptions such as independence of the observations, independent and identically distributed (i.i.d.) errors with continuous distribution, and predictors having bounded second moment. Subsequent extensions of linear quantile regression have been intensively studied, see for example adapting quantile regression in the Bayesian framework [66], for longitudinal data [34], time-series models [63], high-dimensional models with l1{l}_{1}-regularizer [7], nonparametric estimation by kernel weighted local linear fitting [65], by additive models [37,20], etc. The theoretical analysis of the above-mentioned extensions is based on imposing additional assumptions such as samples that are i.i.d. (see for example Belloni and Chernozhukov [7], Yu and Jones [65]), or that are generated by a known additive function (see for example Koenker [37,34]). Such assumptions, which guarantee the performance of the proposed methods for certain data structures, cause concerns in applications due to the uncertainty of the real-world data structures. Bernard and Czado [8] addressed other potential concerns such as quantile crossings and model-misspecification, when the dependence structure of the response variables and the predictors does not follow a Gaussian copula. Flexible models without assuming homoscedasticity, or a linear relationship between the response and the predictors are of interest. Recent research on dealing with this issue includes quantile forests [2,42,44] inspired by the earlier work of random forests [9] and modeling conditional quantiles using copulas (see also Chen et al. [13], Noh et al. [50,51]).Vine copulas in the context of conditional quantile prediction have been investigated by Kraus and Czado [41] using drawable vine copulas (D-vines) and by Chang and Joe [11] and, most recently, Zhu et al. [67] using restricted regular vines (R-vines). The approach of Kraus and Czado [11] is based on first finding the locally optimal regular vine structure among all predictors and then adding the response to each selected tree in the vine structure as a leaf, as also followed by Bauer and Czado [5] in the context of non-Gaussian conditional independence testing. The procedure in Chang and Joe [11] allows for a recursive determination of the response quantiles, which is restricted through the prespecified dependence structure among predictors. The latter might not be the one maximizing the conditional response likelihood, which is the main focus in regression setup. The approach of Kraus and Czado [41] is based on optimizing the conditional log-likelihood and selecting predictors sequentially until no improvement of the conditional log-likelihood is achieved. This approach based on the conditional response likelihood is more appropriate to determine the associated response quantiles. Furthermore, the intensive simulation study in Kraus and Czado [41] showed the superior performance of the D-vine copulas-based quantile regression compared to various quantile regression methods, i.e. linear quantile regression [38], boosting additive quantile regression [35,37,20], nonparametric quantile regression [43], and semiparametric quantile regression [51]. In parallel to our work, Zhu et al. [67] proposed an extension of this D-vine-based forward regression to a restricted R-vine forward regression with comparable performance to the D-vine regression. Thus, the D-vine quantile regression will be our benchmark model.We extend the method of Kraus and Czado [41] in two ways: (1) our approach is applicable to both C-vine and D-vine copulas; (2) a two-step ahead construction is introduced, instead of the one-step ahead construction. Since the two-step ahead construction is the main difference between our method and Kraus and Czado [41], we further explain the second point in more detail. Our proposed method proceeds by adding predictors to the model sequentially. However, in contrast to Kraus and Czado [41] with only one variable ahead, our new approach proposes to look up two variables ahead for selecting the variable to be added in each step. The general idea of this two-step ahead algorithm is as follows: in each step of the algorithm, we study combinations of two variables to find the variable, which in combination with the other improves the conditional log-likelihood the most. Thus, in combination with a forward selection method, this two-step ahead algorithm allows us to construct nonparametric quantile estimators that improve the conditional log-likelihood in each step and, most importantly, take possible future improvements into account. Our method is applicable to both low and high-dimensional data. By construction, quantile crossings are avoided. All marginal densities and copulas are estimated nonparametrically, allowing more flexibility than parametric specifications. Kraus and Czado [41] addressed the necessity and possible benefit of the nonparametric estimation of bivariate copulas in the quantile regression framework. This construction permits a large variety of dependence structures, resulting in a well-performing conditional quantile estimator. Moreover, extending to the C-vine copula class, in addition to the D-vine copulas, provides greater flexibility.The article is organized as follows. Section 2 introduces the general setup, the concept of C-vine and D-vine copulas, and the nonparametric approach for estimating copula densities. Section 3 describes the vine-based approach for quantile regression. The new two-step ahead forward selection algorithms are described in Section 4. We investigate in Proposition 4.1 the consistency of the conditional quantile estimator for given variable orders. The finite sample performance of the vine-based conditional quantile estimator is evaluated in Section 5 by several quantile-related measurements in various simulation settings. We apply the newly introduced algorithms to low- and high-dimensional real data in Section 6. In Section 7, we conclude and discuss possible directions of future research.2Theoretical backgroundConsider the random vector X=(X1,…,Xd)T{\boldsymbol{X}}={\left({X}_{1},\ldots ,{X}_{d})}^{T}with observed values x=(x1,…,xd)T{\boldsymbol{x}}={\left({x}_{1},\ldots ,{x}_{d})}^{T}, joint distribution and density function FFand ff, marginal distribution and density functions FXj{F}_{{X}_{j}}and fXj{f}_{{X}_{j}}for Xj,j=1,…,d{X}_{j},j=1,\ldots ,d. Sklar’s theorem [57] allows us to represent any multivariate distribution in terms of its marginals FXj{F}_{{X}_{j}}and a copula CCencoding the dependence structure. In the continuous case, CCis unique and satisfies F(x)=C(FX1(x1),…,FXd(xd))F\left({\boldsymbol{x}})=C\left({F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{d}}\left({x}_{d})\hspace{-0.08em})and f(x)=c(FX1(x1),…,FXd(xd))[∏j=1dfXj(xj)]f\left({\boldsymbol{x}})=c\left({F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{d}}\left({x}_{d}))\left[\mathop{\prod }\limits_{j=1}^{d}{f}_{{X}_{j}}\left({x}_{j})], where ccis the density function of the copula CC. To characterize the dependence structure of X{\boldsymbol{X}}, we transform each Xj{X}_{j}to a uniform variable Uj{U}_{j}by applying the probability integral transform, i.e. Uj≔FXj(Xj),j=1,…,d{U}_{j}:= {F}_{{X}_{j}}\left({X}_{j}),\hspace{0.33em}j=1,\ldots ,d. Then the random vector U=(U1,…,Ud)T{\boldsymbol{U}}={\left({U}_{1},\ldots ,{U}_{d})}^{T}with observed values (u1,…,ud)T{\left({u}_{1},\ldots ,{u}_{d})}^{T}has a copula as a joint distribution denoted as CU1,…,Ud{C}_{{U}_{1},\ldots ,{U}_{d}}with associated copula density function cU1,…,Ud{c}_{{U}_{1},\ldots ,{U}_{d}}. While the catalogue of bivariate parametric copula families is large, this is not true for d>2d\gt 2. Therefore, conditioning was applied to construct multivariate copulas using only bivariate copulas as building blocks. Joe [28] gave the first pair copula construction for dddimensions in terms of distribution functions, whereas Bedford and Cooke [6] independently developed constructions in terms of densities together with a graphical building plan, called a regular vine tree structure. It consists of a set of linked trees T1,…,Td{T}_{1},\ldots ,{T}_{d}(edges in tree Tj{T}_{j}become nodes in tree Tj+1{T}_{j+1}) satisfying a proximity condition, which allows us to identify all possible constructions. Each edge of the trees is associated with a pair copula CUi,Uj;UD{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}, where DDis a subset of indices not containing i,ji,j. In this case, the set {i,j}\left\{i,j\right\}is called the conditioned set, while DDis the conditioning set. A joint density using the class of vine copulas is then the product of all pair copulas identified by the tree structure evaluated at appropriate conditional distribution functions FXj∣XD{F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}and the product of the marginal densities fXj,j=1,…,d{f}_{{X}_{j}},j=1,\ldots ,d. A detailed treatment of vine copulas together with estimation methods and model choice approaches are given, for example in the studies of Joe [30] and Czado [17].Since we are interested in simple copula-based estimation methods for conditional quantiles, we restrict to two subclasses of the regular vine tree structure, namely, the D- and C-vine structure. We show later that these structures allow us to express conditional distribution and quantiles in closed form. In the D-vine tree structure all trees are paths, i.e. all nodes have degree ≤2\le 2. Nodes with degree 1 are called leaf nodes. A C-vine structure occurs, when all trees are stars with a root node in the centre. The right and left panels of Figure 1 illustrate a D-vine and a C-vine tree sequence in four dimensions, respectively.Figure 1C-vine tree sequence (left panel) and a D-vine tree sequence (right panel) in four dimensions.For these sub classes we can easily give the corresponding vine density [17, Chapter 4]. For a D-vine density we have a permutation s1,…,sd{s}_{1},\ldots ,{s}_{d}of 1,…,d1,\ldots ,dsuch that (1)f(x1,…,xd)=∏j=1d−1∏i=1d−jcUsi,Usi+j;Usi+1,…,Usi+j−1(FXsi∣Xsi+1,…,Xsi+j−1(xsi∣xsi+1,…,xsi+j−1),FXsi+j∣Xsi+1,…,Xsi+j−1(xsi+j∣xsi+1,…,xsi+j−1))⋅∏k=1dfXsk(xsk),\begin{array}{rcl}f\left({x}_{1},\ldots ,{x}_{d})& =& \mathop{\displaystyle \prod }\limits_{j=1}^{d-1}\mathop{\displaystyle \prod }\limits_{i=1}^{d-j}{c}_{{U}_{{s}_{i}},{U}_{{s}_{i+j}};{U}_{{s}_{i+1}},\ldots ,{U}_{{s}_{i+j-1}}}({F}_{{X}_{{s}_{i}}| {X}_{{s}_{i+1}},\ldots ,{X}_{{s}_{i+j-1}}}\left({x}_{{s}_{i}}| {x}_{{s}_{i+1}},\ldots ,{x}_{{s}_{i+j-1}}),\\ & & {F}_{{X}_{{s}_{i+j}}| {X}_{{s}_{i+1}},\ldots ,{X}_{{s}_{i+j-1}}}\left({x}_{{s}_{i+j}}| {x}_{{s}_{i+1}},\ldots ,{x}_{{s}_{i+j-1}}))\cdot \mathop{\displaystyle \prod }\limits_{k=1}^{d}{f}_{{X}_{{s}_{k}}}\left({x}_{{s}_{k}}),\end{array}while for a C-vine density the following representation holds (2)f(x1,…,xd)=∏j=1d−1∏i=1d−jcUsj,Usj+i;Us1,…,Usj−1(FXsj∣Xs1,…,Xsj−1(xsj∣xs1,…,xsj−1),FXsj+i∣Xs1,…,Xsj−1(xsj+i∣xs1,…,xsj−1))⋅∏k=1dfXsk(xsk).\begin{array}{rcl}f\left({x}_{1},\ldots ,{x}_{d})& =& \mathop{\displaystyle \prod }\limits_{j=1}^{d-1}\mathop{\displaystyle \prod }\limits_{i=1}^{d-j}{c}_{{U}_{{s}_{j}},{U}_{{s}_{j+i}};{U}_{{s}_{1}},\ldots ,{U}_{{s}_{j-1}}}({F}_{{X}_{{s}_{j}}| {X}_{{s}_{1}},\ldots ,{X}_{{s}_{j-1}}}\left({x}_{{s}_{j}}| {x}_{{s}_{1}},\ldots ,{x}_{{s}_{j-1}}),\\ & & {F}_{{X}_{{s}_{j+i}}| {X}_{{s}_{1}},\ldots ,{X}_{{s}_{j-1}}}\left({x}_{{s}_{j+i}}| {x}_{{s}_{1}},\ldots ,{x}_{{s}_{j-1}}))\cdot \mathop{\displaystyle \prod }\limits_{k=1}^{d}{f}_{{X}_{{s}_{k}}}\left({x}_{{s}_{k}}).\end{array}To determine the needed conditional distribution FXj∣XD{F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}in Eqs. (1) and (2) for appropriate choices of jjand DD, the recursion discussed in Joe [28] is available. Using uj=FXj∣XD(xj∣xD){u}_{j}={F}_{{X}_{j}| {{\boldsymbol{X}}}_{D}}\left({x}_{j}| {{\boldsymbol{x}}}_{D})for j=1,…,dj=1,\ldots ,dwe can express them as compositions of h-functions. These are defined in general as hUi∣Uj;UD(ui∣uj;uD)=∂∂ujCUi,Uj;UD(ui,uj;uD){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})=\frac{\partial }{\partial {u}_{j}}{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j};{{\boldsymbol{u}}}_{D}). Additionally, we made in Eqs. (1) and (2) the simplifying assumption ([17], Section 5.4) that is, the copula function CUi,Uj;UD{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}does not depend on the specific conditioning value of uD{{\boldsymbol{u}}}_{D}, i.e. CUi,Uj;UD(ui,uj;uD)=CUi,Uj;UD(ui,uj){C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j};{{\boldsymbol{u}}}_{D})={C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j}). The dependence on uD{{\boldsymbol{u}}}_{D}in Eqs. (1) and (2) is completely captured by the arguments of the pair copulas. This assumption is often made for tractability reasons in higher dimensions (Haff et al. [27] and Stoeber et al. [58]). It implies further that the h-function satisfies hUi∣Uj;UD(ui∣uj;uD)=∂∂ujCUi,Uj;UD(ui,uj)=CUi∣Uj;UD(ui∣uj){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})=\frac{\partial }{\partial {u}_{j}}{C}_{{U}_{i},{U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i},{u}_{j})={C}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j})and is independent of uD{{\boldsymbol{u}}}_{D}.2.1Nonparametric estimators of the copula densities and h-functionsThere are many methods to estimate a bivariate copula density cUi,Uj{c}_{{U}_{i},{U}_{j}}nonparametrically. Examples are the transformation estimator [12], the transformation local likelihood estimator [22], the tapered transformation estimator [61], the beta kernel estimator [12], and the mirror-reflection estimator [24]. Among the above-mentioned kernel estimators, the transformation local likelihood estimator [22] was found by Nagler et al. [47] to have an overall best performance. The estimator is implemented in the R packages kdecopula [45] and rvinecopulib [49] using Gaussian kernels. We review its construction in Appendix A. To satisfy the copula definition, it is scaled to have uniform margins.As mentioned above the simplifying assumption implies that hUi∣Uj;UD(ui∣uj;uD){h}_{{U}_{i}| {U}_{j};{{\boldsymbol{U}}}_{D}}\left({u}_{i}| {u}_{j};{{\boldsymbol{u}}}_{D})is independent of specific values of uD{{\boldsymbol{u}}}_{D}. Thus, it is sufficient to show how the h-function hUi∣Uj=CUi∣Uj(ui∣uj){h}_{{U}_{i}| {U}_{j}}={C}_{{U}_{i}| {U}_{j}}\left({u}_{i}| {u}_{j})can be estimated nonparametrically. For this, we use as estimator CˆUi∣Uj(ui∣uj)=∫0uicˆUi,Uj(u˜i,uj)du˜i,{\hat{C}}_{{U}_{i}| {U}_{j}}\left({u}_{i}| {u}_{j})=\underset{0}{\overset{{u}_{i}}{\int }}{\hat{c}}_{{U}_{i},{U}_{j}}\left({\tilde{u}}_{i},{u}_{j}){\rm{d}}{\tilde{u}}_{i},where cˆUi,Uj{\hat{c}}_{{U}_{i},{U}_{j}}is one of the aforementioned nonparametric estimators of the bivariate copula density of (Ui,Uj)\left({U}_{i},{U}_{j})for which it holds that cˆUi,Uj{\hat{c}}_{{U}_{i},{U}_{j}}integrates to 1 and has uniform margins.3Vine-based quantile regressionIn the general regression framework, the predictive ability of a set of variables X=(X1,…,Xp)T{\boldsymbol{X}}={\left({X}_{1},\ldots ,{X}_{p})}^{T}for the response Y∈RY\in {\mathbb{R}}is studied. The main interest of vine-based quantile regression is to predict the α∈(0,1)\alpha \in \left(0,1)quantile qα(x1,…,xp)=FY∣X1,…,Xp−1(α∣x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})={F}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})of the response variable YYgiven X{\boldsymbol{X}}by using a copula-based model of (Y,X)T{\left(Y,{\boldsymbol{X}})}^{T}. As shown in Kraus and Czado [41], this can be expressed as (3)FY∣X1,…,Xp−1(α∣x1,…,xp)=FY−1(CV∣U1,…,Up−1(α∣FX1(x1),…,FXp(xp))),{F}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})={F}_{Y}^{-1}\left({C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}\left(\alpha | {F}_{{X}_{1}}\left({x}_{1}),\ldots ,{F}_{{X}_{p}}\left({x}_{p}))),where CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}is the conditional distribution function of V=FY(Y)V={F}_{Y}\left(Y)given Uj=FXj(Xj)=uj{U}_{j}={F}_{{X}_{j}}\left({X}_{j})={u}_{j}for j=1,…,pj=1,\ldots ,p, with corresponding density cV∣U1,…,Up{c}_{V| {U}_{1},\ldots ,{U}_{p}}, and CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}denotes the (p+1)\left(p+1)-dimensional copula associated with the joint distribution of (Y,X)T{\left(Y,{\boldsymbol{X}})}^{T}. In view of Section 1, we have d=p+1d=p+1. An estimate of qα(x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})can be obtained using estimated marginal quantile functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1,j=1,…,p{\hat{F}}_{{X}_{j}}^{-1},\hspace{0.33em}j=1,\ldots ,p, and the estimated conditional distribution function CˆV∣U1,…,Up−1{\hat{C}}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}giving qˆα(x1,…,xp)=FˆY−1(CˆV∣U1,…,Up−1(α∣FˆX1(x1),…FˆXp(xp))){\hat{q}}_{\alpha }\left({x}_{1},\ldots ,{x}_{p})={\hat{F}}_{Y}^{-1}\left({\hat{C}}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}\left(\alpha | {\hat{F}}_{{X}_{1}}\left({x}_{1}),\ldots {\hat{F}}_{{X}_{p}}\left({x}_{p}))).In general, CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}can be any (p+1)\left(p+1)-dimensional multivariate copula, however, for certain vine structures the corresponding conditional distribution function CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}can be obtained in closed form not requiring numerical integration. For D-vine structures this is possible and has been already used in the studies of Kraus and Czado [41]. Tepegjozova [59] showed that this is also the case for certain C-vine structures. More precisely, the copula CV,U1,…,Up{C}_{V,{U}_{1},\ldots ,{U}_{p}}with D-vine structure allows us to express CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}in a closed form if and only if the response VVis a leaf node in the first tree of the tree sequence. For a C-vine structure we need, that the node containing the response variable VVin the conditioned set is not a root node in any tree. Additional flexibility in using such D- and C-vine structures is achieved by allowing for nonparametric pair-copulas as building blocks.The order of the predictors within the tree sequences itself is a free parameter with direct impact on the target function CV∣U1,…,Up{C}_{V| {U}_{1},\ldots ,{U}_{p}}and thus, on the corresponding prediction performance of qα(x1,…,xp){q}_{\alpha }\left({x}_{1},\ldots ,{x}_{p}). For this, we recall the concept of a node order for C- and D-vine copulas introduced in the study by Tepegjozova [59]. A D-vine copula denoted by CD{{\mathcal{C}}}_{D}has order OD(CD)=(V,Ui1,…,Uip),{{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{D})=\left(V,{U}_{{i}_{1}},\ldots ,{U}_{{i}_{p}}),if the response VVis the first node of the first tree T1{T}_{1}and Uik{U}_{{i}_{k}}is the (k+1k+1)th node of T1{T}_{1}, for k=1,…,pk=1,\ldots ,p. A C-vine copula CC{{\mathcal{C}}}_{C}has order OC(CC)=(V,Ui1,…,Uip),{{\mathcal{O}}}_{C}\left({{\mathcal{C}}}_{C})=\left(V,{U}_{{i}_{1}},\ldots ,{U}_{{i}_{p}}),if Ui1{U}_{{i}_{1}}is the root node in the first tree T1{T}_{1}, Ui2Ui1{U}_{{i}_{2}}{U}_{{i}_{1}}is the root node in the second tree T2{T}_{2}, and UikUik−1;Ui1,…,Uik−2{U}_{{i}_{k}}{U}_{{i}_{k-1}};{U}_{{i}_{1}},\ldots ,{U}_{{i}_{k-2}}is the root node in the kth tree Tk{T}_{k}for k=3,…,p−1k=3,\ldots ,p-1.Now our goal is to find an optimal order of D- or C-vine copula model with regard to a fit measure. This measure has to allow us to quantify the explanatory power of a model. One such measure is the estimated conditional copula log-likelihood function as a fit measure. For NNi.i.d. observations v≔(v(1),…,v(N))T{\boldsymbol{v}}:= {\left({v}^{\left(1)},\ldots ,{v}^{\left(N)})}^{T}and uj≔(uj(1),…,uj(N))T{{\boldsymbol{u}}}_{j}:= {\left({u}_{j}^{\left(1)},\ldots ,{u}_{j}^{\left(N)})}^{T}, for j=1,…,pj=1,\ldots ,pof the random vector (V,U1,…,Up)T{\left(V,{U}_{1},\ldots ,{U}_{p})}^{T}we fit a C- or D-vine copula with order O(Cˆ)=(V,U1,…,Up){\mathcal{O}}\left(\hat{{\mathcal{C}}})=\left(V,{U}_{1},\ldots ,{U}_{p})using nonparametric pair copulas. We denote this copula by Cˆ\hat{{\mathcal{C}}}, then the fitted conditional log-likelihood can be determined as cll(Cˆ,v,(u1,…,up))=∑n=1NlncˆV∣U1,…,Up(v(n)∣u1(n),…,up(n))=∑n=1NlncˆV,U1(v(n),u1(n))+∑j=2plncˆV,Uj∣U1,…,Uj−1(CˆV∣U1,…,Uj−1(v(n)∣u1(n),…,uj−1(n)),CˆUj∣U1,…,Uj−1(uj(n)∣u1(n),…,uj−1(n))).\begin{array}{rcl}cll\left(\hat{{\mathcal{C}}},{\boldsymbol{v}},\left({{\boldsymbol{u}}}_{1},\ldots ,{{\boldsymbol{u}}}_{p}))& =& \mathop{\displaystyle \sum }\limits_{n=1}^{N}\mathrm{ln}{\hat{c}}_{V| {U}_{1},\ldots ,{U}_{p}}\left({v}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{p}^{\left(n)})\\ & =& \mathop{\displaystyle \sum }\limits_{n=1}^{N}\left[\mathrm{ln}{\hat{c}}_{V,{U}_{1}}\left({v}^{\left(n)},{u}_{1}^{\left(n)})+\mathop{\displaystyle \sum }\limits_{j=2}^{p}\mathrm{ln}{\hat{c}}_{V,{U}_{j}| {U}_{1},\ldots ,{U}_{j-1}}\left({\hat{C}}_{V| {U}_{1},\ldots ,{U}_{j-1}}\left({v}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{j-1}^{\left(n)}),{\hat{C}}_{{U}_{j}| {U}_{1},\ldots ,{U}_{j-1}}\left({u}_{j}^{\left(n)}| {u}_{1}^{\left(n)},\ldots ,{u}_{j-1}^{\left(n)}))\right].\end{array}Penalizations for model complexity when parametric pair copulas are used can be added as shown in the study by Tepegjozova [59]. To define an appropriate penalty in the case of using nonparametric pair copulas is an open research question (see also Section 7).4Forward selection algorithmsHaving a set of pppredictors, there are p!p\!different orders that uniquely determine p!p\!C-vines and p!p\!D-vines. Fitting and comparing all of them are computationally inefficient. Thus, the idea is to have an algorithm that will sequentially choose the elements of the order, so that at every step the resulting model for the prediction of the conditional quantiles has the highest conditional log-likelihood. Building upon the idea of Kraus and Czado [41] for the one-step ahead D-vine regression, we propose an algorithm which allows for more flexibility and which is less greedy, given the intention to obtain a globally optimal C- or D-vine fit. The algorithm builds the C- or D-vine step by step, starting with an order consisting of only the response variable VV. Each step adds one of the predictors to the order based on the improvement of the conditional log-likelihood, while taking into account the possibility of future improvement, i.e. extending our view two steps ahead in the order. As discussed in Section 2.1, the pair copulas at each step are estimated nonparametrically in contrast to the parametric approach of Kraus and Czado [41]. We present the implementation for both C-vine and D-vine-based quantile regression in a single algorithm, in which the user decides whether to fit a C-vine or D-vine model based on the background knowledge of dependency structures in the data. Implementation for a large data set is computationally challenging; therefore, randomization is introduced to guarantee computational efficiency in high dimensions.4.1Two-step ahead forward selection algorithm for C- and D-vine-based quantile regressionInput and data preprocessing: Consider NNi.i.d observations y≔(y(1),…,y(N)){\boldsymbol{y}}:= ({y}^{\left(1)},\ldots ,{y}^{\left(N)})and xj≔(xj(1),…,xj(N)){{\boldsymbol{x}}}_{j}:= \left({x}_{j}^{\left(1)},\ldots ,{x}_{j}^{\left(N)})for j=1,…,pj=1,\ldots ,p, from the random vector (Y,X1,…,Xp)T{\left(Y,{X}_{1},\ldots ,{X}_{p})}^{T}. The input data are on the x-scale, but in order to fit bivariate copulas we need to transform it to the u-scale using the probability integral transform. Since the marginal distributions are unknown we estimate them, i.e. FY{F}_{Y}and FXj{F}_{{X}_{j}}, for j=1,…,pj=1,\ldots ,p, are estimated using a univariate nonparametric kernel density estimator with the R package kde1d [48]. This results in the pseudo copula data vˆ(n)≔FˆY(y(n)){\hat{v}}^{\left(n)}:= {\hat{F}}_{Y}({y}^{\left(n)})\hspace{0.33em}and uˆj(n)≔FˆXj(xj(n)),{\hat{u}}_{j}^{\left(n)}:= {\hat{F}}_{{X}_{j}}\left({x}_{j}^{\left(n)}),for n=1,…,N,j=1,…,pn=1,\ldots ,N,\hspace{0.33em}j=1,\ldots ,p. The normalized marginals (zz-scale) are defined as Zj≔Φ−1(Uj){Z}_{j}:= {\Phi }^{-1}\left({U}_{j})for j=1,…,p,j=1,\ldots ,p,and ZV≔Φ−1(V){Z}_{V}:= {\Phi }^{-1}\left(V), where Φ\Phi denotes the standard normal distribution function.Step 1: To reduce computational complexity, we perform a pre-selection of the predictors based on Kendall’s τ\tau . This is motivated by the fact that Kendall’s τ\tau is rank-based, therefore, invariant with respect to monotone transformations of the marginals and can be expressed in terms of pair copulas. Using the pseudo copula data (vˆ,uˆj)={vˆ(n),uˆj(n)∣n=1,…,N}\left(\hat{{\boldsymbol{v}}},{\hat{{\boldsymbol{u}}}}_{j})=\left\{{\hat{v}}^{\left(n)},{\hat{u}}_{j}^{\left(n)}| n=1,\ldots ,N\right\}, estimates τˆVUj{\hat{\tau }}_{V{U}_{j}}of the Kendall’s τ\tau values between the response VV, and all possible predictors Uj{U}_{j}for j=1,…,pj=1,\ldots ,p, are obtained. For a given k≤pk\le p, the kklargest estimates of ∣τˆVUj∣| {\hat{\tau }}_{V{U}_{j}}| are selected and the corresponding indices q1,…,qk{q}_{1},\ldots ,{q}_{k}are identified such that ∣τˆVUq1∣≥∣τˆVUq2∣≥⋯≥∣τˆVUqk∣≥∣τˆVUqk+1∣≥⋯≥∣τˆVUqp∣| {\hat{\tau }}_{V{U}_{{q}_{1}}}| \ge | {\hat{\tau }}_{V{U}_{{q}_{2}}}| \hspace{0.33em}\ge \cdots \ge \hspace{0.33em}| {\hat{\tau }}_{V{U}_{{q}_{k}}}| \ge | {\hat{\tau }}_{V{U}_{{q}_{k+1}}}| \hspace{0.33em}\ge \cdots \ge \hspace{0.33em}| {\hat{\tau }}_{V{U}_{{q}_{p}}}| . The parameter kkis a hyper-parameter and therefore, subject to tuning. To obtain a parsimonious model, we suggest a kkcorresponding to 5–20% of the total number of predictors. The kkcandidate predictors and the corresponding candidate index set of step 1 are defined as Uq1,…,Uqk{U}_{{q}_{1}},\ldots ,{U}_{{q}_{k}}and K1={q1,…,qk}{K}_{1}=\{{q}_{1},\ldots ,{q}_{k}\}, respectively. For all c∈K1c\in {K}_{1}and j∈{1,…,p}⧹{c}j\in \{1,\ldots ,p\}\setminus \{c\}the candidate two-step ahead C- or D-vine copulas are defined as the three-dimensional copulas Cc,j1{{\mathcal{C}}}_{c,j}^{1}with order O(Cc,j1)=(V,Uc,Uj){\mathcal{O}}\left({{\mathcal{C}}}_{c,j}^{1})=\left(V,{U}_{c},{U}_{j}). The first predictor is added to the order based on the conditional log-likelihood of the candidate two-step ahead C- or D-vine copulas, Cc,j1{{\mathcal{C}}}_{c,j}^{1}, given as cll(Cc,j1,vˆ,(uˆc,uˆj))=∑n=1N[logcˆV,Uc(vˆ(n),uˆc(n))+logcˆV,Uj∣Uc(hˆV∣Uc(vˆ(n)∣uˆc(n)),hˆUj∣Uc(uˆj(n)∣uˆc(n)))].cll({{\mathcal{C}}}_{c,j}^{1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j}))=\mathop{\sum }\limits_{n=1}^{N}\left[\log {\hat{c}}_{V,{U}_{c}}\left({\hat{v}}^{\left(n)},{\hat{u}}_{c}^{\left(n)})+\log {\hat{c}}_{V,{U}_{j}| {U}_{c}}\left({\hat{h}}_{V| {U}_{c}}\left({\hat{v}}^{\left(n)}| {\hat{u}}_{c}^{\left(n)}),{\hat{h}}_{{U}_{j}| {U}_{c}}\left({\hat{u}}_{j}^{\left(n)}| {\hat{u}}_{c}^{\left(n)}))].For each candidate predictor Uc{U}_{c}, the maximal two-step ahead conditional log-likelihood at step 1, cllc1cl{l}_{c}^{1}, is defined as cllc1≔maxj∈{1,…,p}⧹{c}cll(Cc,j1,vˆ,(uˆc,uˆj)),∀c∈K1cl{l}_{c}^{1}:= {\max }_{j\in \left\{1,\ldots ,p\right\}\setminus \left\{c\right\}}cll({{\mathcal{C}}}_{c,j}^{1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j})),\hspace{0.33em}\forall c\in {K}_{1}. Finally, based on the maximal two-step ahead conditional log-likelihood at step 1, cllc1cl{l}_{c}^{1}, the index t1{t}_{1}is chosen as t1≔argmaxc∈K1cllc1{t}_{1}:= \arg {\max }_{c\in {K}_{1}}cl{l}_{c}^{1}, and the corresponding candidate predictor Ut1{U}_{{t}_{1}}is selected as the first predictor added to the order. An illustration of the vine tree structure of the candidate two-step ahead copulas Cc,j1{{\mathcal{C}}}_{c,j}^{1}, in the case of fitting a D-vine model, with order OD(Cc,j1)=(V,Uc,Uj){{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{c,j}^{1})=\left(V,{U}_{c},{U}_{j})is given in Figure 2. Finally, the current optimal fit after the first step is the C-vine or D-vine copula, C1{{\mathcal{C}}}_{1}with order O(C1)=(V,Ut1){\mathcal{O}}\left({{\mathcal{C}}}_{1})=\left(V,{U}_{{t}_{1}}).Figure 2VVis fixed as the first node of T1{T}_{1}and the first candidate predictor to be included in the model, Uc{U}_{c}(grey), is chosen based on the conditional log-likelihood of the two-step ahead copula including the predictor Uj{U}_{j}(grey filled).Step r{\boldsymbol{r}}: After r−1r-1steps, the current optimal fit is the C- or D-vine copula Cr−1{{\mathcal{C}}}_{r-1}with order O(Cr−1)=(V,Ut1,…,Utr−1){\mathcal{O}}\left({{\mathcal{C}}}_{r-1})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}). At each previous step ii, the order of the current optimal fit is sequentially updated with the predictor Uti{U}_{{t}_{i}}for i=1,…,r−1i=1,\ldots ,r-1. At the rth step, the next predictor candidate is to be included. To do so, the set of potential candidates is narrowed based on a partial correlation measure. Defining a partial Kendall’s τ\tau is not straightforward and requires the notion of a partial copula, which is the average over the conditional copula given the values of the conditioning values (e.g. see Gijbels and Matterne [23] and references therein). In addition, the computation in the case of multivariate conditioning is very demanding and still an open research problem. Therefore, we took a pragmatic view and base our candidate selection on partial correlation. Due to the assumption of Gaussian margins inherited to the Pearson’s partial correlation, the estimates are computed on the zz-scale. Estimates of the empirical Pearson’s partial correlation, ρˆZV,Zj;Zt1,…,Ztr−1{\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}}}, between the normalized response variable VVand available predictors Uj{U}_{j}for j∈{1,2,…,p}⧹{t1,…,tr−1}j\in \left\{1,2,\ldots ,p\right\}\setminus \left\{{t}_{1},\ldots ,{t}_{r-1}\right\}are obtained. Similar to the first step, a set of candidate predictors of size kkis selected based on the largest values of ∣ρˆZV,Zj;Zt1,…,Ztr−1∣| {\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}}}| and the corresponding indices q1,…,qk{q}_{1},\ldots ,{q}_{k}. The kkcandidate predictors and the corresponding candidate index set of step rrare defined as Uq1,…,Uqk{U}_{{q}_{1}},\ldots ,{U}_{{q}_{k}}and the set Kr={q1,…,qk}{K}_{r}=\{{q}_{1},\ldots ,{q}_{k}\}, respectively. For all c∈Krc\in {K}_{r}and j∈{1,2,…,p}⧹{t1,…,tr−1,c}j\in \{1,2,\ldots ,p\}\setminus \{{t}_{1},\ldots ,{t}_{r-1},c\}, the candidate two-step ahead C- or D-vine copulas are defined as the copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}with order O(Cc,jr)=(V,Ut1,…,Utr−1,Uc,Uj){\mathcal{O}}\left({{\mathcal{C}}}_{c,j}^{r})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c},{U}_{j}). There are k(p−r)k\left(p-r)different candidate two-step ahead C- or D-vine copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}(since we have kkcandidates for the one-step ahead extension Uc{U}_{c}, and for each, p−(r−1)−1p-\left(r-1)-1two step ahead extensions Uj{U}_{j}). Their corresponding conditional log-likelihood functions are given as cll(Cc,jr,vˆ,(uˆt1…uˆtr−1,uˆc,uˆj))=cll(Cr−1,vˆ,(uˆt1…uˆtr−1))+∑n=1NlogcˆVUc;Ut1,…,Utr−1(CˆV∣Ut1,…,Utr−1(vˆ(n)∣uˆt1(n),…,uˆtr−1(n)),CˆUc∣Ut1,…,Utr−1(uˆc(n)∣uˆt1(n),…,uˆtr−1(n)))+∑n=1NlogcˆVUj;Ut1,…,Utr−1,Uc(CˆV∣Ut1,…,Utr−1,Uc(vˆ(n)∣uˆt1(n),…,uˆtr−1(n),uˆc(n)),CˆUj∣Ut1,…,Utr−1,Uc(uˆj(n)∣uˆt1(n),…,uˆtr−1(n),uˆc(n))).\begin{array}{l}cll({{\mathcal{C}}}_{c,j}^{r},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}},{\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j}))\\ \hspace{1.0em}=cll({{\mathcal{C}}}_{r-1},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}}))\\ \hspace{2.0em}+\mathop{\displaystyle \sum }\limits_{n=1}^{N}\log {\hat{c}}_{V{U}_{c};{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}({\hat{C}}_{V| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}\left({\hat{v}}^{(n)}| {\hat{u}}_{{t}_{1}}^{(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)}),{\hat{C}}_{{U}_{c}| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}}\left({\hat{u}}_{c}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)}))\\ \hspace{2.0em}+\mathop{\displaystyle \sum }\limits_{n=1}^{N}\log {\hat{c}}_{V{U}_{j};{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}({\hat{C}}_{V| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}\left({\hat{v}}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)},{\hat{u}}_{c}^{\left(n)}),\\ \hspace{3.025em}{\hat{C}}_{{U}_{j}| {U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}}\left({\hat{u}}_{j}^{\left(n)}| {\hat{u}}_{{t}_{1}}^{\left(n)},\ldots ,{\hat{u}}_{{t}_{r-1}}^{\left(n)},{\hat{u}}_{c}^{\left(n)})).\end{array}The rth predictor is then added to the order based on the maximal two-step ahead conditional log-likelihood at Step rr, cllcrcl{l}_{c}^{r}, defined as (4)cllcr≔maxj∈{1,2,…,p}⧹{t1,…,tr−1,c}cll(Cc,jr,vˆ,(uˆt1…uˆtr−1,uˆc,uˆj)),∀c∈Kr.cl{l}_{c}^{r}:= \mathop{\max }\limits_{j\in \{1,2,\ldots ,p\}\setminus \{{t}_{1},\ldots ,{t}_{r-1},c\}}cll({{\mathcal{C}}}_{c,j}^{r},\hat{{\boldsymbol{v}}},\left({\hat{{\boldsymbol{u}}}}_{{t}_{1}}\ldots {\hat{{\boldsymbol{u}}}}_{{t}_{r-1}},{\hat{{\boldsymbol{u}}}}_{c},{\hat{{\boldsymbol{u}}}}_{j})),\hspace{0.33em}\forall c\in {K}_{r}.The index tr{t}_{r}is chosen as tr≔argmaxc∈Krcllcr{t}_{r}:= \arg {\max }_{c\in {K}_{r}}\hspace{0.33em}cl{l}_{c}^{r}, and the predictor Utr{U}_{{t}_{r}}is selected as the rth predictor of the order. An illustration of the vine tree structure of the candidate two-step ahead copulas Cc,jr{{\mathcal{C}}}_{c,j}^{r}for a D-vine model with order OD(Cc,jr)=(V,Ut1,…,Utr−1,Uc,Uj){{\mathcal{O}}}_{D}\left({{\mathcal{C}}}_{c,j}^{r})=\left(V,{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c},{U}_{j})is given in Figure 3. At this step, the current optimal fit is the C-vine or D-vine copula Cr{{\mathcal{C}}}_{r}, with order O(Cr)=(V,Ut1,…Utr){\mathcal{O}}\left({{\mathcal{C}}}_{r})=\left(V,{U}_{{t}_{1}},\ldots {U}_{{t}_{r}}). The iterative procedure is repeated until all predictors are included in the order of the C- or D-vine copula model.Figure 3In step rr, the current optimal fit, Cr−1{{\mathcal{C}}}_{r-1}(black), is extended by one more predictor, Uc{U}_{c}(grey), to obtain the new current optimal fit Cr{{\mathcal{C}}}_{r}(black and grey), based on the conditional log-likelihood of the two-step ahead copula Cc,jr{{\mathcal{C}}}_{c,j}^{r}which also includes the predictor Uj{U}_{j}(grey filled) (In the figure, we use the shortened notation Ut1:r−1{U}_{{t}_{1:r-1}}instead of writing Ut1,…,Utr−1{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}}and we use Ut1:r−1,c{U}_{{t}_{1:r-1},c}instead of Ut1,…,Utr−1,Uc{U}_{{t}_{1}},\ldots ,{U}_{{t}_{r-1}},{U}_{c}.).4.1.1Additional variable reduction in higher dimensionsThe above search procedure requires calculating p−rp-rconditional log-likelihoods for each candidate predictor at a given step rr. This leads to calculating a total of (p−r)k\left(p-r)kconditional log-likelihoods, where kkis the number of candidates. For pplarge, this procedure would cause a heavy computational burden. Hence, the idea is to reduce the number of conditional log-likelihoods calculated for each candidate predictor. This is achieved by reducing the size of the set, over which the maximal two-step ahead conditional log-likelihood cllcrcl{l}_{c}^{r}in Eq. (4), is computed. Instead of over the set {1,2,…,p}⧹{t1,…,tr−1,c}\left\{1,2,\ldots ,p\right\}\setminus \left\{{t}_{1},\ldots ,{t}_{r-1},c\right\}, the maximum can be taken over an appropriate subset. This subset can then be chosen either based on the largest Pearson’s partial correlations in absolute value denoted as ∣ρˆZV,Zj;Zt1,…,Ztr−1,Zc∣| {\hat{\rho }}_{{Z}_{V},{Z}_{j};{Z}_{{t}_{1}},\ldots ,{Z}_{{t}_{r-1}},{Z}_{c}}| , by random selection, or a combination of the two. The selection method and the size of reduction are user-decided.4.2Consistency of the conditional quantile estimatorThe conditional quantile function on the original scale in Eq. (3) requires the inverse of the marginal distribution function of YY. Following Kraus and Czado [41] and Noh et al. [50], the marginal cumulative distribution functions FY{F}_{Y}and FXj,j=1,…p{F}_{{X}_{j}},j=1,\ldots p, are estimated nonparametrically to reduce the bias caused by model misspecification. Examples of nonparametric estimators for the marginal distributions FY{F}_{Y}and FXj{F}_{{X}_{j}}, are the continuous kernel smoothing estimator [52] and the transformed local likelihood estimator in the univariate case [21]. Using a Gaussian kernel, the above two estimators of the marginal distribution are uniformly strong consistent. When all inverses of the h-functions are also estimated nonparametrically, we establish the consistency of the conditional quantile estimator FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}in Proposition 4.1 for fixed variable orders. By showing the uniform consistency, Proposition 4.1 gives an indication of the performance of the conditional quantile estimator FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}for fixed variable orders, while combining the consistent estimators of FY{F}_{Y}, FXj{F}_{{X}_{j}}, and bivariate copula densities. Under the consistency guarantee, the numerical performance of FˆY∣X1,…,Xp−1{\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}investigated by extensive simulation studies is presented in Section 5.Proposition 4.1Let the inverse of the marginal distribution functions FY{F}_{Y}and FXj{F}_{{X}_{j}}j=1,…,pj=1,\ldots ,pbe uniformly continuous and estimated nonparametrically, and let the inverse of the h-functions expressing the conditional quantile estimator CV∣U1,…,Up−1{C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}be uniformly continuous and estimated nonparametrically in the interior of the support of bivariate copulas, i.e. [δ,1−δ]2,δ→0+{\left[\delta ,1-\delta ]}^{2},\delta \to {0}_{+}. If estimators of the inverse of marginal functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, j=1,…,pj=1,\ldots ,p, are uniformly strong consistent on the support [δ,1−δ],δ→0+\left[\delta ,1-\delta ],\delta \to {0}_{+}, and the estimators of the inverse of h-functions composing the conditional quantile estimator CV∣U1,…,Up−1{C}_{V| {U}_{1},\ldots ,{U}_{p}}^{-1}are uniformly strong consistent, then the estimator FˆY∣X1,…,Xp−1(α∣x1,…,xp){\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})is also uniformly strong consistent.If estimators of the inverse of marginal functions FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, j=1,…,pj=1,\ldots ,p, are at least weak consistent, and the estimators of the inverse of h-functions are also at least weak consistent, then the estimator FˆY∣X1,…,Xp−1(α∣x1,…,xp){\hat{F}}_{Y| {X}_{1},\ldots ,{X}_{p}}^{-1}\left(\alpha | {x}_{1},\ldots ,{x}_{p})is weak consistent.For more details about uniform continuous functions see the studies by Bartle and Sherbert ([4] Section 5.4), Kolmogorov and Fomin ([39], p. 109, Def. 1). For a definition of strong uniform consistency or convergence with probability one, see the studies by Ryzin [54], Silverman [56], and Durrett ([19], p. 16), while for a definition for weak consistency or convergence in probability, see the studies by Durrett ([19], p. 53). The strong uniform consistency result in Proposition 1 requires additionally that all estimators of FˆY−1{\hat{F}}_{Y}^{-1}, FˆXj−1{\hat{F}}_{{X}_{j}}^{-1}, for j=1,…pj=1,\ldots p, are strong uniformly consistent on a truncated compact interval [δ,1−δ],δ→0+\left[\delta ,1-\delta ],\delta \to {0}_{+}. Although not directly used in the proof of Proposition 4.1 in Appendix B, the truncation is an essential condition for guaranteeing the strong uniform consistency of all estimators of the inverse of the marginal distributions (i.e. estimators of quantile functions), see the studies by Cheng [15,14], Van Keilegom and Veraverbeke [60].5Simulation studyThe proposed two-step ahead forward selection algorithms for C- and D-vine-based quantile regression, from Section 4.1, are implemented in the statistical language R [53]. The D-vine one-step ahead algorithm is implemented in the R package vinereg [46]. In the simulation study from R [41], it is shown that the D-vine one-step ahead forward selection algorithm performs better or similar, compared to other state-of-the-art quantile methods, boosting additive quantile regression [20,36], nonparametric quantile regression [43], semi-parametric quantile regression [51], and linear quantile regression [38]. Thus, we use the one-step ahead algorithm as the benchmark competitive method in the simulation study. We set up the following simulation settings given below. Each setting is replicated for R=100R=100times. In each simulation replication, we randomly generate Ntrain{N}_{{\rm{train}}}samples used for fitting the appropriate nonparametric vine-based quantile regression models. Additionally, another Neval=12Ntrain{N}_{{\rm{eval}}}=\frac{1}{2}{N}_{{\rm{train}}}sample for settings (a)–(f) and Neval=Ntrain{N}_{{\rm{eval}}}={N}_{{\rm{train}}}for settings (g) and (h) are generated for predicting conditional quantiles from the models. settings (a)–(f) are designed to test quantile prediction accuracy of nonparametric C- or D-vine quantile regression in cases where p≤Np\le N; hence, we set Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}or 300. settings (g) and (h) test quantile prediction accuracy in cases where p>Np\gt N; hence, we set Ntrain=100{N}_{{\rm{train}}}=100.(a)Simulation setting M5 from Kraus and Czado [41]: Y=∣2X1−X2+0.5∣+(−0.5X3+1)(0.1X43)+σε,Y=\sqrt{| 2{X}_{1}-{X}_{2}+0.5| }+\left(-0.5{X}_{3}+1)\left(0.1{X}_{4}^{3})+\sigma \varepsilon ,with ε∼N(0,1),σ∈{0.1,1}\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,1),\sigma \in \left\{0.1,1\right\}, (X1,X2,X3,X4)T∼N4(0,Σ){\left({X}_{1},{X}_{2},{X}_{3},{X}_{4})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{4}\left(0,\Sigma ), and the (i,j)\left(i,j)th component of the covariance matrix given as (Σ)i,j=0.5∣i−j∣{\left(\Sigma )}_{i,j}=0.{5}^{| i-j| }.(b)(Y,X1,…,X5)T{\left(Y,{X}_{1},\ldots ,{X}_{5})}^{T}follows a mixture of two six-dimensional TTcopulas with degrees of freedom equal to 3 and mixture probabilities 0.3 and 0.7. Association matrices R1{R}_{1}, R2{R}_{2}and marginal distributions are recorded in Table 1.Table 1Association matrices of the multivariate t-copula and marginal distributions for setting (b)R1=10.60.50.60.70.10.610.50.50.50.50.50.510.50.50.50.60.50.510.50.50.70.50.50.510.50.10.50.50.50.51{R}_{1}=\left(\begin{array}{cccccc}1& 0.6& 0.5& 0.6& 0.7& 0.1\\ 0.6& 1& 0.5& 0.5& 0.5& 0.5\\ 0.5& 0.5& 1& 0.5& 0.5& 0.5\\ 0.6& 0.5& 0.5& 1& 0.5& 0.5\\ 0.7& 0.5& 0.5& 0.5& 1& 0.5\\ 0.1& 0.5& 0.5& 0.5& 0.5& 1\end{array}\right)R2=1-0.3-0.5-0.4-0.5-0.1-0.310.50.50.50.5-0.50.510.50.50.5-0.40.50.510.50.5-0.50.50.50.510.5-0.10.50.50.50.51{R}_{2}=\left(\begin{array}{cccccc}1& -0.3& -0.5& -0.4& -0.5& -0.1\\ -0.3& 1& 0.5& 0.5& 0.5& 0.5\\ -0.5& 0.5& 1& 0.5& 0.5& 0.5\\ -0.4& 0.5& 0.5& 1& 0.5& 0.5\\ -0.5& 0.5& 0.5& 0.5& 1& 0.5\\ -0.1& 0.5& 0.5& 0.5& 0.5& 1\end{array}\right)YYX1{X}_{1}X2{X}_{2}X3{X}_{3}X4{X}_{4}X5{X}_{5}N(0,1)N\left(0,1)t4{t}_{4}N(1,4)N\left(1,4)t4{t}_{4}N(1,4)N\left(1,4)t4{t}_{4}(c)Linear and heteroscedastic [11]: Y=5(X1+X2+X3+X4)+10(U1+U2+U3+U4)ε,Y=5\left({X}_{1}+{X}_{2}+{X}_{3}+{X}_{4})+10\left({U}_{1}+{U}_{2}+{U}_{3}+{U}_{4})\varepsilon ,where (X1,X2,X3,X4)T∼N4(0,Σ){\left({X}_{1},{X}_{2},{X}_{3},{X}_{4})}^{T} \sim \hspace{0.25em}{N}_{4}\left(0,\Sigma ), Σi,j=0.5I{i≠j}{\Sigma }_{i,j}=0.{5}^{I\left\{i\ne j\right\}}, ε∼N4(0,0.5)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}{N}_{4}\left(0,0.5), and Uj{U}_{j}, j=1,…,4j=1,\ldots ,4are obtained from the Xj{X}_{j}’s by the probability integral transform.(d)Nonlinear and heteroscedastic [11]: Y=U1U2e1.8U3U4+0.5(U1+U2+U3+U4)εY={U}_{1}{U}_{2}{{\rm{e}}}^{1.8{U}_{3}{U}_{4}}+0.5\left({U}_{1}+{U}_{2}+{U}_{3}+{U}_{4})\varepsilon , where Uj,j=1,…,4{U}_{j},j=1,\ldots ,4are probability integral transformed from N4(0,Σ){N}_{4}\left(0,\Sigma ), Σi,j=0.5I{i≠j}{\Sigma }_{i,j}={0.5}^{I\left\{i\ne j\right\}}, and ε∼N(0,0.5)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,0.5).(e)R-vine copula [17]: (V,U1,…,U4)T{\left(V,{U}_{1},\ldots ,{U}_{4})}^{T}follows an R-vine distribution with pair copulas given in Table 2.Table 2Pair copulas of the R-vine CV,U1,U2,U3,U4{C}_{V,{U}_{1},{U}_{2},{U}_{3},{U}_{4}}, with their family parameter and Kendall’s τ\tau for setting (e)TreeEdgeConditioned; ConditioningFamilyParameterKendall’s τ\tau 11U1,U3{U}_{1},{U}_{3};Gumbel3.90.7412U2,U3{U}_{2},{U}_{3};Gauss0.90.7113V,U3V,{U}_{3};Gauss0.50.3314V,U4V,{U}_{4};Clayton4.80.7121V,U1V,{U}_{1}; U3{U}_{3}Gumbel(90)6.5−0.85-0.8522V,U2V,{U}_{2}; U3{U}_{3}Gumbel(90)2.6−0.62-0.6223U3,U4{U}_{3},{U}_{4}; VVGumbel1.90.4831U1,U2{U}_{1},{U}_{2}; V,U3V,{U}_{3}Clayton0.90.3132U2,U4{U}_{2},{U}_{4}; V,U3V,{U}_{3}Clayton(90)5.1−0.72-0.7241U1,U4{U}_{1},{U}_{4}; V,U2,U3V,{U}_{2},{U}_{3}Gauss0.20.13(f)D-vine copula [59]: (V,U1,…,U5)T{\left(V,{U}_{1},\ldots ,{U}_{5})}^{T}follows a D-vine distribution with pair copulas given in Table 3.Table 3Pair copulas of the D-vine CV,U1,U2,U3,U4,U5{C}_{V,{U}_{1},{U}_{2},{U}_{3},{U}_{4},{U}_{5}}, with their family parameter and Kendall’s τ\tau for setting (f)TreeEdgeConditioned; ConditioningFamilyParameterKendall’s τ\tau 11V,U1V,{U}_{1};Clayton3.000.6012U1,U2{U}_{1},{U}_{2};Joe8.770.8013U2,U3{U}_{2},{U}_{3};Gumbel2.000.5014U3,U4{U}_{3},{U}_{4};Gauss0.200.1315U4,U5{U}_{4},{U}_{5};Indep.0.000.0021V,U2V,{U}_{2}; U1{U}_{1}Gumbel5.000.8022U1,U3{U}_{1},{U}_{3}; U2{U}_{2}Frank9.440.6523U2,U4{U}_{2},{U}_{4}; U3{U}_{3}Joe2.780.4924U3,U5{U}_{3},{U}_{5}; U4{U}_{4}Gauss0.200.1331V,U3V,{U}_{3}; U1,U2{U}_{1},{U}_{2}Joe3.830.6032U1,U4{U}_{1},{U}_{4}; U2,U3{U}_{2},{U}_{3}Frank6.730.5533U2,U5{U}_{2},{U}_{5}; U3,U4{U}_{3},{U}_{4}Gauss0.290.1941V,U4V,{U}_{4}; U1,U2,U3{U}_{1},{U}_{2},{U}_{3}Clayton2.000.5042U1,U5{U}_{1},{U}_{5}; U2,U3,U4{U}_{2},{U}_{3},{U}_{4}Gauss0.090.0651V,U5V,{U}_{5}; U1,U2,U3,U4{U}_{1},{U}_{2},{U}_{3},{U}_{4}Indep.0.000.00(g)Similar to setting (a), Y=∣2X1−X2+0.5∣+(−0.5X3+1)(0.1X43)+(X5,…,X110)(0,…,0)T+σε,Y=\sqrt{| 2{X}_{1}-{X}_{2}+0.5| }+\left(-0.5{X}_{3}+1)\left(0.1{X}_{4}^{3})+\left({X}_{5},\ldots ,{X}_{110}){\left(0,\ldots ,0)}^{T}+\sigma \varepsilon ,where (X1,…,X110)T∼N110(0,Σ){\left({X}_{1},\ldots ,{X}_{110})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{110}\left(0,\Sigma )with the (i,j)\left(i,j)th component of the covariance matrix (Σ)i,j=0.5∣i−j∣{\left(\Sigma )}_{i,j}=0.{5}^{| i-j| }, ε∼N(0,1)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,1), and σ∈{0.1,1}\sigma \in \left\{0.1,1\right\}.(h)Similar to (g), Y=(X13,…,X1103)β+εY=\left({X}_{1}^{3},\ldots ,{X}_{110}^{3}){\boldsymbol{\beta }}+\varepsilon , where (X1,…,X10)T∼N10(0,ΣA){\left({X}_{1},\ldots ,{X}_{10})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{10}\left(0,{\Sigma }_{A})with the (i,j)\left(i,j)th component of the covariance matrix (ΣA)i,j=0.8∣i−j∣{\left({\Sigma }_{A})}_{i,j}=0.{8}^{| i-j| }, (X11,…,X110)T∼N100(0,ΣB){\left({X}_{11},\ldots ,{X}_{110})}^{T}\hspace{0.33em} \sim \hspace{0.33em}{N}_{100}\left(0,{\Sigma }_{B})with (ΣB)i,j=0.4∣i−j∣{\left({\Sigma }_{B})}_{i,j}=0.{4}^{| i-j| }. The first ten entries of β{\boldsymbol{\beta }}are a descending sequence between (2,1.1)\left(2,1.1)with increment of 0.1, respectively, and the rest are equal to 0. We assume ε∼N(0,σ)\varepsilon \hspace{0.33em} \sim \hspace{0.33em}N\left(0,\sigma )and σ∈{0.1,1}\sigma \in \left\{0.1,1\right\}.Since the true regression quantiles are difficult to obtain in most settings, we consider the averaged check loss [41,40] and the interval score [11,25], instead of the out-of-sample mean averaged square error in Kraus and Czado [41], to evaluate the performance of the estimation methods. For a chosen α∈(0,1)\alpha \in \left(0,1), the averaged check loss is defined as (5)CL^α=1R∑r=1R1Neval∑n=1Neval{γα(Yr,neval−qˆα(Xr,neval))},{\widehat{\text{CL}}}_{\alpha }=\frac{1}{R}\mathop{\sum }\limits_{r=1}^{R}\left\{\frac{1}{{N}_{{\rm{eval}}}}\mathop{\sum }\limits_{n=1}^{{N}_{{\rm{eval}}}}\left\{{\gamma }_{\alpha }({Y}_{r,n}^{{\rm{eval}}}-{\hat{q}}_{\alpha }\left({X}_{r,n}^{{\rm{eval}}}))\right\}\right\},where γα{\gamma }_{\alpha }is the check loss function.The interval score, for the (1−α)×100%\left(1-\alpha )\times 100 \% prediction interval, is defined as (6)IS^α=1R∑r=1R1Neval∑n=1Neval{(qˆα∕2(Xr,neval)−qˆ1−α∕2(Xr,neval))+2α(qˆ1−α∕2(Xr,neval)−Yr,neval)I{Yr,neval≤qˆ1−α∕2(Xr,neval)}+2α(Yr,neval−qˆα∕2(Xr,neval))I{Yr,neval>qˆα∕2(Xr,neval)},\begin{array}{rcl}{\widehat{\text{IS}}}_{\alpha }& =& \frac{1}{R}\mathop{\displaystyle \sum }\limits_{r=1}^{R}\left\{\frac{1}{{N}_{{\rm{eval}}}}\mathop{\displaystyle \sum }\limits_{n=1}^{{N}_{{\rm{eval}}}}\left\{\phantom{\rule[-1.15em]{}{0ex}}\left({\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})-{\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}}))\right.\\ & & +\frac{2}{\alpha }\left({\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})-{Y}_{r,n}^{{\rm{eval}}})I\left\{{Y}_{r,n}^{{\rm{eval}}}\le {\hat{q}}_{1-\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})\right\}\\ & & \left.\left.+\frac{2}{\alpha }\left({Y}_{r,n}^{{\rm{eval}}}-{\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}}))I\left\{{Y}_{r,n}^{{\rm{eval}}}\gt {\hat{q}}_{\alpha /2}\left({X}_{r,n}^{{\rm{eval}}})\right\}\phantom{\rule[-1.25em]{}{0ex}}\right\}\right\},\end{array}and smaller interval scores are better.For settings (a)–(f), the estimation procedure for the two-step ahead C- or D-vine quantile regression follows exactly Section 4.1 where the candidate sets at each step include all possible remaining predictors. The additional variable reduction described in Section 4.1.1 is not applied; thus, we calculate all possible conditional log-likelihoods in each step. On the contrary, due to computational burden in settings (g) and (h), we set the number of candidates to be k=5k=5and the additional variable reduction from Section 4.1.1 is applied. The chosen subset contains 20% of all possible choices, where 10% are predictors having the highest Pearson’s partial correlation with the response and the remaining 10% are chosen randomly from the remaining predictors. Performance of the C- and D-vine two-step ahead quantile regression is compared with the C- and D-vine one-step ahead quantile regression. The performance of the competitive methods, evaluated by the averaged check loss at 5, 50, and 95% quantile levels and interval score for the 95% prediction interval, is recorded in Tables 4 and 5. All densities are estimated nonparametrically for a fair comparison. Table 4 shows that the C- and D-vine two-step ahead regression models outperform the C- and D-vine one-step ahead regression models in five out of seven settings, except settings (b) and (e), in which all models perform quite similar to each other. Again, when comparing regression models within the same vine copula class, the C-vine two-step ahead regression models outperform the C-vine one-step ahead models in five out of seven settings. Similarly, the D-vine two-step ahead models outperform the D-vine one-step ahead models in six out of seven scenarios, except setting (b). In scenarios where there is no significant improvement through the second step, both one-step and two-step ahead approaches perform very similar. All of that implies that the two-step ahead vine-based quantile regression greatly improves the performance of the one-step ahead quantile regression. Table 5 indicates that in the high-dimensional settings, where the two-step ahead quantile regression was used in combination with the additional variable selection from Section 4.1.1, in three out of four simulation settings, the two-step ahead models outperform the one-step ahead models. In setting (g), we can see that all models show similar performance. In setting (g) with standard deviation σ=0.1\sigma =0.1, the D-vine one-step ahead model outperforms the other models, while in setting (g) with σ=1\sigma =1, the D-vine two-step ahead model shows a better performance. In setting (h), we see a significant improvement in the two-step ahead models compared to the one-step ahead models. For both σ=0.1\sigma =0.1and σ=1\sigma =1, the best performing model is the C-vine two-step ahead model. These results indicate that the newly proposed method improves the accuracy of the one-step ahead quantile regression in high dimensions, even with an attempt to ease the computational complexity of the two-step ahead model with a low number of candidates, compared to the number of predictors.Table 4Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}for settings (a)–(f) with Ntrain=300{N}_{{\rm{train}}}=300and Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}SettingModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}IS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}Ntrain=300{N}_{{\rm{train}}}=300Ntrain=1,000{N}_{{\rm{train}}}=\hspace{0.1em}\text{1,000}\hspace{0.1em}(a)D-vine One-step55.540.660.160.5155.890.670.150.50σ=0.1\sigma =0.1D-vine Two-step43.330.470.100.4140.740.450.090.37∗∗\ast \ast C-vine One-step53.510.640.160.4954.520.660.150.49C-vine Two-step42.010.450.100.4040.040.440.090.37(a)D-vine One-step154.351.630.451.62162.121.700.431.66σ=1\sigma =1D-vine Two-step148.531.570.451.56156.771.630.421.62∗∗\ast \ast C-vine One-step151.601.610.451.60160.781.680.431.65C-vine Two-step148.411.560.451.56156.791.630.421.62(b)D-vine One-step118.751.290.421.30125.331.370.401.36∗\ast D-vine Two-step119.101.300.421.30125.241.360.401.36C-vine One-step119.081.300.411.30125.121.360.401.36C-vine Two-step118.901.300.421.30125.301.360.401.36(c)D-vine One-step2908.9030.548.5530.423064.7831.698.1531.47∗∗\ast \ast D-vine Two-step2853.5230.218.7029.953041.9531.618.2031.26C-vine One-step2859.2330.248.5929.953046.5231.648.1831.25C-vine Two-step2850.1030.198.6429.843042.4631.628.2031.23(d)D-vine One-step86.400.920.240.9191.110.960.220.95∗∗\ast \ast D-vine Two-step83.540.900.240.8889.560.960.220.92C-vine One-step84.990.910.240.9090.400.960.220.94C-vine Two-step83.330.900.240.8789.470.960.220.92(e)D-vine One-step10.590.110.030.1110.490.110.030.11∗\ast D-vine Two-step10.320.100.030.1110.260.090.020.11C-vine One-step10.230.110.030.1010.020.100.020.10C-vine Two-step10.350.100.030.1110.330.100.020.11(f)D-vine One-step13.790.160.040.1413.700.160.040.14∗∗\ast \ast D-vine Two-step8.440.090.020.088.280.090.020.08C-vine One-step12.620.140.040.1312.230.130.040.13C-vine Two-step9.090.100.020.098.930.090.020.08Lower values, indicating better performance, are highlighted in bold. With ∗∗\ast \ast we denote the scenarios in which there is an improvement through the second step and with ∗\ast we denote scenarios in which the models perform similar.Table 5Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}for settings (g)–(h) with Ntrain=100{N}_{{\rm{train}}}=100ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}IS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}(g), σ=0.1∗\sigma =0.{1}^{\ast }(g), σ=1∗∗\sigma ={1}^{\ast \ast }D-vine One-step19.630.260.250.2353.380.690.670.65 D-vine Two-step20.480.260.260.2552.170.680.650.63C-vine One-step19.730.250.250.2453.620.690.670.65 C-vine Two-step19.790.250.250.2552.350.670.650.64 (h), σ=0.1∗∗\sigma =0.{1}^{\ast \ast }(h), σ=1∗∗\sigma ={1}^{\ast \ast }D-vine One-step558.366.926.987.04554.186.876.936.99 D-vine Two-step529.516.466.626.78531.306.646.646.64 C-vine One-step514.086.056.436.81512.966.396.416.44 C-vine Two-step479.665.876.006.12483.926.056.056.05Lower values, indicating better performance, are highlighted in bold. With ∗∗\ast \ast we denote the scenarios in which there is an improvement through the second step and with ∗\ast we denote scenarios in which the models perform similar.The proposed two-step algorithms, as compared to the one-step algorithms, are computationally more intensive. We present the averaged computation time over R=100R=100replications on 100 paralleled cores (Xeon Gold 6140 CPUSs@2.6 GHz) in settings (g) and (h) where p>Ntrainp\gt {N}_{{\rm{train}}}, for the one step ahead and the two-step ahead approach. The high-dimensional settings have similar computational times since the computational intensity depends on the number of pair copula estimations and the number of candidates, which are the same for settings (g) and (h). Hence, we only report the averaged computational times for settings (g) and (h). The average computation time in minutes for the one-step ahead (C- and D-vine) approach is 83.01, in contrast to 200.28 by the two-step ahead (C- and D-vine) approach. With the variable reduction from Section 4.1.1, the two-step algorithms double the time consumption of the one-step algorithms in exchange for prediction accuracy.6Real data examplesWe test the proposed methods on two real data sets, i.e. the Concrete data set from Yeh [64] corresponding to p≤Np\le N, and the riboflavin data set from Bühlmann and van de Geer [10] corresponding to p>Np\gt N. For both, performance of the four competitive algorithms is evaluated by the averaged check loss defined in Eq. (5) at 5, 50, and 95% quantile levels, and the 95% prediction interval score defined in Eq. (6), by randomly splitting the data set into training and evaluation sets 100 times.6.1Concrete data setThe concrete data set was initially used in Yeh [64], and is available at the UCI Machine Learning Repository [18]. The data set has in total 1,030 samples. Our objective is quantile predictions of the concrete compressive strength, which is a highly nonlinear function of age and ingredients. The predictors are age (AgeDay, counted in days) and seven physical measurements of the concrete ingredients (given in kg in a m3{{\rm{m}}}^{3}mixture): cement (CementComp), blast furnace slag (BlastFur), fly ash (FlyAsh), water (WaterComp), superplasticizer, coarse aggregate (CoarseAggre), and fine aggregate (FineAggre). We randomly split the data set into a training set with 830 samples and an evaluation set with 200 samples; the random splitting is repeated 100 times. Performance of the proposed C- and D-vine two-step ahead quantile regression, compared to the C- and D-vine one-step ahead quantile regression, is evaluated by several measurements reported in Table 6 after 100 repetitions of fitting the models. It is not unexpected that the results of the four algorithms are more distinct than most simulation settings, given the small number of predictors. However, there is an improvement in the performance of the two-step ahead approach compared to the one-step ahead approach for both C- and D-vine-based models. Also, the C-vine model seems more appropriate for modeling the dependency structure in the data set. Finally, out of all models, the C-vine two-step ahead algorithm is the best performing algorithm in terms of out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}on the Concrete data set, as seen in Table 6.In Figure 4, the marginal effect plots based on the fitted quantiles, from the C-vine two-step model, for the three most influential predictors are given. The marginal effect of a predictor is its expected impact on the quantile estimator, where the expectation is taken over all other predictors. This is estimated using all fitted conditional quantiles and smoothed over the predictors considered.Table 6Concrete data set: out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}D-vine One-step1032.3210.752.7610.52D-vine Two-step987.1010.542.789.82C-vine One-step976.7510.652.709.45C-vine Two-step967.0010.522.649.45The best performing model is highlighted in bold.Figure 4Marginal effect plots for the three most influential predictors on the concrete compressive strength for α\alpha values of 0.05 (red colour), 0.5 (green colour), and 0.95 (blue colour).6.2Riboflavin data setThe riboflavin data set, available in the R package hdi, aims at quantile predictions of the log-transformed production rate of Bacillus subtilis using log-transformed expression levels of 4,088 genes. To reduce the computational burden, we perform a pre-selection of the top 100 genes with the highest variance [10], resulting in a subset with p=100p=100log-transformed gene expressions and N=71N=71samples. Random splitting of the subset into training set with 61 samples and evaluation set with 10 samples, is repeated for 100 times. For the C- and D-vine two-step ahead quantile regression the number of candidates is set to k=10k=10. Additionally, to further reduce the computational burden the additional variable selection from Section 4.1.1 is applied with the chosen subset containing 25% of all possible choices, where 15% are predictors having the highest partial correlation with the log-transformed Bacillus subtilis production rate and the remaining 10% are chosen randomly from the remaining predictors. Performance of competitive quantile regression models is reported in Table 7, where we see that the proposed C-vine two-step ahead quantile regression is the best performing model and outperforms both the D-vine one-step ahead quantile regression from Kraus and Czado [41] and the C-vine one-step ahead quantile regression to a large extent. Furthermore, the second best performing method is the D-vine two-step ahead model which, while performing slightly worse than the C-vine two-step ahead model, also significantly outperforms both the C-vine and D-vine one-step ahead models. Since the predictors entering the C- and D-vine models yield a descending order of the predictors contributing to maximizing the conditional log-likelihood, the order indicates the influence of the predictors to the response variable. It is often of practical interest to know which gene expressions are of the highest importance for prediction. Since we repeat the random splitting of the subset for R=100R=100times, the importance of the gene expressions is ranked sequentially by choosing the one with the highest frequency of each element in the order excluding the gene expressions chosen in the previous steps. For instance, the most important gene expression is chosen as the one most frequently ranked first; the second most important gene is chosen as the one most frequently chosen as the second element in the order, excluding the most important gene selected in the previous step. The top ten most influential gene expressions using the C- and D-vine one- or two-step ahead models are recorded in Table 8. Figure 5 shows the marginal effect plots based on the fitted quantiles, from the C-vine two-step model, for the ten most influential predictors on the log-transformed Bacillus subtilis production rate.Table 7Out-of-sample predictions IS^0.5{\widehat{\text{IS}}}_{0.5}, CL^0.05{\widehat{\text{CL}}}_{0.05}, CL^0.5{\widehat{\text{CL}}}_{0.5}, CL^0.95{\widehat{\text{CL}}}_{0.95}ModelIS^0.05{\widehat{\text{IS}}}_{0.05}CL^0.05{\widehat{\text{CL}}}_{0.05}CL^0.5{\widehat{\text{CL}}}_{0.5}CL^0.95{\widehat{\text{CL}}}_{0.95}D-vine One-step33.830.440.420.41D-vine Two-step30.570.440.380.33C-vine One-step34.520.490.430.38C-vine Two-step28.590.410.360.30The best performing model is highlighted in bold.Table 8Ten most influential gene expressions on the conditional quantile function, ranked based on their position in the orderModel/Position12345678910D-vine One-stepGGTYCICMTARPSEYVAKTHIKANSBSPOVBYVZBYQJBD-vine Two-stepMTARPSETHIKYMFEYCICsigMPGMYACCYVQFYKPBC-vine One-stepGGTYCICMTARPSEHITBFMBABPHRCYBAEPGMYHEFC-vine Two-stepMTARPSETHIKYCICYURUPGMsigMYACCYKRMASNBFigure 5Marginal effect plots for the ten most influential predictors on the log-transformed Bacillus subtilis production rate for α=0.5\alpha =0.5.7Summary and discussionIn this article, we introduce a two-step ahead forward selection algorithm for nonparametric C- and D-vine copula-based quantile regression. Inclusion of future information, obtained through considering the next tree in the two-step ahead algorithm, yields a significantly less greedy sequential selection procedure in comparison to the already existing one-step ahead algorithm for D-vine-based quantile regression in Kraus and Czado [41]. We extend the vine-based quantile regression framework to include C-vine copulas, providing an additional choice for the dependence structure. Furthermore, for the first time, nonparametric bivariate copulas are used to construct vine copula-based quantile regression models. The nonparametric estimation overcomes the problem of possible family misspecification in the parametric estimation of bivariate copulas and allows for even more flexibility in dependence estimation. Additionally, under mild regularity conditions, the nonparametric conditional quantile estimator is shown to be consistent.The extensive simulation study, including several different settings and data sets with different dimensions, strengths of dependence, and tail dependencies, shows that the two-step ahead algorithm outperforms the one-step ahead algorithm in most scenarios. The results for the Concrete and Riboflavin data sets are especially interesting, as the C-vine two-step ahead algorithm has a significant improvement compared to the other algorithms. These findings provide strong evidence for the need of modeling the dependence structure following a C-vine copula. In addition, the two-step ahead algorithm allows controlling the computational intensity independently of the data dimensions through the number of candidate predictors and the additional variable selection discussed in Section 5. Thus, fitting vine-based quantile regression models in high dimensions becomes feasible. As seen in several simulation settings, there is a significant gain by introducing additional dependence structures other than the D-vine-based quantile regression. A further research area is developing similar forward selection algorithms for R-vine tree structures while optimizing the conditional log-likelihood.At each step of the vine building stage, we compare equal-sized models with the same number of variables. The conditional log-likelihood is suited for such a comparison. Other questions might come in handy, such as choosing between C-vine, D-vine, or R-vine information criteria. When maximum likelihood estimation is employed at all stages, the selection criteria of Akaike (AIC) [1], the Bayesian information criterion (BIC) [55], and the focused information criterion (FIC) [16] might be used immediately. Ko et al. [33] studied FIC and AIC specifically for the selection of parametric copulas. The copula information criterion in the spirit of the Akaike information criterion by Gronneberg and Hjort [26] can be used for selection among copula models with empirically estimated margins, while Ko and Hjort [32] studied such a criterion for parametric copula models. We plan a deeper investigation of the use of information criteria for nonparametrically estimated copulas and for vines in particular. Such a study is beyond the scope of this article but could be interesting to study stopping criteria too for building vines.Nonparametrically estimated vines are offering considerable flexibility. Their parametric counterparts, on the other hand, are enjoying simplicity. An interesting route for further research is to combine parametric and nonparametric components in the construction of the vines in an efficient way to bring the most benefit, which should be made tangible through some criteria such that guidance can be provided about which components should be modelled nonparametrically and which others are best modelled parametrically. For some types of models, such choice between a parametric and a nonparametric model has been investigated by Jullum and Hjort [31] via the focused information criterion. This and alternative methods taking the effective degrees of freedom into account are worth further investigating for vine copula models.

Journal

Dependence Modelingde Gruyter

Published: Jan 1, 2022

Keywords: vine copulas; conditional quantile function; nonparametric pair-copulas; 62H05; 62G08; 62G05

There are no references for this article.