Estimating Systematic Risk under Extremely Adverse Market Conditions

Estimating Systematic Risk under Extremely Adverse Market Conditions Abstract This paper considers the problem of estimating a linear model between two heavy-tailed variables if the explanatory variable has an extremely low (or high) value. We propose an estimator for the model coefficient by exploiting the tail dependence between the two variables and prove its asymptotic properties. Simulations show that our estimation method yields a lower mean-squared error than regressions conditional on tail observations. In an empirical application, we illustrate the better performance of our approach relative to the conditional regression approach in projecting the losses of industry-specific stock portfolios in the event of a market crash. In financial management, the risk of stock portfolios is often assessed by estimating their return sensitivity to key risk factors. The coefficient in a single-factor model, the “market beta” is commonly given a prominent position in such an assessment. Nevertheless, there is wide consensus that the relationship between asset returns and market risk depends on market conditions. For example, equity returns exhibit stronger correlation during volatile periods, especially in the case of extreme market downturns; see, for example, King and Wadhwani (1990), Longin and Solnik (1995, 2001), and Ang and Chen (2002). Thus, risk managers who are concerned about possible extreme losses in distress events may need to analyze systematic risk only under extremely adverse market conditions. In this paper, we develop a new method for evaluating the consequences of such extreme events. The goal of this paper is to estimate a linear model between two heavy-tailed variables conditional on the explanatory variable having an extremely low (or high) value. Consider the following model on the relation between two continuous random variables X and Y, conditional upon an extremely low value of X Y=βTX+ɛ,  for  X<Qx(p¯), (1) where p¯ denotes a very small probability, ɛ is the error term that is assumed to be independent of the X under the condition X<Qx(p¯), and where Qx(p¯) denotes the quantile function of X defined as Qx(p¯)=sup⁡{c:Pr(X≤c)≤p¯}. Since the relation holds for extremely low values of X only, we use an index T to distinguish the coefficient in βT from the coefficient in a global linear model. We intend to estimate βT using observations on (X, Y). With Y and X as the returns of, respectively, a stock portfolio and the market portfolio, the coefficient βT is regarded as a measure of systematic risk under extremely adverse market conditions. Estimating βT can be useful to assess the extreme loss on the stock portfolio in the event of a market crash. The coefficient βT in the linear tail model in Equation (1) can be regarded as a regression coefficient. Consequently, a direct approach to estimating βT is to apply a “conditional regression,” that is, to estimate a least squares regression coefficient based on observations corresponding to extremely low values of X only. This method has been used, for example, to evaluate βTs of financial returns on commodities, currencies (Atanasov and Nitschka, 2014; Lettau, Maggiori, and Weber, 2014), stocks (Post and Versijp, 2007), and active trading strategies (Mitchell and Pulvino, 2001). Two potential drawbacks of this conditional regression approach apply. First, because the conditional regression is based on a small number of observations, the approach may potentially produce a relatively large variance of the estimator. Second, when applied to financial market data, the heavy-tailedness of financial returns may further increase the estimation error. We propose an alternative estimator for βT by exploiting the tail dependence imposed by the heavy-tailedness of X and Y. Under mild conditions, we show that the proposed estimator possesses consistency and asymptotic normality. Simulations show that our estimation method yields a lower mean-squared error than estimating conditional regressions on tail observations. In an empirical application, we illustrate the better performance of our approach relative to the conditional regression approach in projecting the losses of industry-specific portfolios in major stock market crashes over the past 80 years. However, the method can also be applied to other variables known to be heavy-tailed, such as exchange rates, trading volume, insurance claims, or the severity of natural disasters. Theoretically, our estimator has a structure similar to a regression coefficient. The estimator in a standard univariate regression analysis consists of a dependence measure given by the correlation, and the marginal risk measures given by the standard deviations. In the estimator of βT, the dependence measure is replaced by a tail dependence measure and the marginal risk measures are replaced by quantiles obtained from tail observations. Our study builds on the literature on multivariate extreme value theory (EVT) that provides measures to evaluate the dependence among extreme observations; see, for example, Embrechts, De Haan, and Huang (2000) and De Haan and Ferreira (2006, Chapters 6 and 7). Poon, Rockinger, and Tawn (2004) and Hartmann, Straetmans, and de Vries (2004) have applied such tail dependence measures to analyze linkages between asset returns during crises. Bollerslev, Todorov, and Li (2013) analyze the tail dependence between jumps in stock returns using high-frequency data. Zhang, Li, and Wang (2013) develop a model to estimate the level of tail dependence between two variables Y and X as a function of other explanatory variables Z. While their model helps to predict the probability that Y is extremely low given that X is extremely low, the model cannot be directly applied to solve the problem of predicting the expected level of Y given an extremely low observed value of X. Some studies focus on deriving the level of tail dependence in global linear models. Malevergne and Sornette (2004) derive the level of tail dependence, assuming a global linear relation with a single risk factor. Others study tail dependence assuming global linear relations with multiple risk factors; see, for example, De Vries (2005) and Hartmann, Straetmans, and de Vries (2010). In contrast, our approach focuses on estimating βT by exploiting the tail dependence structure while assuming the linear model only in the tail. Our study should be distinguished from the literature on estimating global linear regression models in the presence of heavy tails. Mikosch and De Vries (2013) show that the finite sample distribution of a regression coefficient is heavy-tailed if the error term follows a heavy-tailed distribution. This may be improved by applying least (tail-)trimmed squares, which ensures asymptotic normality even if the error terms have infinite variance; see Rousseeuw (1985) and Hill (2013). Our study differs from this existing literature in the sense that our purpose is to estimate the linear relation only for extremely low X, rather than estimating a global linear relation. Similarly, our study is distinct from quantile regressions; see, for example, Koenker and Bassett (1978) and Chernozhukov and Fernández-Val (2011). Quantile regression analysis seeks to estimate the quantile of the conditional distribution of Y as a function of the observed value of X. In contrast, we focus on predicting the expectation of Y using a linear relation with X conditional on an extreme value of X. Finally, our study should also be differentiated from continuous time models in which the dependent variable reacts differently to diffusive and jump components in the independent variable; see Todorov and Bollerslev (2010) for defining such a model, and Li, Todorov, and Tauchen (2017) for an efficient estimator of the coefficient on the jump component. The linear coefficient on the jump component is different from βT in discrete time. Given an extremely low realization of the independent variable, this realization could potentially be attributed mainly to the jump component but, nevertheless, also involves a diffusive component. Therefore, βT is a combination of the two linear coefficients on the diffusive and jump components, albeit mainly loaded on the latter. Different from Todorov and Bollerslev (2010), the linear tail model in Equation (1) does not require a symmetric relation to extremely low or high values of the independent variable: By applying our methodology to—Y and—X, one obtains an estimate of the (potentially different) relation when X has an extremely high value. The present study proposes the methodology to estimate βT based on EVT and demonstrates its validity by deriving the asymptotic properties of the estimator and performing simulations. Beyond the scope of the current paper, the level of βT can be applied in various contexts. Firstly, it may have asset pricing implications as a measure of systematic tail risk. In Van Oordt and Zhou (2016), we apply the methodology developed in the present study to estimate the level of βT of individual stocks and test whether stocks with a higher βTs earn a higher risk premium. In this asset pricing study, we show that historical estimates of βT contain information about the future performance of stocks in market crashes over and above the information captured by regular market betas. Nevertheless, we find no evidence of a positive risk premium in the cross-section of expected returns. Moreover, the methodology can be applied to measure the sensitivity of banks to large shocks in the financial system. An application in this direction can be found in Van Oordt and Zhou (2014). Other potential applications include other return processes that potentially exhibit nonlinear relations with the market or other risk factors, such as the returns of hedge funds, safe-haven currencies, safe-haven commodities, and portfolios with complex derivatives or options. The remainder of the paper is organized as follows. Section 1 describes our estimation method. Section 2 reports simulation results. Section 3 provides an empirical illustration of our approach in which we estimate the losses on industry-specific stock portfolios during market crashes. Section 4 concludes. 1 Methodology 1.1 Theory We start by assuming heavy-tailedness of X and Y. The definition of heavy-tailedness is as follows. The tail distributions of X and Y are heavy-tailed if they can be expressed as Pr(X<−u)=u−αxlx(u) and Pr(Y<−u)=u−αyly(u), (2) where lx and ly are slowly varying functions as u→∞. That is, limu→∞lx(tu)lx(u)=limu→∞ly(tu)ly(u)=1, for any t > 0. Parameters αx and αy are called the tail indices. Equivalently, the heavy-tailedness of X can be expressed as limu→∞Pr(X<−ux)Pr(X<−u)=x−αx, for any fixed x > 0. We assume the usual second-order condition (see, e.g, De Haan and Stadtmüller, 1996), which quantifies the speed of convergence in this relation as limu→∞Pr(X<−ux)Pr(X<−u)−x−αxη(u)=x−αxx−γ−1−γ, (3) with an eventually positive or negative function η(u) such that η(u)→0 as u→∞, and γ>0.1 The idea behind our approach to estimating βT is as follows. The relation in Equation (1) is specified only for the region X<Qx(p¯), while the model specifies no assumptions on the relation for the region X≥Qx(p¯). The relation brings about a dependence structure between X and Y in the case of extremely low values of X, that is, if X<Qx(p¯). This structure determines the dependence between the left tails of the distributions of X and Y. Our approach relies on analyzing this tail dependence structure to infer the level of βT. We consider the following tail dependence measure from multivariate EVT, τ:=limp→0τ(p):=limp→0 1pPr(Y<Qy(p),X<Qx(p)), (4) where Qy(p) denotes the quantile function of Y defined as Qy(p)=sup⁡{c:Pr(Y≤c)≤p}.2 The tail dependence measure can be rewritten as τ=limp→0Pr(Y<Qy(p)|X<Qx(p)), which is the probability of observing an extremely low value of Y conditional on an extremely low value of X. Since it is the limit of a conditional probability, the τ-measure is by definition bounded by 0≤τ≤1. The case τ = 0 is regarded as tail independence, while the case τ = 1 corresponds to complete tail dependence. Also, the tail dependence measure is invariant to positive linear transformations on X and Y. These features of the τ-measure indicate that its role in our approach will resemble that of a correlation coefficient, except that the τ-measure focuses on dependence in the tails only. The following theorem shows how the τ-measure relates to the coefficient βT in the linear tail model in Equation (1). Theorem 1 Under the linear tail model in Equation (1) and the heavy-tail set-up of the downside distributions in Equations (2) and (3), with αy>12αx and βT≥0, we have that limp→0(τ(p))1/αxQy(p)Qx(p)=βT. (5) Proof See the Appendix. Theorem 1 does not depend on assuming the heavy-tailedness of the unobservable error term ɛ. The theorem holds also if ɛ exhibits a thin-tailed distribution, such as the normal distribution. The condition αy>12αx basically requires Y not to be “too heavy-tailed” in comparison with X. The intuition is that, otherwise, the error terms ɛ would have a much heavier tail than X. The impact of extreme realizations of ɛ on extreme realizations of Y would overshadow the impact of the relation between X and Y. As a consequence, it is not possible to infer the level of βT. Nevertheless, this condition is not very restrictive in the context of stock market returns. For example, if X represents the returns on a general market index with an αx of 4 (see, e.g., Jansen and De Vries, 1991), the condition is satisfied if the firm’s stock returns Y have finite variance. Moreover, conditional upon a sufficiently low αx, Theorem 1 also holds if Y has infinite variance or mean. The relation in Theorem 1 provides the basis for the estimation of coefficient βT. Consider independent and identically distributed (i.i.d.) observations (X1,Y1),…,(Xn,Yn) with the i.i.d. unobserved error terms ɛ1,…,ɛn. Later we will also consider the presence of temporal dependence. To estimate βT, we estimate each component in Equation (5). As in usual extreme value analysis, we mimic the limit procedure p→0 by considering only the lowest k observations in the tail region, such that k:=k(n)→∞ and k/n→0 as n→∞. In other words, for statistical estimation, the probability p is set at some low level p=k/n. Hence, we obtain the estimator of βT as β^T:=τ^(k/n)1/α^xQ^y(k/n)Q^x(k/n). (6) We remark that the estimator β^T in Equation (6) shows similarities with a standard regression analysis. Considering a standard linear regression between random variables U and V, the estimator of the slope coefficient is ρ^σ^u/σ^v, where ρ^ is the correlation coefficient between U and V, and where σ^u and σ^v are the standard deviations of U and V, respectively. Similarly, the estimator β^T consists of the tail dependence measure τ^, and two tail risk measures, that is, the tail quantiles of X and Y. In addition, it combines these components in a similar way as in standard regression analysis. 1.2 Estimation For our procedure, we rely on relatively simple and widely used estimators to obtain estimates of each of the components in Equation (6). These estimators rely exclusively on observations far in the tail of the distributions of X and Y. Nevertheless, the development of better estimators for the building blocks in Equation (6) has been the subject of an extant literature. Hence, our procedure to estimate βT via Equation (6) may well stand to be further improved by choosing other estimators for the components. Throughout this paper, we will refer to our estimation of βT with the estimators of the components below as the EVT approach. The estimate of the tail index αx is obtained from the k1 lowest observations of X with the estimator proposed in Hill (1975). Here, k1 is another intermediate sequence such that k1:=k1(n)→∞ and k1/n→0 as n→∞. Suppose the observations of (X, Y) are (X1,Y1),…,(Xn,Yn). By ranking the observations of Xt as Xn,1≤Xn,2≤⋯≤Xn,n, the Hill estimator is defined as 1α^x:=1k1∑i=1k1log⁡(Xn,iXn,k1+1). (7) For the τ-measure, multivariate EVT provides a nonparametric estimate; see Embrechts, De Haan, and Huang (2000). The estimator is given as τ^(k/n):=1k∑t=1n1{Yt<Yn,k+1,Xt<Xn,k+1}, (8) where Yn,k+1 is the (k+1)-th lowest order statistic of Yt. Finally, the quantiles of X and Y at the probability level k/n, the Q^x(k/n) and Q^y(k/n), are estimated by their (k+1)-th lowest order statistics, that is, Xn,k+1 and Yn,k+1. Notice that, in Equation (5), the same tail probability p appears in the term τ(p) and the two quantiles Qy(p) and Qx(p). Correspondingly for the estimators of the τ-measure and the quantiles of X and Y, the same intermediate sequence k is used. Differently, there is no theoretical restriction such that k = k1, though this will be the most complicated case when dealing with the asymptotic normality below. In general, the estimator of βT via Equation (6) inherits its consistency and asymptotic normality from the consistency and asymptotic normality of the estimators of its subcomponents. In addition, the estimator of βT is even consistent if limp→0τ(p)=0, even though the statistical properties of the estimator of the τ-measure are less known in this case. To prove the consistency of β^T also for the case limp→0τ(p)=0, we require some additional conditions to ensure the asymptotic normality of the Hill estimator. These conditions are as follows. First, we require a condition on k1 ensuring that k1 is not too high: limn→∞k1η(−Qx(k1n))=λ<∞. (9) Conditions (3) and (9) are usually assumed to obtain the asymptotic normality of the Hill estimator; see, e.g., De Haan and Ferreira (2006), conditions (3.2.5) and (3.2.6). Second, an additional restriction ensures that k1 is not too low. As n→∞, k1log⁡n→+∞. (10) The following theorem states the consistency of β^T. Theorem 2 Assume that the conditions in Theorem 1 hold and k→∞,k1→∞,kn→0,k1n→0, as n→∞. In addition, only if limp→0τ(p)=0, do we further assume Conditions (9) and (10). Then we have that as n→∞, β^T→PβT. Next, we deal with asymptotic normality. For that purpose, we assume the second-order condition for the distribution of Y and the joint distribution (X, Y). This is in line with usual asymptotic normality result in multivariate extreme value statistics; see, for example, Einmahl, de Haan, and Li (2006). First, assume that limu→∞Pr(Y<−uy)Pr(Y<−u)−y−αyη′(u)=y−αyy−γ′−1−γ′, (11) with an eventually positive or negative function η′(u) such that η′(u)→0 as u→∞, and γ′<0. Second, with denoting the distribution function of X and Y as Fx and Fy, we define R(x,y,p):=1pPr(Fx(X)<px,Fy(Y)<py). For the dependence structure, we assume that, R(x,y,p)→R(x,y) as p→0 for some positive function R(x, y), with a speed of convergence as follows: there exists a θ>0 for which, as p→0, R(x,y,p)−R(x,y)=O(pθ), (12) for all (x,y)∈[0,1]2/{(0,0)}.3 The following theorem states the asymptotic normality of β^T. Theorem 3 Assume that the conditions in Theorem 1 hold. Suppose limp→0τ(p)=τ∈(0,1) and Pr(Y>u)=O(Pr(Y<−u)) as u→∞. Further assume that the second-order conditions (11) and (12) hold. Suppose k1=k=O(nζ), where ζ<min⁡(2θ/(1+2θ),2γ/(2γ+αx),2γ′/(2γ′+αy),3/(αy+2)).We then have that, as n→∞, k(β^T−βT)→dN(0,(βT)2αx2(1τ−1−(log⁡τ)2)). In this theorem we choose k = k1 because this is what we use in the simulation and empirical illustration. We note, however, that this choice is not necessarily the only one. One may also choose k and k1 such that, as n→∞, k/k1 converges to zero, a finite positive number or infinity. If k/k1→0 as n→∞, the asymptotic limit of α^x will play a dominant role in that of β^T. Conversely, if k/k1→∞ as n→∞, the asymptotic limit of τ^(k/n) and the two quantiles will play a dominant role in that of β^T. Therefore, the asymptotic normality result turns out to be simpler in these two cases. If k/k1 converges to a finite value other than 1, the asymptotic normality of β^T can be derived in a similar way, with a slightly more complicated structure for the asymptotic variance. The consistency and asymptotic normality results are obtained when {(Xt,Yt)} forms an i.i.d. sample. In the context of stock returns, it is likely that {(Xt,Yt)} is a time series with temporal dependence. In general, under weak conditions, EVT analysis can be applied without modification to temporally dependent data; see Drees (2008) for a general discussion. More specifically, if {(Xt,Yt)} exhibits weak temporal dependence such as autocorrelation or GARCH-type volatility clustering, then the consistency will not be affected. This follows from the consistency results for each component in β^T. For the Hill estimator, see, for example, Hsing (1991); for the quantiles and the tail dependence measure, see, for example, Hill (2009). It is notable that temporal dependence does affect the asymptotic normality result in the sense that it may lead to a different structure of the asymptotic variance of the estimates. Therefore, in financial applications, it may be better to rely on a block bootstrap procedure to obtain adjusted standard errors. 2 Simulations We run two sets of simulations to compare the performance of the proposed procedure to estimate βT and the performance of a regression conditional on tail observations.4 In each set of simulations, the generated samples consist of 1,250 random observations for (Xt, Y t), which corresponds approximately to the length of our estimation window in the empirical exercise. The first set of simulations evaluates the estimation accuracy of the two approaches if the data generating process is in line with the linear tail model in Equation (1). In this set of simulations, the observations for Y are constructed by aggregating the simulated X and ɛ according to different global linear models and segmented linear models. In these simulations, we compare the estimated and true values for βT. The second set of simulations compares the predictive power of the two approaches when the data generating process does not follow a linear model in the tail. In these simulations, the observations of Y and X are drawn from different copula models. The purpose of these simulations is to verify which approach exhibits a better performance if the linear tail model is used as an approximation. The performance of the two approaches is compared by assessing their ability in predicting Yt from an extremely low Xt. 2.1 Linear Models In the first set of simulations, we consider three global linear models in which the relation is unaffected by the observation of X, that is, β=βT=0.5,1,1.5. Moreover, we consider two segmented linear models. If the observation of X is larger than the third percentile of X, then the observation of Y is generated from a linear model with β = 1.5 Otherwise, it is generated from a linear model with βT=0.5 and βT=1.5, respectively. Several data-generating processes are considered for X and ɛ. The Student’s t-distribution is known to be heavy-tailed with the tail index equal to the degrees of freedom. We perform simulations of X and ɛ based on random draws from a Student’s t-distribution with three, four, and five degrees of freedom, which implies X and ɛ are heavy-tailed with tail indices of three, four, and five, respectively. The choice of the parameter α is similar to the estimates in the empirical analysis. Moreover, we perform simulations where X and ɛ exhibit temporal dependence and are each generated from a GARCH(1,1) process, that is, Zt=σZ,tζt, where σZ,t2=ψ0+ψ1Zt−12+ψ2σZ,t−12, for Z=X,ɛ. The parameter choices for the simulation with normally distributed innovations ζt are (ψ0,ψ1,ψ2)=(0.5,0.11,0.88), which implies X and ɛ are heavy-tailed with a tail index of 3.68; see Sun and Zhou (2014, Table 3). The parameter choices in another simulation based on innovations ζt from a standardized Student’s t-distribution with eight degrees of freedom are (ψ0,ψ1,ψ2)=(0.5,0.08,0.91), which implies X and ɛ are heavy-tailed with a tail index of 3.82. For each of the five models and data-generating processes, we generate 10,000 samples and estimate βT in each sample, using both the conditional regression approach and the EVT approach. Then, by comparing the estimates with the real βT value, we calculate the mean squared error (MSE), the estimation bias, and the estimation variance for the two approaches. For brevity, we report the simulations based on the Student’s t-distribution with four degrees of freedom, because the pattern across the simulations is very similar. Figures 1 and 2 show the simulation results for different choices on the number of observations in the tail, k. The first column of Figures 1 and 2 compares the MSE between the EVT approach and the conditional regression. Under the heavy-tailed set-up, we observe a better performance with the EVT approach relative to the conditional regression, if βT is estimated based on a few observations in the tail, that is, for low levels of k. However, the conditional regression may perform better if more observations from the moderate level are included, that is, for high levels of k. Nevertheless, the MSE of the EVT approach is not very sensitive to including more observations from the moderate level. The second and third columns of Figures 1 and 2 show the decomposition of the MSE into squared bias and variance. We observe that the estimates from the conditional regression bear a larger variance, while the estimation error from the EVT approach is mainly due to positive bias.6 Figure 1. View largeDownload slide Simulations with a global linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a global linear model ( βT=β). The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 1. View largeDownload slide Simulations with a global linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a global linear model ( βT=β). The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 2. View largeDownload slide Simulations with a segmented linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a segmented linear model, where the slope equals βT if the value of Xt is below its third percentile (which occurs on expectation for 37.5 observations in each sample), and to β otherwise. The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 2. View largeDownload slide Simulations with a segmented linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a segmented linear model, where the slope equals βT if the value of Xt is below its third percentile (which occurs on expectation for 37.5 observations in each sample), and to β otherwise. The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. 2.2 Copula Models In the following simulations we compare the ability of the EVT approach and the conditional regression approach in predicting Yt from Xt when the data have a nonlinear structure. For each data-generating process, we use two Student’s t-distributions with four degrees of freedom for the marginal distributions of Y from X, but we use six different copulas for the dependence structure. Following the simulations in Støve, Tjøstheim, and Hufthammer (2014), these different copulas are a Clayton copula with parameter θ = 1, a Clayton copula with parameter θ = 2, a Gaussian copula with parameter ρ=0.3, a Gaussian copula with parameter ρ=0.8, a Gumbel copula with parameter θ = 2, and a Gumbel copula with parameter θ = 3. For each of these data-generating processes, we generate 10,000 samples. From each sample i, we hold out the observation (Yi,t,Xi,t) corresponding to the lowest Xi,t to generate an extreme out-of-sample observation. This observation is denoted as (Yi,t*,Xi,t*). Then we estimate βiT based on applying the conditional regression approach and the EVT approach to all other observations in that sample. The estimates are denoted as β^OLS,iT and β^EVT,iT, respectively. Following the approximation in the linear tail model, the expected value of Yi,t* is the product of the estimated coefficients βiT and the observed value of Xi,t*. We project Yi,t* from Xi,t* as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*. In this second set of simulations, we compare the performance of the conditional regression approach and the EVT approach based on the MSE of those projections. Figure 3 shows the results of this second set of simulations for different levels of k. For the Clayton copula, which exhibits lower tail dependence, the EVT approach clearly shows a better performance than the conditional regression approach for a wide change of choices for k. The EVT approach also outperforms the conditional regression approach for the Gaussian copula and the Gumbel copula, which are copulas that exhibit tail independence for the lower tails. If the dependence is weak for those copulas, then the EVT approach results in a smaller MSE than the conditional regression approach only for relatively low choices of k. However, if the dependence is stronger, then the better performance of the EVT approach relative to the conditional regression approach holds for a wider range of choices of k. Figure 3. View largeDownload slide Simulations based on copula models. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The marginal distributions of Y and X are Student’s t-distributions with four degrees of freedom in the simulations in each chart, but different copulas are used for the dependence structure as indicated in the subtitle of the individual charts. From each sample i, we hold out the observation corresponding to the lowest Xi,t. This observation is denoted as (Yi,t*,Xi,t*). Out-of-sample projections are calculated as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*, where β^EVT,iT and β^OLS,iT are estimated from all observation in sample i except (Yi,t*,Xi,t*). The MSE in each of the charts is calculated as m−1∑i(Y^i,t*−Yi,t*)2. Figure 3. View largeDownload slide Simulations based on copula models. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The marginal distributions of Y and X are Student’s t-distributions with four degrees of freedom in the simulations in each chart, but different copulas are used for the dependence structure as indicated in the subtitle of the individual charts. From each sample i, we hold out the observation corresponding to the lowest Xi,t. This observation is denoted as (Yi,t*,Xi,t*). Out-of-sample projections are calculated as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*, where β^EVT,iT and β^OLS,iT are estimated from all observation in sample i except (Yi,t*,Xi,t*). The MSE in each of the charts is calculated as m−1∑i(Y^i,t*−Yi,t*)2. The challenge with real data is that it is unknown below which threshold the linear tail model can be used as a good approximation, because the true data-generating process is unobserved. Therefore, it is difficult to justify the right choice of k. Clearly, if the data-generating process is a global linear model, it is better to use all observations in a global linear regression model, since this results in a smaller variance of the estimates. However, such a method would suffer from a bias if the underlying dependence structure in the tail deviates from the global linear model, as, for example, in the segmented linear model. This bias can be severe if the deviation is large. To mitigate this bias, one may rely on observations in the tail only. The simulations show that, when estimates are based on a small number of tail observations, it is generally better to rely on the EVT approach rather than on the conditional regression approach. Choosing a very small k results in a high estimation variance. To strike a balance between bias and variance, a common approach in extreme value analysis is to calculate the estimator for various levels of k, and then choose a low level of k in a range of k that is associated with a relatively stable level of the estimate; see, for example, Drees, De Haan, and Resnick (2000). Moreover, in empirical applications, it is advisable to verify whether the conclusions are robust for various levels of k; see, for example, Loretan and Phillips (1994). 3 Illustration We compare the performance of the EVT approach and the conditional regression approach in an empirical illustration. We employ data on value-weighted returns of 48 industry-specific stock portfolios and a general market index in the United States. The return series run from 1931 until 2010.7 We divide the data into 16 five-year subperiods. We assess the performance of both approaches in projecting the losses of industry portfolios on the day of the largest market loss within each subperiod. Within each five-year period, we estimate the coefficient βjT in the linear tail model with the returns on industry portfolio j as the dependent variable and the excess market returns as the right-hand-side variable. In the estimation procedure, we exclude the day on which the market portfolio suffered its largest loss in order to obtain an “out of sample” estimate for the subsequent comparison. The coefficients are estimated with both the conditional regression approach and the EVT approach. The number of observations in each subperiod is on average 1315 and we estimate the coefficient βjT with k = 25, or, k/n≈2%. We denote these estimates as β^OLS,jT and β^EVT,jT, respectively. In line with the condition αy>12αx in Theorem 1, we exclude portfolios with α^j≤12α^m in each subperiod.8 In most subperiods, no portfolios are excluded. We denote the number of portfolios excluded for this reason by S. In most subperiods, S = 0. Table 1 reports the average β^EVT,jT of the remaining portfolios for each subperiod (denoted as N). Table 1 also reports the minimum and maximum β^EVT,jT and the corresponding industry name to give an indication of the range of the β^EVT,jTs. For each subperiod, the average estimate from the EVT approach is slightly above 1. Most β^EVT,jTs fall in the range between 0.5 and 2.0. These estimates imply that most portfolios are expected to lose between half and twice as much as the market portfolio in a market crash. Table 1. Estimates Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Notes: Within each period, we estimate coefficient βT in Equation (1) for the industry portfolios with non-missing observations using the EVT approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns from that period while excluding the observation on the day the market suffered its largest loss. The column labeled “Av. β^EVT,jT” reports the average of β^EVT,jT of N portfolios. The column labeled “S” reports the number of (excluded) portfolios with α^j≤12α^m. The last columns report the minimum and maximum β^EVT,jT, and the industry name from the data documentation. Table 1. Estimates Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Notes: Within each period, we estimate coefficient βT in Equation (1) for the industry portfolios with non-missing observations using the EVT approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns from that period while excluding the observation on the day the market suffered its largest loss. The column labeled “Av. β^EVT,jT” reports the average of β^EVT,jT of N portfolios. The column labeled “S” reports the number of (excluded) portfolios with α^j≤12α^m. The last columns report the minimum and maximum β^EVT,jT, and the industry name from the data documentation. Based on the βjT estimates, we make a projection of the losses on each portfolio j on the day that the market suffered its largest loss. For each subperiod, we report the largest loss on the market portfolio, defined as Lm=−min⁡{Rm,1e,…,Rm,te}, and the corresponding date in Table 2. We denote the actual loss on a specific industry portfolio on that day as Lj=−Rj,t*e, where t* refers to the day of the largest loss on the market portfolio. Following the linear tail model, the projections under the two approaches are L^EVT,j=Lmβ^EVT,jT and L^OLS,j=Lmβ^OLS,jT, respectively.9 We compare the performance of the two approaches by their root mean-squared error (RMSE) calculated as N−1∑jeEVT,j2 and N−1∑jeOLS,j2, where eEVT,j=Lj−L^EVT,j and eOLS,j=Lj−L^OLS,j. The best-performing method should report a lower RMSE. Table 2. Performance evaluation Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Notes: Within each period, we estimate the coefficient βT in Equation (1) for 48 industry portfolios using the EVT approach and the conditional regression approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns during that period while excluding the observation on the day the market suffered its largest loss. Within each period, we project the loss on the day of the largest market loss for each industry portfolio j as the product of βjT and the actual market loss. From the difference between the projected losses on the industry portfolios and the actual losses, we calculate the root mean-squared error (RMSE) for each approach. The last columns report the t-statistics calculated from Equation (13) and the corresponding p-values for testing against the null hypothesis that the EVT approach produces a higher RMSE than the conditional regression (OLS) approach. Table 2. Performance evaluation Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Notes: Within each period, we estimate the coefficient βT in Equation (1) for 48 industry portfolios using the EVT approach and the conditional regression approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns during that period while excluding the observation on the day the market suffered its largest loss. Within each period, we project the loss on the day of the largest market loss for each industry portfolio j as the product of βjT and the actual market loss. From the difference between the projected losses on the industry portfolios and the actual losses, we calculate the root mean-squared error (RMSE) for each approach. The last columns report the t-statistics calculated from Equation (13) and the corresponding p-values for testing against the null hypothesis that the EVT approach produces a higher RMSE than the conditional regression (OLS) approach. Moreover, to provide a formal assessment on whether the differences in RMSE within each subperiod are statistically significant or due to sampling variability, we implement a Diebold and Mariano (1995) type test with the small-sample correction proposed by Harvey, Leybourne, and Newbold (1997). In particular, with the paired differences dj=eEVT,j2−eOLS,j2, we calculate the following test statistic for each subperiod t-stat=d¯V^(d¯)/(N−1), (13) where d¯=N−1∑jdj, and V^(d¯)=∑j(dj−d¯)2N. Following Harvey, Leybourne, and Newbold (1997), the t-statistic in Equation (13) can be compared against the critical values of the Student’s t-distribution with degrees of freedom (N−1).10 This test relies on assuming that the differences dj follow a normal distribution. Although the error terms eEVT,j and eOLS,j may inherit heavy tails from the distribution of stock returns, Harvey, Leybourne, and Newbold (1997) affirm that the test results are in general not very sensitive to the presence of heavy tails. Table 2 reports the RMSE of the projected portfolio losses for both approaches in each subperiod. In all subperiods, the EVT approach reports a lower RMSE than the conditional regression approach.11 The average reduction in the RMSE is approximately 40%. We report t-statistics calculated from Equation (13) and the corresponding p-values in the last two columns to test against the null hypothesis of a larger RMSE from the EVT approach. In 11 out of 16 subperiods, the null is rejected at the 5% significance level. We zoom in on two subperiods with relatively big and relatively small reductions in the RMSE. A substantial improvement in the projections in terms of RMSE is during the stock market plunge of 8.1% on October 18, 1937. This stock market crash occurred during a period of high uncertainty about the US economy: During the 9-month economic decline from September 1937 to June 1938, national income fell by 12% and firm profits fell by 78%; see, for example, Roose (1948). The errors in the projected losses with the EVT approach are significantly smaller at the 1% significance level. The improvement in the projections is limited for the stock market plunge of 5.0% on September 17, 2001. This was the day that the NYSE resumed trading after the terrorist attacks 6 days earlier. For this stock market crash, the RMSE of the regression approach is 5.76, while the RMSE for the EVT approach is 5.18 (no significant difference). A relatively high proportion of the forecasting errors on this day are due to the reaction of a few industries because of the nature of this particular event. The industry portfolios “Defense” and “Shipbuilding” would usually move in the same direction as the stock market index. However, during this crash, they reported a gain of 15% and 7%, respectively. In contrast, because of the nature of the terrorist attacks, the industry portfolios “Aircraft” and “Transportation” (which includes “Air transportation”) reacted much more strongly. These portfolios lost 18% and 14%, respectively. The linear tail model does not anticipate the profits and losses for these four portfolios in this case. If the projection errors for these four portfolios are excluded, the RMSE of the EVT approach and conditional approach decline substantially to a level of 4.40 and 3.48, respectively. Moreover, the difference between the MSE of the EVT approach and that of the conditional regression approach in this subperiod becomes (weakly) significant at the 10% level if these four portfolios are excluded. To summarize, the EVT approach shows a better overall performance than the conditional regression approach in projecting portfolio losses on the worst market day. We interpret this better performance as resulting from improved accuracy in estimating βT based on a small number of extreme observations. 3.1 Robustness Checks We perform several robustness checks on the performance of the EVT approach. First, the simulation results in Section 2 suggest that the performance of the EVT approach relative to the conditional regression approach is stronger if estimates are based on a small number of tail observations. To verify this in our empirical illustration, we vary the level of k over a range of values. With k fixed at 20, 30, 35, 40, 45, and 50, the null is rejected at the 5% significance level for 11, 11, 9, 8, 5, and 5 subperiods, respectively. This confirms that the better performance of the EVT approach relative to the conditional regression approach is stronger for lower levels of k. Second, we also estimate βT using a modified EVT approach where the tail index is estimated using the modified Hill estimator of Huisman et al. (2001) for small samples using 25% of the observations, while all other components are estimated with k = 25. When comparing this modified EVT approach with the conditional regression approach using k = 25, then we reject the null at the 5% significance level for nine subperiods. Third, we compare the EVT approach with an alternative benchmark that minimizes the mean absolute error to estimate the coefficient in the linear tail model. This is equivalent to estimating a quantile regression based on observations corresponding to the k = 25 lowest X to predict the median of Y given an extremely low level of X. With this benchmark, the null is rejected at the 5% significance level for 12 subperiods. To summarize, the better performance of the EVT approach is robust with respect to several methodological variations. We also compare the performance of the EVT approach with k = 25 with that of a global linear regression model using all daily observations in the subperiod (except the observation corresponding to the largest market loss). In this setting, the difference in performance not only depends on the properties of the estimators, but also on the severity of nonlinearities in the dependence structure of Y and X in the empirical application. Stronger nonlinearities increase the bias in the projections from the global linear regression model, which, in the absence of these nonlinearities, is expected to result in a lower RMSE. In the context of industry portfolios, the EVT approach does not perform better than a global linear model as it results in an RMSE that is empirically lower in only 6 out of the 16 subperiods. This suggests that the nonlinearities in the dependence structure between the market returns and the returns on industry portfolios are, in general, not sufficiently severe to warrant better projections from the EVT approach. Such nonlinearities can be expected to be more prevalent in the returns of currencies (Lettau, Maggiori, and Weber, 2014), safe-haven commodities (Baur and McDermott, 2010), hedge funds (Patton, 2009), portfolios with options-like payoffs (Fung and Hsieh, 2001), and even the returns of individual stocks (Van Oordt and Zhou, 2016). 4 Concluding Remarks In this paper, we propose an EVT approach for estimating βT in the linear tail model based on only a small number of extreme observations. Simulations show that our EVT approach yields a lower MSE than conditional regressions on tail observations. We demonstrate one application of the EVT approach: projecting large losses on industry portfolios in extremely adverse market conditions. The estimator of βT might be further improved by considering more sophisticated EVT techniques. For instance, the estimator on the tail index, α^x, suffers from an asymptotic bias issue. Many studies propose different bias corrected estimators; see, for example, Peng (1998), Feuerverger and Hall (1999), and Gomes, de Haan, and Rodrigues (2008). In addition, Fougères, de Haan, and Mercadier (2015) propose a bias-corrected estimator for the τ-measure. Whether the use of such sophisticated techniques will improve the current simple estimation procedure for βT offers an interesting perspective for future research. Appendix: Proofs Throughout the proof, we need the following lemma as an auxiliary result, which approximates the probability of joint extreme events by that of a marginal extreme event. Lemma 1 Suppose βT>0 and αy>12αx. Under the linear tail model in Equation (1) and the heavy-tailedness of the downside distributions in Equation (2), we have that limp→0Pr(Y<Qy(py),X<Qx(px))Pr(X<min⁡(Qy(py)βT,Qx(px)))=1, (14)uniformly for (x,y)∈(0,3/2]2. Proof of Lemma 1 Denote the two sets in the numerator and denominator of Equation (14) as C:={Y<Qy(py),X<Qx(px)}={βTX+ɛ<Qy(py),X<Qx(px)},C0:={X<min⁡(Qy(py)βT,Qx(px))}. The goal is to prove that Pr(C)∼Pr(C0) as p→0 uniformly for all (x,y)∈(0,3/2]2. To achieve this goal, we first find suitable upper and lower bounds for Pr(C) using set manipulations. Denote the following sets C1:={βTX<Qy(py)(1+δ),X<Qx(px),ɛ<−δQy(py)},C21:={βTX<Qy(py)(1−δ),X<Qx(px)},C22:={ɛ<δQy(py),X<Qx(px)}, with δ:=δ(p)>0 to be specified later. It is clear that, for any 0<δ<1,C1⊂C⊂C21∪C22. We prove the lemma by showing, for some proper choice of δ, the limit relations Pr(C1)Pr(C0)→1, Pr(C21)Pr(C0)→1  and  Pr(C22)Pr(C0)→0, (15) hold uniformly as p→0 for all (x,y)∈(0,3/2]2. We first prove the limit relation for Pr(C1)/Pr(C0). Since X and ɛ are independent, we have that Pr(C1)=Pr(X<min⁡(Qy(py)βT(1+δ(p)),Qx(px)))Pr(ɛ<−δ(p)Qy(py)). (16) From the heavy-tailed property of the distribution function of Y in Equation (2), we obtain that Qy(p)=−p−1/αyl˜y(p), with l˜y(p) denoting a slowly varying function as p→0. By taking δ(p)=pc with 0<c<1αy, we have that −δ(p)Qy(py)=pc−1/αyy1/αyl˜y(py)→+∞ as p→0, which ensures that the second term Pr(ɛ<−δ(p)Qy(py))→1 as p→0 holds uniformly for all 0<y≤3/2. Next, for the first term of Pr(C1) in Equation (16), we show that uniformly for all (x,y)∈(0,3/2]2, limp→0Pr(X<min⁡(Qy(py)βT(1+δ(p)),Qx(px)))Pr(X<min⁡(Qy(py)βT,Qx(px)))=1. (17) This relation follows directly from the following reasons. The denominator of Equation (17) provides an upper bound for the numerator. Moreover, the following inequality provides a lower bound for the numerator: (1+δ(p))min⁡(Qy(py)βT,Qx(px))≤min⁡(Qy(py)βT(1+δ(p)),Qx(px)). From the heavy-tailedness of the distribution of X, we get that, uniformly for all (x,y)∈(0,3/2]2, the lower bound satisfies that Pr(X<(1+δ(p))min⁡(Qy(py)βT,Qx(px)))Pr(X<min⁡(Qy(py)βT,Qx(px)))∼(1+δ(p))−αx→1, as p→0. Here, in the last step we use the fact that δ(p)=pc→0 as p→0. Equation (17) is thus proved. Combining the results regarding the two terms of Pr(C1) in Equation (16), we get that, as p→0, Pr(C1)Pr(C0)→1 uniformly for all (x,y)∈(0,3/2]2. The proof of the limit result that Pr(C21)Pr(C0)→1 as p→0 holds uniformly for all (x,y)∈(0,3/2]2 follows similar lines. Finally, we deal with the limit relation for Pr(C22)Pr(C0). For this purpose, we first prove lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=0 for any given υ>0. Due to the independence between X and ɛ, we have Pr(C22)=Pr(ɛ<δQy(py))px. Moreover, we have the following inequality: Pr(ɛ<δ(p)Qy(py))Pr(βTX<−δ(p)2Qy(py)),=Pr(ɛ<δ(p)Qy(py),βTX<−δ(p)2Qy(py)),≤Pr(Y<δ(p)2Qy(py)). This implies that, for any given υ>0, lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=lim⁡sup⁡p→0Pr(ɛ<δ(p)Qy(py))p1−cαy−υx,≤lim⁡sup⁡p→0Pr(Y<δ(p)2Qy(py))p1−cαy−υPr(βTX<−δ(p)2Qy(py))x,=lim⁡sup⁡p→0(−δ(p)2Qy(py))−αyly(δ(p)2Qy(py))p1−cαy−υPr(βTX<−δ(p)2Qy(py))x,=lim⁡sup⁡p→0pυ2αyy(l˜y(py))−αyly(δ(p)2Qy(py))Pr(βTX<−δ(p)2Qy(py))x=0. Here, the last step follows from the facts that (l˜y(py))−αyly(δ(p)2Qy(py)) is a slowly varying function as p→0, Pr(βTX<−δ(p)2Qy(py))→1 as p→0, and cαy<1. Next, we show that for some small υ>0, limp→0Pr(C0)p2−cαy−υ=+∞. Since Pr(C0)=min⁡(Pr(X<Qy(py)βT),px)=min⁡(((py)−1/αyl˜y(py)βT)−αxlx(−Qy(py)βT),px)=min⁡(pαx/αy(y−1/αyl˜y(py)βT)−αxlx(−Qy(py)βT),px). In order to obtain limp→∞Pr(C0)p2−cαy−υ=+∞, we need to choose c and υ such that 2−cαy−υ>max⁡(αxαy,1). Since it is assumed that αy>12αx, this can be achieved by first choosing c<min⁡(1αy,1αy(2−αxαy)), which implies that 2−cαy>max⁡(αxαy,1), and then choosing a small υ>0. With these choices, limp→0Pr(C0)p2−cαy−υ=+∞. Together with lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=0, we get that limp→0Pr(C22)Pr(C0)=0. By proving all limit relations in Equation (15), we proved the lemma. Proof of Theorem 1 If βT=0, then τ(p)=p. From the heavy-tailed property of the distribution function of X and Y, we get that Qx(p)=−p−1/αxl˜x(p) and Qy(p)=−p−1/αyl˜y(p), where l˜x and l˜y are two slowly varying functions as p→0. Consequently, limp→0(τ(p))1/αxQy(p)Qx(p)=limp→0p2/αx−1/αyl˜y(p)l˜x(p)=0=βT, where in the last step we used the fact that the assumption αy>12αx implies 2/αx−1/αy>0 and the fact that l˜y(p)l˜x(p) is also a slowly varying function as p→0. If βT>0, the proof of the theorem relies on the following relation: liminf⁡p→0Qy(p)βTQx(p)≥1, (18) which we prove by contraction as follows. Suppose there would exist a sequence pn→0 as n→∞ such that, for all pn, Qy(pn)βTQx(pn)<1−δ for some δ>0. This would imply that pn=Pr(Y<Qy(pn))≥Pr(Y<(1−δ)βTQx(pn)). (19) For sufficiently large n, we have pn<p¯, which implies that the linear model in Equation (1) applies for sufficiently large n. Hence, we would have, for sufficiently large n, that Pr(Y<(1−δ)βTQx(pn))≥Pr(βTX<(1−δ2)βTQx(pn),ɛ<−δ2βTQx(pn)),=Pr(X<(1−δ2)Qx(pn))Pr(ɛ<−δ2βTQx(pn)), (20) where the last step is due to the independence between X and ɛ. Regarding the second term in Equation (20), notice that, as p→0, Pr(ɛ<−δ2βTQx(p))→1, since −δ2βTQx(p)→+∞. In other words, given any κ>0, we would have Pr(ɛ<−δ2βTQx(pn))>1−κ for sufficiently large n. Regarding the first term in Equation (20), notice that limp→0Pr(X<(1−δ2)Qx(p))Pr(X<Qx(p))=(1−δ2)−αx. Thus for any κ>0, for sufficiently large n, Pr(X<(1−δ2)Qx(pn))>pn(1−δ2)−αx(1−κ). Combining those results on the two terms in Equation (20) gives that, for any κ>0 and sufficiently large n, Pr(Y<(1−δ)βTQx(pn))>pn(1−δ2)−αx(1−κ)2. By choosing sufficiently small κ, we would have (1−δ2)−αx(1−κ)2>1, which contradicts with Equation (19). Therefore, we conclude that Equation (18) must hold true. Next, we turn to the proof of the theorem. By applying Lemma 1, with x=y=1, we obtain limp→0pτ(p)Pr(X<min⁡(Qy(p)βT,Qx(p)))=1. After substituting Pr(X<Qx(p)) for p, we need to handle the quotient Pr(X<min⁡(Qy(p)βT,Qx(p)))Pr(X<Qx(p)) as p→0. To handle this quotient we use Lemma 2.1 in Drees (1998): The second-order condition (3) implies that for any given δ>0, there exists a level of u0=u0(δ), such that |xαxPr(X<−ux)Pr(X<−u)−1η(u)|≤δx−αx+δ, for all u>u0 and ux>u0. Consequently, if x=x(u)>x0 for some given x0>0, we have that limu→∞xαxPr(X<−ux)Pr(X<−u)=1, (21) We apply the limit relation (21) by substituting ux and u by −min⁡(Qy(p)βT,Qx(p)) and −Qx(p), respectively. This is feasible because it follows from the inequality in Equation (18) that min⁡(Qy(p)βT,Qx(p))Qx(p)=max⁡(Qy(p)βTQx(p),1)≥1. With the planned substitution, we get that limp→0Pr(X<min⁡(Qy(p)βT,Qx(p)))Pr(X<Qx(p))·(max⁡(Qy(p)βTQx(p),1))αx=1, which implies that limp→0τ(p)·max⁡((Qy(p)βTQx(p))αx,1)=1. (22) The theorem follows directly from combining Equations (18) and (22). Proof of Theorem 2 Write β^T=(τ^(k/n)τ(k/n))1/α^x·(τ(k/n))1/α^x−1/αx·Q^y(k/n)Qy(k/n)·Qx(k/n)Q^x(k/n)·((τ(k/n))1/αxQy(k/n)Qx(k/n))=:I1·I2·I3·I4·I5. The classic consistency results in extreme value statistics ensures that α^x→Pαx, Q^x(k/n)Qx(k/n)→P1, and Q^y(k/n)Qy(k/n)→P1 as n→∞; see Theorem 3.2.2 and Corollary 4.3.9 in De Haan and Ferreira (2006). Hence, I3,I4→P1 as n→∞. Theorem 1 ensures that I5→βT as n→∞. Therefore, the only issues left to prove are I1,I2→P1 as n→∞. We first deal with I1, which is equivalent to prove the consistency of τ^(k/n). Denote τ˜(x,y)=1k∑t=1n1{Xt<Qx(knx) and Yt<Qy(kny)}, (23) for (x, y) in the neighborhood of (1, 1). Then, τ^(k/n) can be written as τ^(k/n)=τ˜(nkFx(Xn,k+1),nkFy(Yn,k+1)). Here, (nkFx(Xn,k+1),nkFy(Yn,k+1)) is in the neighborhood of (1, 1) in the following sense. According to Corollary 2.2.2 in De Haan and Ferreira (2006), as n→∞, k(nkFx(Xn,k+1)−1)→dN(0,1). Hence, for any δ>0, as n→∞, Pr(|nkFx(Xn,k+1)−1|>k−1/2+δ)→0. A similar relation for Yn,k+1 holds. Therefore, in order to prove that I1→P1 as n→∞, we will prove a more general result that τ˜(x,y)/τ(k/n)→P1 uniformly for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2 for some 0<δ<1/2. By applying the law of large number, we get that as n→∞, τ˜(x,y)R(x,y,k/n)→P1, (24) where R(x,y,k/n)=nkPr(X<Qx(knx),Y<Qy(kny)). Notice that τ(k/n)=R(1,1,k/n). Hence, what remains to be proved is that the denominator in Equation (24) can be replaced by τ(k/n), that is, limn→∞R(x,y,k/n)τ(k/n)=1 (25) holds uniformly for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2. If βT=0, as n→∞, for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2, we have uniformly limn→∞R(x,y,k/n)τ(k/n)=limn→∞nkknxknynk(kn)2=limn→∞xy=1, where the last equality uses x→1 and y→1 as n→∞, since k→∞ as n→∞. If βT>0, applying Lemma 1 with p=k/n directly gives that limn→∞R(x,y,k/n)nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))=1 holds uniformly for (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2⊂(0,3/2]2. We further simplify the denominator using the limit relation in Equation (21) as follows: as n→∞, nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))=Pr(X<min⁡(Qy(kny)βT,Qx(knx)))Pr(X<Qx(kn))∼(min⁡(Qy(kny)βT,Qx(knx))Qx(kn))−αx=min⁡((βTQx(kn)Qy(kny))αx,(Qx(kn)Qx(knx))αx). From Theorem 1, we get that (βTQx(kn)Qy(kny))αx∼τ(k/n) as n→∞ holds uniformly for |y−1|≤k−1/2+δ. In addition, (Qx(kn)Qx(knx))αx→1 as n→∞ holds uniformly for |x−1|≤k−1/2+δ. Together with τ(k/n)≤1, we get that nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))∼τ(k/n) holds uniformly for (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2, as n→∞. Hence, we proved Equation (25) and consequently handled the term I1. We remark that our proof allows for limp→0τ(p)=0, which goes beyond the typical consistency results in bivariate extreme value statistics. Finally, we deal with I2. If lim⁡sup⁡p→0τ(p)>0, then the consistency of α^x leads to I2→P1, as n→∞. In this case, the theorem is proved without using Conditions (3), (9) and (10). If limp→0τ(p)=0, to prove I2→P1, we need to prove that as n→∞, log⁡τ(k/n)(1α^x−1αx)→P0. (26) The conditions in Equations (3) and (9) imply the asymptotic normality for α^x: as n→∞, k1(1α^x−1αx)=Op(1); see, for example, Theorem 3.2.5 in De Haan and Ferreira (2006). Therefore, it only remains to prove that log⁡τ(k/n)=o(k1) as n→∞. If βT=0, then τ(k/n)=k/n>1/n. Hence, as n→∞, log⁡τ(k/n)=O(log⁡n). If βT>0, following Theorem 1, we get that for sufficiently large n, τ(k/n)∼(βTQx(kn)Qy(kn))αx=(βT)αx(kn)αx/αy−1(l˜x(kn)l˜y(kn))αx>D(kn)αx/αy−1+δ, for some D > 0 and δ>0. The last step comes from the Potter inequality; see, for example, Inequality (B.1.19) in De Haan and Ferreira (2006). Therefore, log⁡τ(k/n)=O(log⁡n) as n→∞. Combining log⁡τ(k/n)=O(log⁡n) as n→∞ with Condition (10), we have that log⁡τ(k/n)=o(k1) as n→∞, which implies that I2→P1. ▪ Proof of Theorem 3 We start by deriving the explicit form for R(x, y) and its partial derivatives at (1, 1) because these quantities play an important role in the calculation of the asymptotic variance. Notice that R is a homogeneous function with degree 1. Thus, it is only necessary to derive R(x,1) for x > 0. This is given in the following lemma. Lemma 2 Under the conditions in Theorem 3, we have R(x,1)=min⁡(x,τ) for x > 0. Proof Theorem 1 implies that limp→0Qy(p)Qx(τp)=limp→0τ1/αQy(p)Qx(p)=βT. Hence, for any τ<x<1, we have that for sufficiently small p, Qy(p)≤βTQx(xp). On the other hand, for any 0<x<τ, for sufficiently small p, Qy(p)≥βTQx(xp). By applying Lemma 1 with y = 1, we get that limp→0R(x,1)1pPr(X<min⁡(Qy(p)βT,Qx(px)))=1. In particular, for τ<x<1, since Qy(p)≤βTQx(xp), R(x,1)=limp→0Pr(βTX<Qy(p))Pr(X<Qx(p))=limp→0(Qy(p)βTQx(p))−αx=τ. On the other hand, for 0<x<τ, R(x,1)=limp→0Pr(X<Qx(xp)p=x. Finally, for the point x=τ, we use continuity of the R(x,1) function to get that R(τ,1)=τ. The lemma is thus proved. ▪ As a direct consequence of Lemma 2, we get that for xy>τ R(x,y)=yR(xy,1)=τy. Hence, the partial derivatives of R at the neighborhood of (1, 1) exist as R1(1,1)=0 and R2(1,1)=τ, where R1, R2 denotes the partial derivatives of R with respect to x and y, respectively. With the expression of the R function and its partial derivatives, we apply Theorem 7.2.2 in De Haan and Ferreira (2006) to obtain the asymptotic normality of τ^(k/n) as k(τ^(k/n)−τ)→PW(1,1)−τW(+∞,1), (27) where W(x, y) is a continuous mean zero Gaussian process with the following covariance structure: EW(x1,y1)W(x2,y2)=R(min⁡(x1,x2),min⁡(y1,y2)). In addition, the Gaussian process W also governs the asymptotic normality for marginals as sup⁡0<x≤T1xλ|k(1k∑t=1n1{Xt<Qx(knx)}−x)−W(x,+∞)|→P0, (28) and sup⁡0<y≤T1yλ|k(1k∑t=1n1{Yt<Qy(kny)}−y)−W(+∞,y)|→P0, (29) for given T > 0 and 0≤λ<1/2. Next, from the asymptotic properties in Equations (28) and (29), we get that, as n→∞, k(1α^x−1αx)→P1αx(∫01W(s,+∞)dss−W(1,+∞)), (30) k(Q^x(k/n)Qx(k/n)−1)→P1αxW(1,+∞), (31) k(Q^y(k/n)Qy(k/n)−1)→P1αyW(+∞,1). (32) Here we use the conditions k=o(nζ) with ζ<min⁡(2γαx+2γ,2γ′αy+2γ′). For the derivation of the three relations; see Example 5.1.5 and Equation (5.1.19) in De Haan and Ferreira (2006). Now we can deal with the asymptotic normality of the estimator β^T by combining the asymptotic normality results for the four elements: Using the Cramér’s delta method, we can use the asymptotic relations in Equations (27) and (30)–(32) to obtain that k(β^Tτ1/αxQy(k/n)Qx(k/n)−1)→PΓ, where Γ=1αx(1τW(1,1)−W(+∞,1))+(log⁡τ)1αx(∫01W(s,+∞)dss−W(1,+∞)),+1αyW(+∞,1)−1αxW(1,+∞),=1αx(1τW(1,1)+(log⁡τ)∫01W(s,+∞)dss−(1+log⁡τ)W(1,+∞)). Using the expression of R(x,1) in Lemma 2, one can calculate that Var(Γ)=1αx2(1τ−1−(log⁡τ)2). Therefore, what remains to be proved is the following deterministic relation limn→∞k(τ1/αxQy(k/n)Qx(k/n)βT−1)=0. Knowing that limn→∞Qy(k/n)Qx(k/n)=βTτ−1/αx>0, the above relation is equivalent to limn→∞k(τ−(βTQx(k/n)Qy(k/n))αx)=0. Next, from Condition (12) and k=o(n2θ/(2θ+1)), we get that limn→∞k(τ−τ(k/n))=0. Hence, what remains to be proved is limn→∞k(τ(k/n)−(βTQx(k/n)Qy(k/n))αx)=0. (33) Notice that in Lemma 1, by considering p=k/n, x=y=1, the denominator is simplified to Pr(X<Qy(k/n)/βT) because for sufficiently large n, Qx(k/n)>Qx(zk/n)>Qy(k/n)/βT for some z∈(τ,1). Hence, Equation (14) implies that τ(k/n)−(βTQx(k/n)Qy(k/n))αx→0 as n→∞. Hence, to prove Equation (33), we aim at reproving this limit relation with an additional “speed of convergence” k. Consequently, we revisit the proof of Lemma 1 with adding the k factor to each limit relation in Equation (15). Recall the sets C, C0, C1, C21, and C22 defined in the proof of Lemma 1. Without loss of generality, we use the same notation to indicate the sets when taking p=k/n, x=y=1, and δ=δn=k−1/2−κ with κ>0 such that (1/2+κ)αy+3/2<1/ζ. Notice that, since ζ<2αy+3, the choice of κ is feasible. With such a choice, we have that limn→∞kδn=0 and limn→∞kδn−αykn=0. Next, we prove the following limit relations that are used for handling the probability of the sets. As n→∞, kPr(ɛ<δnQy(k/n))→0 and kPr(ɛ≥-δnQy(k/n))→0, (34) k(nkPr(βTX<Qy(k/n)(1±δn))−(βTQx(k/n)Qy(k/n))αx)=0. (35) Proof of Equation (34) We start with the first half. Notice that Pr(ɛ<δnQy(k/n))=Pr(ɛ<δnQy(k/n),X<Qx(p¯))Pr(X<Qx(p¯)),≤Pr(Y<δnQy(k/n)+βTQx(p¯))p¯∼δn−αyk/np¯. In the last step, we use a limit relation similar to Equation (21) based on the second-order condition (11) and the facts that 1/δn>1 and δnQy(k/n)→−∞ as n→∞. Since kδn−αykn→0 as n→∞, the first half of Equation (34) is proved. For the second half, we write Pr(ɛ≥−δnQy(k/n))=Pr(ɛ≥−δnQy(k/n),1βT+1δnQy(k/n)≤X<Qx(p¯))Pr(1βT+1δnQy(k/n)≤X<Qx(p¯)),≤Pr(Y≥−1βT+1δnQy(k/n))p¯−Pr(X<1βT+1δnQy(k/n)),≤DPr(Y<1βT+1δnQy(k/n))p¯−Pr(X<1βT+1δnQy(k/n)), for some constant D > 0. Here, the last step uses the condition that Pr(Y>u)=O(Pr(Y<−u)). Notice that the denominator converges to p¯, which is positive and finite. The second half of Equation (34) is thus proved similar to the proof for the first half.▪ Proof of Equation (35) Recall the second-order condition (3). The condition that k=O(nζ) with ζ<2γ2γ+αx implies that kη(Qx(k/n))→0. Together with the fact that Qy(k/n)(1±δn)βTQx(k/n)→τ−1/αx, we get that limn→∞k(nkPr(βTX<Qy(k/n)(1±δn))−(Qy(k/n)(1±δn)βTQx(k/n))−αx)=0. Hence, Equation (35) is proved since limn→∞kδn=0.▪ Now, we return to prove Equation (33) by dealing with the three sets C1, C21, and C22. First, the limit relation in Equation (35) implies that limn→∞k(nkPr(C21)−(βTQx(k/n)Qy(k/n))αx)=0. The first half of the limit relation (34) implies that limn→∞knkPr(C22)=0. Next, for C1, due to independency, we have that Pr(C1)=Pr(βTX<Qy(k/n)(1+δn))·Pr(ɛ<−δnQy(k/n)). The second half of the limit relation (34) implies that limn→∞k(Pr(ɛ<−δnQy(k/n))−1)=0. Together with Equation (35), we get that limn→∞k(nkPr(C1)−(βTQx(k/n)Qy(k/n))αx)=0. By combining Pr(C1), Pr(C21), and Pr(C22), which gives the lower and upper bounds of Pr(C), we proved Equation (33) and thus the theorem.▪ Footnotes 1 The second-order condition for X is assumed to generalize Theorems 1 and 2 to also hold for αx/2<αy<αx. 2 The τ-measure in Equation (4) is closely related to the measure E(κ|κ≥1) introduced by Huang (1992) and applied by Hartmann, Straetmans, and de Vries (2004). There, κ is the number of events occurring with probability p and E(κ|κ≥1) is the expected number of tail events given that there is at least one. In the bivariate case, the two measures are connected by E(κ|κ≥1)=22−τ. 3 An example demonstrating the compatibility of the three second-order conditions (3), (11), and (12) with the linear tail model in Equation (1) is Y=βX+ɛ, where X and ɛ are independently Cauchy distributed. In this case, we have that (X, Y) follows a bivariate Cauchy distribution satisfying these conditions with γ=γ′=−2 and θ = 1. 4 Formally, the estimator in the conditional regression approach is β^OLST=∑{t:Xt<Xn,k+1}(Yt−Y¯T)(Xt−X¯T)∑{t:Xt<Xn,k+1}(Xt−X¯T)2, where X¯T=(1/k)∑{t:Xt<Xn,k+1}(Xt) and Y¯T=(1/k)∑{t:Xt<Xn,k+1}(Yt). The estimator in the conditional regression approach is theoretically unbiased if ɛ has a zero conditional mean. A direct theoretical comparison of the asymptotic variances of the EVT approach and the conditional regression approach is difficult because their levels depend on different statistical parameters. For example, the asymptotic variance of β^OLST depends on the variance of ɛ, while the EVT approach does not assume finite variances. 5 In a sample of 1,250 observations, about 1,250×3.0%=37.5 observations are expected to be generated from the linear tail model. 6 To illustrate the decomposition of the MSE, we report the squared bias in the figures, which do not show the sign of the bias. Numerical results show that the bias of the EVT approach is consistently positive for all simulated models. 7 Data and documentation are available from the personal website of Kenneth French: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. Our results are based on data accessed on January 8, 2014. The secondary data are based on the returns of stocks listed on NYSE, AMEX, and NASDAQ in the CRSP database. The definition of the industry portfolios is based on SIC codes. Industry portfolios with missing returns in a subperiod are excluded from the analysis for that specific subperiod. Five industry portfolios report missing returns in the start of the sample. From July 1963 onward, one industry portfolio remains unavailable (“Healthcare,” SIC-codes 8000-8099). After July 1969, all 48 portfolios are available. 8 Our results remain qualitatively unchanged when these portfolios are included in the analysis. 9 Our conclusions remain the same if the projected losses under the conditional regression approach are calculated as L^OLS,j=COLS,jT+Lmβ^OLS,jT, where COLS,jT=Y¯T−β^OLS,jTX¯T. In this case, the EVT approach provides a lower RMSE at the 5% significance level in 6 out of 16 cases (8 out of 16 cases at the 10% significance level). 10 The test statistic in Equation (13) follows directly from the modified Diebold–Mariano test statistic of Harvey, Leybourne, and Newbold [1997, Equation (9)] with h = 1, which corresponds to assuming that the correlations across all dj are zero. 11 In all subperiods, the mean absolute error from the EVT approach is also below that of the conditional regression approach. References Ang A. , Chen J. . 2002 . Asymmetric Correlations of Equity Portfolios . Journal of Financial Economics 63 : 443 – 494 . Google Scholar CrossRef Search ADS Atanasov V. , Nitschka T. . 2014 . Currency Excess Returns and Global Downside Market Risk . Journal of International Money and Finance 47 : 268 – 285 . Google Scholar CrossRef Search ADS Baur D. G. , McDermott T. K . 2010 . Is Gold a Safe Haven? International Evidence . Journal of Banking & Finance 34 : 1886 – 1898 . Google Scholar CrossRef Search ADS Bollerslev T. , Todorov V. , Li S. Z. . 2013 . Jump Tails, Extreme Dependencies, and the Distribution of Stock Returns . Journal of Econometrics 172 : 307 – 324 . Google Scholar CrossRef Search ADS Chernozhukov V. , Fernández-Val I. . 2011 . Inference for Extremal Conditional Quantile Models, with an Application to Market and Birthweight Risks . Review of Economic Studies 78 : 559 – 589 . Google Scholar CrossRef Search ADS De Haan L. , Ferreira A. . 2006 . Extreme Value Theory: An Introduction . New York : Springer . Google Scholar CrossRef Search ADS De Haan L. , Stadtmüller U. . 1996 . Generalized Regular Variation of Second Order . Journal of the Australian Mathematical Society (Series A) 61 : 381 – 395 . Google Scholar CrossRef Search ADS De Vries C. G . 2005 . The Simple Economics of Bank Fragility . Journal of Banking & Finance 29 : 803 – 825 . Google Scholar CrossRef Search ADS Diebold F. X. , Mariano R. S . 1995 . Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 : 253 – 263 . Drees H . 1998 . On Smooth Statistical Tail Functionals . Scandinavian Journal of Statistics 25 : 187 – 210 . Google Scholar CrossRef Search ADS Drees H . 2008 . Some Aspects of Extreme Value Statistics under Serial Dependence . Extremes 11 : 35 – 53 . Google Scholar CrossRef Search ADS Drees H. , De Haan L. , Resnick S. . 2000 . How to Make a Hill Plot . Annals of Statistics 28 : 254 – 274 . Google Scholar CrossRef Search ADS Einmahl J. H. J. , Haan L. de , Li D. . 2006 . Weighted Approximations of Tail Copula Processes with Application to Testing the Bivariate Extreme Value Condition . Annals of Statistics 34 : 1987 – 2014 . Google Scholar CrossRef Search ADS Embrechts P. , De Haan L. , Huang X. . 2000 . “Modelling Multivariate Extremes.” In Embrechts P. (ed.), Extremes and Integrated Risk Management , pp. 59 – 67 . London : RISK Books . Feuerverger A. , Hall P. . 1999 . Estimating a Tail Exponent by Modelling Departure from a Pareto Distribution . Annals of Statistics 27 : 760 – 781 . Google Scholar CrossRef Search ADS Fougères A.-L. , Haan L. de , Mercadier C. . 2015 . Bias Correction in Multivariate Extremes . Annals of Statistics 43 : 903 – 934 . Google Scholar CrossRef Search ADS Fung W. , Hsieh D. A. . 2001 . The Risk in Hedge Fund Strategies: Theory and Evidence from Trend Followers . Review of Financial Studies 14 : 313 – 341 . Google Scholar CrossRef Search ADS Gomes M. I. , Haan L. de , Rodrigues L. H. . 2008 . Tail Index Estimation for Heavy-Tailed Models: Accommodation of Bias in Weighted Log-Excesses . Journal of the Royal Statistical Society: Series B 70 : 31 – 52 . Hartmann P. , Straetmans S. , de Vries C. G. . 2004 . Asset Market Linkages in Crisis Periods . Review of Economics and Statistics 86 : 313 – 326 . Google Scholar CrossRef Search ADS Hartmann P. , Straetmans S. , de Vries C. G. . 2010 . Heavy Tails and Currency Crises . Journal of Empirical Finance 17 : 241 – 254 . Google Scholar CrossRef Search ADS Harvey D. , Leybourne S. , Newbold P. . 1997 . Testing the Equality of Prediction Mean Squared Errors . International Journal of Forecasting 13 : 281 – 291 . Google Scholar CrossRef Search ADS Hill B. M. 1975 . A Simple General Approach to Inference About the Tail of a Distribution . Annals of Statistics 3 : 1163 – 1174 . Google Scholar CrossRef Search ADS Hill J. B. 2009 . On Functional Central Limit Theorems for Dependent, Heterogeneous Arrays with Applications to Tail Index and Tail Dependence Estimation . Journal of Statistical Planning and Inference 139 : 2091 – 2110 . Google Scholar CrossRef Search ADS Hill J. B. 2013 . Least Tail-Trimmed Squares for Infinite Variance Autoregressions . Journal of Time Series Analysis 34 : 168 – 186 . Google Scholar CrossRef Search ADS Hsing T. 1991 . On Tail Index Estimation Using Dependent Data . Annals of Statistics 19 : 1547 – 1569 . Google Scholar CrossRef Search ADS Huang X. 1992 . Statistics of Bivariate Extreme Values. PhD thesis, Erasmus University Rotterdam (PhD Thesis No. 22, Tinbergen Institute Research Series) . Huisman R. , Koedijk K. G. , Kool C. J. M. , Palm F. . 2001 . Tail-Index Estimates in Small Samples . Journal of Business & Economic Statistics 19 : 208 – 216 . Google Scholar CrossRef Search ADS Jansen D. W. , De Vries C. G. . 1991 . On the Frequency of Large Stock Returns: Putting Booms and Busts into Perspective . Review of Economics and Statistics 73 : 18 – 24 . Google Scholar CrossRef Search ADS King M. A. , Wadhwani S. . 1990 . Transmission of Volatility Between Stock Markets . Review of Financial Studies 3 : 5 – 33 . Google Scholar CrossRef Search ADS Koenker R. , Bassett G. . 1978 . Regression Quantiles . Econometrica 46 : 33 – 50 . Google Scholar CrossRef Search ADS Lettau M. , Maggiori M. , Weber M. . 2014 . Conditional Risk Premia in Currency Markets and Other Asset Classes . Journal of Financial Economics 114 : 197 – 225 . Google Scholar CrossRef Search ADS Li J. , Todorov V. , Tauchen G. . 2017 . Jump Regressions . Econometrica 85 : 173 – 195 . Google Scholar CrossRef Search ADS Longin F. , Solnik B. . 1995 . Is the Correlation in International Equity Returns Constant: 1960–1990? Journal of International Money and Finance 14 : 3 – 26 . Google Scholar CrossRef Search ADS Longin F. , Solnik B. . 2001 . Extreme Correlation of International Equity Markets . Journal of Finance 56 : 649 – 676 . Google Scholar CrossRef Search ADS Loretan M. , Phillips P. C. B. . 1994 . Testing the Covariance Stationarity of Heavy-Tailed Time Series: An Overview of the Theory with Applications to Several Financial Datasets . Journal of Empirical Finance 1 : 211 – 248 . Google Scholar CrossRef Search ADS Malevergne Y. , Sornette D. . 2004 . How to Account for Extreme Co-Movements between Individual Stocks and the Market . Journal of Risk 6 : 71 – 116 . Google Scholar CrossRef Search ADS Mikosch T. , De Vries C. G. . 2013 . Heavy Tails of OLS . Journal of Econometrics 172 : 205 – 221 . Google Scholar CrossRef Search ADS Mitchell M. , Pulvino T. . 2001 . Characteristics of Risk and Return in Risk Arbitrage. 2001 . Journal of Finance 56 : 2135 – 2175 . Google Scholar CrossRef Search ADS Patton A. J. 2009 . Are “Market Neutral” Hedge Funds really Market Neutral? Review of Financial Studies 22 : 2495 – 2530 . Google Scholar CrossRef Search ADS Peng L. 1998 . Asymptotically Unbiased Estimators for the Extreme-Value Index . Statistics & Probability Letters 38 : 107 – 115 . Google Scholar CrossRef Search ADS Poon S. H. , Rockinger M. , Tawn J. A. . 2004 . Extreme Value Dependence in Financial Markets: Diagnostics, Models, and Financial Implications . Review of Financial Studies 17 : 581 – 610 . Google Scholar CrossRef Search ADS Post T. , Versijp P. . 2007 . Multivariate Tests for Stochastic Dominance Efficiency of a Given Portfolio . Journal of Financial and Quantitative Analysis 42 : 489 – 515 . Google Scholar CrossRef Search ADS Roose K. D. 1948 . The Recession of 1937–38 . Journal of Political Economy 56 : 239 – 248 . Google Scholar CrossRef Search ADS Rousseeuw P. J. 1985 . “Multivariate Estimation with High Breakdown Point.” In Grossman W. , Pflug G. , Vincze I. , Wertz W. (eds.), Mathematical Statistics and Applications , pp. 283 – 297 . Dordrecht : Reidel Publishing Company . Google Scholar CrossRef Search ADS Støve B. , Tjøstheim D. , Hufthammer K. O. . 2014 . Using Local Gaussian Correlation in a Nonlinear Re-examination of Financial Contagion . Journal of Empirical Finance 25 : 62 – 82 . Google Scholar CrossRef Search ADS Sun P. , Zhou C. . 2014 . Diagnosing the Distribution of GARCH Innovations . Journal of Empirical Finance 29 : 287 – 303 . Google Scholar CrossRef Search ADS Todorov V. , Bollerslev T. . 2010 . Jumps and Betas: A New Framework for Disentangling and Estimating Systematic Risks . Journal of Econometrics 157 : 220 – 235 . Google Scholar CrossRef Search ADS Van Oordt M. R. C. , Zhou C. . 2014 . “Systemic Risk and Bank Business Models.” De Nederlandsche Bank Working paper, 442 . Google Scholar CrossRef Search ADS Van Oordt M. R. C. , Zhou C. . 2016 . Systematic Tail Risk . Journal of Financial and Quantitative Analysis 51 : 685 – 705 . Google Scholar CrossRef Search ADS Zhang Q. , Li D. , Wang H. . 2013 . A Note on Tail Dependence Regression . Journal of Multivariate Analysis 120 : 163 – 172 . Google Scholar CrossRef Search ADS © The Author, 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Financial Econometrics Oxford University Press

Estimating Systematic Risk under Extremely Adverse Market Conditions

Loading next page...
 
/lp/ou_press/estimating-systematic-risk-under-extremely-adverse-market-conditions-sybZC0B3ZQ
Publisher
Oxford University Press
Copyright
© The Author, 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1479-8409
eISSN
1479-8417
D.O.I.
10.1093/jjfinec/nbx033
Publisher site
See Article on Publisher Site

Abstract

Abstract This paper considers the problem of estimating a linear model between two heavy-tailed variables if the explanatory variable has an extremely low (or high) value. We propose an estimator for the model coefficient by exploiting the tail dependence between the two variables and prove its asymptotic properties. Simulations show that our estimation method yields a lower mean-squared error than regressions conditional on tail observations. In an empirical application, we illustrate the better performance of our approach relative to the conditional regression approach in projecting the losses of industry-specific stock portfolios in the event of a market crash. In financial management, the risk of stock portfolios is often assessed by estimating their return sensitivity to key risk factors. The coefficient in a single-factor model, the “market beta” is commonly given a prominent position in such an assessment. Nevertheless, there is wide consensus that the relationship between asset returns and market risk depends on market conditions. For example, equity returns exhibit stronger correlation during volatile periods, especially in the case of extreme market downturns; see, for example, King and Wadhwani (1990), Longin and Solnik (1995, 2001), and Ang and Chen (2002). Thus, risk managers who are concerned about possible extreme losses in distress events may need to analyze systematic risk only under extremely adverse market conditions. In this paper, we develop a new method for evaluating the consequences of such extreme events. The goal of this paper is to estimate a linear model between two heavy-tailed variables conditional on the explanatory variable having an extremely low (or high) value. Consider the following model on the relation between two continuous random variables X and Y, conditional upon an extremely low value of X Y=βTX+ɛ,  for  X<Qx(p¯), (1) where p¯ denotes a very small probability, ɛ is the error term that is assumed to be independent of the X under the condition X<Qx(p¯), and where Qx(p¯) denotes the quantile function of X defined as Qx(p¯)=sup⁡{c:Pr(X≤c)≤p¯}. Since the relation holds for extremely low values of X only, we use an index T to distinguish the coefficient in βT from the coefficient in a global linear model. We intend to estimate βT using observations on (X, Y). With Y and X as the returns of, respectively, a stock portfolio and the market portfolio, the coefficient βT is regarded as a measure of systematic risk under extremely adverse market conditions. Estimating βT can be useful to assess the extreme loss on the stock portfolio in the event of a market crash. The coefficient βT in the linear tail model in Equation (1) can be regarded as a regression coefficient. Consequently, a direct approach to estimating βT is to apply a “conditional regression,” that is, to estimate a least squares regression coefficient based on observations corresponding to extremely low values of X only. This method has been used, for example, to evaluate βTs of financial returns on commodities, currencies (Atanasov and Nitschka, 2014; Lettau, Maggiori, and Weber, 2014), stocks (Post and Versijp, 2007), and active trading strategies (Mitchell and Pulvino, 2001). Two potential drawbacks of this conditional regression approach apply. First, because the conditional regression is based on a small number of observations, the approach may potentially produce a relatively large variance of the estimator. Second, when applied to financial market data, the heavy-tailedness of financial returns may further increase the estimation error. We propose an alternative estimator for βT by exploiting the tail dependence imposed by the heavy-tailedness of X and Y. Under mild conditions, we show that the proposed estimator possesses consistency and asymptotic normality. Simulations show that our estimation method yields a lower mean-squared error than estimating conditional regressions on tail observations. In an empirical application, we illustrate the better performance of our approach relative to the conditional regression approach in projecting the losses of industry-specific portfolios in major stock market crashes over the past 80 years. However, the method can also be applied to other variables known to be heavy-tailed, such as exchange rates, trading volume, insurance claims, or the severity of natural disasters. Theoretically, our estimator has a structure similar to a regression coefficient. The estimator in a standard univariate regression analysis consists of a dependence measure given by the correlation, and the marginal risk measures given by the standard deviations. In the estimator of βT, the dependence measure is replaced by a tail dependence measure and the marginal risk measures are replaced by quantiles obtained from tail observations. Our study builds on the literature on multivariate extreme value theory (EVT) that provides measures to evaluate the dependence among extreme observations; see, for example, Embrechts, De Haan, and Huang (2000) and De Haan and Ferreira (2006, Chapters 6 and 7). Poon, Rockinger, and Tawn (2004) and Hartmann, Straetmans, and de Vries (2004) have applied such tail dependence measures to analyze linkages between asset returns during crises. Bollerslev, Todorov, and Li (2013) analyze the tail dependence between jumps in stock returns using high-frequency data. Zhang, Li, and Wang (2013) develop a model to estimate the level of tail dependence between two variables Y and X as a function of other explanatory variables Z. While their model helps to predict the probability that Y is extremely low given that X is extremely low, the model cannot be directly applied to solve the problem of predicting the expected level of Y given an extremely low observed value of X. Some studies focus on deriving the level of tail dependence in global linear models. Malevergne and Sornette (2004) derive the level of tail dependence, assuming a global linear relation with a single risk factor. Others study tail dependence assuming global linear relations with multiple risk factors; see, for example, De Vries (2005) and Hartmann, Straetmans, and de Vries (2010). In contrast, our approach focuses on estimating βT by exploiting the tail dependence structure while assuming the linear model only in the tail. Our study should be distinguished from the literature on estimating global linear regression models in the presence of heavy tails. Mikosch and De Vries (2013) show that the finite sample distribution of a regression coefficient is heavy-tailed if the error term follows a heavy-tailed distribution. This may be improved by applying least (tail-)trimmed squares, which ensures asymptotic normality even if the error terms have infinite variance; see Rousseeuw (1985) and Hill (2013). Our study differs from this existing literature in the sense that our purpose is to estimate the linear relation only for extremely low X, rather than estimating a global linear relation. Similarly, our study is distinct from quantile regressions; see, for example, Koenker and Bassett (1978) and Chernozhukov and Fernández-Val (2011). Quantile regression analysis seeks to estimate the quantile of the conditional distribution of Y as a function of the observed value of X. In contrast, we focus on predicting the expectation of Y using a linear relation with X conditional on an extreme value of X. Finally, our study should also be differentiated from continuous time models in which the dependent variable reacts differently to diffusive and jump components in the independent variable; see Todorov and Bollerslev (2010) for defining such a model, and Li, Todorov, and Tauchen (2017) for an efficient estimator of the coefficient on the jump component. The linear coefficient on the jump component is different from βT in discrete time. Given an extremely low realization of the independent variable, this realization could potentially be attributed mainly to the jump component but, nevertheless, also involves a diffusive component. Therefore, βT is a combination of the two linear coefficients on the diffusive and jump components, albeit mainly loaded on the latter. Different from Todorov and Bollerslev (2010), the linear tail model in Equation (1) does not require a symmetric relation to extremely low or high values of the independent variable: By applying our methodology to—Y and—X, one obtains an estimate of the (potentially different) relation when X has an extremely high value. The present study proposes the methodology to estimate βT based on EVT and demonstrates its validity by deriving the asymptotic properties of the estimator and performing simulations. Beyond the scope of the current paper, the level of βT can be applied in various contexts. Firstly, it may have asset pricing implications as a measure of systematic tail risk. In Van Oordt and Zhou (2016), we apply the methodology developed in the present study to estimate the level of βT of individual stocks and test whether stocks with a higher βTs earn a higher risk premium. In this asset pricing study, we show that historical estimates of βT contain information about the future performance of stocks in market crashes over and above the information captured by regular market betas. Nevertheless, we find no evidence of a positive risk premium in the cross-section of expected returns. Moreover, the methodology can be applied to measure the sensitivity of banks to large shocks in the financial system. An application in this direction can be found in Van Oordt and Zhou (2014). Other potential applications include other return processes that potentially exhibit nonlinear relations with the market or other risk factors, such as the returns of hedge funds, safe-haven currencies, safe-haven commodities, and portfolios with complex derivatives or options. The remainder of the paper is organized as follows. Section 1 describes our estimation method. Section 2 reports simulation results. Section 3 provides an empirical illustration of our approach in which we estimate the losses on industry-specific stock portfolios during market crashes. Section 4 concludes. 1 Methodology 1.1 Theory We start by assuming heavy-tailedness of X and Y. The definition of heavy-tailedness is as follows. The tail distributions of X and Y are heavy-tailed if they can be expressed as Pr(X<−u)=u−αxlx(u) and Pr(Y<−u)=u−αyly(u), (2) where lx and ly are slowly varying functions as u→∞. That is, limu→∞lx(tu)lx(u)=limu→∞ly(tu)ly(u)=1, for any t > 0. Parameters αx and αy are called the tail indices. Equivalently, the heavy-tailedness of X can be expressed as limu→∞Pr(X<−ux)Pr(X<−u)=x−αx, for any fixed x > 0. We assume the usual second-order condition (see, e.g, De Haan and Stadtmüller, 1996), which quantifies the speed of convergence in this relation as limu→∞Pr(X<−ux)Pr(X<−u)−x−αxη(u)=x−αxx−γ−1−γ, (3) with an eventually positive or negative function η(u) such that η(u)→0 as u→∞, and γ>0.1 The idea behind our approach to estimating βT is as follows. The relation in Equation (1) is specified only for the region X<Qx(p¯), while the model specifies no assumptions on the relation for the region X≥Qx(p¯). The relation brings about a dependence structure between X and Y in the case of extremely low values of X, that is, if X<Qx(p¯). This structure determines the dependence between the left tails of the distributions of X and Y. Our approach relies on analyzing this tail dependence structure to infer the level of βT. We consider the following tail dependence measure from multivariate EVT, τ:=limp→0τ(p):=limp→0 1pPr(Y<Qy(p),X<Qx(p)), (4) where Qy(p) denotes the quantile function of Y defined as Qy(p)=sup⁡{c:Pr(Y≤c)≤p}.2 The tail dependence measure can be rewritten as τ=limp→0Pr(Y<Qy(p)|X<Qx(p)), which is the probability of observing an extremely low value of Y conditional on an extremely low value of X. Since it is the limit of a conditional probability, the τ-measure is by definition bounded by 0≤τ≤1. The case τ = 0 is regarded as tail independence, while the case τ = 1 corresponds to complete tail dependence. Also, the tail dependence measure is invariant to positive linear transformations on X and Y. These features of the τ-measure indicate that its role in our approach will resemble that of a correlation coefficient, except that the τ-measure focuses on dependence in the tails only. The following theorem shows how the τ-measure relates to the coefficient βT in the linear tail model in Equation (1). Theorem 1 Under the linear tail model in Equation (1) and the heavy-tail set-up of the downside distributions in Equations (2) and (3), with αy>12αx and βT≥0, we have that limp→0(τ(p))1/αxQy(p)Qx(p)=βT. (5) Proof See the Appendix. Theorem 1 does not depend on assuming the heavy-tailedness of the unobservable error term ɛ. The theorem holds also if ɛ exhibits a thin-tailed distribution, such as the normal distribution. The condition αy>12αx basically requires Y not to be “too heavy-tailed” in comparison with X. The intuition is that, otherwise, the error terms ɛ would have a much heavier tail than X. The impact of extreme realizations of ɛ on extreme realizations of Y would overshadow the impact of the relation between X and Y. As a consequence, it is not possible to infer the level of βT. Nevertheless, this condition is not very restrictive in the context of stock market returns. For example, if X represents the returns on a general market index with an αx of 4 (see, e.g., Jansen and De Vries, 1991), the condition is satisfied if the firm’s stock returns Y have finite variance. Moreover, conditional upon a sufficiently low αx, Theorem 1 also holds if Y has infinite variance or mean. The relation in Theorem 1 provides the basis for the estimation of coefficient βT. Consider independent and identically distributed (i.i.d.) observations (X1,Y1),…,(Xn,Yn) with the i.i.d. unobserved error terms ɛ1,…,ɛn. Later we will also consider the presence of temporal dependence. To estimate βT, we estimate each component in Equation (5). As in usual extreme value analysis, we mimic the limit procedure p→0 by considering only the lowest k observations in the tail region, such that k:=k(n)→∞ and k/n→0 as n→∞. In other words, for statistical estimation, the probability p is set at some low level p=k/n. Hence, we obtain the estimator of βT as β^T:=τ^(k/n)1/α^xQ^y(k/n)Q^x(k/n). (6) We remark that the estimator β^T in Equation (6) shows similarities with a standard regression analysis. Considering a standard linear regression between random variables U and V, the estimator of the slope coefficient is ρ^σ^u/σ^v, where ρ^ is the correlation coefficient between U and V, and where σ^u and σ^v are the standard deviations of U and V, respectively. Similarly, the estimator β^T consists of the tail dependence measure τ^, and two tail risk measures, that is, the tail quantiles of X and Y. In addition, it combines these components in a similar way as in standard regression analysis. 1.2 Estimation For our procedure, we rely on relatively simple and widely used estimators to obtain estimates of each of the components in Equation (6). These estimators rely exclusively on observations far in the tail of the distributions of X and Y. Nevertheless, the development of better estimators for the building blocks in Equation (6) has been the subject of an extant literature. Hence, our procedure to estimate βT via Equation (6) may well stand to be further improved by choosing other estimators for the components. Throughout this paper, we will refer to our estimation of βT with the estimators of the components below as the EVT approach. The estimate of the tail index αx is obtained from the k1 lowest observations of X with the estimator proposed in Hill (1975). Here, k1 is another intermediate sequence such that k1:=k1(n)→∞ and k1/n→0 as n→∞. Suppose the observations of (X, Y) are (X1,Y1),…,(Xn,Yn). By ranking the observations of Xt as Xn,1≤Xn,2≤⋯≤Xn,n, the Hill estimator is defined as 1α^x:=1k1∑i=1k1log⁡(Xn,iXn,k1+1). (7) For the τ-measure, multivariate EVT provides a nonparametric estimate; see Embrechts, De Haan, and Huang (2000). The estimator is given as τ^(k/n):=1k∑t=1n1{Yt<Yn,k+1,Xt<Xn,k+1}, (8) where Yn,k+1 is the (k+1)-th lowest order statistic of Yt. Finally, the quantiles of X and Y at the probability level k/n, the Q^x(k/n) and Q^y(k/n), are estimated by their (k+1)-th lowest order statistics, that is, Xn,k+1 and Yn,k+1. Notice that, in Equation (5), the same tail probability p appears in the term τ(p) and the two quantiles Qy(p) and Qx(p). Correspondingly for the estimators of the τ-measure and the quantiles of X and Y, the same intermediate sequence k is used. Differently, there is no theoretical restriction such that k = k1, though this will be the most complicated case when dealing with the asymptotic normality below. In general, the estimator of βT via Equation (6) inherits its consistency and asymptotic normality from the consistency and asymptotic normality of the estimators of its subcomponents. In addition, the estimator of βT is even consistent if limp→0τ(p)=0, even though the statistical properties of the estimator of the τ-measure are less known in this case. To prove the consistency of β^T also for the case limp→0τ(p)=0, we require some additional conditions to ensure the asymptotic normality of the Hill estimator. These conditions are as follows. First, we require a condition on k1 ensuring that k1 is not too high: limn→∞k1η(−Qx(k1n))=λ<∞. (9) Conditions (3) and (9) are usually assumed to obtain the asymptotic normality of the Hill estimator; see, e.g., De Haan and Ferreira (2006), conditions (3.2.5) and (3.2.6). Second, an additional restriction ensures that k1 is not too low. As n→∞, k1log⁡n→+∞. (10) The following theorem states the consistency of β^T. Theorem 2 Assume that the conditions in Theorem 1 hold and k→∞,k1→∞,kn→0,k1n→0, as n→∞. In addition, only if limp→0τ(p)=0, do we further assume Conditions (9) and (10). Then we have that as n→∞, β^T→PβT. Next, we deal with asymptotic normality. For that purpose, we assume the second-order condition for the distribution of Y and the joint distribution (X, Y). This is in line with usual asymptotic normality result in multivariate extreme value statistics; see, for example, Einmahl, de Haan, and Li (2006). First, assume that limu→∞Pr(Y<−uy)Pr(Y<−u)−y−αyη′(u)=y−αyy−γ′−1−γ′, (11) with an eventually positive or negative function η′(u) such that η′(u)→0 as u→∞, and γ′<0. Second, with denoting the distribution function of X and Y as Fx and Fy, we define R(x,y,p):=1pPr(Fx(X)<px,Fy(Y)<py). For the dependence structure, we assume that, R(x,y,p)→R(x,y) as p→0 for some positive function R(x, y), with a speed of convergence as follows: there exists a θ>0 for which, as p→0, R(x,y,p)−R(x,y)=O(pθ), (12) for all (x,y)∈[0,1]2/{(0,0)}.3 The following theorem states the asymptotic normality of β^T. Theorem 3 Assume that the conditions in Theorem 1 hold. Suppose limp→0τ(p)=τ∈(0,1) and Pr(Y>u)=O(Pr(Y<−u)) as u→∞. Further assume that the second-order conditions (11) and (12) hold. Suppose k1=k=O(nζ), where ζ<min⁡(2θ/(1+2θ),2γ/(2γ+αx),2γ′/(2γ′+αy),3/(αy+2)).We then have that, as n→∞, k(β^T−βT)→dN(0,(βT)2αx2(1τ−1−(log⁡τ)2)). In this theorem we choose k = k1 because this is what we use in the simulation and empirical illustration. We note, however, that this choice is not necessarily the only one. One may also choose k and k1 such that, as n→∞, k/k1 converges to zero, a finite positive number or infinity. If k/k1→0 as n→∞, the asymptotic limit of α^x will play a dominant role in that of β^T. Conversely, if k/k1→∞ as n→∞, the asymptotic limit of τ^(k/n) and the two quantiles will play a dominant role in that of β^T. Therefore, the asymptotic normality result turns out to be simpler in these two cases. If k/k1 converges to a finite value other than 1, the asymptotic normality of β^T can be derived in a similar way, with a slightly more complicated structure for the asymptotic variance. The consistency and asymptotic normality results are obtained when {(Xt,Yt)} forms an i.i.d. sample. In the context of stock returns, it is likely that {(Xt,Yt)} is a time series with temporal dependence. In general, under weak conditions, EVT analysis can be applied without modification to temporally dependent data; see Drees (2008) for a general discussion. More specifically, if {(Xt,Yt)} exhibits weak temporal dependence such as autocorrelation or GARCH-type volatility clustering, then the consistency will not be affected. This follows from the consistency results for each component in β^T. For the Hill estimator, see, for example, Hsing (1991); for the quantiles and the tail dependence measure, see, for example, Hill (2009). It is notable that temporal dependence does affect the asymptotic normality result in the sense that it may lead to a different structure of the asymptotic variance of the estimates. Therefore, in financial applications, it may be better to rely on a block bootstrap procedure to obtain adjusted standard errors. 2 Simulations We run two sets of simulations to compare the performance of the proposed procedure to estimate βT and the performance of a regression conditional on tail observations.4 In each set of simulations, the generated samples consist of 1,250 random observations for (Xt, Y t), which corresponds approximately to the length of our estimation window in the empirical exercise. The first set of simulations evaluates the estimation accuracy of the two approaches if the data generating process is in line with the linear tail model in Equation (1). In this set of simulations, the observations for Y are constructed by aggregating the simulated X and ɛ according to different global linear models and segmented linear models. In these simulations, we compare the estimated and true values for βT. The second set of simulations compares the predictive power of the two approaches when the data generating process does not follow a linear model in the tail. In these simulations, the observations of Y and X are drawn from different copula models. The purpose of these simulations is to verify which approach exhibits a better performance if the linear tail model is used as an approximation. The performance of the two approaches is compared by assessing their ability in predicting Yt from an extremely low Xt. 2.1 Linear Models In the first set of simulations, we consider three global linear models in which the relation is unaffected by the observation of X, that is, β=βT=0.5,1,1.5. Moreover, we consider two segmented linear models. If the observation of X is larger than the third percentile of X, then the observation of Y is generated from a linear model with β = 1.5 Otherwise, it is generated from a linear model with βT=0.5 and βT=1.5, respectively. Several data-generating processes are considered for X and ɛ. The Student’s t-distribution is known to be heavy-tailed with the tail index equal to the degrees of freedom. We perform simulations of X and ɛ based on random draws from a Student’s t-distribution with three, four, and five degrees of freedom, which implies X and ɛ are heavy-tailed with tail indices of three, four, and five, respectively. The choice of the parameter α is similar to the estimates in the empirical analysis. Moreover, we perform simulations where X and ɛ exhibit temporal dependence and are each generated from a GARCH(1,1) process, that is, Zt=σZ,tζt, where σZ,t2=ψ0+ψ1Zt−12+ψ2σZ,t−12, for Z=X,ɛ. The parameter choices for the simulation with normally distributed innovations ζt are (ψ0,ψ1,ψ2)=(0.5,0.11,0.88), which implies X and ɛ are heavy-tailed with a tail index of 3.68; see Sun and Zhou (2014, Table 3). The parameter choices in another simulation based on innovations ζt from a standardized Student’s t-distribution with eight degrees of freedom are (ψ0,ψ1,ψ2)=(0.5,0.08,0.91), which implies X and ɛ are heavy-tailed with a tail index of 3.82. For each of the five models and data-generating processes, we generate 10,000 samples and estimate βT in each sample, using both the conditional regression approach and the EVT approach. Then, by comparing the estimates with the real βT value, we calculate the mean squared error (MSE), the estimation bias, and the estimation variance for the two approaches. For brevity, we report the simulations based on the Student’s t-distribution with four degrees of freedom, because the pattern across the simulations is very similar. Figures 1 and 2 show the simulation results for different choices on the number of observations in the tail, k. The first column of Figures 1 and 2 compares the MSE between the EVT approach and the conditional regression. Under the heavy-tailed set-up, we observe a better performance with the EVT approach relative to the conditional regression, if βT is estimated based on a few observations in the tail, that is, for low levels of k. However, the conditional regression may perform better if more observations from the moderate level are included, that is, for high levels of k. Nevertheless, the MSE of the EVT approach is not very sensitive to including more observations from the moderate level. The second and third columns of Figures 1 and 2 show the decomposition of the MSE into squared bias and variance. We observe that the estimates from the conditional regression bear a larger variance, while the estimation error from the EVT approach is mainly due to positive bias.6 Figure 1. View largeDownload slide Simulations with a global linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a global linear model ( βT=β). The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 1. View largeDownload slide Simulations with a global linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a global linear model ( βT=β). The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 2. View largeDownload slide Simulations with a segmented linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a segmented linear model, where the slope equals βT if the value of Xt is below its third percentile (which occurs on expectation for 37.5 observations in each sample), and to β otherwise. The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. Figure 2. View largeDownload slide Simulations with a segmented linear model. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The observations of Y are constructed from the simulated observations of X and ɛ following different linear relations. The observations of X and ɛ are randomly drawn from the Student’s t-distribution with four degrees of freedom. The observations of Y are constructed from a segmented linear model, where the slope equals βT if the value of Xt is below its third percentile (which occurs on expectation for 37.5 observations in each sample), and to β otherwise. The estimates from the simulations, β^T, are compared with the true value, βT. The MSE is calculated as m−1∑i(βT−β^iT)2, where i refers to the i-th simulated sample. The squared bias is calculated as (βT−β¯T)2 and the variance is calculated as m−1∑i(β¯T−β^iT)2, where β¯T=m−1∑iβ^iT. 2.2 Copula Models In the following simulations we compare the ability of the EVT approach and the conditional regression approach in predicting Yt from Xt when the data have a nonlinear structure. For each data-generating process, we use two Student’s t-distributions with four degrees of freedom for the marginal distributions of Y from X, but we use six different copulas for the dependence structure. Following the simulations in Støve, Tjøstheim, and Hufthammer (2014), these different copulas are a Clayton copula with parameter θ = 1, a Clayton copula with parameter θ = 2, a Gaussian copula with parameter ρ=0.3, a Gaussian copula with parameter ρ=0.8, a Gumbel copula with parameter θ = 2, and a Gumbel copula with parameter θ = 3. For each of these data-generating processes, we generate 10,000 samples. From each sample i, we hold out the observation (Yi,t,Xi,t) corresponding to the lowest Xi,t to generate an extreme out-of-sample observation. This observation is denoted as (Yi,t*,Xi,t*). Then we estimate βiT based on applying the conditional regression approach and the EVT approach to all other observations in that sample. The estimates are denoted as β^OLS,iT and β^EVT,iT, respectively. Following the approximation in the linear tail model, the expected value of Yi,t* is the product of the estimated coefficients βiT and the observed value of Xi,t*. We project Yi,t* from Xi,t* as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*. In this second set of simulations, we compare the performance of the conditional regression approach and the EVT approach based on the MSE of those projections. Figure 3 shows the results of this second set of simulations for different levels of k. For the Clayton copula, which exhibits lower tail dependence, the EVT approach clearly shows a better performance than the conditional regression approach for a wide change of choices for k. The EVT approach also outperforms the conditional regression approach for the Gaussian copula and the Gumbel copula, which are copulas that exhibit tail independence for the lower tails. If the dependence is weak for those copulas, then the EVT approach results in a smaller MSE than the conditional regression approach only for relatively low choices of k. However, if the dependence is stronger, then the better performance of the EVT approach relative to the conditional regression approach holds for a wider range of choices of k. Figure 3. View largeDownload slide Simulations based on copula models. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The marginal distributions of Y and X are Student’s t-distributions with four degrees of freedom in the simulations in each chart, but different copulas are used for the dependence structure as indicated in the subtitle of the individual charts. From each sample i, we hold out the observation corresponding to the lowest Xi,t. This observation is denoted as (Yi,t*,Xi,t*). Out-of-sample projections are calculated as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*, where β^EVT,iT and β^OLS,iT are estimated from all observation in sample i except (Yi,t*,Xi,t*). The MSE in each of the charts is calculated as m−1∑i(Y^i,t*−Yi,t*)2. Figure 3. View largeDownload slide Simulations based on copula models. Notes: The solid lines report the simulation results for the EVT approach; the dashed lines report those for the conditional regression approach. The simulations are based on m = 10,000 samples with n = 1,250 observations each. The marginal distributions of Y and X are Student’s t-distributions with four degrees of freedom in the simulations in each chart, but different copulas are used for the dependence structure as indicated in the subtitle of the individual charts. From each sample i, we hold out the observation corresponding to the lowest Xi,t. This observation is denoted as (Yi,t*,Xi,t*). Out-of-sample projections are calculated as Y^i,t*EVT=β^EVT,iTXi,t* and Y^i,t*OLS=β^OLS,iTXi,t*, where β^EVT,iT and β^OLS,iT are estimated from all observation in sample i except (Yi,t*,Xi,t*). The MSE in each of the charts is calculated as m−1∑i(Y^i,t*−Yi,t*)2. The challenge with real data is that it is unknown below which threshold the linear tail model can be used as a good approximation, because the true data-generating process is unobserved. Therefore, it is difficult to justify the right choice of k. Clearly, if the data-generating process is a global linear model, it is better to use all observations in a global linear regression model, since this results in a smaller variance of the estimates. However, such a method would suffer from a bias if the underlying dependence structure in the tail deviates from the global linear model, as, for example, in the segmented linear model. This bias can be severe if the deviation is large. To mitigate this bias, one may rely on observations in the tail only. The simulations show that, when estimates are based on a small number of tail observations, it is generally better to rely on the EVT approach rather than on the conditional regression approach. Choosing a very small k results in a high estimation variance. To strike a balance between bias and variance, a common approach in extreme value analysis is to calculate the estimator for various levels of k, and then choose a low level of k in a range of k that is associated with a relatively stable level of the estimate; see, for example, Drees, De Haan, and Resnick (2000). Moreover, in empirical applications, it is advisable to verify whether the conclusions are robust for various levels of k; see, for example, Loretan and Phillips (1994). 3 Illustration We compare the performance of the EVT approach and the conditional regression approach in an empirical illustration. We employ data on value-weighted returns of 48 industry-specific stock portfolios and a general market index in the United States. The return series run from 1931 until 2010.7 We divide the data into 16 five-year subperiods. We assess the performance of both approaches in projecting the losses of industry portfolios on the day of the largest market loss within each subperiod. Within each five-year period, we estimate the coefficient βjT in the linear tail model with the returns on industry portfolio j as the dependent variable and the excess market returns as the right-hand-side variable. In the estimation procedure, we exclude the day on which the market portfolio suffered its largest loss in order to obtain an “out of sample” estimate for the subsequent comparison. The coefficients are estimated with both the conditional regression approach and the EVT approach. The number of observations in each subperiod is on average 1315 and we estimate the coefficient βjT with k = 25, or, k/n≈2%. We denote these estimates as β^OLS,jT and β^EVT,jT, respectively. In line with the condition αy>12αx in Theorem 1, we exclude portfolios with α^j≤12α^m in each subperiod.8 In most subperiods, no portfolios are excluded. We denote the number of portfolios excluded for this reason by S. In most subperiods, S = 0. Table 1 reports the average β^EVT,jT of the remaining portfolios for each subperiod (denoted as N). Table 1 also reports the minimum and maximum β^EVT,jT and the corresponding industry name to give an indication of the range of the β^EVT,jTs. For each subperiod, the average estimate from the EVT approach is slightly above 1. Most β^EVT,jTs fall in the range between 0.5 and 2.0. These estimates imply that most portfolios are expected to lose between half and twice as much as the market portfolio in a market crash. Table 1. Estimates Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Notes: Within each period, we estimate coefficient βT in Equation (1) for the industry portfolios with non-missing observations using the EVT approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns from that period while excluding the observation on the day the market suffered its largest loss. The column labeled “Av. β^EVT,jT” reports the average of β^EVT,jT of N portfolios. The column labeled “S” reports the number of (excluded) portfolios with α^j≤12α^m. The last columns report the minimum and maximum β^EVT,jT, and the industry name from the data documentation. Table 1. Estimates Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Period Av. β^EVT,jT N S Minimum β^EVT,jT Maximum β^EVT,jT 1931–1935 1.15 39 3 0.61 Tobacco Prdcts 1.79 Recreation 1936–1940 1.06 41 1 0.43 Tobacco Prdcts 2.08 Recreation 1941–1945 1.16 42 0 0.63 Communication 2.47 Real Estate 1946–1950 1.08 43 0 0.37 Communication 1.69 Construction 1951–1955 1.00 43 0 0.38 Communication 1.61 Aircraft 1956–1960 1.09 43 0 0.53 Food Products 1.71 Electronic Eq. 1961–1965 1.15 43 0 0.58 Utilities 1.93 Recreation 1966–1970 1.25 47 0 0.55 Utilities 1.90 Recreation 1971–1975 1.15 48 0 0.65 Utilities 1.78 Entertainment 1976–1980 1.09 45 3 0.61 Utilities 1.62 Healthcare 1981–1985 1.11 48 0 0.62 Utilities 2.31 Precious Metals 1986–1990 1.00 48 0 0.54 Utilities 1.32 Candy & Soda 1991–1995 1.16 48 0 0.64 Utilities 1.86 Shipbldng & Railrd Eq. 1996–2000 1.01 48 0 0.44 Utilities 1.86 Coal 2001–2005 1.02 48 0 0.56 Real Estate 1.70 Electronic Eq. 2006–2010 1.12 48 0 0.56 Beer & Liquor 2.19 Coal Notes: Within each period, we estimate coefficient βT in Equation (1) for the industry portfolios with non-missing observations using the EVT approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns from that period while excluding the observation on the day the market suffered its largest loss. The column labeled “Av. β^EVT,jT” reports the average of β^EVT,jT of N portfolios. The column labeled “S” reports the number of (excluded) portfolios with α^j≤12α^m. The last columns report the minimum and maximum β^EVT,jT, and the industry name from the data documentation. Based on the βjT estimates, we make a projection of the losses on each portfolio j on the day that the market suffered its largest loss. For each subperiod, we report the largest loss on the market portfolio, defined as Lm=−min⁡{Rm,1e,…,Rm,te}, and the corresponding date in Table 2. We denote the actual loss on a specific industry portfolio on that day as Lj=−Rj,t*e, where t* refers to the day of the largest loss on the market portfolio. Following the linear tail model, the projections under the two approaches are L^EVT,j=Lmβ^EVT,jT and L^OLS,j=Lmβ^OLS,jT, respectively.9 We compare the performance of the two approaches by their root mean-squared error (RMSE) calculated as N−1∑jeEVT,j2 and N−1∑jeOLS,j2, where eEVT,j=Lj−L^EVT,j and eOLS,j=Lj−L^OLS,j. The best-performing method should report a lower RMSE. Table 2. Performance evaluation Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Notes: Within each period, we estimate the coefficient βT in Equation (1) for 48 industry portfolios using the EVT approach and the conditional regression approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns during that period while excluding the observation on the day the market suffered its largest loss. Within each period, we project the loss on the day of the largest market loss for each industry portfolio j as the product of βjT and the actual market loss. From the difference between the projected losses on the industry portfolios and the actual losses, we calculate the root mean-squared error (RMSE) for each approach. The last columns report the t-statistics calculated from Equation (13) and the corresponding p-values for testing against the null hypothesis that the EVT approach produces a higher RMSE than the conditional regression (OLS) approach. Table 2. Performance evaluation Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Period Worst day Market loss RMSE OLS RMSE EVT t-Stat p-Value 1931–1935 July 21, 1933 9.33 6.69 4.04 2.60 0.007 1936–1940 October 18, 1937 8.10 4.93 2.59 3.18 0.001 1941–1945 December 8, 1941 4.09 2.37 1.95 1.64 0.055 1946–1950 September 3, 1946 6.82 2.39 1.45 1.68 0.050 1951–1955 September 26, 1955 6.49 2.86 1.47 3.53 0.001 1956–1960 October 21, 1957 3.04 1.96 1.20 2.98 0.002 1961–1965 May 28, 1962 6.99 3.22 2.73 1.35 0.093 1966–1970 May 25, 1970 3.21 2.23 1.42 2.63 0.006 1971–1975 November 18, 1974 3.57 2.46 1.01 3.16 0.001 1976–1980 October 9, 1979 3.44 2.07 0.85 5.42 0.000 1981–1985 October 25, 1982 3.62 2.33 1.07 3.73 0.000 1986–1990 October 19, 1987 17.44 6.73 3.95 1.34 0.094 1991–1995 November 15, 1991 3.55 2.83 2.01 1.76 0.042 1996–2000 April 14, 2000 6.73 4.35 3.37 1.11 0.136 2001–2005 September 17, 2001 5.03 5.76 5.18 0.99 0.164 2006–2010 December 1, 2008 8.95 3.37 1.40 3.59 0.000 Notes: Within each period, we estimate the coefficient βT in Equation (1) for 48 industry portfolios using the EVT approach and the conditional regression approach with k = 25, or, k/n≈2%. For each period, we estimate βTs based on daily excess returns during that period while excluding the observation on the day the market suffered its largest loss. Within each period, we project the loss on the day of the largest market loss for each industry portfolio j as the product of βjT and the actual market loss. From the difference between the projected losses on the industry portfolios and the actual losses, we calculate the root mean-squared error (RMSE) for each approach. The last columns report the t-statistics calculated from Equation (13) and the corresponding p-values for testing against the null hypothesis that the EVT approach produces a higher RMSE than the conditional regression (OLS) approach. Moreover, to provide a formal assessment on whether the differences in RMSE within each subperiod are statistically significant or due to sampling variability, we implement a Diebold and Mariano (1995) type test with the small-sample correction proposed by Harvey, Leybourne, and Newbold (1997). In particular, with the paired differences dj=eEVT,j2−eOLS,j2, we calculate the following test statistic for each subperiod t-stat=d¯V^(d¯)/(N−1), (13) where d¯=N−1∑jdj, and V^(d¯)=∑j(dj−d¯)2N. Following Harvey, Leybourne, and Newbold (1997), the t-statistic in Equation (13) can be compared against the critical values of the Student’s t-distribution with degrees of freedom (N−1).10 This test relies on assuming that the differences dj follow a normal distribution. Although the error terms eEVT,j and eOLS,j may inherit heavy tails from the distribution of stock returns, Harvey, Leybourne, and Newbold (1997) affirm that the test results are in general not very sensitive to the presence of heavy tails. Table 2 reports the RMSE of the projected portfolio losses for both approaches in each subperiod. In all subperiods, the EVT approach reports a lower RMSE than the conditional regression approach.11 The average reduction in the RMSE is approximately 40%. We report t-statistics calculated from Equation (13) and the corresponding p-values in the last two columns to test against the null hypothesis of a larger RMSE from the EVT approach. In 11 out of 16 subperiods, the null is rejected at the 5% significance level. We zoom in on two subperiods with relatively big and relatively small reductions in the RMSE. A substantial improvement in the projections in terms of RMSE is during the stock market plunge of 8.1% on October 18, 1937. This stock market crash occurred during a period of high uncertainty about the US economy: During the 9-month economic decline from September 1937 to June 1938, national income fell by 12% and firm profits fell by 78%; see, for example, Roose (1948). The errors in the projected losses with the EVT approach are significantly smaller at the 1% significance level. The improvement in the projections is limited for the stock market plunge of 5.0% on September 17, 2001. This was the day that the NYSE resumed trading after the terrorist attacks 6 days earlier. For this stock market crash, the RMSE of the regression approach is 5.76, while the RMSE for the EVT approach is 5.18 (no significant difference). A relatively high proportion of the forecasting errors on this day are due to the reaction of a few industries because of the nature of this particular event. The industry portfolios “Defense” and “Shipbuilding” would usually move in the same direction as the stock market index. However, during this crash, they reported a gain of 15% and 7%, respectively. In contrast, because of the nature of the terrorist attacks, the industry portfolios “Aircraft” and “Transportation” (which includes “Air transportation”) reacted much more strongly. These portfolios lost 18% and 14%, respectively. The linear tail model does not anticipate the profits and losses for these four portfolios in this case. If the projection errors for these four portfolios are excluded, the RMSE of the EVT approach and conditional approach decline substantially to a level of 4.40 and 3.48, respectively. Moreover, the difference between the MSE of the EVT approach and that of the conditional regression approach in this subperiod becomes (weakly) significant at the 10% level if these four portfolios are excluded. To summarize, the EVT approach shows a better overall performance than the conditional regression approach in projecting portfolio losses on the worst market day. We interpret this better performance as resulting from improved accuracy in estimating βT based on a small number of extreme observations. 3.1 Robustness Checks We perform several robustness checks on the performance of the EVT approach. First, the simulation results in Section 2 suggest that the performance of the EVT approach relative to the conditional regression approach is stronger if estimates are based on a small number of tail observations. To verify this in our empirical illustration, we vary the level of k over a range of values. With k fixed at 20, 30, 35, 40, 45, and 50, the null is rejected at the 5% significance level for 11, 11, 9, 8, 5, and 5 subperiods, respectively. This confirms that the better performance of the EVT approach relative to the conditional regression approach is stronger for lower levels of k. Second, we also estimate βT using a modified EVT approach where the tail index is estimated using the modified Hill estimator of Huisman et al. (2001) for small samples using 25% of the observations, while all other components are estimated with k = 25. When comparing this modified EVT approach with the conditional regression approach using k = 25, then we reject the null at the 5% significance level for nine subperiods. Third, we compare the EVT approach with an alternative benchmark that minimizes the mean absolute error to estimate the coefficient in the linear tail model. This is equivalent to estimating a quantile regression based on observations corresponding to the k = 25 lowest X to predict the median of Y given an extremely low level of X. With this benchmark, the null is rejected at the 5% significance level for 12 subperiods. To summarize, the better performance of the EVT approach is robust with respect to several methodological variations. We also compare the performance of the EVT approach with k = 25 with that of a global linear regression model using all daily observations in the subperiod (except the observation corresponding to the largest market loss). In this setting, the difference in performance not only depends on the properties of the estimators, but also on the severity of nonlinearities in the dependence structure of Y and X in the empirical application. Stronger nonlinearities increase the bias in the projections from the global linear regression model, which, in the absence of these nonlinearities, is expected to result in a lower RMSE. In the context of industry portfolios, the EVT approach does not perform better than a global linear model as it results in an RMSE that is empirically lower in only 6 out of the 16 subperiods. This suggests that the nonlinearities in the dependence structure between the market returns and the returns on industry portfolios are, in general, not sufficiently severe to warrant better projections from the EVT approach. Such nonlinearities can be expected to be more prevalent in the returns of currencies (Lettau, Maggiori, and Weber, 2014), safe-haven commodities (Baur and McDermott, 2010), hedge funds (Patton, 2009), portfolios with options-like payoffs (Fung and Hsieh, 2001), and even the returns of individual stocks (Van Oordt and Zhou, 2016). 4 Concluding Remarks In this paper, we propose an EVT approach for estimating βT in the linear tail model based on only a small number of extreme observations. Simulations show that our EVT approach yields a lower MSE than conditional regressions on tail observations. We demonstrate one application of the EVT approach: projecting large losses on industry portfolios in extremely adverse market conditions. The estimator of βT might be further improved by considering more sophisticated EVT techniques. For instance, the estimator on the tail index, α^x, suffers from an asymptotic bias issue. Many studies propose different bias corrected estimators; see, for example, Peng (1998), Feuerverger and Hall (1999), and Gomes, de Haan, and Rodrigues (2008). In addition, Fougères, de Haan, and Mercadier (2015) propose a bias-corrected estimator for the τ-measure. Whether the use of such sophisticated techniques will improve the current simple estimation procedure for βT offers an interesting perspective for future research. Appendix: Proofs Throughout the proof, we need the following lemma as an auxiliary result, which approximates the probability of joint extreme events by that of a marginal extreme event. Lemma 1 Suppose βT>0 and αy>12αx. Under the linear tail model in Equation (1) and the heavy-tailedness of the downside distributions in Equation (2), we have that limp→0Pr(Y<Qy(py),X<Qx(px))Pr(X<min⁡(Qy(py)βT,Qx(px)))=1, (14)uniformly for (x,y)∈(0,3/2]2. Proof of Lemma 1 Denote the two sets in the numerator and denominator of Equation (14) as C:={Y<Qy(py),X<Qx(px)}={βTX+ɛ<Qy(py),X<Qx(px)},C0:={X<min⁡(Qy(py)βT,Qx(px))}. The goal is to prove that Pr(C)∼Pr(C0) as p→0 uniformly for all (x,y)∈(0,3/2]2. To achieve this goal, we first find suitable upper and lower bounds for Pr(C) using set manipulations. Denote the following sets C1:={βTX<Qy(py)(1+δ),X<Qx(px),ɛ<−δQy(py)},C21:={βTX<Qy(py)(1−δ),X<Qx(px)},C22:={ɛ<δQy(py),X<Qx(px)}, with δ:=δ(p)>0 to be specified later. It is clear that, for any 0<δ<1,C1⊂C⊂C21∪C22. We prove the lemma by showing, for some proper choice of δ, the limit relations Pr(C1)Pr(C0)→1, Pr(C21)Pr(C0)→1  and  Pr(C22)Pr(C0)→0, (15) hold uniformly as p→0 for all (x,y)∈(0,3/2]2. We first prove the limit relation for Pr(C1)/Pr(C0). Since X and ɛ are independent, we have that Pr(C1)=Pr(X<min⁡(Qy(py)βT(1+δ(p)),Qx(px)))Pr(ɛ<−δ(p)Qy(py)). (16) From the heavy-tailed property of the distribution function of Y in Equation (2), we obtain that Qy(p)=−p−1/αyl˜y(p), with l˜y(p) denoting a slowly varying function as p→0. By taking δ(p)=pc with 0<c<1αy, we have that −δ(p)Qy(py)=pc−1/αyy1/αyl˜y(py)→+∞ as p→0, which ensures that the second term Pr(ɛ<−δ(p)Qy(py))→1 as p→0 holds uniformly for all 0<y≤3/2. Next, for the first term of Pr(C1) in Equation (16), we show that uniformly for all (x,y)∈(0,3/2]2, limp→0Pr(X<min⁡(Qy(py)βT(1+δ(p)),Qx(px)))Pr(X<min⁡(Qy(py)βT,Qx(px)))=1. (17) This relation follows directly from the following reasons. The denominator of Equation (17) provides an upper bound for the numerator. Moreover, the following inequality provides a lower bound for the numerator: (1+δ(p))min⁡(Qy(py)βT,Qx(px))≤min⁡(Qy(py)βT(1+δ(p)),Qx(px)). From the heavy-tailedness of the distribution of X, we get that, uniformly for all (x,y)∈(0,3/2]2, the lower bound satisfies that Pr(X<(1+δ(p))min⁡(Qy(py)βT,Qx(px)))Pr(X<min⁡(Qy(py)βT,Qx(px)))∼(1+δ(p))−αx→1, as p→0. Here, in the last step we use the fact that δ(p)=pc→0 as p→0. Equation (17) is thus proved. Combining the results regarding the two terms of Pr(C1) in Equation (16), we get that, as p→0, Pr(C1)Pr(C0)→1 uniformly for all (x,y)∈(0,3/2]2. The proof of the limit result that Pr(C21)Pr(C0)→1 as p→0 holds uniformly for all (x,y)∈(0,3/2]2 follows similar lines. Finally, we deal with the limit relation for Pr(C22)Pr(C0). For this purpose, we first prove lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=0 for any given υ>0. Due to the independence between X and ɛ, we have Pr(C22)=Pr(ɛ<δQy(py))px. Moreover, we have the following inequality: Pr(ɛ<δ(p)Qy(py))Pr(βTX<−δ(p)2Qy(py)),=Pr(ɛ<δ(p)Qy(py),βTX<−δ(p)2Qy(py)),≤Pr(Y<δ(p)2Qy(py)). This implies that, for any given υ>0, lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=lim⁡sup⁡p→0Pr(ɛ<δ(p)Qy(py))p1−cαy−υx,≤lim⁡sup⁡p→0Pr(Y<δ(p)2Qy(py))p1−cαy−υPr(βTX<−δ(p)2Qy(py))x,=lim⁡sup⁡p→0(−δ(p)2Qy(py))−αyly(δ(p)2Qy(py))p1−cαy−υPr(βTX<−δ(p)2Qy(py))x,=lim⁡sup⁡p→0pυ2αyy(l˜y(py))−αyly(δ(p)2Qy(py))Pr(βTX<−δ(p)2Qy(py))x=0. Here, the last step follows from the facts that (l˜y(py))−αyly(δ(p)2Qy(py)) is a slowly varying function as p→0, Pr(βTX<−δ(p)2Qy(py))→1 as p→0, and cαy<1. Next, we show that for some small υ>0, limp→0Pr(C0)p2−cαy−υ=+∞. Since Pr(C0)=min⁡(Pr(X<Qy(py)βT),px)=min⁡(((py)−1/αyl˜y(py)βT)−αxlx(−Qy(py)βT),px)=min⁡(pαx/αy(y−1/αyl˜y(py)βT)−αxlx(−Qy(py)βT),px). In order to obtain limp→∞Pr(C0)p2−cαy−υ=+∞, we need to choose c and υ such that 2−cαy−υ>max⁡(αxαy,1). Since it is assumed that αy>12αx, this can be achieved by first choosing c<min⁡(1αy,1αy(2−αxαy)), which implies that 2−cαy>max⁡(αxαy,1), and then choosing a small υ>0. With these choices, limp→0Pr(C0)p2−cαy−υ=+∞. Together with lim⁡sup⁡p→0Pr(C22)p2−cαy−υ=0, we get that limp→0Pr(C22)Pr(C0)=0. By proving all limit relations in Equation (15), we proved the lemma. Proof of Theorem 1 If βT=0, then τ(p)=p. From the heavy-tailed property of the distribution function of X and Y, we get that Qx(p)=−p−1/αxl˜x(p) and Qy(p)=−p−1/αyl˜y(p), where l˜x and l˜y are two slowly varying functions as p→0. Consequently, limp→0(τ(p))1/αxQy(p)Qx(p)=limp→0p2/αx−1/αyl˜y(p)l˜x(p)=0=βT, where in the last step we used the fact that the assumption αy>12αx implies 2/αx−1/αy>0 and the fact that l˜y(p)l˜x(p) is also a slowly varying function as p→0. If βT>0, the proof of the theorem relies on the following relation: liminf⁡p→0Qy(p)βTQx(p)≥1, (18) which we prove by contraction as follows. Suppose there would exist a sequence pn→0 as n→∞ such that, for all pn, Qy(pn)βTQx(pn)<1−δ for some δ>0. This would imply that pn=Pr(Y<Qy(pn))≥Pr(Y<(1−δ)βTQx(pn)). (19) For sufficiently large n, we have pn<p¯, which implies that the linear model in Equation (1) applies for sufficiently large n. Hence, we would have, for sufficiently large n, that Pr(Y<(1−δ)βTQx(pn))≥Pr(βTX<(1−δ2)βTQx(pn),ɛ<−δ2βTQx(pn)),=Pr(X<(1−δ2)Qx(pn))Pr(ɛ<−δ2βTQx(pn)), (20) where the last step is due to the independence between X and ɛ. Regarding the second term in Equation (20), notice that, as p→0, Pr(ɛ<−δ2βTQx(p))→1, since −δ2βTQx(p)→+∞. In other words, given any κ>0, we would have Pr(ɛ<−δ2βTQx(pn))>1−κ for sufficiently large n. Regarding the first term in Equation (20), notice that limp→0Pr(X<(1−δ2)Qx(p))Pr(X<Qx(p))=(1−δ2)−αx. Thus for any κ>0, for sufficiently large n, Pr(X<(1−δ2)Qx(pn))>pn(1−δ2)−αx(1−κ). Combining those results on the two terms in Equation (20) gives that, for any κ>0 and sufficiently large n, Pr(Y<(1−δ)βTQx(pn))>pn(1−δ2)−αx(1−κ)2. By choosing sufficiently small κ, we would have (1−δ2)−αx(1−κ)2>1, which contradicts with Equation (19). Therefore, we conclude that Equation (18) must hold true. Next, we turn to the proof of the theorem. By applying Lemma 1, with x=y=1, we obtain limp→0pτ(p)Pr(X<min⁡(Qy(p)βT,Qx(p)))=1. After substituting Pr(X<Qx(p)) for p, we need to handle the quotient Pr(X<min⁡(Qy(p)βT,Qx(p)))Pr(X<Qx(p)) as p→0. To handle this quotient we use Lemma 2.1 in Drees (1998): The second-order condition (3) implies that for any given δ>0, there exists a level of u0=u0(δ), such that |xαxPr(X<−ux)Pr(X<−u)−1η(u)|≤δx−αx+δ, for all u>u0 and ux>u0. Consequently, if x=x(u)>x0 for some given x0>0, we have that limu→∞xαxPr(X<−ux)Pr(X<−u)=1, (21) We apply the limit relation (21) by substituting ux and u by −min⁡(Qy(p)βT,Qx(p)) and −Qx(p), respectively. This is feasible because it follows from the inequality in Equation (18) that min⁡(Qy(p)βT,Qx(p))Qx(p)=max⁡(Qy(p)βTQx(p),1)≥1. With the planned substitution, we get that limp→0Pr(X<min⁡(Qy(p)βT,Qx(p)))Pr(X<Qx(p))·(max⁡(Qy(p)βTQx(p),1))αx=1, which implies that limp→0τ(p)·max⁡((Qy(p)βTQx(p))αx,1)=1. (22) The theorem follows directly from combining Equations (18) and (22). Proof of Theorem 2 Write β^T=(τ^(k/n)τ(k/n))1/α^x·(τ(k/n))1/α^x−1/αx·Q^y(k/n)Qy(k/n)·Qx(k/n)Q^x(k/n)·((τ(k/n))1/αxQy(k/n)Qx(k/n))=:I1·I2·I3·I4·I5. The classic consistency results in extreme value statistics ensures that α^x→Pαx, Q^x(k/n)Qx(k/n)→P1, and Q^y(k/n)Qy(k/n)→P1 as n→∞; see Theorem 3.2.2 and Corollary 4.3.9 in De Haan and Ferreira (2006). Hence, I3,I4→P1 as n→∞. Theorem 1 ensures that I5→βT as n→∞. Therefore, the only issues left to prove are I1,I2→P1 as n→∞. We first deal with I1, which is equivalent to prove the consistency of τ^(k/n). Denote τ˜(x,y)=1k∑t=1n1{Xt<Qx(knx) and Yt<Qy(kny)}, (23) for (x, y) in the neighborhood of (1, 1). Then, τ^(k/n) can be written as τ^(k/n)=τ˜(nkFx(Xn,k+1),nkFy(Yn,k+1)). Here, (nkFx(Xn,k+1),nkFy(Yn,k+1)) is in the neighborhood of (1, 1) in the following sense. According to Corollary 2.2.2 in De Haan and Ferreira (2006), as n→∞, k(nkFx(Xn,k+1)−1)→dN(0,1). Hence, for any δ>0, as n→∞, Pr(|nkFx(Xn,k+1)−1|>k−1/2+δ)→0. A similar relation for Yn,k+1 holds. Therefore, in order to prove that I1→P1 as n→∞, we will prove a more general result that τ˜(x,y)/τ(k/n)→P1 uniformly for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2 for some 0<δ<1/2. By applying the law of large number, we get that as n→∞, τ˜(x,y)R(x,y,k/n)→P1, (24) where R(x,y,k/n)=nkPr(X<Qx(knx),Y<Qy(kny)). Notice that τ(k/n)=R(1,1,k/n). Hence, what remains to be proved is that the denominator in Equation (24) can be replaced by τ(k/n), that is, limn→∞R(x,y,k/n)τ(k/n)=1 (25) holds uniformly for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2. If βT=0, as n→∞, for all (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2, we have uniformly limn→∞R(x,y,k/n)τ(k/n)=limn→∞nkknxknynk(kn)2=limn→∞xy=1, where the last equality uses x→1 and y→1 as n→∞, since k→∞ as n→∞. If βT>0, applying Lemma 1 with p=k/n directly gives that limn→∞R(x,y,k/n)nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))=1 holds uniformly for (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2⊂(0,3/2]2. We further simplify the denominator using the limit relation in Equation (21) as follows: as n→∞, nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))=Pr(X<min⁡(Qy(kny)βT,Qx(knx)))Pr(X<Qx(kn))∼(min⁡(Qy(kny)βT,Qx(knx))Qx(kn))−αx=min⁡((βTQx(kn)Qy(kny))αx,(Qx(kn)Qx(knx))αx). From Theorem 1, we get that (βTQx(kn)Qy(kny))αx∼τ(k/n) as n→∞ holds uniformly for |y−1|≤k−1/2+δ. In addition, (Qx(kn)Qx(knx))αx→1 as n→∞ holds uniformly for |x−1|≤k−1/2+δ. Together with τ(k/n)≤1, we get that nkPr(X<min⁡(Qy(kny)βT,Qx(knx)))∼τ(k/n) holds uniformly for (x,y)∈[1−k−1/2+δ,1+k−1/2+δ]2, as n→∞. Hence, we proved Equation (25) and consequently handled the term I1. We remark that our proof allows for limp→0τ(p)=0, which goes beyond the typical consistency results in bivariate extreme value statistics. Finally, we deal with I2. If lim⁡sup⁡p→0τ(p)>0, then the consistency of α^x leads to I2→P1, as n→∞. In this case, the theorem is proved without using Conditions (3), (9) and (10). If limp→0τ(p)=0, to prove I2→P1, we need to prove that as n→∞, log⁡τ(k/n)(1α^x−1αx)→P0. (26) The conditions in Equations (3) and (9) imply the asymptotic normality for α^x: as n→∞, k1(1α^x−1αx)=Op(1); see, for example, Theorem 3.2.5 in De Haan and Ferreira (2006). Therefore, it only remains to prove that log⁡τ(k/n)=o(k1) as n→∞. If βT=0, then τ(k/n)=k/n>1/n. Hence, as n→∞, log⁡τ(k/n)=O(log⁡n). If βT>0, following Theorem 1, we get that for sufficiently large n, τ(k/n)∼(βTQx(kn)Qy(kn))αx=(βT)αx(kn)αx/αy−1(l˜x(kn)l˜y(kn))αx>D(kn)αx/αy−1+δ, for some D > 0 and δ>0. The last step comes from the Potter inequality; see, for example, Inequality (B.1.19) in De Haan and Ferreira (2006). Therefore, log⁡τ(k/n)=O(log⁡n) as n→∞. Combining log⁡τ(k/n)=O(log⁡n) as n→∞ with Condition (10), we have that log⁡τ(k/n)=o(k1) as n→∞, which implies that I2→P1. ▪ Proof of Theorem 3 We start by deriving the explicit form for R(x, y) and its partial derivatives at (1, 1) because these quantities play an important role in the calculation of the asymptotic variance. Notice that R is a homogeneous function with degree 1. Thus, it is only necessary to derive R(x,1) for x > 0. This is given in the following lemma. Lemma 2 Under the conditions in Theorem 3, we have R(x,1)=min⁡(x,τ) for x > 0. Proof Theorem 1 implies that limp→0Qy(p)Qx(τp)=limp→0τ1/αQy(p)Qx(p)=βT. Hence, for any τ<x<1, we have that for sufficiently small p, Qy(p)≤βTQx(xp). On the other hand, for any 0<x<τ, for sufficiently small p, Qy(p)≥βTQx(xp). By applying Lemma 1 with y = 1, we get that limp→0R(x,1)1pPr(X<min⁡(Qy(p)βT,Qx(px)))=1. In particular, for τ<x<1, since Qy(p)≤βTQx(xp), R(x,1)=limp→0Pr(βTX<Qy(p))Pr(X<Qx(p))=limp→0(Qy(p)βTQx(p))−αx=τ. On the other hand, for 0<x<τ, R(x,1)=limp→0Pr(X<Qx(xp)p=x. Finally, for the point x=τ, we use continuity of the R(x,1) function to get that R(τ,1)=τ. The lemma is thus proved. ▪ As a direct consequence of Lemma 2, we get that for xy>τ R(x,y)=yR(xy,1)=τy. Hence, the partial derivatives of R at the neighborhood of (1, 1) exist as R1(1,1)=0 and R2(1,1)=τ, where R1, R2 denotes the partial derivatives of R with respect to x and y, respectively. With the expression of the R function and its partial derivatives, we apply Theorem 7.2.2 in De Haan and Ferreira (2006) to obtain the asymptotic normality of τ^(k/n) as k(τ^(k/n)−τ)→PW(1,1)−τW(+∞,1), (27) where W(x, y) is a continuous mean zero Gaussian process with the following covariance structure: EW(x1,y1)W(x2,y2)=R(min⁡(x1,x2),min⁡(y1,y2)). In addition, the Gaussian process W also governs the asymptotic normality for marginals as sup⁡0<x≤T1xλ|k(1k∑t=1n1{Xt<Qx(knx)}−x)−W(x,+∞)|→P0, (28) and sup⁡0<y≤T1yλ|k(1k∑t=1n1{Yt<Qy(kny)}−y)−W(+∞,y)|→P0, (29) for given T > 0 and 0≤λ<1/2. Next, from the asymptotic properties in Equations (28) and (29), we get that, as n→∞, k(1α^x−1αx)→P1αx(∫01W(s,+∞)dss−W(1,+∞)), (30) k(Q^x(k/n)Qx(k/n)−1)→P1αxW(1,+∞), (31) k(Q^y(k/n)Qy(k/n)−1)→P1αyW(+∞,1). (32) Here we use the conditions k=o(nζ) with ζ<min⁡(2γαx+2γ,2γ′αy+2γ′). For the derivation of the three relations; see Example 5.1.5 and Equation (5.1.19) in De Haan and Ferreira (2006). Now we can deal with the asymptotic normality of the estimator β^T by combining the asymptotic normality results for the four elements: Using the Cramér’s delta method, we can use the asymptotic relations in Equations (27) and (30)–(32) to obtain that k(β^Tτ1/αxQy(k/n)Qx(k/n)−1)→PΓ, where Γ=1αx(1τW(1,1)−W(+∞,1))+(log⁡τ)1αx(∫01W(s,+∞)dss−W(1,+∞)),+1αyW(+∞,1)−1αxW(1,+∞),=1αx(1τW(1,1)+(log⁡τ)∫01W(s,+∞)dss−(1+log⁡τ)W(1,+∞)). Using the expression of R(x,1) in Lemma 2, one can calculate that Var(Γ)=1αx2(1τ−1−(log⁡τ)2). Therefore, what remains to be proved is the following deterministic relation limn→∞k(τ1/αxQy(k/n)Qx(k/n)βT−1)=0. Knowing that limn→∞Qy(k/n)Qx(k/n)=βTτ−1/αx>0, the above relation is equivalent to limn→∞k(τ−(βTQx(k/n)Qy(k/n))αx)=0. Next, from Condition (12) and k=o(n2θ/(2θ+1)), we get that limn→∞k(τ−τ(k/n))=0. Hence, what remains to be proved is limn→∞k(τ(k/n)−(βTQx(k/n)Qy(k/n))αx)=0. (33) Notice that in Lemma 1, by considering p=k/n, x=y=1, the denominator is simplified to Pr(X<Qy(k/n)/βT) because for sufficiently large n, Qx(k/n)>Qx(zk/n)>Qy(k/n)/βT for some z∈(τ,1). Hence, Equation (14) implies that τ(k/n)−(βTQx(k/n)Qy(k/n))αx→0 as n→∞. Hence, to prove Equation (33), we aim at reproving this limit relation with an additional “speed of convergence” k. Consequently, we revisit the proof of Lemma 1 with adding the k factor to each limit relation in Equation (15). Recall the sets C, C0, C1, C21, and C22 defined in the proof of Lemma 1. Without loss of generality, we use the same notation to indicate the sets when taking p=k/n, x=y=1, and δ=δn=k−1/2−κ with κ>0 such that (1/2+κ)αy+3/2<1/ζ. Notice that, since ζ<2αy+3, the choice of κ is feasible. With such a choice, we have that limn→∞kδn=0 and limn→∞kδn−αykn=0. Next, we prove the following limit relations that are used for handling the probability of the sets. As n→∞, kPr(ɛ<δnQy(k/n))→0 and kPr(ɛ≥-δnQy(k/n))→0, (34) k(nkPr(βTX<Qy(k/n)(1±δn))−(βTQx(k/n)Qy(k/n))αx)=0. (35) Proof of Equation (34) We start with the first half. Notice that Pr(ɛ<δnQy(k/n))=Pr(ɛ<δnQy(k/n),X<Qx(p¯))Pr(X<Qx(p¯)),≤Pr(Y<δnQy(k/n)+βTQx(p¯))p¯∼δn−αyk/np¯. In the last step, we use a limit relation similar to Equation (21) based on the second-order condition (11) and the facts that 1/δn>1 and δnQy(k/n)→−∞ as n→∞. Since kδn−αykn→0 as n→∞, the first half of Equation (34) is proved. For the second half, we write Pr(ɛ≥−δnQy(k/n))=Pr(ɛ≥−δnQy(k/n),1βT+1δnQy(k/n)≤X<Qx(p¯))Pr(1βT+1δnQy(k/n)≤X<Qx(p¯)),≤Pr(Y≥−1βT+1δnQy(k/n))p¯−Pr(X<1βT+1δnQy(k/n)),≤DPr(Y<1βT+1δnQy(k/n))p¯−Pr(X<1βT+1δnQy(k/n)), for some constant D > 0. Here, the last step uses the condition that Pr(Y>u)=O(Pr(Y<−u)). Notice that the denominator converges to p¯, which is positive and finite. The second half of Equation (34) is thus proved similar to the proof for the first half.▪ Proof of Equation (35) Recall the second-order condition (3). The condition that k=O(nζ) with ζ<2γ2γ+αx implies that kη(Qx(k/n))→0. Together with the fact that Qy(k/n)(1±δn)βTQx(k/n)→τ−1/αx, we get that limn→∞k(nkPr(βTX<Qy(k/n)(1±δn))−(Qy(k/n)(1±δn)βTQx(k/n))−αx)=0. Hence, Equation (35) is proved since limn→∞kδn=0.▪ Now, we return to prove Equation (33) by dealing with the three sets C1, C21, and C22. First, the limit relation in Equation (35) implies that limn→∞k(nkPr(C21)−(βTQx(k/n)Qy(k/n))αx)=0. The first half of the limit relation (34) implies that limn→∞knkPr(C22)=0. Next, for C1, due to independency, we have that Pr(C1)=Pr(βTX<Qy(k/n)(1+δn))·Pr(ɛ<−δnQy(k/n)). The second half of the limit relation (34) implies that limn→∞k(Pr(ɛ<−δnQy(k/n))−1)=0. Together with Equation (35), we get that limn→∞k(nkPr(C1)−(βTQx(k/n)Qy(k/n))αx)=0. By combining Pr(C1), Pr(C21), and Pr(C22), which gives the lower and upper bounds of Pr(C), we proved Equation (33) and thus the theorem.▪ Footnotes 1 The second-order condition for X is assumed to generalize Theorems 1 and 2 to also hold for αx/2<αy<αx. 2 The τ-measure in Equation (4) is closely related to the measure E(κ|κ≥1) introduced by Huang (1992) and applied by Hartmann, Straetmans, and de Vries (2004). There, κ is the number of events occurring with probability p and E(κ|κ≥1) is the expected number of tail events given that there is at least one. In the bivariate case, the two measures are connected by E(κ|κ≥1)=22−τ. 3 An example demonstrating the compatibility of the three second-order conditions (3), (11), and (12) with the linear tail model in Equation (1) is Y=βX+ɛ, where X and ɛ are independently Cauchy distributed. In this case, we have that (X, Y) follows a bivariate Cauchy distribution satisfying these conditions with γ=γ′=−2 and θ = 1. 4 Formally, the estimator in the conditional regression approach is β^OLST=∑{t:Xt<Xn,k+1}(Yt−Y¯T)(Xt−X¯T)∑{t:Xt<Xn,k+1}(Xt−X¯T)2, where X¯T=(1/k)∑{t:Xt<Xn,k+1}(Xt) and Y¯T=(1/k)∑{t:Xt<Xn,k+1}(Yt). The estimator in the conditional regression approach is theoretically unbiased if ɛ has a zero conditional mean. A direct theoretical comparison of the asymptotic variances of the EVT approach and the conditional regression approach is difficult because their levels depend on different statistical parameters. For example, the asymptotic variance of β^OLST depends on the variance of ɛ, while the EVT approach does not assume finite variances. 5 In a sample of 1,250 observations, about 1,250×3.0%=37.5 observations are expected to be generated from the linear tail model. 6 To illustrate the decomposition of the MSE, we report the squared bias in the figures, which do not show the sign of the bias. Numerical results show that the bias of the EVT approach is consistently positive for all simulated models. 7 Data and documentation are available from the personal website of Kenneth French: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. Our results are based on data accessed on January 8, 2014. The secondary data are based on the returns of stocks listed on NYSE, AMEX, and NASDAQ in the CRSP database. The definition of the industry portfolios is based on SIC codes. Industry portfolios with missing returns in a subperiod are excluded from the analysis for that specific subperiod. Five industry portfolios report missing returns in the start of the sample. From July 1963 onward, one industry portfolio remains unavailable (“Healthcare,” SIC-codes 8000-8099). After July 1969, all 48 portfolios are available. 8 Our results remain qualitatively unchanged when these portfolios are included in the analysis. 9 Our conclusions remain the same if the projected losses under the conditional regression approach are calculated as L^OLS,j=COLS,jT+Lmβ^OLS,jT, where COLS,jT=Y¯T−β^OLS,jTX¯T. In this case, the EVT approach provides a lower RMSE at the 5% significance level in 6 out of 16 cases (8 out of 16 cases at the 10% significance level). 10 The test statistic in Equation (13) follows directly from the modified Diebold–Mariano test statistic of Harvey, Leybourne, and Newbold [1997, Equation (9)] with h = 1, which corresponds to assuming that the correlations across all dj are zero. 11 In all subperiods, the mean absolute error from the EVT approach is also below that of the conditional regression approach. References Ang A. , Chen J. . 2002 . Asymmetric Correlations of Equity Portfolios . Journal of Financial Economics 63 : 443 – 494 . Google Scholar CrossRef Search ADS Atanasov V. , Nitschka T. . 2014 . Currency Excess Returns and Global Downside Market Risk . Journal of International Money and Finance 47 : 268 – 285 . Google Scholar CrossRef Search ADS Baur D. G. , McDermott T. K . 2010 . Is Gold a Safe Haven? International Evidence . Journal of Banking & Finance 34 : 1886 – 1898 . Google Scholar CrossRef Search ADS Bollerslev T. , Todorov V. , Li S. Z. . 2013 . Jump Tails, Extreme Dependencies, and the Distribution of Stock Returns . Journal of Econometrics 172 : 307 – 324 . Google Scholar CrossRef Search ADS Chernozhukov V. , Fernández-Val I. . 2011 . Inference for Extremal Conditional Quantile Models, with an Application to Market and Birthweight Risks . Review of Economic Studies 78 : 559 – 589 . Google Scholar CrossRef Search ADS De Haan L. , Ferreira A. . 2006 . Extreme Value Theory: An Introduction . New York : Springer . Google Scholar CrossRef Search ADS De Haan L. , Stadtmüller U. . 1996 . Generalized Regular Variation of Second Order . Journal of the Australian Mathematical Society (Series A) 61 : 381 – 395 . Google Scholar CrossRef Search ADS De Vries C. G . 2005 . The Simple Economics of Bank Fragility . Journal of Banking & Finance 29 : 803 – 825 . Google Scholar CrossRef Search ADS Diebold F. X. , Mariano R. S . 1995 . Comparing Predictive Accuracy . Journal of Business & Economic Statistics 13 : 253 – 263 . Drees H . 1998 . On Smooth Statistical Tail Functionals . Scandinavian Journal of Statistics 25 : 187 – 210 . Google Scholar CrossRef Search ADS Drees H . 2008 . Some Aspects of Extreme Value Statistics under Serial Dependence . Extremes 11 : 35 – 53 . Google Scholar CrossRef Search ADS Drees H. , De Haan L. , Resnick S. . 2000 . How to Make a Hill Plot . Annals of Statistics 28 : 254 – 274 . Google Scholar CrossRef Search ADS Einmahl J. H. J. , Haan L. de , Li D. . 2006 . Weighted Approximations of Tail Copula Processes with Application to Testing the Bivariate Extreme Value Condition . Annals of Statistics 34 : 1987 – 2014 . Google Scholar CrossRef Search ADS Embrechts P. , De Haan L. , Huang X. . 2000 . “Modelling Multivariate Extremes.” In Embrechts P. (ed.), Extremes and Integrated Risk Management , pp. 59 – 67 . London : RISK Books . Feuerverger A. , Hall P. . 1999 . Estimating a Tail Exponent by Modelling Departure from a Pareto Distribution . Annals of Statistics 27 : 760 – 781 . Google Scholar CrossRef Search ADS Fougères A.-L. , Haan L. de , Mercadier C. . 2015 . Bias Correction in Multivariate Extremes . Annals of Statistics 43 : 903 – 934 . Google Scholar CrossRef Search ADS Fung W. , Hsieh D. A. . 2001 . The Risk in Hedge Fund Strategies: Theory and Evidence from Trend Followers . Review of Financial Studies 14 : 313 – 341 . Google Scholar CrossRef Search ADS Gomes M. I. , Haan L. de , Rodrigues L. H. . 2008 . Tail Index Estimation for Heavy-Tailed Models: Accommodation of Bias in Weighted Log-Excesses . Journal of the Royal Statistical Society: Series B 70 : 31 – 52 . Hartmann P. , Straetmans S. , de Vries C. G. . 2004 . Asset Market Linkages in Crisis Periods . Review of Economics and Statistics 86 : 313 – 326 . Google Scholar CrossRef Search ADS Hartmann P. , Straetmans S. , de Vries C. G. . 2010 . Heavy Tails and Currency Crises . Journal of Empirical Finance 17 : 241 – 254 . Google Scholar CrossRef Search ADS Harvey D. , Leybourne S. , Newbold P. . 1997 . Testing the Equality of Prediction Mean Squared Errors . International Journal of Forecasting 13 : 281 – 291 . Google Scholar CrossRef Search ADS Hill B. M. 1975 . A Simple General Approach to Inference About the Tail of a Distribution . Annals of Statistics 3 : 1163 – 1174 . Google Scholar CrossRef Search ADS Hill J. B. 2009 . On Functional Central Limit Theorems for Dependent, Heterogeneous Arrays with Applications to Tail Index and Tail Dependence Estimation . Journal of Statistical Planning and Inference 139 : 2091 – 2110 . Google Scholar CrossRef Search ADS Hill J. B. 2013 . Least Tail-Trimmed Squares for Infinite Variance Autoregressions . Journal of Time Series Analysis 34 : 168 – 186 . Google Scholar CrossRef Search ADS Hsing T. 1991 . On Tail Index Estimation Using Dependent Data . Annals of Statistics 19 : 1547 – 1569 . Google Scholar CrossRef Search ADS Huang X. 1992 . Statistics of Bivariate Extreme Values. PhD thesis, Erasmus University Rotterdam (PhD Thesis No. 22, Tinbergen Institute Research Series) . Huisman R. , Koedijk K. G. , Kool C. J. M. , Palm F. . 2001 . Tail-Index Estimates in Small Samples . Journal of Business & Economic Statistics 19 : 208 – 216 . Google Scholar CrossRef Search ADS Jansen D. W. , De Vries C. G. . 1991 . On the Frequency of Large Stock Returns: Putting Booms and Busts into Perspective . Review of Economics and Statistics 73 : 18 – 24 . Google Scholar CrossRef Search ADS King M. A. , Wadhwani S. . 1990 . Transmission of Volatility Between Stock Markets . Review of Financial Studies 3 : 5 – 33 . Google Scholar CrossRef Search ADS Koenker R. , Bassett G. . 1978 . Regression Quantiles . Econometrica 46 : 33 – 50 . Google Scholar CrossRef Search ADS Lettau M. , Maggiori M. , Weber M. . 2014 . Conditional Risk Premia in Currency Markets and Other Asset Classes . Journal of Financial Economics 114 : 197 – 225 . Google Scholar CrossRef Search ADS Li J. , Todorov V. , Tauchen G. . 2017 . Jump Regressions . Econometrica 85 : 173 – 195 . Google Scholar CrossRef Search ADS Longin F. , Solnik B. . 1995 . Is the Correlation in International Equity Returns Constant: 1960–1990? Journal of International Money and Finance 14 : 3 – 26 . Google Scholar CrossRef Search ADS Longin F. , Solnik B. . 2001 . Extreme Correlation of International Equity Markets . Journal of Finance 56 : 649 – 676 . Google Scholar CrossRef Search ADS Loretan M. , Phillips P. C. B. . 1994 . Testing the Covariance Stationarity of Heavy-Tailed Time Series: An Overview of the Theory with Applications to Several Financial Datasets . Journal of Empirical Finance 1 : 211 – 248 . Google Scholar CrossRef Search ADS Malevergne Y. , Sornette D. . 2004 . How to Account for Extreme Co-Movements between Individual Stocks and the Market . Journal of Risk 6 : 71 – 116 . Google Scholar CrossRef Search ADS Mikosch T. , De Vries C. G. . 2013 . Heavy Tails of OLS . Journal of Econometrics 172 : 205 – 221 . Google Scholar CrossRef Search ADS Mitchell M. , Pulvino T. . 2001 . Characteristics of Risk and Return in Risk Arbitrage. 2001 . Journal of Finance 56 : 2135 – 2175 . Google Scholar CrossRef Search ADS Patton A. J. 2009 . Are “Market Neutral” Hedge Funds really Market Neutral? Review of Financial Studies 22 : 2495 – 2530 . Google Scholar CrossRef Search ADS Peng L. 1998 . Asymptotically Unbiased Estimators for the Extreme-Value Index . Statistics & Probability Letters 38 : 107 – 115 . Google Scholar CrossRef Search ADS Poon S. H. , Rockinger M. , Tawn J. A. . 2004 . Extreme Value Dependence in Financial Markets: Diagnostics, Models, and Financial Implications . Review of Financial Studies 17 : 581 – 610 . Google Scholar CrossRef Search ADS Post T. , Versijp P. . 2007 . Multivariate Tests for Stochastic Dominance Efficiency of a Given Portfolio . Journal of Financial and Quantitative Analysis 42 : 489 – 515 . Google Scholar CrossRef Search ADS Roose K. D. 1948 . The Recession of 1937–38 . Journal of Political Economy 56 : 239 – 248 . Google Scholar CrossRef Search ADS Rousseeuw P. J. 1985 . “Multivariate Estimation with High Breakdown Point.” In Grossman W. , Pflug G. , Vincze I. , Wertz W. (eds.), Mathematical Statistics and Applications , pp. 283 – 297 . Dordrecht : Reidel Publishing Company . Google Scholar CrossRef Search ADS Støve B. , Tjøstheim D. , Hufthammer K. O. . 2014 . Using Local Gaussian Correlation in a Nonlinear Re-examination of Financial Contagion . Journal of Empirical Finance 25 : 62 – 82 . Google Scholar CrossRef Search ADS Sun P. , Zhou C. . 2014 . Diagnosing the Distribution of GARCH Innovations . Journal of Empirical Finance 29 : 287 – 303 . Google Scholar CrossRef Search ADS Todorov V. , Bollerslev T. . 2010 . Jumps and Betas: A New Framework for Disentangling and Estimating Systematic Risks . Journal of Econometrics 157 : 220 – 235 . Google Scholar CrossRef Search ADS Van Oordt M. R. C. , Zhou C. . 2014 . “Systemic Risk and Bank Business Models.” De Nederlandsche Bank Working paper, 442 . Google Scholar CrossRef Search ADS Van Oordt M. R. C. , Zhou C. . 2016 . Systematic Tail Risk . Journal of Financial and Quantitative Analysis 51 : 685 – 705 . Google Scholar CrossRef Search ADS Zhang Q. , Li D. , Wang H. . 2013 . A Note on Tail Dependence Regression . Journal of Multivariate Analysis 120 : 163 – 172 . Google Scholar CrossRef Search ADS © The Author, 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Journal of Financial EconometricsOxford University Press

Published: Oct 26, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off