TY - JOUR
AU - , Van Bever, G
AB - Summary In many problems from multivariate analysis, the parameter of interest is a shape matrix: a normalized version of the corresponding scatter or dispersion matrix. In this article we propose a notion of depth for shape matrices that involves data points only through their directions from the centre of the distribution. We refer to this concept as Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape due to Tyler (1987). Besides estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of the shape matrices it provides, the practical relevance of which is illustrated by applications to principal component analysis and shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the topological and boundedness properties of the corresponding depth regions, and the existence of a deepest shape matrix, and we prove Fisher consistency in the elliptical case. Finally, we derive a Glivenko–Cantelli-type result and establish almost sure consistency of the deepest shape matrix estimator. 1. Introduction Location depths measure the centrality of an arbitrary |$k$|-vector |$\theta$| with respect to a probability measure |$P=P^X$| over |${{\mathbb{R}}}^k$|⁠. Letting |$\mathcal{S}^{k-1}=\{x\in{{\mathbb{R}}}^k:\|x\|^2=x^{ \mathrm{\scriptscriptstyle T} } x =1\}$| denote the unit sphere in |${{\mathbb{R}}}^k$|⁠, the most famous example is the Tukey (1975) half-space depth $$\begin{equation*}\label{definHD} D(\theta,P)=\inf_{u\in \mathcal{S}^{k-1}}{\rm pr}\{u^{ \mathrm{\scriptscriptstyle T} }(X-\theta)\geqslant 0\};\end{equation*}$$ throughout, |$\rm pr$| refers to probability under the probability measure |$P$| at hand. The half-space depth regions |$\{\theta\in{{\mathbb{R}}}^k: D(\theta,P)\geqslant \alpha\}$| form a family of nested convex subsets of |${{\mathbb{R}}}^k$|⁠. The Tukey median |$\theta_P$|⁠, defined as the barycentre of the innermost region |$M_P=\{\theta\in{{\mathbb{R}}}^k:D(\theta,P)=\max_{\xi\in{{\mathbb{R}}}^k} D(\xi,P)\}$|⁠, extends the univariate median to the multivariate case and is a robust alternative to the expectation |$E(X)$|⁠. Besides location estimation, many inference problems can be tackled in a robust and nonparametric way by using the centre-outward order resulting from depth (Liu et al., 1999). Adopting the parametric depth approach of Mizera (2002), |$D(\theta,P)$| can also be read as a measure of how well the location parameter value |$\theta$| fits the probability measure |$P$|⁠. In this spirit, possible outliers in a dataset |$X_1,\ldots,X_n$| will be flagged by low depth values |$D(X_i,P_n)$|⁠, where |$P_n$| denotes the corresponding empirical probability measure. In this paper, the focus is on multivariate dispersion parameters known as shape matrices. For simplicity, in this section we restrict our attention to elliptical distributions. Let |$\mathcal{P}_k$| be the collection of |$k\times k$| symmetric positive-definite matrices, and write |$A^{1/2}$|⁠, with |$A\in\mathcal{P}_k$|⁠, for the unique square root of |$A$| in |$\mathcal{P}_k$|⁠. We will say that |$P=P^X$| is elliptical with location |$\theta\in{{\mathbb{R}}}^k$|⁠, scatter |$\Sigma\in\mathcal{P}_k$| and generating variate |$R$| if |$X$| has the same distribution as |$\theta+R\Sigma^{1/2}U$|⁠, where |$U$| is uniformly distributed over |$\mathcal{S}^{k-1}$| and is independent of the nonnegative scalar random variable |$R$|⁠, which has unit median. This median constraint makes |$\Sigma$| identifiable without moment conditions. Under finite second-order moments, the resulting covariance matrix is |$\Sigma_P=\{E(R^2)/k\}\Sigma$|⁠. Inference problems such as constructing confidence regions for |$\theta$| require one to estimate the full scatter matrix |$\Sigma$| or the full covariance matrix |$\Sigma_P$|⁠. However, in many other problems, it is sufficient to estimate the shape matrix, that is, the normalized scatter matrix $$V=\frac{k}{{\rm tr}(\Sigma)}\,\Sigma=\frac{k}{{\rm tr}(\Sigma_P)}\,\Sigma_P$$ (this shape matrix |$V$| could be normalized, as in Paindaveine (2008), to have determinant 1 or upper-left entry 1, which would not affect the results of the present paper). For instance, principal components may equivalently be computed from |$V$|⁠, from |$\Sigma$| or, when it exists, from |$\Sigma_P$|⁠, since proportional matrices have the same eigenvectors. When it comes to fixing the number of principal components on which to base further analysis, one typically looks at the proportions of explained variances |$p_m(\Sigma_P) = \sum_{\ell=1}^m \lambda_\ell(\Sigma_P)/\sum_{\ell=1}^k \lambda_\ell(\Sigma_P)$| (⁠|$m=1,\ldots,k$|⁠), where |$\lambda_\ell(A)$| denotes the |$\ell$|th largest eigenvalue of |$A$|⁠. Similarly to eigenvectors, these proportions remain unchanged if they are computed from |$V$| rather than from |$\Sigma$| or |$\Sigma_P$|⁠. Thus, in principal component analysis it is sufficient to estimate, or know the value of, |$V$|⁠. There is a large literature on inference for shape. The main contribution of this work is to provide a depth concept for shape, measuring how well a given shape matrix |$V$| fits the probability measure |$P$|⁠. While the proposed depth will lead to estimators and tests for shape, its main added value is the ordering of shape matrices resulting from depth. Here, we mention only two possible applications. The first is in principal component analysis, where a suitable estimator |$\hat{V}$| is to be chosen. When it is suspected that there could be outliers, one might for instance consider the minimum covariance determinant estimates |$\hat{V}_\gamma$|⁠, |$\gamma\in[0.5,1]$|⁠, trimming a proportion |$1-\gamma$| of the data; see § 5. Choosing |$\gamma$| should typically be done on the basis of the proportion of outliers, which is usually unknown. We will show that the shape depth of |$\hat{V}_{\gamma}$| allows an informed choice of |$\gamma$|⁠. The second application concerns outlier detection in multivariate financial times series. Since volatility is key in finance, one might flag atypical days in such series by identifying days that associate a low depth with a shape estimator |$\hat V_{\rm full}$| computed from the full series. Depth for a generic parameter has been discussed in Mizera (2002). Depth for scatter matrices, however, has only been considered by Zhang (2002), Chen et al. (2018) and Paindaveine & Van Bever (2018), and only the last reference considers depth for shape matrices. 2. Shape depth Tyler (1987) introduced a notion of shape that extends the concept of shape beyond the elliptical set-up. Consider the multivariate sign |$U_{\theta,\,V}$| defined as |$V^{-1/2}(X-\theta)/\|V^{-1/2}(X-\theta)\|$| if |$X\neq \theta$| and |$0$| otherwise, where |$V^{-1/2}$| is the inverse of |$V^{1/2}$|⁠. Let |$W_{\theta,\,V}={\rm vec}\{ U_{\theta,\,V} U^{ \mathrm{\scriptscriptstyle T} }_{\theta,\,V} - (1/k) I_k\}$|⁠, where |${\rm vec}\,A$| stacks the columns of |$A$| on top of each other and |$I_k$| is the |$k\times k$| identity matrix. The Tyler shape of |$P=P^X$|⁠, |$V_T$| say, is then the matrix |$V\in\mathcal{P}_{k,\,{\rm tr}}=\{V\in\mathcal{P}_k: {\rm tr}(V)=k \}$| satisfying $$\begin{equation}\label{tyler}E (W_{\theta,\,V})=0.\end{equation}$$ (1) If |$P$| is smooth at |$\theta$|⁠, in the sense that no hyperplane containing |$\theta$| has a strictly positive |$P$|-probability mass, then (1) admits a unique solution |$V\in\mathcal{P}_{k,\,{\rm tr}}$| that agrees with the true shape if |$P$| is elliptical with location |$\theta$| (Tyler, 1987; Kent & Tyler, 1988; Dümbgen, 1998). In essence, (1) identifies the shape |$V$| that makes the origin of |${{\mathbb{R}}}^{k^2}$| most central in an |$L_2$| sense for the distribution |$P^{W_{\theta,\,V}}$| of |$W_{\theta,\,V}$|⁠; that is, it defines |$V_T$| as the solution of $$\begin{equation}\label{heodsh}0=\mathop{\arg\min}_{m\in{{\mathbb{R}}}^{k^2}}E\bigl(\|W_{\theta,\,V}-m\|^2\bigr).\end{equation}$$ (2) The inspiration for the present work is the idea that one may define the shape of |$P$| as the matrix |$V\in\mathcal{P}_{k,\,{\rm tr}}$| that makes the origin of |${{\mathbb{R}}}^{k^2}$| most central for the distribution of |$W_{\theta,\, V}$|⁠, in the half-space depth sense, i.e., as the value of |$V$| that maximizes the following depth. Definition 1 (Tyler shape depth). Let |$P=P^X$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$V\in\mathcal{P}_{k,\,{\rm tr}}$|⁠. (i) For any |$\theta\in{{\mathbb{R}}}^k$|⁠, the fixed-|$\theta$| shape depth of |$V$| with respect to |$P$| is |$D_\theta(V,P) = D(0,P^{W_{\theta,\,V}}) = \inf_{u\in\mathcal{S}^{k^2-1}} {\rm pr}( u^{ \mathrm{\scriptscriptstyle T} } W_{\theta,\,V} \geqslant 0 )$|⁠. (ii) The shape depth of |$\hspace{.2mm}V$| with respect to |$P$| is |$D(V,P)=D_{\theta_P}(V,P)$|⁠, where |$\theta_P$| is the Tukey median of |$P$|⁠. We will use the notation |$D(\cdot\, ,P)$| for both half-space and Tyler shape depths, as the vector or matrix nature of the argument will remove any ambiguity. The fixed-|$\theta$| shape depth can equivalently be defined as |$D_\theta(V,P) = \inf_{M} {\rm pr}\{ U_{\theta,\,V}^{ \mathrm{\scriptscriptstyle T} } M U_{\theta,\,V} - {{\rm tr}(M)/k} \geqslant 0 \} ,$| where the infimum is over all |$k\times k$| symmetric matrices |$M$|⁠; see the Supplementary Material. Although, in view of (2), |$V_T$| can be seen as a sign-based mean concept for shape, the maximizer of Tyler shape depth is of a median nature. The main benefit of the proposed depth does not come from the deepest shape itself, but rather from the ranking of shapes it provides; see § 5. Definition 1(ii) calls for some comments. Two approaches were considered in the literature for Tyler shape in the case of an unspecified centre: the Tyler (1987) plug-in approach, which replaces the unknown |$\theta$| with some location functional, and the Hettmansperger & Randles (2002) approach, which jointly solves |$E ( U_{\theta,\,V} ) = 0$| and |$E ( W_{\theta,\,V} ) = 0$| (existence of a unique solution to joint location and scatter M-estimating equations was studied in Maronna (1976) under ellipticity and in Tatsuoka & Tyler (2000) for nonelliptic cases). Both approaches provide two distinct shapes outside the elliptical set-up. In contrast, for the proposed depth, the plug-in and joint maximization approaches always lead to the same shape: irrespective of |$\lambda$|⁠, the objective function |$(\theta,V)\mapsto D(0,P^{U_{\theta,\,V}})+\lambda D(0,P^{W_{\theta,\,V}})$| is indeed maximized at |$\theta=\theta_P$| and |$V=\arg\max_V D(0,P^{W_{\theta_P,\,V}})$|⁠, since |$D(0,P^{U_{\theta,\,V}})=D(0,P^{V^{-1/2}(X-\theta)})=D(\theta,P^X)$| is, for any |$V$|⁠, maximized at |$\theta=\theta_P$|⁠. An alternative way of obtaining an unspecified location version of Tyler shape is to construct it on pairwise differences (Dümbgen, 1998). We will not investigate this for our shape depth, since the sample version of the resulting depth would lead to a much heavier computational burden. 3. Main properties In this section, we study the main properties of the shape depth |$D_\theta(V,P)$| and the corresponding depth regions |$R_{\theta}(\alpha,P)=\{ V\in \mathcal{P}_{k,\,{\rm tr}}: D_\theta(V,P)\geqslant \alpha\}$|⁠. Topological statements for subsets of |$\mathcal{P}_{k,\,{\rm tr}}$| and for functions defined on |$\mathcal{P}_{k,\,{\rm tr}}$| will refer to the topology whose open sets are generated by balls of the form |$B(V_0,r)=\{ V\in\mathcal{P}_{k,\,{\rm tr}} : d(V,V_0)<r\}$|⁠, where |$d$| is the usual geodesic distance on |$\mathcal{P}_k$|⁠: with the classical log mapping on |$\mathcal{P}_k$|⁠, this distance is such that |$d(V_a,V_b)=\|\!\log( V_a^{-1/2} V_b V_a^{-1/2} ) \|_{\rm F}$|⁠, where |$\|A\|_{\rm F}=\{{\rm tr}(AA^{ \mathrm{\scriptscriptstyle T} })\}^{1/2}$| is the Frobenius norm of |$A$| (Bhatia, 2007). We start with the following continuity result. Theorem 1. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. Then the following properties hold: (i) |$V\mapsto D_\theta(V,P)$| is upper semicontinuous on |$\mathcal{P}_{k,\,{\rm tr}}$|⁠; (ii) the depth region |$R_{\theta}(\alpha,P)$| is closed for any |$\alpha\geqslant 0$|⁠; (iii) if |$P$| is absolutely continuous with respect to the Lebesgue measure, then |$V\mapsto D_\theta(V,P)$| is also lower semicontinuous, and hence continuous, on |$\mathcal{P}_{k,\,{\rm tr}}$|⁠. We will say that a subset |$R$| of |$\mathcal{P}_{k,\,{\rm tr}}$| is bounded if and only if |$R\subset B(I_k,r)$| for some |$r>0$|⁠; since |$d$| satisfies the triangle inequality, we need only consider balls centred at |$I_k$|⁠. Moreover, we will say that |$P$| is smooth at |$\theta$| if and only if |$t_{\theta,\,P}=0$|⁠, where |$t_{\theta,\,P}=\sup_{u\in\mathcal{S}^{k-1}} {\rm pr}\{u^{ \mathrm{\scriptscriptstyle T} }(X-\theta)=0\}$|⁠. We then have the following result. Theorem 2. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. Then the depth region |$R_{\theta}(\alpha,P)$| is bounded and compact for any |$\alpha>t_{\theta,\,P}$|⁠. The main reason for working with geodesic distance rather than Frobenius distance |$d_{\rm F}(V_1,V_2)=\|V_2-V_1\|_{\rm F}$| is that, unlike |$(\mathcal{P}_{k,\,{\rm tr}},d_{\rm F})$|⁠, the metric space |$(\mathcal{P}_{k,\,{\rm tr}},d)$| is complete; see, for instance, Bhatia & Holbrook (2006, Proposition 10). This is what allows us to establish compactness in Theorem 2, which is the main ingredient needed for the following result. Theorem 3. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. (i) If |$R_{\theta}(t_{\theta,\,P},P)$| is nonempty, then there exists a shape |$V_*\in\mathcal{P}_{k,\,{\rm tr}}$| that maximizes |$D_\theta(V,P)$|⁠. (ii) In particular, if |$P$| is smooth at |$\theta$|⁠, then such a deepest shape |$V_*$| exists. While the previous result guarantees existence of a deepest shape for absolutely continuous probability measures, uniqueness is not guaranteed in general. Parallel to what is done for the Tukey median, we then define the fixed-|$\theta$| shape matrix of |$P$| as the barycentre of the deepest shape region of |$P$|⁠, i.e., as the shape matrix |$V_{\theta,\,P}$| satisfying $$\begin{equation}\label{barydef} {\rm vec}\,V_{\theta,\,P} = {\int_{{\rm vec}\,R_\theta(\alpha_*,\,P)} v \, {{\rm d}} v} \Big/ {\int_{{\rm vec}\,R_\theta(\alpha_*,\,P)} {{\rm d}} v} ,\end{equation}$$ (3) with |$\alpha_*=\max_V D_\theta(V,P)$|⁠. Two remarks are in order. First, the integrals in (3) exist and are finite because |${\rm vec}\,\mathcal{P}_{k,\,{\rm tr}}$| is a bounded subset of |${{\mathbb{R}}}^{k^2}$|⁠: |$0\leqslant V^2_{ij} < V_{ii} V_{jj}\leqslant k^2$| for any |$V \in\mathcal{P}_{k,\,{\rm tr}}$|⁠. Second, the following convexity result implies that |$V_{\theta,\,P}$| has maximal depth. Theorem 4. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. Then: (i) |$V\mapsto D_\theta(V,P)$| is quasi-concave, i.e., |$D_\theta(V_t,P) \geqslant \min\{D_\theta(V_a,P),D_\theta(V_b,P)\}$| for |$V_t=(1-t)V_a+t V_b$| with |$V_a,V_b\in \mathcal{P}_{k,\,{\rm tr}}$| and |$t\in[0,1]$|⁠; (ii) the region |$R_{\theta}(\alpha,P)$| is convex for any |$\alpha\geqslant 0$|⁠. This defines the fixed-|$\theta$| shape of a probability measure |$P$| under the very mild condition that |$R_{\theta}(t_{\theta,\,P},P)$| is nonempty, hence in particular when |$P$| is smooth at |$\theta$|⁠. Of course, it is important that, under ellipticity, this agrees with the elliptical concept of shape provided in § 1. The following Fisher consistency result confirms that this is the case. Theorem 5. Let |$P$| be an elliptical probability measure over |${{\mathbb{R}}}^k$| with location |$\theta_0$| and shape |$V_0$|⁠. Then |$D_{\theta_0}(V_0,P) \geqslant D_{\theta_0}(V,P)$| for any |$V\in\mathcal{P}_{k,\,{\rm tr}}$| and, provided that |${\rm pr}(\{\theta_0\})<1$|⁠, the equality holds if and only if |$V=V_0$|⁠. Letting |$Y_k$| be beta-distributed with parameters |$1/2$| and |$(k-1)/2$|⁠, the maximal depth is |$D_{\theta_0}(V_0,P)=[1-{\rm pr}(\{\theta_0\})] {\rm pr}(Y_k>1/k)$|⁠. In this result, |${\rm pr}(\{\theta_0\})$| is the probability that the generating variate |$R$| associated with |$P$| is equal to zero. Lemma 2 in Paindaveine & Van Bever (2017) implies that the maximal depth in Theorem 5 is monotone decreasing in |$k$| if |${\rm pr}(\{\theta_0\})$| does not depend on |$k$|⁠, in which case the maximal depth is convergent as |$k$| goes to infinity. Since |$Y_k$| has the same distribution as |$Z_1^2/(\sum_{\ell=1}^k Z_\ell^2)$|⁠, where |$Z=(Z_1,\ldots,Z_k)^{ \mathrm{\scriptscriptstyle T} }$| is |$k$|-variate standard normal, the limit is equal to |${\rm pr}(Z_1^2>1)\approx 0.317$|⁠. The proof of Theorem 5 requires the following result. Theorem 6. Let |$P=P^X$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. Then for any shape matrix |$V$|⁠, any invertible |$k\times k$| matrix |$A$| and any |$k$|-vector |$b$|⁠, $$D_{A\theta+b}(V_A,P^{A X+{b}})= D_\theta(V,P^X) , \quad R_{A\theta+b}(\alpha,P^{AX+b}) = \{ V_A : V\in R_{\theta}(\alpha,P) \},$$where |$V_A=k A V\! A^{ \mathrm{\scriptscriptstyle T} } / {\rm tr}(A V\!A^{ \mathrm{\scriptscriptstyle T} })$| is the shape matrix proportional to |$A V\!A^{ \mathrm{\scriptscriptstyle T} }$|⁠. This shows that the fixed-|$\theta$| shape depth and the corresponding regions behave well under affine transformations, and in particular under changes of the measurement units. Affine invariance is a classical requirement in location depth (Zuo & Serfling, 2000). Tyler shape depth is a sign concept in the sense that it depends on the underlying random vector |$X$| only through its multivariate sign |$U_{\theta,\,V}$|⁠. In the elliptical case, it follows that if the distribution does not charge the centre of the distribution, this depth does not depend on the distribution of the underlying generating variate |$R$|⁠. More precisely, we have the following result. Theorem 7. Let |$P$| be an elliptical probability measure over |${{\mathbb{R}}}^k$| with location |$\theta_0$| and shape |$V_0$|⁠. Then: (i) for some |$h:\mathcal{P}_{k,\,{\rm tr}}\to [0,1]$| that does not depend on |$V$| or on |$P$|⁠, $$\begin{equation}\label{theconj}D_{\theta_0}(V,P) = [1-{\rm pr}(\{\theta_0\})] \, h\biggl\{ \frac{k (V_0^{-1/2} V V_0^{-1/2})}{{\rm tr}(V_0^{-1}V)}\biggr\} ;\end{equation}$$ (4) (ii) for |$k=2$|⁠, $$\begin{equation}\label{explicik2} D_{\theta_0}(V,P) = [1-{\rm pr}(\{\theta_0\})] \, {\rm pr}\bigg( Y_2 \geqslant \frac{1}{2} + \frac{1}{2} \bigg[ 1- \det \bigg\{ \frac{2 V_0^{-1} V}{{\rm tr}(V_0^{-1}V)}\bigg\} \bigg]^{1/2} \, \bigg) ,\end{equation}$$ (5)where |$Y_2$| is beta-distributed with parameters |$1/2$| and |$1/2$|⁠. The function |$h$| in this result does not depend on |$P$|⁠, so depth, under ellipticity, depends on |$P$| through |$V_0$| and |${\rm pr}[\{\theta_0\}]$| only, with the dependence on |${\rm pr}[\{\theta_0\}]$| not affecting the induced ranking of shape matrices. It is easy to check that the explicit bivariate elliptical depth in (5) is compatible with the general results obtained above. While it seems very challenging to obtain an explicit expression for the function |$h$| in (4), numerical experiments lead us to conjecture that, irrespective of the dimension |$k$|⁠, the mapping |$h$| is of the form |$h(M)=g(\det M)$| for some function |$g:{{\mathbb{R}}}^+\to[0,1]$|⁠. The results of this section extend to the unspecified-location shape depth |$D(V,P)=D_{\theta_P}(V,P)$| and the corresponding regions |$R(\alpha,P)=\{ V\in \mathcal{P}_{k,\,{\rm tr}}: D(V,P)\geqslant \alpha\}$|⁠. Theorems 1–4 hold for any fixed |$\theta$|⁠, and their unspecified-|$\theta$| versions are simply obtained by substituting |$\theta_P$| for |$\theta$| throughout. In particular, the existence of an unspecified-location deepest shape matrix is guaranteed if |$P$| is smooth at |$\theta_P$| or, more generally, if |$R(t_{\theta_P,\,P},P)$| is nonempty. The shape |$V_P$| of |$P$| is then defined as the barycentre of the set of shape matrices maximizing |$D(\cdot\, ,P)$|⁠. In view of the affine equivariance of |$\theta_P$|⁠, i.e., |$\theta_{P^{AX+B}}=A\theta_{P^X}+b$|⁠, the affine-invariance/equivariance properties $$D(V_A,P^{A X+{b}})= D(V,P^X) , \quad R(\alpha,P^{AX+b}) = \{ V_A : V\in R(\alpha,P)\}$$ follow directly from Theorem 6, to which we refer for the definition of |$V_A$|⁠. Finally, Theorems 5 and 7 also readily extend to the unspecified-location case, since |$\theta_P=\theta_0$| for any elliptical probability measure |$P$| with location |$\theta_0$|⁠. In particular, if |$P$| is elliptical with shape |$V_0$|⁠, then the unspecified-|$\theta$| shape depth |$D(V,P)$| is uniquely maximized at |$V=V_0$|⁠, as long as the distribution is not degenerate at a single point. 4. Consistency When |$k$|-variate observations |$X_1,\ldots,X_n$| are available, we define the sample fixed-|$\theta$| depth of a shape matrix |$V$| as |$D_\theta(V,P_n)$|⁠, where |$P_n$| is the empirical probability measure associated with |$X_1,\ldots,X_n$|⁠, and we define its unspecified-location version as |$D(V,P_n)$|⁠. In this section, we state a Glivenko–Cantelli-type result for these sample depths and investigate consistency of maximum-depth shape estimators. Theorem 8. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and let |$P_n$| denote the empirical probability measure associated with a random sample of size |$n$| from |$P$|⁠. Then: (i) for any |$\theta\in{{\mathbb{R}}}^k$|⁠, |$\sup_{V\in\mathcal{P}_{k,\,{\rm tr}}}|D_\theta(V,P_n)-D_\theta(V,P)| \to 0$| almost surely as |$n\to\infty$|⁠; (ii) if |$P$| is absolutely continuous with respect to Lebesgue measure, then |$\sup_{V\in\mathcal{P}_{k,\,{\rm tr}}} |D(V,P_n)-D(V,P)| \to 0$| almost surely as |$n\to\infty$|⁠. We illustrate this result in the bivariate elliptical case associated with Theorem 7(ii). Figure 1 shows contour plots of |$D_\theta(V,P)$| in terms of |$V_{12}/(V_{11}V_{22})^{1/2}$| and |$V_{22}/V_{11}$| for various bivariate, arbitrarily elliptical probability measures. The sign nature of shape depth ensures that these contours, along with their empirical counterparts, are distribution-free in the class of elliptical distributions that do not charge the centre of symmetry. Figure 1 also displays the empirical contour plots obtained from a random sample of size |$n=800$| drawn from the corresponding bivariate normal distributions. Clearly, the results support the consistency in Theorem 8(i). Fig. 1. Open in new tabDownload slide The top row shows contour plots of |$D_\theta(V,P)$| in terms of |$V_{12}/(V_{11}V_{22})^{1/2}$| and |$V_{22}/V_{11}$|⁠, where |$P$| refers to an arbitrary bivariate elliptical probability measure with location |$\theta$| and shape |$V_A={\rm diag}(1,1)$| (left), |$V_B= {\rm diag}(1.6,0.4)$| (middle) or |$V_C$| with diagonal vector |$(1.5,0.5)^{ \mathrm{\scriptscriptstyle T} }$| and off-diagonal elements |$0.5$| (right), and with |${\rm pr}[\{\theta\}]=0$|⁠. The bottom row displays the corresponding contour plots of |$D_0(V,P_n)$|⁠, where |$P_n$| is the empirical probability measure associated with a random sample of size |$n=800$| from the centred bivariate normal with shape |$V_A$| (left), |$V_B$| (middle) or |$V_C$| (right). The true shapes |$V_{0,\,P}$| and sample deepest shapes |$V_{0,\,P_n}$| are marked in red and blue, respectively. Fig. 1. Open in new tabDownload slide The top row shows contour plots of |$D_\theta(V,P)$| in terms of |$V_{12}/(V_{11}V_{22})^{1/2}$| and |$V_{22}/V_{11}$|⁠, where |$P$| refers to an arbitrary bivariate elliptical probability measure with location |$\theta$| and shape |$V_A={\rm diag}(1,1)$| (left), |$V_B= {\rm diag}(1.6,0.4)$| (middle) or |$V_C$| with diagonal vector |$(1.5,0.5)^{ \mathrm{\scriptscriptstyle T} }$| and off-diagonal elements |$0.5$| (right), and with |${\rm pr}[\{\theta\}]=0$|⁠. The bottom row displays the corresponding contour plots of |$D_0(V,P_n)$|⁠, where |$P_n$| is the empirical probability measure associated with a random sample of size |$n=800$| from the centred bivariate normal with shape |$V_A$| (left), |$V_B$| (middle) or |$V_C$| (right). The true shapes |$V_{0,\,P}$| and sample deepest shapes |$V_{0,\,P_n}$| are marked in red and blue, respectively. In § 3 the shape |$V_{\theta,\,P}$| of |$P$| was defined as the barycentre of the collection of |$P$|-deepest shape matrices. In the empirical case, a natural estimator is the corresponding shape matrix |$V_{\theta,\,P_n}$| computed from the empirical probability measure |$P_n$| associated with the sample at hand; existence here follows from the fact that |$D_\theta(V,P_n)$| may only take values |$\ell/n$| (⁠|$\ell=0,1,\ldots,n$|⁠). The same argument ensures the existence of the sample deepest shape |$V_{P_n}$| in the unspecified-location case. The sample Tukey median |$\theta_{P_n}$| was one of the first affine-equivariant location estimators with a high breakdown point. It would therefore be interesting to investigate whether the affine-equivariant shape estimator |$V_{P_n}$|⁠, parallel to the Maronna–Stahel–Yohai P-estimators of scatter, also has a high breakdown point (Tyler, 1994). Since this is beyond the scope of the present work, we focus on consistency of sample deepest shapes. Theorem 9. Let |$P$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and let |$P_n$| denote the empirical probability measure associated with a random sample of size |$n$| from |$P$|⁠. (i) Fix |$\theta\in{{\mathbb{R}}}^k$| and assume that |$R_{\theta}(t_{\theta,\,P},P)$| is nonempty. Then |$V_{\theta,\,P_n}\to V_{\theta,\,P}$| almost surely as |$n\to\infty$|⁠. (ii) If |$P$| is absolutely continuous with respect to Lebesgue measure, then |$V_{P_n}\to V_{P}$| almost surely as |$n\to\infty$|⁠. The specified-|$\theta$| result in Theorem 9(i) holds in particular when |$P$| is smooth at |$\theta$|⁠. The unspecified-|$\theta$| result requires a more stringent smoothness assumption, namely absolute continuity of |$P$|⁠. This assumption, which is already present in Theorem 8(ii), is only needed to control the effect of replacing |$\theta$| by |$\theta_{P_n}$| in |$D_\theta(V,P_n)$| and |$V_{\theta,\,P_n}$|⁠. Figure 1 also supports Theorem 9(i), since in each sample considered, the sample deepest shape is close to its population counterpart. 5. Two applications 5.1. Choosing a shape matrix estimator in principal component analysis There is a vast literature on scatter or shape estimation. Among the most famous estimators are the minimum covariance determinant scatters |$S_\gamma$|⁠. Recall that in the empirical case, |$S_\gamma$| is the covariance matrix with the smallest determinant among covariance matrices computed using only a proportion |$\gamma$| of the observations. The choice of the trimming proportion |$1-\gamma$| is crucial, as the loss in efficiency can be very large if the trimming is excessive; see, for example, Croux & Haesbroeck (1999) or Paindaveine & Van Bever (2014). Choosing |$\gamma$| is therefore difficult, as it should be large, but not so large as to incorporate outliers. In this section, we consider robust principal component analysis based on the shape estimators |$\hat{V}_\gamma=kS_\gamma/{\rm tr}(S_\gamma)$| and show that Tyler shape depth enables the making of an informed choice of |$\gamma$|⁠. For several contamination proportions |$\eta$|⁠, we independently generated |$R=500$| bivariate samples of |$n=800$| independent observations, each comprising |$(1-\eta)n$| clean observations and |$\eta n$| outliers. With |$X$| bivariate normal with mean zero and covariance matrix |${\rm diag}(4,1)$|⁠, and |$Y$| bivariate normal with mean |$(0,\delta)^{ \mathrm{\scriptscriptstyle T} }$| and identity covariance matrix, the clean observations are equal to |$X$| in distribution, whereas the outliers are distributed, in equal proportions, as |$Y$| or |$-Y$|⁠. Two simulations were conducted, one for |$\delta=4$| and one for |$\delta=5$|⁠; clearly, the former simulation represents a harder robustness problem than the latter. We consider estimating the first principal direction |$e_1=(1,0)^{ \mathrm{\scriptscriptstyle T} }$| of the uncontaminated distribution. For any |$\gamma\in[0.5,1]$|⁠, a natural estimator is, up to sign, the first eigenvector |$\hat{v}_{\gamma}$| of |$\hat{V}_\gamma$|⁠. Denoting by |$\hat{v}_{r,\,\gamma}$| this estimate in replication |$r=1,\ldots,R$|⁠, estimation performance can be measured through the mean squared error $${\small{\text{MSE}}}_\gamma= \frac{1}{R} \sum_{r=1}^R \, (\Delta\alpha_{r,\,\gamma})^2,$$ where |$\Delta\alpha_{r,\,\gamma} = \arccos ( |e_1^{ \mathrm{\scriptscriptstyle T} } \hat{v}_{r,\,\gamma}| )$| is the angle between the population first eigendirection |$e_1$| and its estimate |$\hat{v}_{r,\,\gamma}$|⁠. Figure 2 plots |${\small{\text{MSE}}}_\gamma$| as a function of |$\gamma$|⁠; the Monte Carlo exercise was performed for every value of |$\gamma\in\{0.5,0.51,\ldots,0.99,1\}$|⁠. The results confirm that for any contamination proportion |$\eta$|⁠, a suitable value of |$\gamma$| should be identified. The optimal value |$\gamma_0=\arg\min_\gamma {\small{\text{MSE}}}_\gamma$| basically coincides with |$1-\eta$| in the easy case of |$\delta=5$|⁠, whereas in the harder case of |$\delta=4$|⁠, |$\gamma_0$| is slightly smaller than |$1-\eta$| for large contaminations. This is no surprise: when outliers are hard to identify, the estimators |$\hat{V}_\gamma$|⁠, with |$\gamma\approx 1-\eta$|⁠, are likely to be based on some outliers, which will strongly affect the estimation performance. Fig. 2. Open in new tabDownload slide The first row shows plots of the mapping |$\gamma\mapsto {\small{\text{MSE}}}_\gamma$| in the easy case of |$\delta=5$| for contamination proportions |$\eta\in\{0,0.1,0.2,0.3\}$|⁠. The second row shows plots of |$50$| random curves |$\mathcal{C}=[\{\gamma, D(\hat{V}_\gamma,P_{\gamma,\,n})\}:\gamma\in (0.5,1)]$|⁠, still for |$\delta=5$| and for the same contamination proportions. The third and fourth rows display the corresponding plots for the harder case associated with |$\delta=4$|⁠; see § 5 for details. Each panel shows a vertical line at |$\gamma_0=\arg\min_\gamma {\small{\text{MSE}}}_\gamma$|⁠. Fig. 2. Open in new tabDownload slide The first row shows plots of the mapping |$\gamma\mapsto {\small{\text{MSE}}}_\gamma$| in the easy case of |$\delta=5$| for contamination proportions |$\eta\in\{0,0.1,0.2,0.3\}$|⁠. The second row shows plots of |$50$| random curves |$\mathcal{C}=[\{\gamma, D(\hat{V}_\gamma,P_{\gamma,\,n})\}:\gamma\in (0.5,1)]$|⁠, still for |$\delta=5$| and for the same contamination proportions. The third and fourth rows display the corresponding plots for the harder case associated with |$\delta=4$|⁠; see § 5 for details. Each panel shows a vertical line at |$\gamma_0=\arg\min_\gamma {\small{\text{MSE}}}_\gamma$|⁠. In this framework, Tyler shape depth, as presented, may be very useful in selecting a suitable value of |$\gamma$|⁠. We suggest choosing |$\gamma$| based on visual inspection of the curve |$\mathcal{C}=[\{\gamma, D(\hat{V}_\gamma,P_{\gamma,\,n})\}: \gamma\in (0.5,1)]$|⁠, where |$P_{\gamma,\,n}$| denotes the empirical measure associated with the optimal subsample leading to |$\hat{V}_\gamma$|⁠. The rationale is the following: for |$\gamma$| small, |$D(\hat{V}_\gamma,P_{\gamma,\,n})$| will remain relatively high as long as no outlier is added to the optimal subsample. As |$\gamma$| increases and outliers are added in the computation of |$\hat{V}_\gamma$|⁠, the depth |$D(\hat{V}_\gamma,P_{\gamma,\,n})$| will decrease sharply, thereby forming a kink in |$\mathcal{C}$|⁠. The selected |$\gamma$| for a given dataset, |$\hat\gamma$|⁠, should therefore be the largest value for which |$\mathcal{C}$| exhibits stable behaviour. Figure 2 plots the curve |$\mathcal{C}$| for the values of |$\delta$| and |$\eta$| considered above and clearly illustrates the behaviour of the depth curves just described. When the outliers are easily identifiable, the kinks occur at |$\gamma_0$|⁠, which coincides with |$1-\eta$|⁠. In the harder case, where outliers and clean data tend to be mixed, the selected value |$\hat\gamma$| is still remarkably close to |$\gamma_0$|⁠. In conclusion, Tyler shape depth and the ranking of shape matrices that it provides yield an effective visual tool that allows the selection of a sensible trimming proportion |$1-\gamma$| in a data-driven way when conducting, for example, principal component analysis. 5.2. Outlier detection For each trading day between 1 February 2015 and 1 February 2017, we collected the Nasdaq Composite and S&P500 stock indices every five minutes and computed their returns, i.e., the differences between the logarithms of consecutive index values. The returns on a given day form a bivariate dataset of usually 78 observations, though the number of observations varies due to missing values; days with fewer than 70 bivariate returns were discarded. The resulting dataset comprises |$n=38\,489$| observations on |$D=478$| trading days. Our analysis examines the joint behaviour of the bivariate returns in order to determine which trading days are atypical. An important source of atypicality is associated with the overall scale of the bivariate returns, which alternate between periods of high and low volatility. Such deviations can easily be detected by comparing the trace of any scatter measure on intraday data with that on the whole dataset, so we focus instead on detecting atypical joint volatility, i.e., days on which the ratios of the marginal volatilities or the correlations between the returns deviate greatly from their global behaviour. Let |$\hat V_{\rm full}=\hat{V}_{\hat\gamma}$| denote the minimum covariance determinant shape estimator computed from the full collection of |$n$| returns with maximal shape depth. More precisely, denoting by |$P_{\rm full}$| the empirical distribution of the full collection of returns, let |$\hat\gamma=\arg \max_{\gamma\in\Gamma} D(\hat V_{\gamma},P_{\rm full})$| for |$\Gamma=\{0.5,0.505,0.51,\dots, 0.995,1\}$|⁠. The value obtained is |$\hat\gamma=0.825$|⁠, with corresponding depth |$D(\hat{V}_{\hat\gamma},P_{\rm full})=0.497$|⁠. This high depth value ensures that |$\hat V_{\rm full}$| is an excellent proxy for the deepest shape matrix |$\hat{V}=\arg\max_V D(V,P_{\rm full})$|⁠, so the computation of |$\hat{V}$| is unnecessary. Returns at the beginning of each trading period are known to be more volatile and should be discarded in shape estimation, so the robustness of |$\hat V_{\rm full}$| is an obvious asset: the value of |$\hat\gamma$| allows us to adaptively discard days on which the volatility deviates from its global pattern. The procedure discarded more than half of the corresponding intraday returns for 17 days, and, remarkably, 13 of these days lie within the two atypical periods mentioned in the next paragraph. For each day |$d=1,\dots,D$|⁠, we evaluated the depth |$D(\hat V_{\rm full},P_d)$| of the global shape estimate with respect to the empirical distribution |$P_d$| of the bivariate returns on day |$d$|⁠. Figure 3(a) presents the depth values |$D(\hat V_{\rm full},P_d)$|⁠. Vertical lines mark major events affecting the shape of the volatility, while the two grey-shaded rectangles cover two periods during which the markets notoriously yielded atypical returns: the first period follows the devaluation of the Chinese yuan on 11 August 2015, which led to rapid changes in the stock markets, including large devaluations on 24 August, labelled event (a). The second period covers the beginning of 2016, when a slump in oil prices made stocks relying on oil very volatile compared to others. This resulted in atypical shape behaviour during the period 22 January–9 February; this last day, event (b), exhibited the sharpest loss for the S&P500 index. The other events are: (c) the decision of the European Central Bank on 10 March 2016 to extend quantitative easing, thereby slashing interest rates, which had a significant positive impact on both the Nasdaq and S&P500, but more pronounced for the latter; (d) the positive effect on financial stocks following Federal Reserve officials’ comments on the possibility of a rate hike made on 27 May 2016; and (e) the aftermath of Donald Trump’s election to the U.S. presidency on 9 November 2016. Detection of atypical observations was achieved by flagging outliers having depths so low that they are outside the box-and-whisker plot. This resulted in 12 flagged days, each either being one of the events described above or lying in one of the grey-shaded regions. Fig. 3. Open in new tabDownload slide (a) Plot of |$D(\hat V_{\rm full},P_d)$| as a function of |$d$|⁠; the events labelled (a) to (e) are described in § 5.2. (b) Plot of |$D(\hat V_{\rm full},P_{d})$| versus |${\small{\text{HD}}}(\hat V_{\rm full},P_d)$| for each trading day |$d$|⁠; events from panel (a) are marked with the same labels. Fig. 3. Open in new tabDownload slide (a) Plot of |$D(\hat V_{\rm full},P_d)$| as a function of |$d$|⁠; the events labelled (a) to (e) are described in § 5.2. (b) Plot of |$D(\hat V_{\rm full},P_{d})$| versus |${\small{\text{HD}}}(\hat V_{\rm full},P_d)$| for each trading day |$d$|⁠; events from panel (a) are marked with the same labels. We also computed the half-space shape depth |${\small{\text{HD}}} (\hat V_{\rm full},P_d)$| of the global estimate for each day |$d$| (Paindaveine & Van Bever, 2018). Figure 3(b), a plot of |$D(\hat V_{\rm full},P_{d})$| versus |${\small{\text{HD}}}(\hat V_{\rm full},P_d)$|⁠, shows a clear positive association. However, values of half-space shape depth seem to have a higher concentration than those of Tyler’s, because the former maximizes a concept of scatter depth in scale and may be able to find scatter estimates better suited to the data. Indeed, a decrease in volatility in one of the marginals could be balanced by considering a scatter with a smaller scale, which would have a large depth value. A by-product of this is the fact that when evaluating half-space shape depth, the difficult maximization step in scale seems to be crucial for correctly computing the depth ranking of the data, which can be affected by small deviations. More importantly, while events (a) and (b) have low depth with respect to both concepts, only Tyler shape depth succeeds in flagging days associated with events (c)–(e) as outlying. 6. Hypothesis testing for shape In the previous section we presented two specific applications of shape depth. The concept also allows us to tackle more standard inference problems for shape, such as point estimation and hypothesis testing. Here we consider testing |$\mathcal{H}_{0}:V=V_0$| against |$\mathcal{H}_{1}:V\neq V_0$| at level |$\alpha\in(0,1)$|⁠, where |$V_0\in\mathcal{P}_{k,\,{\rm tr}}$| is fixed, based on a random sample |$X_1,\ldots,X_n$| from a |$k$|-variate elliptical distribution with known location |$\theta$| and unknown shape |$V$|⁠. In view of Theorem 5, a natural depth-based test, |$\phi_D$| say, rejects the null for small values of |$T_{\theta,\, n}=D_\theta(V_0,P_n)$|⁠, where |$P_n$| is the empirical distribution of |$X_1,\ldots,X_n$|⁠. Since |$T_{\theta,\, n}$| is discrete, achieving null size |$\alpha$| in general requires randomization. The resulting test thus rejects the null hypothesis if |$T_{\theta,\, n} < t_{\alpha,\, n}$|⁠, rejects the null hypothesis with probability |$\gamma_{\alpha,\, n}$| if |$T_{\theta, \, n} = t_{\alpha,\, n}$|⁠, and does not reject the null hypothesis if |$T_{\theta, \, n} > t_{\alpha,\, n}$|⁠, where |$t_{\alpha,\, n}$| is the null |$\alpha$|-quantile of |$T_{\theta,\, n}$| and |$\gamma_{\alpha,\, n}$| is the amount of randomization. Under the assumption that |$P$| does not charge the centre of the distribution, |$T_{\theta,\, n}$| is distribution-free under the null hypothesis, which allows |$t_{\alpha,\, n}$| and |$\gamma_{\alpha,\, n}$| to be estimated arbitrarily well through simulations. Prior to applying the test below for |$k=2$| at level |$5\%$| with sample sizes |$n=200$| and |$500$|⁠, these were estimated from 500 000 mutually independent standard normal samples for each sample size, yielding |$\hat{t}_{0.05,\,200}=0.40$|⁠, |$\hat{\gamma}_{0.05,\, 200} = 0.61$|⁠, |$\hat{t}_{0.05,\, 500} = 0.43$| and |$\hat{\gamma}_{0.05,\, 500} = 0.25$|⁠. Distribution-freeness of |$T_{\theta,\, n}$| under the null hypothesis actually extends to the class of distributions with elliptical directions (Randles, 2000). We performed two simulations in the bivariate case. The first involves the problem of testing the null hypothesis of sphericity, |$\mathcal{H}_0: V_0=I_2$|⁠, about |$\theta=0$| and compares the finite-sample powers of |$\phi_D$| with those of some competitors. For each value of |$\ell=0,1,\ldots,6$| we generated |$M=3000$| independent random samples |$X_1,\ldots,X_n$| of size |$n=500$| from the normal distribution with location |$\theta=0$| and shape $$V_{\ell,\, \xi}=I_2+\ell\xi\begin{pmatrix}1\; & \;0.5\\ 0.5\; &\; -1\end{pmatrix}$$ and from the corresponding elliptical Cauchy distribution. The value |$\ell=0$| corresponds to the null hypothesis, whereas |$\ell=1,\ldots,6$| provide increasingly severe alternatives. We took |$\xi=0.035$| and |$0.045$| for the normal and Cauchy samples in order to obtain roughly the same rejection frequencies in both cases. For each sample, we carried out six tests at nominal level |$5\%$|⁠: (i) the test |$\phi_D$| described above; (ii) the Gaussian test from John (1972) or, more precisely, its extension to elliptical distributions with finite fourth-order moments from Hallin & Paindaveine (2006); (iii) the sign test from Hallin & Paindaveine (2006); (iv) the Wald test based on the Tyler (1987) scatter matrix; (v) and (vi) the tests from Paindaveine & Van Bever (2014) based on the shape estimator |$\hat{V}_\gamma$| in § 5, with |$\gamma=0.5$| and |$\gamma=0.8$|⁠, respectively. The tests (ii)–(vi) were performed based on their asymptotic null distributions. The rejection frequencies in Fig. 4 reveal that |$\phi_D$| performs very similarly to, though it may be slightly dominated by, the sign-based tests in (iii)–(iv), but performs very well in the case of heavy tails, where it is superior to all the other tests. As expected, the Gaussian test collapses under heavy tails and the minimum covariance determinant tests show low empirical power. Fig. 4. Open in new tabDownload slide Rejection frequencies under (a) bivariate normal and (b) elliptical Cauchy densities of six tests of sphericity: the Gaussian test (dot-dashed), the sign test (dotted), the test based on Tyler’s scatter matrix (dashed), the depth-based test (solid), and two minimum covariance determinant-based tests based on different trimming proportions (long dashed and short-long dashed for trimming proportions |$0.2$| and |$0.5$|⁠, respectively). Results are based on 3000 replications and the sample size is |$n=500$|⁠; see § 6 for details. Fig. 4. Open in new tabDownload slide Rejection frequencies under (a) bivariate normal and (b) elliptical Cauchy densities of six tests of sphericity: the Gaussian test (dot-dashed), the sign test (dotted), the test based on Tyler’s scatter matrix (dashed), the depth-based test (solid), and two minimum covariance determinant-based tests based on different trimming proportions (long dashed and short-long dashed for trimming proportions |$0.2$| and |$0.5$|⁠, respectively). Results are based on 3000 replications and the sample size is |$n=500$|⁠; see § 6 for details. The second simulation tests |$\mathcal{H}_{0}:V=V_0$|⁠, with |$V_0={\rm diag}(2,1/2)$| and specified location |$\theta=0$|⁠, and compares the tests above in terms of the level robustness (He et al., 1990). We considered mixture distributions |$P^{X_{(\eta)}}=(1-\eta)P^{X}+\eta P^{Y}$| with several contamination levels |$\eta$|⁠. Here |$X$| is a bivariate, normal or elliptical Cauchy, null random vector. The contamination random vector |$Y$| was chosen as follows: (a) |$Y$| has the same distribution as the vector obtained by rotating |$X$| about the origin by |$45^\circ$|⁠; (b) |$Y$| has the same elliptical distribution as |$X$|⁠, but with shape |$V=I_2$|⁠; (c) |$Y$| is obtained by multiplying the vector |$Y$| in case (b) by 4. The uncontaminated distribution |$P^X$| puts more mass along the horizontal axis. In case (a), the contamination typically shows along the main bisector, whereas the contamination in case (b) is uniformly distributed over the unit circle. As for case (c), the contamination combines the directional feature of case (b) with radial outlyingness. For each combination of distribution, normal or Cauchy, contamination pattern, (a), (b) or (c), and contamination level, |$\eta=0,0.025,0.05,0.1,0.2,0.25,0.3$|⁠, we generated 3000 independent random samples |$X_{(\eta)1},\ldots,X_{(\eta)n}$| of size |$n=200$|⁠. Figure 5 plots the resulting rejection frequencies and reveals the very good robustness of the depth-based test |$\phi_D$|⁠; recall that, irrespective of |$\eta$|⁠, the target rejection frequency is |$5\%$| here. In particular, |$\phi_D$| always dominates its sign-based competitors (iii) and (iv). The minimum covariance determinant tests (v) and (vi) dominate |$\phi_D$| in terms of robustness, but exhibit poor finite-sample power. Radial outliers strongly affect the Gaussian test. Fig. 5. Open in new tabDownload slide Null rejection frequencies, plotted against the contamination level |$\eta$|⁠, of the same six tests with the same line types as in Fig. 4, under bivariate normal (left) and elliptical Cauchy (right) densities. The labels (a), (b) and (c) for the top, middle and bottom rows refer to the three contamination patterns considered; see § 6 for their descriptions. Results are based on 3000 replications and the sample size is |$n=200$|⁠. Fig. 5. Open in new tabDownload slide Null rejection frequencies, plotted against the contamination level |$\eta$|⁠, of the same six tests with the same line types as in Fig. 4, under bivariate normal (left) and elliptical Cauchy (right) densities. The labels (a), (b) and (c) for the top, middle and bottom rows refer to the three contamination patterns considered; see § 6 for their descriptions. Results are based on 3000 replications and the sample size is |$n=200$|⁠. Summing up, the test associated with the proposed shape depth provides a good balance between efficiency and robustness. The improved robustness compared with its sign-based competitors is obtained at the cost of a very slight loss of power. Depth-based procedures can thus be defined for standard inference problems on shape, and will tend to perform as well as sign-based procedures. As shown in § 5, however, shape depth provides a whole ranking of shape matrices that allows one to address less standard applications. 7. Future research The present work offers quite rich research perspectives for future research. The asymptotic distributions of the sample depths |$D_\theta(V,P_n)$| and |$D(V,P_n)$|⁠, as well as those of the corresponding deepest shape estimators, could be studied. Investigating the robustness properties of these shape estimators would also be of interest, in particular to see whether the estimators have a high breakdown point. Regarding hypothesis testing, it would be desirable to define depth-based tests for other shape problems, such as testing the null hypothesis that two populations share the same shape. Another key point is related to computational aspects. Since Tyler shape depth was defined through half-space depth, it can in principle be evaluated using the numerous packages dedicated to half-space depth. The definition of Tyler shape depth suggests that evaluation of this depth in dimension |$k$| requires the computation of half-space depth in dimension |$k^2$|⁠. Fortunately, redundancies in the random vector |$W_{\theta,\, V}$| reduce the dimension from |$k^2$| to |$d_k=k(k+1)/2-1$|⁠, as shown by the following result. Theorem 10. Let |$P=P^X$| be a probability measure over |${{\mathbb{R}}}^k$|⁠, and fix |$\theta\in{{\mathbb{R}}}^k$|⁠. Let |${\rm vech}(A)$| be the vector that stacks the lower-diagonal entries of |$A=(A_{ij})$| on top of each other, and let |${\rm vech}_0\,A$| be |${\rm vech}(A)$| deprived of its first component. Then |$D_\theta(V,P) = D(0,P^{\tilde{W}_{\theta,\, V}}) = \inf_{u\in\mathcal{S}^{d_k-1}} {\rm pr}( u^{ \mathrm{\scriptscriptstyle T} } \tilde{W}_{\theta,\,V} \geqslant 0 )$|⁠, with |$\tilde{W}_{\theta,\,V} = {\rm vech}_0\{ U_{\theta,\,V} U^{ \mathrm{\scriptscriptstyle T} }_{\theta,\,V} - (1/k) I_k \}$|⁠. It follows that for |$k=2$| and |$3$|⁠, Tyler shape depth dominates its half-space counterpart in Paindaveine & Van Bever (2018) from a computational point of view. Nevertheless, there is probably room to develop ad hoc algorithms for computing Tyler shape depth more efficiently. It would also be desirable to design iterative algorithms for the computation of deepest shape matrices. Acknowledgement Pandaveine was supported by the Francqui Foundation and by the Program of Concerted Research Actions at the Université libre de Bruxelles. Paindaveine is also affiliated with the Université Toulouse Capitole. The authors would like to thank the editor, associate editor and two referees for their comments. Supplementary material Supplementary material available at Biometrika online includes proofs of all the theorems. References Bhatia R. ( 2007 ). Positive Definite Matrices . Princeton, New Jersey : Princeton University Press . Google Preview WorldCat COPAC Bhatia R. & Holbrook J. ( 2006 ). Riemannian geometry and matrix geometric means . Lin. Alg. Appl. 413 , 594 – 618 . Google Scholar Crossref Search ADS WorldCat Chen M. , Gao C. & Ren Z. ( 2018 ). Robust covariance and scatter matrix estimation under Huber’s contamination model . Ann. Statist. 46 , 1932 – 60 . Google Scholar Crossref Search ADS WorldCat Croux C. & Haesbroeck G. ( 1999 ). Influence function and efficiency of the minimum covariance determinant scatter matrix estimator . J. Mult. Anal. 71 , 161 – 90 . Google Scholar Crossref Search ADS WorldCat Dümbgen L. ( 1998 ). On Tyler’s |$M$|-functional of scatter in high dimension . Ann. Inst. Statist. Math. 50 , 471 – 91 . Google Scholar Crossref Search ADS WorldCat Hallin M. & Paindaveine D. ( 2006 ). Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity . Ann. Statist. 34 , 2707 – 56 . Google Scholar Crossref Search ADS WorldCat He X. , Simpson D. & Portnoy S. ( 1990 ). Breakdown robustness of tests . J. Am. Statist. Assoc. 85 , 446 – 52 . Google Scholar Crossref Search ADS WorldCat Hettmansperger T. P. & Randles R. H. ( 2002 ). A practical affine equivariant multivariate median . Biometrika 89 , 851 – 60 . Google Scholar Crossref Search ADS WorldCat John S. ( 1972 ). The distribution of a statistic used for testing sphericity of normal distributions . Biometrika 59 , 169 – 73 . Google Scholar Crossref Search ADS WorldCat Kent J. & Tyler D. E. ( 1988 ). Maximum likelihood estimation for the wrapped Cauchy distribution . J. Appl. Statist. 15 , 247 – 54 . Google Scholar Crossref Search ADS WorldCat Liu R. Y. , Parelius J. M. & Singh K. ( 1999 ). Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with Discussion) . Ann. Statist. 27 , 783 – 858 . WorldCat Maronna R. A. ( 1976 ). Robust M-estimators of multivariate location and scatter . Ann. Statist. 4 , 51 – 67 . Google Scholar Crossref Search ADS WorldCat Mizera I. ( 2002 ). On depth and deep points: A calculus . Ann. Statist. 30 , 1681 – 736 . Google Scholar Crossref Search ADS WorldCat Paindaveine D. ( 2008 ). A canonical definition of shape . Statist. Prob. Lett. 78 , 2240 – 7 . Google Scholar Crossref Search ADS WorldCat Paindaveine D. & Van Bever G. ( 2014 ). Inference on the shape of elliptical distributions based on the MCD . J. Mult. Anal. 129 , 125 – 44 . Google Scholar Crossref Search ADS WorldCat Paindaveine D. & Van Bever G. ( 2017 ). On the maximal halfspace depth of permutation-invariant distributions on the simplex . Statist. Prob. Lett. 129 , 335 – 9 . Google Scholar Crossref Search ADS WorldCat Paindaveine D. & Van Bever G. ( 2018 ). Halfspace depth for scatter, concentration and shape matrices . Ann. Statist. 46 , 3276 – 307 . Google Scholar Crossref Search ADS WorldCat Randles R. H. ( 2000 ). A simpler, affine-invariant, multivariate, distribution-free sign test . J. Am. Statist. Assoc. 95 , 1263 – 8 . Google Scholar Crossref Search ADS WorldCat Tatsuoka K. S. & Tyler D. E. ( 2000 ). On the uniqueness of |$S$|-functionals and |$M$|-functionals under nonelliptical distributions . Ann. Statist. 28 , 1219 – 43 . Google Scholar Crossref Search ADS WorldCat Tukey J. W. ( 1975 ). Mathematics and the picturing of data . In Proc. Int. Congress Mathematicians (Vancouver, BC, 1974) , vol. 2 . Montreal : Canadian Mathematical Congress , pp. 523 – 31 . Google Preview WorldCat COPAC Tyler D. E. ( 1987 ). A distribution-free |$M$|-estimator of multivariate scatter . Ann. Statist. 15 , 234 – 51 . Google Scholar Crossref Search ADS WorldCat Tyler D. E. ( 1994 ). Finite sample breakdown points of projection based multivariate location and scatter statistics . Ann. Statist. 22 , 1024 – 44 . Google Scholar Crossref Search ADS WorldCat Zhang J. ( 2002 ). Some extensions of Tukey’s depth function . J. Mult. Anal. 82 , 134 – 65 . Google Scholar Crossref Search ADS WorldCat Zuo Y. & Serfling R. ( 2000 ). General notions of statistical depth function . Ann. Statist. 28 , 461 – 82 . Google Scholar Crossref Search ADS WorldCat © 2019 Biometrika Trust This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
TI - Tyler shape depth
JO - Biometrika
DO - 10.1093/biomet/asz039
DA - 2019-12-01
UR - https://www.deepdyve.com/lp/oxford-university-press/tyler-shape-depth-iJFDsajKX6
SP - 913
VL - 106
IS - 4
DP - DeepDyve
ER -