Mapping morphological shape as a high-dimensional functional curve

Mapping morphological shape as a high-dimensional functional curve Abstract Detecting how genes regulate biological shape has become a multidisciplinary research interest because of its wide application in many disciplines. Despite its fundamental importance, the challenges of accurately extracting information from an image, statistically modeling the high-dimensional shape and meticulously locating shape quantitative trait loci (QTL) affect the progress of this research. In this article, we propose a novel integrated framework that incorporates shape analysis, statistical curve modeling and genetic mapping to detect significant QTLs regulating variation of biological shape traits. After quantifying morphological shape via a radius centroid contour approach, each shape, as a phenotype, was characterized as a high-dimensional curve, varying as angle θ runs clockwise with the first point starting from angle zero. We then modeled the dynamic trajectories of three mean curves and variation patterns as functions of θ. Our framework led to the detection of a few significant QTLs regulating the variation of leaf shape collected from a natural population of poplar, Populus szechuanica var tibetica. This population, distributed at altitudes 2000–4500 m above sea level, is an evolutionarily important plant species. This is the first work in the quantitative genetic shape mapping area that emphasizes a sense of ‘function’ instead of decomposing the shape into a few discrete principal components, as the majority of shape studies do. morphological shape, quantitative shape genetic mapping, functional data analysis, Populus szechuanica var tibetica, leaf shape Introduction Tremendous variation in morphology has provided conspicuous evidence for environmental adaptation, biological evolution, natural selection and genetic involvement [1–3]. It has been reported that genes have an important role in controlling phenotypic variation in shape [4–6], and leaf shape has a high heritability [7–9]. Shape (also called form or morphology) is formally defined as the geometrical information that remains when location, scale and rotational effects are filtered out from an object [10]. Three main challenges in the quantitative genetic shape mapping research call for better approaches: (1) how to describe shape accurately, (2) how to model high-dimensional shape curves better without losing too much information and (3) how to detect shape quantitative trait loci (QTL) [7, 11–18]. In current literature, the majority of shape traits have been modeled by multivariate analysis. A few discrete landmark points are first extracted to describe a shape on a coarse scale, and then multivariate analysis approaches are implemented to model these discrete multiple points. Specifically, principal component analysis (PCA) has been the most popular approach in shape studies that projects the multidimensional space into a much lower dimension [7–9, 17, 19–31]; multivariate regression models have been used for analyzing allometry or evolutionary change in shape over time [22, 32]. Although these approaches achieved significant results for genetic shape mapping research, there still is room for improvement. A small amount of sparse landmark points neglect many subtle details of a shape, particularly irregular outlines. Arguments also exist regarding the functional interpretations of PCA [33]. Additionally, multivariate analysis does not use the dynamic trajectory of a trait and hence cannot capture the overall trends of a shape in a functional sense. As a continuous dynamic pattern can be observed underneath discrete landmark points [34], a shape curve is more biologically meaningful when treated as a function or trajectory [35] rather than multiple discrete points. This article introduces an alternative approach that improves on existing quantitative genetic shape mapping research by describing shape information using high-dimensional curves and capturing smooth trajectories underlying shape curves using functional data analysis (FDA) strategies. While FDA is a relatively mature approach in the field of statistics [36–41], it has seldom been used in shape research. As far as we know, in current quantitative shape literature, there are only two articles implementing FDA to shape analysis [42, 43]. However, these two articles had dramatically different aims and foci than our article. The aforementioned articles simply studied shape curves with the user-friendly ‘FDA’ library without involving any genetic association. The genetic association that we considered here not only involved the shape trait (as the response) and genetic variables (including a hidden variable) but also doubled the difficulties of statistical modeling because no available package or model could accurately solve our problems. This article was motivated by leaf shape data collected from a natural population of poplars, Populus Szechuanica var. tibetica. The radius centroid contour (RCC) approach extracted shape information accurately and worked well for any two-dimensional convex outlines. Fu et al. [28] described the details of how shape information was extracted from an image and how RCC transformed a shape into a high-dimensional curve [44, 45]. The main focus of this article is addressing the second and third aforementioned challenges of statistically modeling high-dimensional shape curves, and detecting significant QTLs regulating variation of shape curves. After the shape analysis step, each input image was output in the form of high-dimensional curve varying with the angle θ running clockwise. We modeled the dynamic trends of means and covariance structures of shape curves as functions of θ via nonparametric kernel regression and functional principal component analysis (FPCA) [38–41]. Unlike traditional parametric approaches that directly coped with high-dimensional covariance matrices, FPCA describes random trajectories in terms of eigenfunctions that are interpreted as modes of variation of shape curves. The framework proposed in this study was data adaptive, without assuming any prespecified parametric structures. Additionally, it incorporated shape analysis, FDA and a linkage disequilibrium (LD)-based QTL detection scheme into a novel framework to improve the methodology of quantitative shape mapping research, particularly when modeling high-dimensional shape curves. Besides shape, our approach is applicable to any other data set with curves as response and categorical variables as predictors. Mixture FDA model Let Yij, i=1,…,n; j=1,…,Ni denote the RCC value (i.e. response in statistical terminology and phenotype in genetic terminology) of the ith subject at the jth angular degree. Here, Ni was the total number of angles measured for the ith subject. To make comparisons of different shapes meaningful, we set the number of measurements consistent for each leaf (i.e. Ni = 180 for each i=1,…,n). However, we kept the subject-related notation Ni to make it easy to extend to any irregular or sparse case. As the measurement scales were also allowed to vary for different subjects, we considered θ as a random variable, denoted as θij, i=1,…,n, j=1,…,Ni, lying in a compact interval T. Then for each shape i, the original measurements were (θ⃗i,Y⃗i), i=1,…,n. We denoted two alleles of a marker as M (with allele frequency p) and m (with allele frequency 1−p) and two alleles of a QTL as A (with allele frequency q) and a (with allele frequency 1−q). The two alleles of a QTL formed three genotypes, expressed as AA (coded as c = 3), Aa (coded as c = 2) and aa (coded as c = 1). The genotype of a marker was a categorical predictor, and the genotype of a QTL was a hidden variable. We assumed that the QTL and marker were associated through a LD parameter D. The marker and QTL formed four haplotypes MA, Ma, mA and ma, with the frequencies p11=pq+D,p10=p(1−q)−D,p01=(1−p)q−D and p00=(1−p)(1−q)+D, respectively. The haplotypes from maternal and paternal parents unite randomly to generate nine marker-QTL genotype combinations. The LD-based QTL mapping model detected the hidden QTL by way of the conditional probability for a QTL genotype c given an observable marker genotype for subject i, expressed as πc|i. For a natural population satisfying the Hardy–Weinberg equilibrium, πc|i was computed from a 3 × 3 contingency table in terms of p, q and D [46]. The basic statistical idea behind QTL mapping was a mixture model, in which each observed shape Y⃗i,i=1,…,n was assumed to be attributed to one of the three QTL genotypes, and each genotypic effect on phenotype was modeled from a separate density function (assuming that each quantitative shape curve follows a Gaussian process). Specifically, the effects of a QTL responsible for variation of shape traits could be modeled as [47, 48] Yij=∑c=13ξicXic(θij)+εij, i=1,…,n; j=1,…,Ni; θij∈T, (1) where ξic is an indicator variable describing a possible QTL genotype c for subject i. That is, ξic was coded as 1 if the corresponding QTL genotype c had an effect on ith shape, and 0 otherwise. Xic(θ) is the underlying stochastic process of Y⃗i defined in L2(T) for a particular genotype (c = 1, 2, 3). The trajectories of Xic(θ) are also smooth functions of θ. εij is the experimental error assumed to be independent and identically distributed as N(0,σ2). For a fixed genotype c, we denoted the mean function of Xic(θ) as μc(θ) and the covariance function of it as Gc(θ1,θ2)=cov{Xic(θ1),Xic(θ2)}, θ1, θ2∈T. The mean function μc(θ) was interpreted as the cth genotypic effect of a QTL on the shape trait. Throughout this article, it was assumed that μc(θ) was a smooth function for θ∈T, and Gc(θ1,θ2) was a positive definite and bivariate smooth function for θ1,θ2∈T. Being ‘smooth’ means that the function is twice continuously differentiable. The idea of model (1) was that the originally observed messy shape curves could be modeled as a mixture of three smooth functions accounting for genotypic effects and additive noise. Nonparametric functional PCA model The main idea of FPCA is to interpret Gc(θ1,θ2) as the kernel of a linear mapping on the space of square-integrable functions L2(T), mapping f∈L2(T) to AGf∈L2(T), defined by Hall et al. [49] as (AGf)(θ2)=∫Tf(θ1)Gc(θ1,θ2)dθ1. An eigenfunction v(θ) of the operator AG is a solution of the equation AGv(θ) =λv(θ), with corresponding eigenvalue λ. For a fixed c, we assumed that the operator AG had a sequence of smooth orthonormal eigenfunctions vlc(θ) satisfying ∫Tvkc(θ)vlc(θ)dθ=δklc; k,l=1,…,∞ (here, δklc was the Kronecker symbol), with ordered eigenvalues λ1c≥λ2c≥…≥0. By Mercer’s theorem [50], a spectral decomposition on the function Gc(θ1,θ2), Hilbert–Schmidt kernel, yielded Gc(θ1,θ2)=∑l=1∞λlcvlc(θ1)vlc(θ2), θ1,θ2∈T. (2) Applying the generalized Fourier expansion (Karhunen–Loeve theorem proposed by Karhunen [51]), the stochastic process Xic(θ) can be decomposed into Xic(θ)=μc(θ)+∑l=1∞ζilcvlc(θ), i=1,…,n, (3) where the sum is defined in the sense of L2 convergence, with uniform convergence, and ζilc=⟨Xic−μc,vlc⟩=∫T(Xic(θ)−μc(θ))vlc(θ)dθ. (4)ζilc are uncorrelated random variables with E(ζilc)=0, and var(ζilc)=λlc, subject to L2 convergence, i.e. Σlλlc<∞. ζilc is frequently referred to as the lth functional principal component scores or the lth dominant modes of variation structure. Combining Equations (1) and (3), we obtained Yij=∑c=13ξic[μc(θij)+∑l=1∞ζilcvlc(θij)]+εij, i=1,…,n; j=1,…,Ni; θ∈T. (5) Parameter estimate From the original observed data set D={(θ⃗i,Y⃗i),i=1,…n}, we obtained estimators for all unknown parameters μ^c(θ), ζ^ilc, v^lc(θ), p^,q^,D^ and σ^2 contained in model (5). If the estimator μ^c(θ) was obtained, we could compute a rough estimator of covariance matrix by G¯ij1j2c=(Yij1−μ^c(θj1))(Yij2−μ^c(θj2)). The local linear smoother estimator G^c(θ1,θ2) for Gc(θ1,θ2) was obtained by minimizing ∑i=1nπc|i∑1≤j1≠j2≤NiK2(θij1−θ1hG,θij2−θ2hG){G¯ij1j2c−β}2, (6) with respect to β=β0 [40, 41, 47]. K2(·,·) is the bivariate nonnegative compactly supported kernel function used as weights for locally weighted least squares smoothing [47, 52]. In this article, we used the Epanechnikov function as the kernel functions [53]. Also, hG is the bandwidth corresponding to the kernel function K2. The minimization of Equation (6) yielded G^c(θ1,θ2)=β0^(θ1,θ2) [41], which could be solved in a closed form [47] as follows: G^c(θ1,θ2)=∑i=1nπc|i∑1≤j1≠j2≤NiG¯ij1j2cK2(θij1−θ1hG,θij2−θ2hG)∑i=1nπc|i∑1≤j1≠j2≤NiK2(θij1−θ1hG,θij2−θ2hG). (7) Once the smoothed covariance function G^c(θ1,θ2) was computed from Equation (7), it was then discretized on a suitable finite grid and represented as a covariance matrix. The estimators of the eigenfunctions were obtained by a corresponding spectral decomposition on G^c(θ1,θ2) [54]. To be more specific, λ^lc are the eigenvalues of G^c(θ1,θ2), given by ∫TG^c(θ1,θ2)v^lc(θ1)dθ1=λ^lcv^lc(θ2). Furthermore, the v^lc are the eigenfunctions corresponding to λ^lc, satisfying ∫Tv^lc2(θ)dθ=1 and ∫Tv^kc(θ)v^lc(θ)dθ=0 if k≠l;k,l=1,…,∞. Here, G^c also agreed with an empirical version of the expansion (2) G^c(θ1,θ2)=∑l=1∞I(λ^lc>0)λ^lcv^lc(θ1)v^lc(θ2), (8) where I is the indicator function used to keep only the terms with positive eigenvalues. The positive definiteness of the estimated covariance matrix G^c(θ1,θ2) is not always guaranteed, which might cause problems in practical applications. To avoid potential problems, Yao et al. [38] proposed a solution by only keeping positive eigenvalues [34]. If some λ^lc were negative, then we dropped these negative eigenvalues and their corresponding eigenfunctions, and reconstituted the estimate from the remaining eigenvalue and eigenfunction estimates until only positive eigenvalues were retained. Once v^lc and λ^lc were obtained, fitting trajectories required the estimation of functional principal component scores. By discretizing on Equation (4), plugging μ^c and v^lc into a Riemann sum approximation of the integral, and setting θ0=0, we obtained ζ^ilc=∑j=1Ni(Yij−μ^(θj))v^lc(θj)(θj−θj−1) (9) [34]. This approximation by sum worked well for the regular case. If the data were sparse or irregular, another approach called Principal Analysis through Conditional Expectation could be considered [39]. We then estimated the mean function μ^c(θ) by removing the covariance portion to explore the remaining terms. We defined Yij*=Yij−∑c=13ξic∑l=1∞ζ^ilcv^lc(θij), i=1,…,n,j=1,…,Ni; θij∈T. (10) Then, Equation (5) yielded Yij*=∑c=13ξicμc(θij)+εij, i=1,…,n;j=1,…,Ni; θij∈T. (11) The only unknown parameters in Equation (11) were μc(θ) and σ2. We integrated the Expectation–maximization (EM) algorithm and local polynomial kernel smoothing to estimate these parameters. Let W={ω1,…,ωm}∈T be m grid points at which the values of mean functions will be estimated. For any ωa∈W, we approximated μ^c(θj) from μ^c(ωa) for θj located within a bandwidth hu of ωa, a∈{1,…,m} and j∈{1,…,max⁡(Ni)}. The unknown parameters in Equation (11) could be estimated by maximizing the following local log likelihood ∑i=1n log ⁡[∑c=13πc|i∏j=1Nifc{Yij*|μc(θj),σ2}]K1(θj−ωahu). (12) Here, K1(·) is a nonnegative univariate compactly supported kernel function that was used as weights for local polynomial smoothing. The EM algorithm based on W={ω1,…,ωm}∈T was derived in the following steps. The E step was designed to calculate the posterior probability that subject i had a QTL genotype c given the marker genotype and phenotype data via Πic=πc|i∏j=1Nifc{Yij*|μc(θj),σ2}∑c=13πc|i∏j=1Nifc{Yij*|μc(θj),σ2}. Using these posterior probabilities, the M step was derived to estimate the haplotype frequencies [55] and all other unknown parameters via μc(ωa)=∑i=1nΠic∑j=1NiYij*K1(θj−ωahu)∑i=1nΠic∑j=1NiK1(θj−ωahu), and we then approximated  μc(θj)  from  μc(ωa), (13) σ^2=1∑i=1nNi∑i=1n∑c=13Πic∑j=1Ni[Yij*−μc(θj)]2, p^11=12N[∑i=1N1(2Πi1+Πi2)+∑i=1N2(Πi1+RΠi2)], p^10=12N[∑i=1N1(Πi2+2Πi3)+∑i=1N2(Πi3+(1−R)Πi2)], p^01=12N[∑i=1N3(2Πi1+Πi2)+∑i=1N2(Πi1+(1−R)Πi2)], p^00=12N[∑i=1N3(Πi2+2Πi1)+∑i=1N2(Πi3+RΠi2)], where R=p^11p^00/(p^11p^00+p^10p^01), and the observation numbers of three marker genotypes were denoted as N1 for MM, N2 for Mm and N3 for mm. We summarized the estimating procedures in the following steps: Set initial values for all unknown parameters. Computed the initial value of p^ from the observed marker genotype. As QTL was not observable and q was an important parameter, we extensively searched the initial values of q^, scanning from 0.1 to 0.9 with a scale of 0.1. The optimal initial value of q^ was determined by the maximum value of the log likelihood (12). Doing so allowed us to avoid the sensitivity of initial values and improved the accuracy of the estimator. The iteration is ready to start after selecting initial values for D^, μ^c(θ), ξ^ic and σ^2. The initial values of π^c|i, p^11,p^10,p^01 and p^00 are derived from the initial values of p^, q^ and D^. Computed a rough estimator of covariance matrix G¯ij1j2c. Estimated smooth covariance function G^c(θ1,θ2) by bivariate local linear smoothing Equations (6 and 7). Computed eigenfunctions v^lc(θ) and eigenvalues λ^lc of discretized G^c(θ1,θ2). Only kept the terms with positive eigenvalues. Used Equation (9) to compute functional principal component scores ζ^ilc. Computed Y^ij* from Equation (10). Applied EM algorithm to update μ^c(θ), ξ^ic, σ^2, p^11,p^10,p^01,p^00, p^,q^,D^ and π^c|i. Here, p^,q^, and D^ could be derived from p^11,p^10,p^01 and p^00 via p^=p^11+p^10, q^=p^11+p^01 and D^=p^11p^00−p^10p^01. Repeated step (2) through step (7) until all estimators converged. Tuning parameter selection The number of eigenfunctions used to approximate infinite-dimensional longitudinal process, as well as degree of smoothness, was two tuning parameters that critically affected the performance of the model. Degree of smoothness was determined by the number of grid points W={ω1,…,ωm} and bandwidths hu and hG. The density of the grid determines the accuracy of the analysis. In other words, the larger the number of grid points, the more accurate the estimation will be. Therefore, we set W={0,0.01,0.02,…,1} (100 points) as a ‘dense enough’ choice of the grid. Additionally, following the empirical suggestions of Huang et al. [47] and Wang et al. [48], we chose the number of eigenfunctions required to explain at least 80% of the total variation. The tuning parameters that needed to be selected were bandwidth hu and bandwidth hG. One-subject-leave-out cross-validation has been a popular approach to select tuning parameters. However, because of the large number of computations because of multiple markers, several initial scans of q and bandwidth selection, this would be computationally expensive. Therefore, we used Akaike information criterion (AIC) instead, as Yao et al. [39] used it to obtain similar results in a more computationally efficient manner. The AIC has the form −2L+2×df, where L is the maximum value of the log-likelihood function (12), and df is the effective degrees of freedom that measure the complexity of the model. As models (5) and (12) involve μc(θ), Gc(θ1,θ2), ζilc, vlc(θ) and local polynomial kernel smoothing based on grid points, it is not trivial to determine the degrees of freedom. Inspired by Fan et al. [56] and Wang et al. [48], we adjusted their degrees of freedom to define the shape models presented in this article as df=3+1+3*dfu+3*dfG. For the one-dimensional mean functions, the effective degree of freedom can be computed from [48, 56] dfu=τKhu−1|T|ψK, where T is the support of θ, ψK=K(0)−12∫K2(θ)dθ, and τK=K(0)−12∫K2(t)dt∫{K(t)−12(K*K)(t)}2dt. Similarly, the effective degree of freedom for the two-dimensional covariance function can be defined as [48, 56] dfG=τKhG−1|T|ψK(τKhG−1|T|ψK+1)/2. Hypothesis tests The effects of a QTL on shape variation were mainly manifested by three smooth mean functions μ1(θ),μ2(θ), and μ3(θ) in Equation (5). To draw a scientific conclusion about whether a QTL was significantly associated with shape variation, a hypothesis test was needed: H0:μ1(θ)=μ2(θ)=μ3(θ), H1: At least  one  of  the  equalities  above  did  not hold . (14)H0 corresponded to the reduced model, in which only a single mean function was involved, which indicated that a QTL had no controls for the phenotypic variation. H1 corresponded to the full model, in which three mean functions were necessary for three different genotypes. The test statistic was calculated as the log-likelihood ratio of the reduced to full model. In addition to the above hypothesis test related to the QTL’s existence, we also tested the strength of LD between the QTL and the marker. The basic assumption behind LD-based QTL model was the existence of LD between the hidden QTL and the observed marker. The significance of LD could be tested by H0:D=0 vs. H1:D≠0, (15) where H0 corresponded to the case that a marker and a QTL were at linkage equilibrium (i.e. no correlation). The test statistic for this hypothesis test was χ2=2nD2/(p(1−p)q(1−q)), which was reported to follow a χ2 distribution with one degree of freedom for a univariate trait [28, 57]. However, for the high-dimensional functional shape trait considered in this article, the assumed distribution may not hold. A significant shape-related QTL is not detected unless tests (14) and (15) are simultaneously rejected for each marker, where (14) tests for an association between QTL and shape, and (15) tests for LD between the observable marker and the detected (yet unobservable) QTL. Rejecting both tests simultaneously indicates that the QTL exists and is located around the observed marker, as opposed to the QTL being both randomly segregated and located far away from the marker. In this context, the intersection union test (IUT) approach should be used. However, as the distributions defined for a univariate simple trait may not hold for a high-dimensional complex trait, we permuted 1000 times to determine the significance for these two tests [58]. Real leaf shape data analysis We worked on a real leaf data set collected from poplars, Populus Szechuanica var. tibetica. A sample of n = 106 independent trees was randomly selected from a natural population distributed throughout the Tibet plateau. Twenty-nine microsatellite markers (16 primers) were genotyped. Meanwhile, the leaf images were collected by photographing one randomly picked representative leaf per tree. Each image was then processed using the shape analysis skills and quantitatively represented by a RCC curve of 180 dimensions. Figure 1 illustrates the detailed object and background separation, contour extraction and RCC transformation procedures [28]. Figure 1 View largeDownload slide Diagrammatic illustration of shape analysis procedure. (A) Original photo with leaf object put on top of a cloth; (B) segmentation with object represented as white and background represented as black; (C) extracted contour of leaf from the background and removed leaf stem; and (D) RCC curve by computing the radius between centroid and contour points. Figure 1 View largeDownload slide Diagrammatic illustration of shape analysis procedure. (A) Original photo with leaf object put on top of a cloth; (B) segmentation with object represented as white and background represented as black; (C) extracted contour of leaf from the background and removed leaf stem; and (D) RCC curve by computing the radius between centroid and contour points. We chose K = 4 as the optimal number of eigenfunctions because they explained 83.7% of the total variation for all leaves observed, and both the magnitudes of eigenvalues and the percentage of variation explained dropped off dramatically after the fourth order and the changes resulting from remaining eigenfunctions were trivial. The choice of grid points was W={0,0.01,0.02,…,1}. Moreover, we chose the optimal bandwidths as hu=0.01 and hG=0.05 after scanning 25 candidates via AIC from the set of hu∈{0.01,0.05,0.08,0.1,0.5} and hG∈{0.01,0.05,0.08,0.1,0.5}. Figure 2 illustrates the AIC values of all 25 combinations for these two tuning parameters. Figure 2 View largeDownload slide AIC values for 25 combinations of fine scanning on bandwidths hu and hG. Figure 2 View largeDownload slide AIC values for 25 combinations of fine scanning on bandwidths hu and hG. We performed the proposed method for each of the candidate markers. The markers determined significant by permutation are GCPM_1063 with p^=0.514, q^=0.669, D^=0.063, P-value of 0.004 for the QTL test and P-value of 0.004 for the LD test; and GCPM_1034-1 with p^=0.132, q^=0.669, D^=0.039, P-value of 0.003 for the QTL test and P-value of 0.004 for the LD test. Note that the number of permutations affects the P-values, and the reported P-values mean that only three or four among 1000 permuted test statistic values are more extreme than the observed test statistic values on real data. We demonstrate the result for the most significant finding GCPM_1063 in the following figures and paragraphs. The estimated smoothing mean functions μ^c(θ) for three genotypes of the QTL detected by marker GCPM_1063 are demonstrated in Figure 3A. The light blue curves were the RCC values of the originally observed noisy leaf shapes, from which we easily noticed that underlying dynamic patterns existed. As θ increased clockwise from 0 to 2π ( 2π≈6.28), three mean curves μ^c(θ), c = 1, 2, 3, corresponding to three different genotypic effects, showed different functional trajectories. The biggest variations were observed at θ=π/2 and θ=3π/2 where genotype aa (in black) had the highest peak. As the leaf blade was roughly bilateral symmetric, we expected to observe symmetric patterns over the left and right side of θ=π/2 and θ=3π/2, respectively. Figure 3B was the transformed version of Figure 3A from polar coordinates to xy coordinates. We computed xy coordinates by x=r cos ⁡(θ) and y=r sin ⁡(θ), with 0≤θ≤2π. The polar coordinates demonstrated the results of FDA modeling better, and the xy coordinate demonstrated the structures of shape better. The light blue circles were original noisy leaf shapes. The three colored mean contours overlaying on top of the observed leaf contours illustrated the three genotypic effects of a QTL regulating leaf shape variation detected by marker GCPM_1063. The genotype aa (in black) showed a narrow and elongated pattern (an elliptical leaf shape), Aa (in orange) showed a slightly wider and shorter pattern (an ovate leaf shape) and AA (in red) showed the shortest and widest pattern (a cordate leaf shape). The most striking shape variation regulated by the three genotypic effects of the QTL detected by marker GCPM_1063 could be observed near the leaf base, tip and length/width ratio. Figure 3 View largeDownload slide (A) Estimated mean functions μ^c(θ) of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. The light blue curves represent all original leaf shapes in the data sets. (B) Transformed μ^c(θ) from polar coordinate in Figure 3A to xy coordinate, i.e. from vector space to image domain. AA in red, Aa in orange and aa in black. Figure 3 View largeDownload slide (A) Estimated mean functions μ^c(θ) of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. The light blue curves represent all original leaf shapes in the data sets. (B) Transformed μ^c(θ) from polar coordinate in Figure 3A to xy coordinate, i.e. from vector space to image domain. AA in red, Aa in orange and aa in black. Figure 4 View largeDownload slide Four estimated eigenfunctions v^lc(θ),l=1,…,4 of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. Figure 4 View largeDownload slide Four estimated eigenfunctions v^lc(θ),l=1,…,4 of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. Four eigenfunctions vlc(θ), l=1,…,4, shown in Figure 4, were used to approximate the infinite-dimensional decomposition of the estimated smooth covariance function G^c(θ1,θ2) in Equation (8). These eigenfunctions considered not only between subject variation but also within subject variation. The first eigenfunction had trends similar to the mean function, which indicated that the largest variation was achieved near θ=π/2,π and 3π/2. The three different genotypes of the QTL detected by marker GCPM_1063 explained different percentages of total variation. The first eigenfunction v13(θ) for AA (in red) explained 44.66% percentage of total variation, v12(θ) for Aa (in orange) explained 62.96% percentage of total variation and v11(θ) for aa (in black) explained 53.86% of total variation. We also noticed that the first eigenfunction v^1c had negative values over the intervals (0,π4), (3π4,5π4), and (7π4,2π), for all c = 1, 2, 3. It indicated that a RCC curve with a positive (or negative) functional principal component score ζ^i1c along the direction of v^1c tends to have smaller (or larger) values in these intervals of θ than the overall population average. Unlike the first eigenfunction, the remaining three eigenfunctions explained small percentages of total variation from different directions. The largest absolute magnitudes of the second eigenfunction v^2c for the three genotypes happened around θ=3π/2. The third eigenfunction v^3c divided variation into smaller segments and varied every π/4 segment. The orange line of the fourth eigenfunction v^42 had negative values around the interval θ=[5π/8,10π/8], whereas red v^43 and black v^41 lines had positive signs on the same interval. Different signs indicated that the variation of genotype Aa changed in the opposite direction as those of aa and AA. In summary, the four eigenfunctions of genotype AA (in red) explained 86% of total variation, Aa (in orange) 94% and aa (in black) 90%, respectively. Discussion In this article, we integrated a shape feature extraction skill, a mixture FDA model, a LD-based QTL detection scheme and an IUT hypothesis testing approach into one framework to detect significant QTLs regulating shape variation. After quantifying shapes by the RCC approach, each image was in the format of a high-dimensional curve. Then, we used a local polynomial kernel smoothing approach to model the smooth mean function and FPCA approach to model the smooth covariance function [38–41]. The original shape curves were modeled as samples of one of the three mean functions corresponding to three genotypic effects of a QTL plus additive noise. The three genotypic effects were estimated by incorporating the EM algorithm into a mixture FDA model. Besides the white noise caused by experimental error, the sum of infinite-dimensional smooth eigenfunctions explained between- and within-subject variation [41]. Finally, the equality of mean functions for three genotypes was tested on each candidate marker to determine the significance of the effect of a QTL, and the LD parameter was tested to evaluate the strength of the LD between the detected QTL and each candidate marker. The IUT and permutation approaches were used to guarantee the simultaneous rejection of these two tests before a significant conclusion could be made. We used MATLAB to computationally implement the modeling proposed in this article. The code is available per e-mail request. We worked on a poplar leaf shape data set and detected two significant markers, GCPM_1063 and GCPM_1034-1. We observed great disparities in both mean and variance structures among three genotypes of the QTL in Figures 3 and 4, which visually confirmed the existence of the QTL. Additionally, the output (both mean and variance structures) of this approach is in the form of high-dimensional curves or functions with corresponding angles. This output makes it possible to demonstrate the association between shape changes and QTL, as well as variations in the nature and strength of the QTL effect. This data set only has limited biallelic markers. These markers neither provide full genome coverage nor the concomitant ability to pinpoint specific loci affecting the trait. Therefore, this article was not intended for making any report about important biological or genetic findings and does not restrict the number of candidate genes, gene names or locations. Despite the data set limitations, it represents the only data currently available for poplar species. Additionally, a data set simultaneously containing leaf shape photos (phenotype) and a large-scale genetic markers is indeed rare for all species. The proposed methods are applicable to other shape data with a higher density of genetic markers. If parallel could be incorporated, it would also work for the GWAS shape data (with a high-dimensional phenotype in curves, as well as a high-dimensional genotype data covering the entire genome). We overcame two challenges in current quantitative genetic shape mapping field. First, we accurately described shape quantitatively without much loss of information. Second, we efficiently modeled high-dimensional shape curves. The majority of the current shape studies have used PCA to decrease the dimension of the high-dimensional shape curves and have extracted shape feature from principal component scores as the modeling target [7–9, 17, 19–31]. However, the limitations of PCA have been noticed and reported in recent literature [33]. Specifically, the modeling of PCs (especially those PCs explaining a small amount of total variation) is sensitive to outliers, neglects the overall dynamic trends of shape, skips the intercorrelations among different shape components, yields spurious significance associations and produces false discovery genetic findings [59]. As a solution, we provided an alternative approach so that the entire shape curve (instead of the isolating PCs) could be modeled functionally. The isolating PC approach only uses partial information and therefore is less accurate and harder to interpret. Additionally, the mixture functional framework handles the heterogeneity common in genetic data [60, 61], and serves the data-driven demands better than traditional FDA. Shape analysis has been widely studied in a large span of disciplines, including agriculture [62], plant genetics [8], ecology and evolution [63], developmental biology [18, 64], public health [65] and so on. Specifically, it has been used to analyze a variety of plant characteristics, including the root shape of Japanese radishes [66], as well as the leaves of tomatoes [29], fossils [67], grapevines [9, 31], temperate rainforest trees [68] and flowers [69], to name a few. Shape analysis has also been used to model several human health-related characteristics including bones [70], teeth mandibles [71], brains [72], hearts [65], human skulls [73] and many other traits. Leaf shape has been found to be correlated with important fruit attributes such as the sugar level of fruit [8], flavor and seedlessness of grapes [9]. The shape of plant anatomy (root, grain, vegetable, leaf and fruit) plays a crucial role in improving quantity and quality of agricultural products and crop improvement, which might be attributable to the indirect impact of leaf shape on photosynthesis or the shared pathways between plant organs [66]. Additionally, the shape of agricultural products is also important for marketing, grading and classifying products in terms of commercial quality [62]. Paleontologists use the size and shape of fossilized plants to reconstruct ancient climate [67]. The analyses of butterflies suggest a correlation between flight performance and wing shape [74]. The shape and contour of a person’s teeth and mandibles have large implications for their smile, self-expression and function in society [75]. Some human diseases are manifested in abnormal shape of organs. For example, the shape analysis of the human heart may help diagnose hypertrophic cardiomyopathy and hypertensive heart disease [65]. Shape analysis of the human skull has also been used to study the evolutionary process [73]. For these reasons, this article will contribute to the improvement in methodology in the quantitative shape mapping field and promote multidisciplinary collaborations. The future applications of shape mapping in the aforementioned disciplines will extend the dimension of shape descriptors from two to three, incorporate multiple high-dimensional shapes per subject as the modeling target, study the development trajectories of shape (longitudinal shape) and consider more complex factors associated with the shape variation, such as the gene–environment interactions, gene–gene interactions, polygenic effects and so on. All of these future directions call for advanced statistical models. In addition to the general benefits of FDA models in shape mapping studies that are presented here, our framework could also handle other mixture Gaussian processes with a hidden variable and uncertain classifications, and associations between curve response (other than shape) and categorical predictors, including situations where the level of categorical predictors is unknown or is higher than three. Additionally, the proposed approach could be easily extended to sparse or irregular cases. Key Points Morphological shape is an important trait that responds to environmental adaptation, evolution, natural selection and genetic involvement. Shape has higher heritability than other univariate biological traits. The majority of shape research has implemented PCA to reduce dimensionality and extracted the shape feature after a shape is described with a few discrete landmark points on a coarse scale. FDA could be an alternative approach to directly model a high-dimensional curve after a shape is transformed into such a curve via a fine-scale shape analysis approach. Guifang Fu is an assistant professor of Statistics at Utah State University. Her research interests are shape analysis, statistical genetics and computational biology. Mian Huang is an associate professor of Statistics at Shanghai University of Finance and Economics. His research interests are nonparametric modeling and functional data analysis. Wenhao Bo is a Postdoc of plant genetics at Beijing Forestry University. His research interests are molecular genetics and plant genetics. Han Hao is a graduate student of Statistics at Pennsylvania State University. Her research interests are statistical genetics and system biology. Rongling Wu is a Changjiang Scholars Professor of Genetics at Beijing Forestry University and Distinguished Professor of Statistics at Pennsylvania State University. His research interests are shape analysis, statistical genetics, systems biology and computational biology. Acknowledgements The authors thank Xiaotian Dai for making the figures, and thank Randall Reese and Brennan Bean for the English revisions. Funding The National Science Foundation (grant number DMS1413366 to G.F.); the National Institutes of Health (grant numbers U01 HL119178 and UL1 TR000127 to R.W.); and the National Science Foundation China (grant number 11301324 and Shanghai Chenguang Program to M.H.) and the Fundamental Research Funds for the Central Universities (grant number BLX2014-22 to B.W.). References 1 Ricklefs RE , Miles DB. Ecological and evolutionary inferences from morphology: an ecological perspective. In: Ecological Morphology: Integrative Organismal Biology ,Vol. 101 . Chicago : University of Chicago Press , 1994 , 13 – 41 . 2 Reich PB. Body size, geometry, longevity and metabolism: do plant leaves behave like animal bodies? Trends Ecol Evol 2001 ; 16 ( 12 ): 674 – 80 . Google Scholar CrossRef Search ADS 3 Tsukaya H. Leaf shape: genetic controls and environmental factors . Int J Dev Biol 2005 ; 49 ( 5–6 ): 547 – 55 . Google Scholar CrossRef Search ADS PubMed 4 Van der Knaap E , Lippman Z , Tanksley S. Extremely elongated tomato fruit controlled by four quantitative trait loci with epistatic interactions . Theor Appl Genet 2002 ; 104 ( 2–3 ): 241 – 7 . Google Scholar CrossRef Search ADS PubMed 5 Tanksley SD. The genetic, developmental, and molecular bases of fruit size and shape variation in tomato . Plant Cell 2004 ; 16 ( suppl 1 ): S181 – 9 . Google Scholar CrossRef Search ADS PubMed 6 Scarpella E , Barkoulas M , Tsiantis M. Control of leaf and vein development by auxin . Cold Spring Harb Perspect Biol 2010 ; 2 ( 1 ): a001511. Google Scholar CrossRef Search ADS PubMed 7 Langlade NB , Feng X , Dransfield T , et al. Evolution through genetically controlled allometry space . Proc Natl Acad Sci USA 2005 ; 102 ( 29 ): 10221 – 6 . Google Scholar CrossRef Search ADS PubMed 8 Chitwood DH , Kumar R , Headland LR , et al. A quantitative genetic basis for leaf morphology in a set of precisely defined tomato introgression lines . Plant Cell 2013 ; 25 ( 7 ): 2465 – 81 . Google Scholar CrossRef Search ADS PubMed 9 Chitwood DH , Ranjan A , Martinez CC , et al. A modern ampelography: a genetic basis for leaf shape and venation patterning in grape . Plant Physiol 2014 ; 164 ( 1 ): 259 – 72 . Google Scholar CrossRef Search ADS PubMed 10 Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces . Bull Lond Math Soc 1984 ; 16 ( 2 ): 81 – 121 . Google Scholar CrossRef Search ADS 11 Cootes TF , Taylor CJ , Cooper DH , et al. Active shape models-their training and application . Comput Vis Image Underst 1995 ; 61 ( 1 ): 38 – 59 . Google Scholar CrossRef Search ADS 12 Dryden IL , Mardia KV. Statistical Shape Analysis . J Wiley Chichester , 1998 . 13 Abbasi S , Mokhtarian F , Kittler J. Enhancing CSS-based shape retrieval for objects with shallow concavities . Image Vis Comput 2000 ; 18 ( 3 ): 199 – 211 . Google Scholar CrossRef Search ADS 14 Davies RH , Cootes TF , Taylor CJ. A minimum description length approach to statistical shape modeling . Inf Process Med Imaging 2001 ; 2082 : 50 – 63 . Google Scholar CrossRef Search ADS 15 Zhang J , Zhang X , Krim H , et al. Object representation and recognition in shape spaces . Pattern Recognit 2003 ; 36 ( 5 ): 1143 – 54 . Google Scholar CrossRef Search ADS 16 Super BJ. Fast correspondence-based system for shape retrieval . Pattern Recognit Lett 2004 ; 25 ( 2 ): 217 – 25 . Google Scholar CrossRef Search ADS 17 Renaud S , Auffray JC , De la Porte S. Epigenetic effects on the mouse mandible: common features and discrepancies in remodeling due to muscular dystrophy and response to food consistency . BMC Evol Biol 2010 ; 10 ( 1 ): 28. Google Scholar CrossRef Search ADS PubMed 18 Klingenberg CP. Evolution and development of shape: integrating quantitative approaches . Nat Rev Genet 2010 ; 11 ( 9 ): 623 – 35 . Google Scholar CrossRef Search ADS PubMed 19 Myers EM , Janzen FJ , Adams DC , et al. Quantitative genetics of plastron shape in slider turtles (Trachemys scripta) . Evolution 2006 ; 60 ( 3 ): 563 – 72 . Google Scholar CrossRef Search ADS PubMed 20 Albert AY , Sawaya S , Vines TH , et al. The genetics of adaptive shape shift in stickleback: pleiotropy and effect size . Evolution 2008 ; 62 ( 1 ): 76 – 85 . Google Scholar PubMed 21 Bensmihen S , Hanna AI , Langlade NB , et al. Mutational spaces for leaf shape and size . HFSP J 2008 ; 2 ( 2 ): 110 – 20 . Google Scholar CrossRef Search ADS PubMed 22 Drake AG , Klingenberg CP. The pace of morphological change: historical transformation of skull shape in St Bernard dogs . Proc R Soc Lond B Biol Sci 2008 ; 275 ( 1630 ): 71 – 6 . Google Scholar CrossRef Search ADS 23 Feng X , Wilson Y , Bowers J , et al. Evolution of allometry in Antirrhinum . Plant Cell 2009 ; 21 ( 10 ): 2999 – 3007 . Google Scholar CrossRef Search ADS PubMed 24 Drake AG , Klingenberg CP. Large-scale diversification of skull shape in domestic dogs: disparity and modularity . Am Naturalist 2010 ; 175 ( 3 ): 289 – 301 . Google Scholar CrossRef Search ADS 25 Chitwood DH , Headland LR , Ranjan A , et al. Leaf asymmetry as a developmental constraint imposed by auxin-dependent phyllotactic patterning . Plant Cell 2012 ; 24 ( 6 ): 2318 – 27 . Google Scholar CrossRef Search ADS PubMed 26 Chitwood DH , Headland LR , Kumar R , et al. The developmental trajectory of leaflet morphology in wild tomato species . Plant Physiol 2012 ; 158 ( 3 ): 1230 – 40 . Google Scholar CrossRef Search ADS PubMed 27 Chitwood DH , Headland LR , Filiault DL , et al. Native environment modulates leaf size and response to simulated foliar shade across wild tomato species . PLoS One 2012 ; 7 ( 1 ): e29570. Google Scholar CrossRef Search ADS PubMed 28 Fu G , Bo W , Pang X , et al. Mapping shape quantitative trait loci using a radius-centroid-contour model . Heredity 2013 ; 110 ( 6 ): 511 – 9 . Google Scholar CrossRef Search ADS PubMed 29 Chitwood DH , Ranjan A , Kumar R , et al. Resolving distinct genetic regulators of tomato leaf shape within a heteroblastic and ontogenetic context . Plant Cell 2014 ; 26 ( 9 ): 3616 – 29 . Google Scholar CrossRef Search ADS PubMed 30 Chitwood DH , Topp CN. Revealing plant cryptotypes: defining meaningful phenotypes among infinite traits . Curr Opin Plant Biol 2015 ; 24 : 54 – 60 . Google Scholar CrossRef Search ADS PubMed 31 Chitwood DH , Klein LL , O’Hanlon R , et al. Latent developmental and evolutionary shapes embedded within the grapevine leaf . New Phytol 2015 ; 210 ( 1 ): 343 – 55 . Google Scholar CrossRef Search ADS PubMed 32 Monteiro LR. Multivariate regression models and geometric morphometrics: the search for causal factors in the analysis of shape . Syst Biol 1999 ; 48 ( 1 ): 192 – 9 . Google Scholar CrossRef Search ADS PubMed 33 Bookstein FL. The relation between geometric morphometrics and functional morphology, as explored by Procrustes interpretation of individual shape measures pertinent to function . Anat Rec 2015 ; 298 ( 1 ): 314 – 27 . Google Scholar CrossRef Search ADS 34 Müller HG. Functional modelling and classification of longitudinal data . Scand J Stat 2005 ; 32 ( 2 ): 223 – 40 . Google Scholar CrossRef Search ADS 35 Soille P. Morphological Image Analysis: Principles and Applications . Springer Science & Business Media, Springer-Verlag Berlin Heidelberg , 2013 . 36 Ramsay J , Silverman B. Functional Data Analysis . New York : Springer , 1997 . Google Scholar CrossRef Search ADS 37 Ramsay JO , Silverman BW. Applied Functional Data Analysis: Methods and Case Studies . Citeseer, Springer-Verlag, NY , 2002 . Google Scholar CrossRef Search ADS 38 Yao F , Müller HG , Clifford AJ , et al. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate . Biometrics 2003 ; 59 ( 3 ): 676 – 85 . Google Scholar CrossRef Search ADS PubMed 39 Yao F , Müller HG , Wang JL. Functional data analysis for sparse longitudinal data . J Am Stat Assoc 2005 ; 100 ( 470 ): 577 – 90 . Google Scholar CrossRef Search ADS 40 Yao F , Lee T. Penalized spline models for functional principal component analysis . J R Stat Soc Series B Stat Methodol 2006 ; 68 ( 1 ): 3 – 25 . Google Scholar CrossRef Search ADS 41 Müller HG , Stadtmüller U , Yao F. Functional variance processes . J Am Stat Assoc 2006 ; 101 ( 475 ): 1007 – 18 . Google Scholar CrossRef Search ADS 42 Nettel-Aguirre A. Nuclei shape analysis, a statistical approach . Image Anal Stereol 2008 ; 27 ( 1 ): 1 – 10 . Google Scholar CrossRef Search ADS 43 Epifanio I , Ventura-Campos N. Functional data analysis in shape analysis . Comput Stat Data Anal 2011 ; 55 ( 9 ): 2758 – 73 . Google Scholar CrossRef Search ADS 44 Tan KL , Ooi BC , Thiang LF. Indexing shapes in image databases using the centroid–radii model . Data Knowl Eng 2000 ; 32 ( 3 ): 271 – 89 . Google Scholar CrossRef Search ADS 45 Kong X , Luo Q , Zeng G , et al. A new shape descriptor based on centroid–radii model and wavelet transform . Opt Commun 2007 ; 273 ( 2 ): 362 – 6 . Google Scholar CrossRef Search ADS 46 Saunders G , Fu G , Stevens JR. A Graphical Weighted Power Improving Multiplicity Correction Approach for SNP Selections . Curr Genomics 2014 ; 15 ( 5 ): 380. Google Scholar CrossRef Search ADS PubMed 47 Huang M , Li R , Wang H , et al. Estimating mixture of Gaussian processes by kernel smoothing . J Bus Econ Stat 2014 ; 32 ( 2 ): 259 – 70 . Google Scholar CrossRef Search ADS PubMed 48 Wang S , Huang M , Wu X , et al. Mixture of functional linear models and its application to CO2-GDP functional data . Comput Stat Data Anal 2016 ; 97 : 1 – 15 . Google Scholar CrossRef Search ADS 49 Hall P , Müller HG , Wang JL. Properties of principal component methods for functional and longitudinal data analysis . Annal Stat 2006 ; 34 ( 3 ): 1493 – 517 . Google Scholar CrossRef Search ADS 50 Indritz J. Methods in Analysis . Macmillan , 1963 . 51 Karhunen K , Zur spektraltheorie stochastischer prozess . Annaleas Academiae Scientiarum Fennicae , 1946 ; 34 . 52 Fan J , Gijbels I. Variable bandwidth and local linear regression smoothers . Ann Stat 1992 ; 20 : 2008 – 36 . Google Scholar CrossRef Search ADS 53 Epanechnikov VA. Non-parametric estimation of a multivariate probability density . Theory Probab Appl 1969 ; 14 ( 1 ): 153 – 8 . Google Scholar CrossRef Search ADS 54 Rice JA , Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves . J R Stat Soc Series B Methodol 1991 ; 53 : 233 – 43 . 55 Wang Z , Wu R. A statistical model for high-resolution mapping of quantitative trait loci determining HIV dynamics . Stat Med 2004 ; 23 ( 19 ): 3033 – 51 . Google Scholar CrossRef Search ADS PubMed 56 Fan J , Zhang C , Zhang J. Generalized likelihood ratio statistics and Wilks phenomenon . Ann Stat 2001 ; 29 : 153 – 93 . Google Scholar CrossRef Search ADS 57 Wu R , Ma C , Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps and Qtl . Springer Science & Business Media, Springer-Verlag, NY , 2007 . 58 Churchill GA , Doerge RW. Empirical threshold values for quantitative trait mapping . Genetics 1994 ; 138 ( 3 ): 963 – 71 . Google Scholar PubMed 59 Aschard H , Vilhjálmsson BJ , Greliche N , et al. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies . Am J Hum Genet 2014 ; 94 ( 5 ): 662 – 76 . Google Scholar CrossRef Search ADS PubMed 60 Turnpenny PD , Ellard S. Emery’s Elements of Medical Genetics . Elsevier Health Sciences, Churchill Livingstone , 2011 . 61 McClellan J , King MC. Genetic heterogeneity in human disease . Cell 2010 ; 141 ( 2 ): 210 – 7 . Google Scholar CrossRef Search ADS PubMed 62 Costa C , Antonucci F , Pallottino F , et al. Shape analysis of agricultural products: a review of recent research advances and potential application to computer vision . Food Bioproc Tech 2011 ; 4 ( 5 ): 673 – 92 . Google Scholar CrossRef Search ADS 63 Hollander J , Collyer M , Adams D , et al. Phenotypic plasticity in two marine snails: constraints superseding life history . J Evol Biol 2006 ; 19 ( 6 ): 1861 – 72 . Google Scholar CrossRef Search ADS PubMed 64 Zelditch ML , Swiderski DL , Sheets HD. Geometric Morphometrics for Biologists: A Primer . Academic Press , 2012 . 65 Ardekani S , Jain S , Sanzi A , et al. Shape analysis of hypertrophic and hypertensive heart disease using MRI-based 3D surface models of left ventricular geometry . Med Image Anal 2016 ; 29 : 12 – 23 . Google Scholar CrossRef Search ADS PubMed 66 Iwata H , Niikura S , Matsuura S , et al. Evaluation of variation of root shape of Japanese radish (Raphanus sativus L.) based on image analysis using elliptic Fourier descriptors . Euphytica 1998 ; 102 ( 2 ): 143 – 9 . Google Scholar CrossRef Search ADS 67 Royer DL , Wilf P , Janesko DA , et al. Correlations of climate and plant ecology to leaf size and shape: potential proxies for the fossil record . Am J Botany 2005 ; 92 ( 7 ): 1141 – 51 . Google Scholar CrossRef Search ADS 68 Ostria-Gallardo E , Ranjan A , Chitwood DH , et al. Transcriptomic analysis suggests a key role for squamosa promoter binding protein like, NAC and YUCCA genes in the heteroblastic development of the temperate rainforest tree Gevuina avellana (Proteaceae) . New Phytol 2016 ; 210 ( 2 ): 694 – 708 . Google Scholar CrossRef Search ADS PubMed 69 Miao Z , Gandelin MH , Yuan B. A new image shape analysis approach and its application to flower shape analysis . Image Vis Comput 2006 ; 24 ( 10 ): 1115 – 22 . Google Scholar CrossRef Search ADS 70 Lu YC , Untaroiu CD. Statistical shape analysis of clavicular cortical bone with applications to the development of mean and boundary shape models . Comput Methods Programs Biomed 2013 ; 111 ( 3 ): 613 – 28 . Google Scholar CrossRef Search ADS PubMed 71 Polychronis G , Christou P , Mavragani M , et al. Geometric morphometric 3D shape analysis and covariation of human mandibular and maxillary first molars . Am J Phys Anthropol 2013 ; 152 ( 2 ): 186 – 96 . Google Scholar PubMed 72 Styner M , Oguz I , Xu S , et al. Framework for the statistical shape analysis of brain structures using SPHARM-PDM . Insight J 2006 ; 1071 : 242 – 50 . 73 Martínez-Abadías N , Esparza M , Sjøvold T , et al. Pervasive genetic integration directs the evolution of human skull shape . Evolution 2012 ; 66 ( 4 ): 1010 – 23 . Google Scholar CrossRef Search ADS PubMed 74 Betts C , Wootton R. Wing shape and flight behaviour in butterflies (Lepidoptera: Papilionoidea and Hesperioidea): a preliminary analysis . J Exp Biol 1988 ; 138 ( 1 ): 271 – 88 . 75 Sarver DM. Principles of cosmetic dentistry in orthodontics: part 1. Shape and proportionality of anterior teeth . Am J Orthod Dentofacial Orthop 2004 ; 126 ( 6 ): 749 – 53 . Google Scholar CrossRef Search ADS PubMed © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Briefings in Bioinformatics Oxford University Press

Mapping morphological shape as a high-dimensional functional curve

Loading next page...
 
/lp/ou_press/mapping-morphological-shape-as-a-high-dimensional-functional-curve-X62e0zrtoc
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1467-5463
eISSN
1477-4054
D.O.I.
10.1093/bib/bbw111
Publisher site
See Article on Publisher Site

Abstract

Abstract Detecting how genes regulate biological shape has become a multidisciplinary research interest because of its wide application in many disciplines. Despite its fundamental importance, the challenges of accurately extracting information from an image, statistically modeling the high-dimensional shape and meticulously locating shape quantitative trait loci (QTL) affect the progress of this research. In this article, we propose a novel integrated framework that incorporates shape analysis, statistical curve modeling and genetic mapping to detect significant QTLs regulating variation of biological shape traits. After quantifying morphological shape via a radius centroid contour approach, each shape, as a phenotype, was characterized as a high-dimensional curve, varying as angle θ runs clockwise with the first point starting from angle zero. We then modeled the dynamic trajectories of three mean curves and variation patterns as functions of θ. Our framework led to the detection of a few significant QTLs regulating the variation of leaf shape collected from a natural population of poplar, Populus szechuanica var tibetica. This population, distributed at altitudes 2000–4500 m above sea level, is an evolutionarily important plant species. This is the first work in the quantitative genetic shape mapping area that emphasizes a sense of ‘function’ instead of decomposing the shape into a few discrete principal components, as the majority of shape studies do. morphological shape, quantitative shape genetic mapping, functional data analysis, Populus szechuanica var tibetica, leaf shape Introduction Tremendous variation in morphology has provided conspicuous evidence for environmental adaptation, biological evolution, natural selection and genetic involvement [1–3]. It has been reported that genes have an important role in controlling phenotypic variation in shape [4–6], and leaf shape has a high heritability [7–9]. Shape (also called form or morphology) is formally defined as the geometrical information that remains when location, scale and rotational effects are filtered out from an object [10]. Three main challenges in the quantitative genetic shape mapping research call for better approaches: (1) how to describe shape accurately, (2) how to model high-dimensional shape curves better without losing too much information and (3) how to detect shape quantitative trait loci (QTL) [7, 11–18]. In current literature, the majority of shape traits have been modeled by multivariate analysis. A few discrete landmark points are first extracted to describe a shape on a coarse scale, and then multivariate analysis approaches are implemented to model these discrete multiple points. Specifically, principal component analysis (PCA) has been the most popular approach in shape studies that projects the multidimensional space into a much lower dimension [7–9, 17, 19–31]; multivariate regression models have been used for analyzing allometry or evolutionary change in shape over time [22, 32]. Although these approaches achieved significant results for genetic shape mapping research, there still is room for improvement. A small amount of sparse landmark points neglect many subtle details of a shape, particularly irregular outlines. Arguments also exist regarding the functional interpretations of PCA [33]. Additionally, multivariate analysis does not use the dynamic trajectory of a trait and hence cannot capture the overall trends of a shape in a functional sense. As a continuous dynamic pattern can be observed underneath discrete landmark points [34], a shape curve is more biologically meaningful when treated as a function or trajectory [35] rather than multiple discrete points. This article introduces an alternative approach that improves on existing quantitative genetic shape mapping research by describing shape information using high-dimensional curves and capturing smooth trajectories underlying shape curves using functional data analysis (FDA) strategies. While FDA is a relatively mature approach in the field of statistics [36–41], it has seldom been used in shape research. As far as we know, in current quantitative shape literature, there are only two articles implementing FDA to shape analysis [42, 43]. However, these two articles had dramatically different aims and foci than our article. The aforementioned articles simply studied shape curves with the user-friendly ‘FDA’ library without involving any genetic association. The genetic association that we considered here not only involved the shape trait (as the response) and genetic variables (including a hidden variable) but also doubled the difficulties of statistical modeling because no available package or model could accurately solve our problems. This article was motivated by leaf shape data collected from a natural population of poplars, Populus Szechuanica var. tibetica. The radius centroid contour (RCC) approach extracted shape information accurately and worked well for any two-dimensional convex outlines. Fu et al. [28] described the details of how shape information was extracted from an image and how RCC transformed a shape into a high-dimensional curve [44, 45]. The main focus of this article is addressing the second and third aforementioned challenges of statistically modeling high-dimensional shape curves, and detecting significant QTLs regulating variation of shape curves. After the shape analysis step, each input image was output in the form of high-dimensional curve varying with the angle θ running clockwise. We modeled the dynamic trends of means and covariance structures of shape curves as functions of θ via nonparametric kernel regression and functional principal component analysis (FPCA) [38–41]. Unlike traditional parametric approaches that directly coped with high-dimensional covariance matrices, FPCA describes random trajectories in terms of eigenfunctions that are interpreted as modes of variation of shape curves. The framework proposed in this study was data adaptive, without assuming any prespecified parametric structures. Additionally, it incorporated shape analysis, FDA and a linkage disequilibrium (LD)-based QTL detection scheme into a novel framework to improve the methodology of quantitative shape mapping research, particularly when modeling high-dimensional shape curves. Besides shape, our approach is applicable to any other data set with curves as response and categorical variables as predictors. Mixture FDA model Let Yij, i=1,…,n; j=1,…,Ni denote the RCC value (i.e. response in statistical terminology and phenotype in genetic terminology) of the ith subject at the jth angular degree. Here, Ni was the total number of angles measured for the ith subject. To make comparisons of different shapes meaningful, we set the number of measurements consistent for each leaf (i.e. Ni = 180 for each i=1,…,n). However, we kept the subject-related notation Ni to make it easy to extend to any irregular or sparse case. As the measurement scales were also allowed to vary for different subjects, we considered θ as a random variable, denoted as θij, i=1,…,n, j=1,…,Ni, lying in a compact interval T. Then for each shape i, the original measurements were (θ⃗i,Y⃗i), i=1,…,n. We denoted two alleles of a marker as M (with allele frequency p) and m (with allele frequency 1−p) and two alleles of a QTL as A (with allele frequency q) and a (with allele frequency 1−q). The two alleles of a QTL formed three genotypes, expressed as AA (coded as c = 3), Aa (coded as c = 2) and aa (coded as c = 1). The genotype of a marker was a categorical predictor, and the genotype of a QTL was a hidden variable. We assumed that the QTL and marker were associated through a LD parameter D. The marker and QTL formed four haplotypes MA, Ma, mA and ma, with the frequencies p11=pq+D,p10=p(1−q)−D,p01=(1−p)q−D and p00=(1−p)(1−q)+D, respectively. The haplotypes from maternal and paternal parents unite randomly to generate nine marker-QTL genotype combinations. The LD-based QTL mapping model detected the hidden QTL by way of the conditional probability for a QTL genotype c given an observable marker genotype for subject i, expressed as πc|i. For a natural population satisfying the Hardy–Weinberg equilibrium, πc|i was computed from a 3 × 3 contingency table in terms of p, q and D [46]. The basic statistical idea behind QTL mapping was a mixture model, in which each observed shape Y⃗i,i=1,…,n was assumed to be attributed to one of the three QTL genotypes, and each genotypic effect on phenotype was modeled from a separate density function (assuming that each quantitative shape curve follows a Gaussian process). Specifically, the effects of a QTL responsible for variation of shape traits could be modeled as [47, 48] Yij=∑c=13ξicXic(θij)+εij, i=1,…,n; j=1,…,Ni; θij∈T, (1) where ξic is an indicator variable describing a possible QTL genotype c for subject i. That is, ξic was coded as 1 if the corresponding QTL genotype c had an effect on ith shape, and 0 otherwise. Xic(θ) is the underlying stochastic process of Y⃗i defined in L2(T) for a particular genotype (c = 1, 2, 3). The trajectories of Xic(θ) are also smooth functions of θ. εij is the experimental error assumed to be independent and identically distributed as N(0,σ2). For a fixed genotype c, we denoted the mean function of Xic(θ) as μc(θ) and the covariance function of it as Gc(θ1,θ2)=cov{Xic(θ1),Xic(θ2)}, θ1, θ2∈T. The mean function μc(θ) was interpreted as the cth genotypic effect of a QTL on the shape trait. Throughout this article, it was assumed that μc(θ) was a smooth function for θ∈T, and Gc(θ1,θ2) was a positive definite and bivariate smooth function for θ1,θ2∈T. Being ‘smooth’ means that the function is twice continuously differentiable. The idea of model (1) was that the originally observed messy shape curves could be modeled as a mixture of three smooth functions accounting for genotypic effects and additive noise. Nonparametric functional PCA model The main idea of FPCA is to interpret Gc(θ1,θ2) as the kernel of a linear mapping on the space of square-integrable functions L2(T), mapping f∈L2(T) to AGf∈L2(T), defined by Hall et al. [49] as (AGf)(θ2)=∫Tf(θ1)Gc(θ1,θ2)dθ1. An eigenfunction v(θ) of the operator AG is a solution of the equation AGv(θ) =λv(θ), with corresponding eigenvalue λ. For a fixed c, we assumed that the operator AG had a sequence of smooth orthonormal eigenfunctions vlc(θ) satisfying ∫Tvkc(θ)vlc(θ)dθ=δklc; k,l=1,…,∞ (here, δklc was the Kronecker symbol), with ordered eigenvalues λ1c≥λ2c≥…≥0. By Mercer’s theorem [50], a spectral decomposition on the function Gc(θ1,θ2), Hilbert–Schmidt kernel, yielded Gc(θ1,θ2)=∑l=1∞λlcvlc(θ1)vlc(θ2), θ1,θ2∈T. (2) Applying the generalized Fourier expansion (Karhunen–Loeve theorem proposed by Karhunen [51]), the stochastic process Xic(θ) can be decomposed into Xic(θ)=μc(θ)+∑l=1∞ζilcvlc(θ), i=1,…,n, (3) where the sum is defined in the sense of L2 convergence, with uniform convergence, and ζilc=⟨Xic−μc,vlc⟩=∫T(Xic(θ)−μc(θ))vlc(θ)dθ. (4)ζilc are uncorrelated random variables with E(ζilc)=0, and var(ζilc)=λlc, subject to L2 convergence, i.e. Σlλlc<∞. ζilc is frequently referred to as the lth functional principal component scores or the lth dominant modes of variation structure. Combining Equations (1) and (3), we obtained Yij=∑c=13ξic[μc(θij)+∑l=1∞ζilcvlc(θij)]+εij, i=1,…,n; j=1,…,Ni; θ∈T. (5) Parameter estimate From the original observed data set D={(θ⃗i,Y⃗i),i=1,…n}, we obtained estimators for all unknown parameters μ^c(θ), ζ^ilc, v^lc(θ), p^,q^,D^ and σ^2 contained in model (5). If the estimator μ^c(θ) was obtained, we could compute a rough estimator of covariance matrix by G¯ij1j2c=(Yij1−μ^c(θj1))(Yij2−μ^c(θj2)). The local linear smoother estimator G^c(θ1,θ2) for Gc(θ1,θ2) was obtained by minimizing ∑i=1nπc|i∑1≤j1≠j2≤NiK2(θij1−θ1hG,θij2−θ2hG){G¯ij1j2c−β}2, (6) with respect to β=β0 [40, 41, 47]. K2(·,·) is the bivariate nonnegative compactly supported kernel function used as weights for locally weighted least squares smoothing [47, 52]. In this article, we used the Epanechnikov function as the kernel functions [53]. Also, hG is the bandwidth corresponding to the kernel function K2. The minimization of Equation (6) yielded G^c(θ1,θ2)=β0^(θ1,θ2) [41], which could be solved in a closed form [47] as follows: G^c(θ1,θ2)=∑i=1nπc|i∑1≤j1≠j2≤NiG¯ij1j2cK2(θij1−θ1hG,θij2−θ2hG)∑i=1nπc|i∑1≤j1≠j2≤NiK2(θij1−θ1hG,θij2−θ2hG). (7) Once the smoothed covariance function G^c(θ1,θ2) was computed from Equation (7), it was then discretized on a suitable finite grid and represented as a covariance matrix. The estimators of the eigenfunctions were obtained by a corresponding spectral decomposition on G^c(θ1,θ2) [54]. To be more specific, λ^lc are the eigenvalues of G^c(θ1,θ2), given by ∫TG^c(θ1,θ2)v^lc(θ1)dθ1=λ^lcv^lc(θ2). Furthermore, the v^lc are the eigenfunctions corresponding to λ^lc, satisfying ∫Tv^lc2(θ)dθ=1 and ∫Tv^kc(θ)v^lc(θ)dθ=0 if k≠l;k,l=1,…,∞. Here, G^c also agreed with an empirical version of the expansion (2) G^c(θ1,θ2)=∑l=1∞I(λ^lc>0)λ^lcv^lc(θ1)v^lc(θ2), (8) where I is the indicator function used to keep only the terms with positive eigenvalues. The positive definiteness of the estimated covariance matrix G^c(θ1,θ2) is not always guaranteed, which might cause problems in practical applications. To avoid potential problems, Yao et al. [38] proposed a solution by only keeping positive eigenvalues [34]. If some λ^lc were negative, then we dropped these negative eigenvalues and their corresponding eigenfunctions, and reconstituted the estimate from the remaining eigenvalue and eigenfunction estimates until only positive eigenvalues were retained. Once v^lc and λ^lc were obtained, fitting trajectories required the estimation of functional principal component scores. By discretizing on Equation (4), plugging μ^c and v^lc into a Riemann sum approximation of the integral, and setting θ0=0, we obtained ζ^ilc=∑j=1Ni(Yij−μ^(θj))v^lc(θj)(θj−θj−1) (9) [34]. This approximation by sum worked well for the regular case. If the data were sparse or irregular, another approach called Principal Analysis through Conditional Expectation could be considered [39]. We then estimated the mean function μ^c(θ) by removing the covariance portion to explore the remaining terms. We defined Yij*=Yij−∑c=13ξic∑l=1∞ζ^ilcv^lc(θij), i=1,…,n,j=1,…,Ni; θij∈T. (10) Then, Equation (5) yielded Yij*=∑c=13ξicμc(θij)+εij, i=1,…,n;j=1,…,Ni; θij∈T. (11) The only unknown parameters in Equation (11) were μc(θ) and σ2. We integrated the Expectation–maximization (EM) algorithm and local polynomial kernel smoothing to estimate these parameters. Let W={ω1,…,ωm}∈T be m grid points at which the values of mean functions will be estimated. For any ωa∈W, we approximated μ^c(θj) from μ^c(ωa) for θj located within a bandwidth hu of ωa, a∈{1,…,m} and j∈{1,…,max⁡(Ni)}. The unknown parameters in Equation (11) could be estimated by maximizing the following local log likelihood ∑i=1n log ⁡[∑c=13πc|i∏j=1Nifc{Yij*|μc(θj),σ2}]K1(θj−ωahu). (12) Here, K1(·) is a nonnegative univariate compactly supported kernel function that was used as weights for local polynomial smoothing. The EM algorithm based on W={ω1,…,ωm}∈T was derived in the following steps. The E step was designed to calculate the posterior probability that subject i had a QTL genotype c given the marker genotype and phenotype data via Πic=πc|i∏j=1Nifc{Yij*|μc(θj),σ2}∑c=13πc|i∏j=1Nifc{Yij*|μc(θj),σ2}. Using these posterior probabilities, the M step was derived to estimate the haplotype frequencies [55] and all other unknown parameters via μc(ωa)=∑i=1nΠic∑j=1NiYij*K1(θj−ωahu)∑i=1nΠic∑j=1NiK1(θj−ωahu), and we then approximated  μc(θj)  from  μc(ωa), (13) σ^2=1∑i=1nNi∑i=1n∑c=13Πic∑j=1Ni[Yij*−μc(θj)]2, p^11=12N[∑i=1N1(2Πi1+Πi2)+∑i=1N2(Πi1+RΠi2)], p^10=12N[∑i=1N1(Πi2+2Πi3)+∑i=1N2(Πi3+(1−R)Πi2)], p^01=12N[∑i=1N3(2Πi1+Πi2)+∑i=1N2(Πi1+(1−R)Πi2)], p^00=12N[∑i=1N3(Πi2+2Πi1)+∑i=1N2(Πi3+RΠi2)], where R=p^11p^00/(p^11p^00+p^10p^01), and the observation numbers of three marker genotypes were denoted as N1 for MM, N2 for Mm and N3 for mm. We summarized the estimating procedures in the following steps: Set initial values for all unknown parameters. Computed the initial value of p^ from the observed marker genotype. As QTL was not observable and q was an important parameter, we extensively searched the initial values of q^, scanning from 0.1 to 0.9 with a scale of 0.1. The optimal initial value of q^ was determined by the maximum value of the log likelihood (12). Doing so allowed us to avoid the sensitivity of initial values and improved the accuracy of the estimator. The iteration is ready to start after selecting initial values for D^, μ^c(θ), ξ^ic and σ^2. The initial values of π^c|i, p^11,p^10,p^01 and p^00 are derived from the initial values of p^, q^ and D^. Computed a rough estimator of covariance matrix G¯ij1j2c. Estimated smooth covariance function G^c(θ1,θ2) by bivariate local linear smoothing Equations (6 and 7). Computed eigenfunctions v^lc(θ) and eigenvalues λ^lc of discretized G^c(θ1,θ2). Only kept the terms with positive eigenvalues. Used Equation (9) to compute functional principal component scores ζ^ilc. Computed Y^ij* from Equation (10). Applied EM algorithm to update μ^c(θ), ξ^ic, σ^2, p^11,p^10,p^01,p^00, p^,q^,D^ and π^c|i. Here, p^,q^, and D^ could be derived from p^11,p^10,p^01 and p^00 via p^=p^11+p^10, q^=p^11+p^01 and D^=p^11p^00−p^10p^01. Repeated step (2) through step (7) until all estimators converged. Tuning parameter selection The number of eigenfunctions used to approximate infinite-dimensional longitudinal process, as well as degree of smoothness, was two tuning parameters that critically affected the performance of the model. Degree of smoothness was determined by the number of grid points W={ω1,…,ωm} and bandwidths hu and hG. The density of the grid determines the accuracy of the analysis. In other words, the larger the number of grid points, the more accurate the estimation will be. Therefore, we set W={0,0.01,0.02,…,1} (100 points) as a ‘dense enough’ choice of the grid. Additionally, following the empirical suggestions of Huang et al. [47] and Wang et al. [48], we chose the number of eigenfunctions required to explain at least 80% of the total variation. The tuning parameters that needed to be selected were bandwidth hu and bandwidth hG. One-subject-leave-out cross-validation has been a popular approach to select tuning parameters. However, because of the large number of computations because of multiple markers, several initial scans of q and bandwidth selection, this would be computationally expensive. Therefore, we used Akaike information criterion (AIC) instead, as Yao et al. [39] used it to obtain similar results in a more computationally efficient manner. The AIC has the form −2L+2×df, where L is the maximum value of the log-likelihood function (12), and df is the effective degrees of freedom that measure the complexity of the model. As models (5) and (12) involve μc(θ), Gc(θ1,θ2), ζilc, vlc(θ) and local polynomial kernel smoothing based on grid points, it is not trivial to determine the degrees of freedom. Inspired by Fan et al. [56] and Wang et al. [48], we adjusted their degrees of freedom to define the shape models presented in this article as df=3+1+3*dfu+3*dfG. For the one-dimensional mean functions, the effective degree of freedom can be computed from [48, 56] dfu=τKhu−1|T|ψK, where T is the support of θ, ψK=K(0)−12∫K2(θ)dθ, and τK=K(0)−12∫K2(t)dt∫{K(t)−12(K*K)(t)}2dt. Similarly, the effective degree of freedom for the two-dimensional covariance function can be defined as [48, 56] dfG=τKhG−1|T|ψK(τKhG−1|T|ψK+1)/2. Hypothesis tests The effects of a QTL on shape variation were mainly manifested by three smooth mean functions μ1(θ),μ2(θ), and μ3(θ) in Equation (5). To draw a scientific conclusion about whether a QTL was significantly associated with shape variation, a hypothesis test was needed: H0:μ1(θ)=μ2(θ)=μ3(θ), H1: At least  one  of  the  equalities  above  did  not hold . (14)H0 corresponded to the reduced model, in which only a single mean function was involved, which indicated that a QTL had no controls for the phenotypic variation. H1 corresponded to the full model, in which three mean functions were necessary for three different genotypes. The test statistic was calculated as the log-likelihood ratio of the reduced to full model. In addition to the above hypothesis test related to the QTL’s existence, we also tested the strength of LD between the QTL and the marker. The basic assumption behind LD-based QTL model was the existence of LD between the hidden QTL and the observed marker. The significance of LD could be tested by H0:D=0 vs. H1:D≠0, (15) where H0 corresponded to the case that a marker and a QTL were at linkage equilibrium (i.e. no correlation). The test statistic for this hypothesis test was χ2=2nD2/(p(1−p)q(1−q)), which was reported to follow a χ2 distribution with one degree of freedom for a univariate trait [28, 57]. However, for the high-dimensional functional shape trait considered in this article, the assumed distribution may not hold. A significant shape-related QTL is not detected unless tests (14) and (15) are simultaneously rejected for each marker, where (14) tests for an association between QTL and shape, and (15) tests for LD between the observable marker and the detected (yet unobservable) QTL. Rejecting both tests simultaneously indicates that the QTL exists and is located around the observed marker, as opposed to the QTL being both randomly segregated and located far away from the marker. In this context, the intersection union test (IUT) approach should be used. However, as the distributions defined for a univariate simple trait may not hold for a high-dimensional complex trait, we permuted 1000 times to determine the significance for these two tests [58]. Real leaf shape data analysis We worked on a real leaf data set collected from poplars, Populus Szechuanica var. tibetica. A sample of n = 106 independent trees was randomly selected from a natural population distributed throughout the Tibet plateau. Twenty-nine microsatellite markers (16 primers) were genotyped. Meanwhile, the leaf images were collected by photographing one randomly picked representative leaf per tree. Each image was then processed using the shape analysis skills and quantitatively represented by a RCC curve of 180 dimensions. Figure 1 illustrates the detailed object and background separation, contour extraction and RCC transformation procedures [28]. Figure 1 View largeDownload slide Diagrammatic illustration of shape analysis procedure. (A) Original photo with leaf object put on top of a cloth; (B) segmentation with object represented as white and background represented as black; (C) extracted contour of leaf from the background and removed leaf stem; and (D) RCC curve by computing the radius between centroid and contour points. Figure 1 View largeDownload slide Diagrammatic illustration of shape analysis procedure. (A) Original photo with leaf object put on top of a cloth; (B) segmentation with object represented as white and background represented as black; (C) extracted contour of leaf from the background and removed leaf stem; and (D) RCC curve by computing the radius between centroid and contour points. We chose K = 4 as the optimal number of eigenfunctions because they explained 83.7% of the total variation for all leaves observed, and both the magnitudes of eigenvalues and the percentage of variation explained dropped off dramatically after the fourth order and the changes resulting from remaining eigenfunctions were trivial. The choice of grid points was W={0,0.01,0.02,…,1}. Moreover, we chose the optimal bandwidths as hu=0.01 and hG=0.05 after scanning 25 candidates via AIC from the set of hu∈{0.01,0.05,0.08,0.1,0.5} and hG∈{0.01,0.05,0.08,0.1,0.5}. Figure 2 illustrates the AIC values of all 25 combinations for these two tuning parameters. Figure 2 View largeDownload slide AIC values for 25 combinations of fine scanning on bandwidths hu and hG. Figure 2 View largeDownload slide AIC values for 25 combinations of fine scanning on bandwidths hu and hG. We performed the proposed method for each of the candidate markers. The markers determined significant by permutation are GCPM_1063 with p^=0.514, q^=0.669, D^=0.063, P-value of 0.004 for the QTL test and P-value of 0.004 for the LD test; and GCPM_1034-1 with p^=0.132, q^=0.669, D^=0.039, P-value of 0.003 for the QTL test and P-value of 0.004 for the LD test. Note that the number of permutations affects the P-values, and the reported P-values mean that only three or four among 1000 permuted test statistic values are more extreme than the observed test statistic values on real data. We demonstrate the result for the most significant finding GCPM_1063 in the following figures and paragraphs. The estimated smoothing mean functions μ^c(θ) for three genotypes of the QTL detected by marker GCPM_1063 are demonstrated in Figure 3A. The light blue curves were the RCC values of the originally observed noisy leaf shapes, from which we easily noticed that underlying dynamic patterns existed. As θ increased clockwise from 0 to 2π ( 2π≈6.28), three mean curves μ^c(θ), c = 1, 2, 3, corresponding to three different genotypic effects, showed different functional trajectories. The biggest variations were observed at θ=π/2 and θ=3π/2 where genotype aa (in black) had the highest peak. As the leaf blade was roughly bilateral symmetric, we expected to observe symmetric patterns over the left and right side of θ=π/2 and θ=3π/2, respectively. Figure 3B was the transformed version of Figure 3A from polar coordinates to xy coordinates. We computed xy coordinates by x=r cos ⁡(θ) and y=r sin ⁡(θ), with 0≤θ≤2π. The polar coordinates demonstrated the results of FDA modeling better, and the xy coordinate demonstrated the structures of shape better. The light blue circles were original noisy leaf shapes. The three colored mean contours overlaying on top of the observed leaf contours illustrated the three genotypic effects of a QTL regulating leaf shape variation detected by marker GCPM_1063. The genotype aa (in black) showed a narrow and elongated pattern (an elliptical leaf shape), Aa (in orange) showed a slightly wider and shorter pattern (an ovate leaf shape) and AA (in red) showed the shortest and widest pattern (a cordate leaf shape). The most striking shape variation regulated by the three genotypic effects of the QTL detected by marker GCPM_1063 could be observed near the leaf base, tip and length/width ratio. Figure 3 View largeDownload slide (A) Estimated mean functions μ^c(θ) of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. The light blue curves represent all original leaf shapes in the data sets. (B) Transformed μ^c(θ) from polar coordinate in Figure 3A to xy coordinate, i.e. from vector space to image domain. AA in red, Aa in orange and aa in black. Figure 3 View largeDownload slide (A) Estimated mean functions μ^c(θ) of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. The light blue curves represent all original leaf shapes in the data sets. (B) Transformed μ^c(θ) from polar coordinate in Figure 3A to xy coordinate, i.e. from vector space to image domain. AA in red, Aa in orange and aa in black. Figure 4 View largeDownload slide Four estimated eigenfunctions v^lc(θ),l=1,…,4 of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. Figure 4 View largeDownload slide Four estimated eigenfunctions v^lc(θ),l=1,…,4 of RCC curves for three genotypes of the QTL detected by marker GCPM_1063. AA in red, Aa in orange and aa in black. Four eigenfunctions vlc(θ), l=1,…,4, shown in Figure 4, were used to approximate the infinite-dimensional decomposition of the estimated smooth covariance function G^c(θ1,θ2) in Equation (8). These eigenfunctions considered not only between subject variation but also within subject variation. The first eigenfunction had trends similar to the mean function, which indicated that the largest variation was achieved near θ=π/2,π and 3π/2. The three different genotypes of the QTL detected by marker GCPM_1063 explained different percentages of total variation. The first eigenfunction v13(θ) for AA (in red) explained 44.66% percentage of total variation, v12(θ) for Aa (in orange) explained 62.96% percentage of total variation and v11(θ) for aa (in black) explained 53.86% of total variation. We also noticed that the first eigenfunction v^1c had negative values over the intervals (0,π4), (3π4,5π4), and (7π4,2π), for all c = 1, 2, 3. It indicated that a RCC curve with a positive (or negative) functional principal component score ζ^i1c along the direction of v^1c tends to have smaller (or larger) values in these intervals of θ than the overall population average. Unlike the first eigenfunction, the remaining three eigenfunctions explained small percentages of total variation from different directions. The largest absolute magnitudes of the second eigenfunction v^2c for the three genotypes happened around θ=3π/2. The third eigenfunction v^3c divided variation into smaller segments and varied every π/4 segment. The orange line of the fourth eigenfunction v^42 had negative values around the interval θ=[5π/8,10π/8], whereas red v^43 and black v^41 lines had positive signs on the same interval. Different signs indicated that the variation of genotype Aa changed in the opposite direction as those of aa and AA. In summary, the four eigenfunctions of genotype AA (in red) explained 86% of total variation, Aa (in orange) 94% and aa (in black) 90%, respectively. Discussion In this article, we integrated a shape feature extraction skill, a mixture FDA model, a LD-based QTL detection scheme and an IUT hypothesis testing approach into one framework to detect significant QTLs regulating shape variation. After quantifying shapes by the RCC approach, each image was in the format of a high-dimensional curve. Then, we used a local polynomial kernel smoothing approach to model the smooth mean function and FPCA approach to model the smooth covariance function [38–41]. The original shape curves were modeled as samples of one of the three mean functions corresponding to three genotypic effects of a QTL plus additive noise. The three genotypic effects were estimated by incorporating the EM algorithm into a mixture FDA model. Besides the white noise caused by experimental error, the sum of infinite-dimensional smooth eigenfunctions explained between- and within-subject variation [41]. Finally, the equality of mean functions for three genotypes was tested on each candidate marker to determine the significance of the effect of a QTL, and the LD parameter was tested to evaluate the strength of the LD between the detected QTL and each candidate marker. The IUT and permutation approaches were used to guarantee the simultaneous rejection of these two tests before a significant conclusion could be made. We used MATLAB to computationally implement the modeling proposed in this article. The code is available per e-mail request. We worked on a poplar leaf shape data set and detected two significant markers, GCPM_1063 and GCPM_1034-1. We observed great disparities in both mean and variance structures among three genotypes of the QTL in Figures 3 and 4, which visually confirmed the existence of the QTL. Additionally, the output (both mean and variance structures) of this approach is in the form of high-dimensional curves or functions with corresponding angles. This output makes it possible to demonstrate the association between shape changes and QTL, as well as variations in the nature and strength of the QTL effect. This data set only has limited biallelic markers. These markers neither provide full genome coverage nor the concomitant ability to pinpoint specific loci affecting the trait. Therefore, this article was not intended for making any report about important biological or genetic findings and does not restrict the number of candidate genes, gene names or locations. Despite the data set limitations, it represents the only data currently available for poplar species. Additionally, a data set simultaneously containing leaf shape photos (phenotype) and a large-scale genetic markers is indeed rare for all species. The proposed methods are applicable to other shape data with a higher density of genetic markers. If parallel could be incorporated, it would also work for the GWAS shape data (with a high-dimensional phenotype in curves, as well as a high-dimensional genotype data covering the entire genome). We overcame two challenges in current quantitative genetic shape mapping field. First, we accurately described shape quantitatively without much loss of information. Second, we efficiently modeled high-dimensional shape curves. The majority of the current shape studies have used PCA to decrease the dimension of the high-dimensional shape curves and have extracted shape feature from principal component scores as the modeling target [7–9, 17, 19–31]. However, the limitations of PCA have been noticed and reported in recent literature [33]. Specifically, the modeling of PCs (especially those PCs explaining a small amount of total variation) is sensitive to outliers, neglects the overall dynamic trends of shape, skips the intercorrelations among different shape components, yields spurious significance associations and produces false discovery genetic findings [59]. As a solution, we provided an alternative approach so that the entire shape curve (instead of the isolating PCs) could be modeled functionally. The isolating PC approach only uses partial information and therefore is less accurate and harder to interpret. Additionally, the mixture functional framework handles the heterogeneity common in genetic data [60, 61], and serves the data-driven demands better than traditional FDA. Shape analysis has been widely studied in a large span of disciplines, including agriculture [62], plant genetics [8], ecology and evolution [63], developmental biology [18, 64], public health [65] and so on. Specifically, it has been used to analyze a variety of plant characteristics, including the root shape of Japanese radishes [66], as well as the leaves of tomatoes [29], fossils [67], grapevines [9, 31], temperate rainforest trees [68] and flowers [69], to name a few. Shape analysis has also been used to model several human health-related characteristics including bones [70], teeth mandibles [71], brains [72], hearts [65], human skulls [73] and many other traits. Leaf shape has been found to be correlated with important fruit attributes such as the sugar level of fruit [8], flavor and seedlessness of grapes [9]. The shape of plant anatomy (root, grain, vegetable, leaf and fruit) plays a crucial role in improving quantity and quality of agricultural products and crop improvement, which might be attributable to the indirect impact of leaf shape on photosynthesis or the shared pathways between plant organs [66]. Additionally, the shape of agricultural products is also important for marketing, grading and classifying products in terms of commercial quality [62]. Paleontologists use the size and shape of fossilized plants to reconstruct ancient climate [67]. The analyses of butterflies suggest a correlation between flight performance and wing shape [74]. The shape and contour of a person’s teeth and mandibles have large implications for their smile, self-expression and function in society [75]. Some human diseases are manifested in abnormal shape of organs. For example, the shape analysis of the human heart may help diagnose hypertrophic cardiomyopathy and hypertensive heart disease [65]. Shape analysis of the human skull has also been used to study the evolutionary process [73]. For these reasons, this article will contribute to the improvement in methodology in the quantitative shape mapping field and promote multidisciplinary collaborations. The future applications of shape mapping in the aforementioned disciplines will extend the dimension of shape descriptors from two to three, incorporate multiple high-dimensional shapes per subject as the modeling target, study the development trajectories of shape (longitudinal shape) and consider more complex factors associated with the shape variation, such as the gene–environment interactions, gene–gene interactions, polygenic effects and so on. All of these future directions call for advanced statistical models. In addition to the general benefits of FDA models in shape mapping studies that are presented here, our framework could also handle other mixture Gaussian processes with a hidden variable and uncertain classifications, and associations between curve response (other than shape) and categorical predictors, including situations where the level of categorical predictors is unknown or is higher than three. Additionally, the proposed approach could be easily extended to sparse or irregular cases. Key Points Morphological shape is an important trait that responds to environmental adaptation, evolution, natural selection and genetic involvement. Shape has higher heritability than other univariate biological traits. The majority of shape research has implemented PCA to reduce dimensionality and extracted the shape feature after a shape is described with a few discrete landmark points on a coarse scale. FDA could be an alternative approach to directly model a high-dimensional curve after a shape is transformed into such a curve via a fine-scale shape analysis approach. Guifang Fu is an assistant professor of Statistics at Utah State University. Her research interests are shape analysis, statistical genetics and computational biology. Mian Huang is an associate professor of Statistics at Shanghai University of Finance and Economics. His research interests are nonparametric modeling and functional data analysis. Wenhao Bo is a Postdoc of plant genetics at Beijing Forestry University. His research interests are molecular genetics and plant genetics. Han Hao is a graduate student of Statistics at Pennsylvania State University. Her research interests are statistical genetics and system biology. Rongling Wu is a Changjiang Scholars Professor of Genetics at Beijing Forestry University and Distinguished Professor of Statistics at Pennsylvania State University. His research interests are shape analysis, statistical genetics, systems biology and computational biology. Acknowledgements The authors thank Xiaotian Dai for making the figures, and thank Randall Reese and Brennan Bean for the English revisions. Funding The National Science Foundation (grant number DMS1413366 to G.F.); the National Institutes of Health (grant numbers U01 HL119178 and UL1 TR000127 to R.W.); and the National Science Foundation China (grant number 11301324 and Shanghai Chenguang Program to M.H.) and the Fundamental Research Funds for the Central Universities (grant number BLX2014-22 to B.W.). References 1 Ricklefs RE , Miles DB. Ecological and evolutionary inferences from morphology: an ecological perspective. In: Ecological Morphology: Integrative Organismal Biology ,Vol. 101 . Chicago : University of Chicago Press , 1994 , 13 – 41 . 2 Reich PB. Body size, geometry, longevity and metabolism: do plant leaves behave like animal bodies? Trends Ecol Evol 2001 ; 16 ( 12 ): 674 – 80 . Google Scholar CrossRef Search ADS 3 Tsukaya H. Leaf shape: genetic controls and environmental factors . Int J Dev Biol 2005 ; 49 ( 5–6 ): 547 – 55 . Google Scholar CrossRef Search ADS PubMed 4 Van der Knaap E , Lippman Z , Tanksley S. Extremely elongated tomato fruit controlled by four quantitative trait loci with epistatic interactions . Theor Appl Genet 2002 ; 104 ( 2–3 ): 241 – 7 . Google Scholar CrossRef Search ADS PubMed 5 Tanksley SD. The genetic, developmental, and molecular bases of fruit size and shape variation in tomato . Plant Cell 2004 ; 16 ( suppl 1 ): S181 – 9 . Google Scholar CrossRef Search ADS PubMed 6 Scarpella E , Barkoulas M , Tsiantis M. Control of leaf and vein development by auxin . Cold Spring Harb Perspect Biol 2010 ; 2 ( 1 ): a001511. Google Scholar CrossRef Search ADS PubMed 7 Langlade NB , Feng X , Dransfield T , et al. Evolution through genetically controlled allometry space . Proc Natl Acad Sci USA 2005 ; 102 ( 29 ): 10221 – 6 . Google Scholar CrossRef Search ADS PubMed 8 Chitwood DH , Kumar R , Headland LR , et al. A quantitative genetic basis for leaf morphology in a set of precisely defined tomato introgression lines . Plant Cell 2013 ; 25 ( 7 ): 2465 – 81 . Google Scholar CrossRef Search ADS PubMed 9 Chitwood DH , Ranjan A , Martinez CC , et al. A modern ampelography: a genetic basis for leaf shape and venation patterning in grape . Plant Physiol 2014 ; 164 ( 1 ): 259 – 72 . Google Scholar CrossRef Search ADS PubMed 10 Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces . Bull Lond Math Soc 1984 ; 16 ( 2 ): 81 – 121 . Google Scholar CrossRef Search ADS 11 Cootes TF , Taylor CJ , Cooper DH , et al. Active shape models-their training and application . Comput Vis Image Underst 1995 ; 61 ( 1 ): 38 – 59 . Google Scholar CrossRef Search ADS 12 Dryden IL , Mardia KV. Statistical Shape Analysis . J Wiley Chichester , 1998 . 13 Abbasi S , Mokhtarian F , Kittler J. Enhancing CSS-based shape retrieval for objects with shallow concavities . Image Vis Comput 2000 ; 18 ( 3 ): 199 – 211 . Google Scholar CrossRef Search ADS 14 Davies RH , Cootes TF , Taylor CJ. A minimum description length approach to statistical shape modeling . Inf Process Med Imaging 2001 ; 2082 : 50 – 63 . Google Scholar CrossRef Search ADS 15 Zhang J , Zhang X , Krim H , et al. Object representation and recognition in shape spaces . Pattern Recognit 2003 ; 36 ( 5 ): 1143 – 54 . Google Scholar CrossRef Search ADS 16 Super BJ. Fast correspondence-based system for shape retrieval . Pattern Recognit Lett 2004 ; 25 ( 2 ): 217 – 25 . Google Scholar CrossRef Search ADS 17 Renaud S , Auffray JC , De la Porte S. Epigenetic effects on the mouse mandible: common features and discrepancies in remodeling due to muscular dystrophy and response to food consistency . BMC Evol Biol 2010 ; 10 ( 1 ): 28. Google Scholar CrossRef Search ADS PubMed 18 Klingenberg CP. Evolution and development of shape: integrating quantitative approaches . Nat Rev Genet 2010 ; 11 ( 9 ): 623 – 35 . Google Scholar CrossRef Search ADS PubMed 19 Myers EM , Janzen FJ , Adams DC , et al. Quantitative genetics of plastron shape in slider turtles (Trachemys scripta) . Evolution 2006 ; 60 ( 3 ): 563 – 72 . Google Scholar CrossRef Search ADS PubMed 20 Albert AY , Sawaya S , Vines TH , et al. The genetics of adaptive shape shift in stickleback: pleiotropy and effect size . Evolution 2008 ; 62 ( 1 ): 76 – 85 . Google Scholar PubMed 21 Bensmihen S , Hanna AI , Langlade NB , et al. Mutational spaces for leaf shape and size . HFSP J 2008 ; 2 ( 2 ): 110 – 20 . Google Scholar CrossRef Search ADS PubMed 22 Drake AG , Klingenberg CP. The pace of morphological change: historical transformation of skull shape in St Bernard dogs . Proc R Soc Lond B Biol Sci 2008 ; 275 ( 1630 ): 71 – 6 . Google Scholar CrossRef Search ADS 23 Feng X , Wilson Y , Bowers J , et al. Evolution of allometry in Antirrhinum . Plant Cell 2009 ; 21 ( 10 ): 2999 – 3007 . Google Scholar CrossRef Search ADS PubMed 24 Drake AG , Klingenberg CP. Large-scale diversification of skull shape in domestic dogs: disparity and modularity . Am Naturalist 2010 ; 175 ( 3 ): 289 – 301 . Google Scholar CrossRef Search ADS 25 Chitwood DH , Headland LR , Ranjan A , et al. Leaf asymmetry as a developmental constraint imposed by auxin-dependent phyllotactic patterning . Plant Cell 2012 ; 24 ( 6 ): 2318 – 27 . Google Scholar CrossRef Search ADS PubMed 26 Chitwood DH , Headland LR , Kumar R , et al. The developmental trajectory of leaflet morphology in wild tomato species . Plant Physiol 2012 ; 158 ( 3 ): 1230 – 40 . Google Scholar CrossRef Search ADS PubMed 27 Chitwood DH , Headland LR , Filiault DL , et al. Native environment modulates leaf size and response to simulated foliar shade across wild tomato species . PLoS One 2012 ; 7 ( 1 ): e29570. Google Scholar CrossRef Search ADS PubMed 28 Fu G , Bo W , Pang X , et al. Mapping shape quantitative trait loci using a radius-centroid-contour model . Heredity 2013 ; 110 ( 6 ): 511 – 9 . Google Scholar CrossRef Search ADS PubMed 29 Chitwood DH , Ranjan A , Kumar R , et al. Resolving distinct genetic regulators of tomato leaf shape within a heteroblastic and ontogenetic context . Plant Cell 2014 ; 26 ( 9 ): 3616 – 29 . Google Scholar CrossRef Search ADS PubMed 30 Chitwood DH , Topp CN. Revealing plant cryptotypes: defining meaningful phenotypes among infinite traits . Curr Opin Plant Biol 2015 ; 24 : 54 – 60 . Google Scholar CrossRef Search ADS PubMed 31 Chitwood DH , Klein LL , O’Hanlon R , et al. Latent developmental and evolutionary shapes embedded within the grapevine leaf . New Phytol 2015 ; 210 ( 1 ): 343 – 55 . Google Scholar CrossRef Search ADS PubMed 32 Monteiro LR. Multivariate regression models and geometric morphometrics: the search for causal factors in the analysis of shape . Syst Biol 1999 ; 48 ( 1 ): 192 – 9 . Google Scholar CrossRef Search ADS PubMed 33 Bookstein FL. The relation between geometric morphometrics and functional morphology, as explored by Procrustes interpretation of individual shape measures pertinent to function . Anat Rec 2015 ; 298 ( 1 ): 314 – 27 . Google Scholar CrossRef Search ADS 34 Müller HG. Functional modelling and classification of longitudinal data . Scand J Stat 2005 ; 32 ( 2 ): 223 – 40 . Google Scholar CrossRef Search ADS 35 Soille P. Morphological Image Analysis: Principles and Applications . Springer Science & Business Media, Springer-Verlag Berlin Heidelberg , 2013 . 36 Ramsay J , Silverman B. Functional Data Analysis . New York : Springer , 1997 . Google Scholar CrossRef Search ADS 37 Ramsay JO , Silverman BW. Applied Functional Data Analysis: Methods and Case Studies . Citeseer, Springer-Verlag, NY , 2002 . Google Scholar CrossRef Search ADS 38 Yao F , Müller HG , Clifford AJ , et al. Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate . Biometrics 2003 ; 59 ( 3 ): 676 – 85 . Google Scholar CrossRef Search ADS PubMed 39 Yao F , Müller HG , Wang JL. Functional data analysis for sparse longitudinal data . J Am Stat Assoc 2005 ; 100 ( 470 ): 577 – 90 . Google Scholar CrossRef Search ADS 40 Yao F , Lee T. Penalized spline models for functional principal component analysis . J R Stat Soc Series B Stat Methodol 2006 ; 68 ( 1 ): 3 – 25 . Google Scholar CrossRef Search ADS 41 Müller HG , Stadtmüller U , Yao F. Functional variance processes . J Am Stat Assoc 2006 ; 101 ( 475 ): 1007 – 18 . Google Scholar CrossRef Search ADS 42 Nettel-Aguirre A. Nuclei shape analysis, a statistical approach . Image Anal Stereol 2008 ; 27 ( 1 ): 1 – 10 . Google Scholar CrossRef Search ADS 43 Epifanio I , Ventura-Campos N. Functional data analysis in shape analysis . Comput Stat Data Anal 2011 ; 55 ( 9 ): 2758 – 73 . Google Scholar CrossRef Search ADS 44 Tan KL , Ooi BC , Thiang LF. Indexing shapes in image databases using the centroid–radii model . Data Knowl Eng 2000 ; 32 ( 3 ): 271 – 89 . Google Scholar CrossRef Search ADS 45 Kong X , Luo Q , Zeng G , et al. A new shape descriptor based on centroid–radii model and wavelet transform . Opt Commun 2007 ; 273 ( 2 ): 362 – 6 . Google Scholar CrossRef Search ADS 46 Saunders G , Fu G , Stevens JR. A Graphical Weighted Power Improving Multiplicity Correction Approach for SNP Selections . Curr Genomics 2014 ; 15 ( 5 ): 380. Google Scholar CrossRef Search ADS PubMed 47 Huang M , Li R , Wang H , et al. Estimating mixture of Gaussian processes by kernel smoothing . J Bus Econ Stat 2014 ; 32 ( 2 ): 259 – 70 . Google Scholar CrossRef Search ADS PubMed 48 Wang S , Huang M , Wu X , et al. Mixture of functional linear models and its application to CO2-GDP functional data . Comput Stat Data Anal 2016 ; 97 : 1 – 15 . Google Scholar CrossRef Search ADS 49 Hall P , Müller HG , Wang JL. Properties of principal component methods for functional and longitudinal data analysis . Annal Stat 2006 ; 34 ( 3 ): 1493 – 517 . Google Scholar CrossRef Search ADS 50 Indritz J. Methods in Analysis . Macmillan , 1963 . 51 Karhunen K , Zur spektraltheorie stochastischer prozess . Annaleas Academiae Scientiarum Fennicae , 1946 ; 34 . 52 Fan J , Gijbels I. Variable bandwidth and local linear regression smoothers . Ann Stat 1992 ; 20 : 2008 – 36 . Google Scholar CrossRef Search ADS 53 Epanechnikov VA. Non-parametric estimation of a multivariate probability density . Theory Probab Appl 1969 ; 14 ( 1 ): 153 – 8 . Google Scholar CrossRef Search ADS 54 Rice JA , Silverman BW. Estimating the mean and covariance structure nonparametrically when the data are curves . J R Stat Soc Series B Methodol 1991 ; 53 : 233 – 43 . 55 Wang Z , Wu R. A statistical model for high-resolution mapping of quantitative trait loci determining HIV dynamics . Stat Med 2004 ; 23 ( 19 ): 3033 – 51 . Google Scholar CrossRef Search ADS PubMed 56 Fan J , Zhang C , Zhang J. Generalized likelihood ratio statistics and Wilks phenomenon . Ann Stat 2001 ; 29 : 153 – 93 . Google Scholar CrossRef Search ADS 57 Wu R , Ma C , Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps and Qtl . Springer Science & Business Media, Springer-Verlag, NY , 2007 . 58 Churchill GA , Doerge RW. Empirical threshold values for quantitative trait mapping . Genetics 1994 ; 138 ( 3 ): 963 – 71 . Google Scholar PubMed 59 Aschard H , Vilhjálmsson BJ , Greliche N , et al. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies . Am J Hum Genet 2014 ; 94 ( 5 ): 662 – 76 . Google Scholar CrossRef Search ADS PubMed 60 Turnpenny PD , Ellard S. Emery’s Elements of Medical Genetics . Elsevier Health Sciences, Churchill Livingstone , 2011 . 61 McClellan J , King MC. Genetic heterogeneity in human disease . Cell 2010 ; 141 ( 2 ): 210 – 7 . Google Scholar CrossRef Search ADS PubMed 62 Costa C , Antonucci F , Pallottino F , et al. Shape analysis of agricultural products: a review of recent research advances and potential application to computer vision . Food Bioproc Tech 2011 ; 4 ( 5 ): 673 – 92 . Google Scholar CrossRef Search ADS 63 Hollander J , Collyer M , Adams D , et al. Phenotypic plasticity in two marine snails: constraints superseding life history . J Evol Biol 2006 ; 19 ( 6 ): 1861 – 72 . Google Scholar CrossRef Search ADS PubMed 64 Zelditch ML , Swiderski DL , Sheets HD. Geometric Morphometrics for Biologists: A Primer . Academic Press , 2012 . 65 Ardekani S , Jain S , Sanzi A , et al. Shape analysis of hypertrophic and hypertensive heart disease using MRI-based 3D surface models of left ventricular geometry . Med Image Anal 2016 ; 29 : 12 – 23 . Google Scholar CrossRef Search ADS PubMed 66 Iwata H , Niikura S , Matsuura S , et al. Evaluation of variation of root shape of Japanese radish (Raphanus sativus L.) based on image analysis using elliptic Fourier descriptors . Euphytica 1998 ; 102 ( 2 ): 143 – 9 . Google Scholar CrossRef Search ADS 67 Royer DL , Wilf P , Janesko DA , et al. Correlations of climate and plant ecology to leaf size and shape: potential proxies for the fossil record . Am J Botany 2005 ; 92 ( 7 ): 1141 – 51 . Google Scholar CrossRef Search ADS 68 Ostria-Gallardo E , Ranjan A , Chitwood DH , et al. Transcriptomic analysis suggests a key role for squamosa promoter binding protein like, NAC and YUCCA genes in the heteroblastic development of the temperate rainforest tree Gevuina avellana (Proteaceae) . New Phytol 2016 ; 210 ( 2 ): 694 – 708 . Google Scholar CrossRef Search ADS PubMed 69 Miao Z , Gandelin MH , Yuan B. A new image shape analysis approach and its application to flower shape analysis . Image Vis Comput 2006 ; 24 ( 10 ): 1115 – 22 . Google Scholar CrossRef Search ADS 70 Lu YC , Untaroiu CD. Statistical shape analysis of clavicular cortical bone with applications to the development of mean and boundary shape models . Comput Methods Programs Biomed 2013 ; 111 ( 3 ): 613 – 28 . Google Scholar CrossRef Search ADS PubMed 71 Polychronis G , Christou P , Mavragani M , et al. Geometric morphometric 3D shape analysis and covariation of human mandibular and maxillary first molars . Am J Phys Anthropol 2013 ; 152 ( 2 ): 186 – 96 . Google Scholar PubMed 72 Styner M , Oguz I , Xu S , et al. Framework for the statistical shape analysis of brain structures using SPHARM-PDM . Insight J 2006 ; 1071 : 242 – 50 . 73 Martínez-Abadías N , Esparza M , Sjøvold T , et al. Pervasive genetic integration directs the evolution of human skull shape . Evolution 2012 ; 66 ( 4 ): 1010 – 23 . Google Scholar CrossRef Search ADS PubMed 74 Betts C , Wootton R. Wing shape and flight behaviour in butterflies (Lepidoptera: Papilionoidea and Hesperioidea): a preliminary analysis . J Exp Biol 1988 ; 138 ( 1 ): 271 – 88 . 75 Sarver DM. Principles of cosmetic dentistry in orthodontics: part 1. Shape and proportionality of anterior teeth . Am J Orthod Dentofacial Orthop 2004 ; 126 ( 6 ): 749 – 53 . Google Scholar CrossRef Search ADS PubMed © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Briefings in BioinformaticsOxford University Press

Published: Jan 6, 2017

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off