Background: Maize (Zea mays ssp. mays) is 1 of 3 crops, along with rice and wheat, responsible for more than one-half of all calories consumed around the world. Increasing the yield and stress tolerance of these crops is essential to meet the growing need for food. The cost and speed of plant phenotyping are currently the largest constraints on plant breeding efforts. Datasets linking new types of high-throughput phenotyping data collected from plants to the performance of the same genotypes under agronomic conditions across a wide range of environments are essential for developing new statistical approaches and computer vision–based tools. Findings A set of maize inbreds—primarily recently off patent lines—were phenotyped using a high-throughput platform at University of Nebraska-Lincoln. These lines have been previously subjected to high-density genotyping and scored for a core set of 13 phenotypes in field trials across 13 North American states in 2 years by the Genomes 2 Fields Consortium. A total of 485 GB of image data including RGB, hyperspectral, fluorescence, and thermal infrared photos has been released. Conclusions Correlations between image-based measurements and manual measurements demonstrated the feasibility of quantifying variation in plant architecture using image data. However, naive approaches to measuring traits such as biomass can introduce nonrandom measurement errors confounded with genotype variation. Analysis of hyperspectral image data demonstrated unique signatures from stem tissue. Integrating heritable phenotypes from high-throughput phenotyping data with field data from different environments can reveal previously unknown factors that influence yield plasticity. Keywords: maize; image; phenomics; field-phenotype Data description higher yield potentially. Since the green revolution, the need for food has continued to increase, and a great deal of effort in the Background public and private sectors is devoted to developing crop varieties with higher yield potential. However, as the low-hanging fruit The green revolution created a significant increase in the yields for increased yield vanish, each new increase in yield requires of several major crops in the 1960s and 1970s, dramatically re- more time and resources. Recent studies have demonstrated ducing the prevalence of hunger and famine around the world, that yield increases may have slowed or stopped for some major even as population growth continued. One of the major compo- nents of the green revolution was new varieties of major grain grain crops in large regions of the world . New approaches to plant breeding must be developed if crop production continues crops produced through conventional phenotypic selection with Received: 26 July 2017; Revised: 21 September 2017; Accepted: 23 November 2017 The Author(s) 2017. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 provided the original work is properly cited. by Ed 'DeepDyve' Gillespie user on 16 March 2018 1 2 Liang et al. to grow to meet the needs of an increasing population around gle nucleotide polymorphism (SNP) markers using genotype by the world. sequencing (GBS) . Toward these existing data, we added The major bottleneck in modern plant breeding is phenotyp- RGB, thermal infrared, fluorescent and hyperspectral images ing. Phenotyping can be used in 2 ways. First, by phenotyping a collected once per day per plant, as well as detailed water use large set of lines, a plant breeder can identify those lines with information (single-day, single-plant resolution). At the end of the highest yield potential and/or greatest stress tolerance in the experiment, 12 different types of ground truth phenotypes a given environment. Second, sufficiently detailed phenotyping were measured for individual plants, including destructive mea- measurements from enough different plants can be combined surements. A second experiment focused on interactions be- with genotypic data to identify regions of the genome of a par- tween genotype and environmental stress, collecting the same ticular plant species that carry beneficial or deleterious alleles. types of data described above from 2 maize genotypes under The breeder can then develop new crop varieties that incorpo- well–watered and water-stressed conditions . We are releas- rate as many beneficial alleles and exclude as many deleterious ing this curated dataset of high-throughput plant phenotyping alleles as possible. Phenotyping tends to be expensive and low images from accessions where data on both genotypic variation throughput, yet as breeders seek to identify larger numbers of and agronomic performance under field conditions are already alleles, each with individually smaller effects, the amount of available. All data were generated using a Lemnatec-designed phenotyping required to achieve a given increase in yield po- high-throughput greenhouse-based phenotyping system con- tential is growing. High-throughput computer vision–based ap- structed at the University of Nebraska-Lincoln. This system is proaches to plant phenotyping have the potential to ameliorate distinguished from existing public sector phenotyping systems this bottleneck. These tools can be used to precisely quantify in North America by both the ability to grow plants to a height even subtle traits in plants and will tend to decrease in unit cost of 2.5 meters and the incorporation of a hyperspectral camera with scale, while conventional phenotyping, which remains a . Given the unique properties described above, this compre- human labor–intensive process, does not. hensive dataset should lower the barriers to the development Several recent pilot studies have applied a range of im- of new computer vision approaches or statistical methodologies age processing techniques to extract phenotypic measurements by independent researchers who do not have the funding or in- from crop plants. RGB (R: Red channel; G: Green channel; B: Blue frastructure to generate the wide range of different types of data channel) camera technology, widely used in the consumer sec- needed. tor, has also been the most widely used tool in these initial ef- forts at computer vision–based plant phenotyping [2–5]. Other Methods types of cameras including fluoresence [ 6,7] and near-infrared Greenhouse management (NIR) [6,8,9] have also been employed in high-throughput plant All imaged plants were grown in the greenhouse facility of the phenotyping efforts, primarily in studies of the response of University of Nebraska-Lincoln’s Greenhouse Innovation Center plants to different abiotic stresses. (Latitude: 40.83, Longitude: −96.69) between 2 October 2015 to However, the utility of current studies is limited in 2 ways. 10 November 2015. Kernels were sown in 1.5 gallon pots with First, current analysis tools can extract only a small number of Fafard germination mix supplemented with 1 cup (236 mL) of different phenotypic measurements from images of crop plants. Osmocote plus 15 September 2012 and 1 tablespoon (15 mL) of Approximately 150 tools for analyzing plant image data are Micromax Micronutrients per 2.8 cubic feet (80 L) of soil. The tar- listed in a field-specific database; however, the majority of these get photoperiod was 14:10 with supplementary light provided by either are developed specifically for Arabidopsis thaliana,which is light-emitting diode (LED) growth lamps from 07:00 to 21:00 each a model plant, or are designed specifically to analyze images of day. The target temperature of the growth facility was between roots . Second, a great deal of image data is generated in con- 24−26 C. Pots were weighed once per day and watered back to trolled environments; however, there have been comparatively a target weight of 5400 grams from 10 September 2015 to 11 July few attempts to link phenotypic measurements in the green- 2015 and a target weight of 5500 grams from 11 August 2015 to house to performance in the field. However, one recent report the termination of the experiment. in maize suggested that more than 50% of the total variation in yield under field conditions could be predicted using traits mea- sured under controlled environments . Experimental design A total of 156 plants, representing the 32 genotypes listed in Advances in computational tools for extracting phenotypic measurements of plants from image data and statistical models Table 1, were grown and imaged, as well as 4 pots with soil but for predicting yield under different field conditions from such no plant, which serve as controls for the amount of water lost from the soil as a result of nontranspiration mechanisms (e.g., measurements require suitable training datasets. Here, we gen- erate and validate such a dataset consisting of high-throughput evaporation). The 156 plants plus control pots were arranged in a 10-row by 16-column grid, with 0.235-meter spacing between phenotyping data from 32 distinct maize (Zea mays) accessions, drawn primarily from recently off-patent lines developed by ma- plants in the same row and 1.5-meter spacing between rows (Table 2). Sequential pairs of 2 rows consisted of a complete repli- jor plant breeding companies. These accessions were selected specifically because paired data from the same lines exist for a cate with either 31 genotypes and 1 empty control pot or 32 genotypes. Within each pair of rows, genotypes were blocked in wide range of plant phenotypes collected in 54 distinct field tri- als at locations spanning 13 North American states or provinces groups of 8 (one half row), with order randomized within blocks between replicates in order to maximize statistical power to an- over 2 years . This extremely broad set of field sites captures much of the environmental variation among areas in which alyze within-greenhouse variation. maize is cultivated, with total rainfall during the growing sea- son ranging from 133.604 mm to 960.628 mm (excluding sites Plant imaging with supplemental irrigation) and peak temperatures during the The plants were imaged daily using 4 different cameras in sep- ◦ ◦ growing season ranging from 23.5 C to 34.9 C. In addition, the arate imaging chambers. The 4 types of cameras were thermal same lines have been genotyped for approximately 200 000 sin- infrared, fluorescence, conventional RGB, and hyperspectral [ 12]. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phenotyping of exPVP maize 3 Table 1: 32 genotypes in maize phenotype map mainder of the experiment. The fluorescence camera captured images with a resolution of 1038 × 1390 pixels and measured Genotype ID Genotype Source Released year emission intensity at wavelengths between 500–750 nm based on excitation, with light at 400–500 nm. Plants were imaged us- ZL1 740 Novartis Seeds 1998 ing the same 3 perspectives employed for the thermal infrared ZL2 2369 Cargill 1989 camera. The RGB camera captured images with a resolution of ZL3 A619 Public sector 1992 2454 × 2056 pixels. Initially the zoom of the RGB camera in side ZL4 A632 Public sector 1992 views was set such that each pixel corresponded to 0.746 mm at ZL5 A634 Public sector 1992 the distance of the pot from the camera. Between 5 November ZL6 B14 Public sector 1968 2015 and 10 November 2015, the zoom level of the RGB camera ZL7 B37 Public sector 1971 was reduced to keep the entire plant in the frame of the image. ZL8 B73 Public sector 1972 As a result of a system error, this same decreased zoom level ZL9 C103 Public sector 1991 ZL10 CM105 Public sector 1992 was also applied to all RGB images taken on 20 October 2015. ZL11 LH123HT Holden’s Foundation 1984 At this reduced zoom level, each pixel corresponds to 1.507 mm ZL12 LH145 Holden’s Foundation 1983 at the distance of the pot from the camera, an approximate ×2 ZL13 LH162 Holden’s Foundation 1990 change. Plants were also imaged using the same 3 perspectives ZL14 LH195 Holden’s Foundation 1989 employed for the thermal infrared camera. The hyperspectral ZL15 LH198 Holden’s Foundation 1991 camera captured images with a resolution of 320 horizontal pix- ZL16 LH74 Holden’s Foundation 1983 els. As a result of the scanning technology employed, vertical ZL17 LH82 Holden’s Foundation 1985 resolution ranged from 494 to 499 pixels. Hyperspectral imaging ZL18 Mo17 Public sector 1964 was conducted using illumination from halogen bulbs (Manu- ZL19 DKPB80 DEKALB Genetics ? facturer Sylvania, model # ES50 HM UK 240V 35W 25 GU10). ZL20 PH207 Pioneer Hi-Bred 1983 A total of 243 separate intensity values were captured for each ZL21 PHB47 Pioneer Hi-Bred 1983 pixel spanning a range of light wavelengths between 546 nm– ZL22 PHG35 Pioneer Hi-Bred 1983 1700 nm. Data from each wavelength was stored as a separate ZL23 PHG39 Pioneer Hi-Bred 1983 grayscale image. ZL24 PHG47 Pioneer Hi-Bred 1986 ZL25 PHG83 Pioneer Hi-Bred 1985 ZL26 PHJ40 Pioneer Hi-Bred 1986 Ground truth measurement ZL27 PHN82 Pioneer Hi-Bred 1989 Ground truth measurements were collected at the termina- ZL28 PHV63 Pioneer Hi-Bred 1988 tion of data collection on 11-12 November 2015. Manually ZL29 PHW52 Pioneer Hi-Bred 1988 collected phenotypes included plant height, total number of vis- ZL30 PHZ51 Pioneer Hi-Bred 1986 ible leaves, number of total fully extended leaves, stem diame- ZL31 W117HT Public sector 1982 ter at the base of the plant, stem diameter at the collar of the ZL32 Wf9 Public sector 1991 top fully extended leaf, length and width of top fully extended a leaf, and presence/absence of visible anthocyanin production in Not currently available for order. the stem. After these measurements, total above-ground fresh Genotype represented by only a single plant in the dataset. weight biomass was measured for 4 out of 5 replicates, result- ing in the destruction of the plants. Ground truth data for the Images were collected in the order that the camera types are drought-stressed subset of this dataset were collected following listed in the previous sentence. On each day, plants were imaged the procedure previously described in Ge et al. . sequentially by row, starting with row 1 column 1 and conclud- ing with row 10, column 16 (Table 2). Plants were imaged from the side at 2 angles offset 90 degrees RGB image processing from each other, as well as a top down view. On the first day of Pixels covering portions of the plant were segmented out of RGB imaging or when plants reached the 2-leaf stage of development, images using a green index ((2×G)/(R+B)). Pixels with an index the pot was rotated so that the major axis of leaf phylotaxy was value greater than 1.15  were considered plant pixels. This parallel to the camera in the PA0 orientation and perpendicular method produced some false-positive plant pixels within the re- to the camera in the PA90 orientation. This orientation is con- flective metal columns at the edge of the image. To reduce the sistent for all cameras and was not adjusted again for the re- impact of false positives, these areas were excluded from the Table 2: Experimental layout (ID: ZL1-ZL32) 9 7 310 23 25 26 19 13529 212 418 20 UP UP UP UP 11 16 1 321727 6 22243114301528 8 12 UP UP UP UP 29 31 15 13 1 17 25 9 21 30 3 5 19 14 6 UP UP UP UP 12 23 32 16 7 28 2 18 10 11 8 26 27 4 20 24 UP UP UP UP 25921 27 28 12511 156 7 4 23 31 20 UP UP UP UP 19 32 29 24 16 13 3 8 17 14 18 30 10 26 1 2 UP UP UP UP 8 1 17 23 21 5 7 24 27 18 3 11 31 15 19 2 NA NA NA NA 25 30 4 9 16 32 14 20 10 6 2928122613NANANANA 15 10 532 31 21 16 26218 9256 824 NA NA NA NA 29 13 23 14 27 7 11 30 12 1 28 4 3 20 17 19 NA NA NA NA At the time this experiment was conducted, the total size of the UNL greenhouse system was 10 rows by 20 columns. Positions marked with UP indicate pots fi lled with plants from an unrelated experiment, while positions marked with NA indicate pots that had no plants. The first complete replicate is shown in colo r, and the 4 incomplete blocks within the first replicate are marked in different colors. Marks empty pots within the experimental design. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 4 Liang et al. analysis. Therefore, when plant leaves cross the reflective metal on the difference in reflectance between stem and leaves at frame, some true plant pixels were excluded. If no plant pixels wavelengths of 1056 nm and 1151 nm, the stem was seg- were identified in the image-often the case in the first several mented from other parts of the plant by selecting pixels where days when the plant had either not germinated or had not risen (R /R ) produced a value greater than 1.2. Leaf pixels 1056nm 1151nm above the edge of the pot-the value was recorded as “NA” in the were defined as pixels identified as plant pixels based on NDVI output file. but not classified as stem pixels. In addition to the biological variation between individual plants, overall intensity variation Heritability analysis existed both between different plants imaged on the same day A linear regression model was used to analyze the genotype and the same plant on different days as a result of changes in effect (excluding genotype ZL22, which lacked replication) and the performance of the lighting used in the hyperspectral imag- greenhouse position effect on plant traits. The responses were ing chamber. To calibrate each individual image and make the modeled independently for each day as results comparable, a python script (hosted on Github; see the Code availability section) was used to normalize the intensity values of each plant pixel using data from the nonplant pixels y = μ + α + γ + , (1) h,ij,t h,t h,i,t h,ν(i, j),t h,ij,t in the same image. In order to visualize variation across 243 separate wavelength where the subscript h = 1, ..., 6 denotes the 3 responses ex- measurements across multiple plant images, we used a prin- tracted from the images: plant height, width, and size for the cipal component analysis (PCA)–based approach. After the nor- 2 views 0 and 90 degrees. The subscripts i, j,and t denote the malization described above, PCA analysis of intensity values for jth plant in the ith row and day t, respectively, and ν(i, j)stands individual pixels was conducted. PCA values of each individ- for the genotype at this pot. The parameters α and γ denote row ual plant pixel per analyzed plant were translated to intensity effect and genotype effect, respectively. The error term is . h,ij,t values using the formula [x-min(x)]/[max(x)-min(x)]. False color Let SS ,SS ,and SS be the sum of squares of the regression α,t γ ,t ,t RGB images were constructed with the values for the first prin- model (1) for the row effect, genotype effect, and the error at cipal component stored in the red channel, the second principal time t, respectively. Let SS = SS + SS + SS be the total sum t α,t γ ,t ,t component in the green channel, and the third principal com- of squares at time t. The heritability HR (2) of a given trait within ponent in the blue channel. this population was defined as the ratio of the genotype sum of squares over the sum of genotype and error sum of squares. Fluorescence image processing For the estimate of the heritability of measurement error, the A consistent area of interest was defined for each zoom level row effect term was replaced by a replicate effect (each replicate to exclude the pot and nonuniform areas of the imaging cham- consisted of 2 sequential rows), with exclusion of ZL22 as only 1 ber backdrop. Within that area, pixels with an intensity value plant of this genotype was grown. greater than 70 in the red channel were considered to be plant pixels. The aggregate fluorescence intensity was defined as the SS γ,t HR = . (2) t sum of the red channel intensity values for all pixels classi- SS + SS ,t γ,t fied as plant pixels within the region of interest, and the mean fluorescence intensity as the aggregate fluorescence intensity As the heritability index may change over the growth of the value divided by the number of plant pixels within the region of plant, a nonparametric smoothing method was provided for an- interest. alyzing the time-varying heritability of plants. The definition in (3) excludes the variation brought by the greenhouse row effect, Plant biomass prediction which can be considered the percentage of the variation in plant Two methods were used to predict plant biomass. The first was response that can be explained by the genotype effect after ad- a single variable model based on the number of zoom level– justing the environmental effect. To compare with this defini- adjusted plant pixels identified in the 2 RGB side view images tion of heritability (2), the response in the model without con- on a given day. The second was a multivariate model based upon sidering the row effect was constructed as the sum of plant pixels identified in the 2 RGB side views, sum of plant pixels identified in the 2 RGB side views plus the RGB top y = μ + γ + , (3) h,ij,t h,t h,ν(i, j),t h,ij,t view, aggregate fluorescence intensity in the 2 side views, aggre- gate fluorescence intensity in the 2 side views plus the top view, where, similarly as (1), ν(i, j) is the genotype of the jth plant in number of plant stem pixels identified in the hyperspectral im- the ith row. Let SS and SS be the genotype sum of squares γ,t t age, and number of plant leaf pixels identified in the hyperspec- and total sum of squares under (4). The classical heritability is tral image. Traits were selected to overlap with those employed defined as by Chen et al.  where possible. This multivariate dataset was used to predict plant biomass using linear modeling as well as SS γ,t Multivariate Adaptive Regression Splines (MARS), Random For- HR = . (4) SS t est, and Support Vector Machines (SVM) . MARS analysis was performed using the R package earth , Random Forest with the R package randomForest , and SVM with the R package Hyperspectral image processing e1071 . Two methods and thresholds were used to extract plant re- gions of interest from hyperspectral images. First, the com- Data validation and quality control monly used Normalized Difference Vegetation Index (NDVI) formula was applied to all pixels using the formula (R - Validation against ground truth measurements 750nm R )/(R +R ), and pixels with a value greater than 0.25 A total of approximately 500 GB of image data was ini- 705nm 750nm 705nm were classified as originating from the plant [ 13]. Second, based tially generated by the system during the course of this Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phenotyping of exPVP maize 5 Figure 1: Segmentation of images into plant and not-plant pixels for 1 representative plant (path to this image in the released dataset: Genotype˙ZL019 −> Plant˙008-19 −> Image˙Type −> Day˙32). The area enclosed by the green border is composed of pixels scored as “plant,” the area outside the green border is composed of pixels scored as “not-plant.” Minimum bounding rectangle of plant pixels is shown in red. (A) Side view, angle 1. (B) Side view, 90 degree rotation relative to A. (C) Top view. experiment, consisting of RGB images (51.1%), fluorescence im- of error that is controlled by genetic factors rather than ran- ages (4.3%), and hyperspectral images (44.6%). A subset of the dom error can be ascertained. We determined that 58% of the RGB images within this dataset were previously analyzed in total error in biomass estimate was controlled by genetic vari- Choudhury et al. , and were made available for download ation between different maize lines. As such, this error is sys- from http://plantvision.unl.edu/dataset under the terms of the tematic rather than random and thus more likely to produce Toronto Agreement. To validate the dataset and ensure that misleading downstream results when used in quantitative ge- plants had been properly tracked through both the automated netic analysis. As mentioned above, biomass and plant size are imaging system and ground truth measurements, a simple imperfectly correlated, as different plants can exhibit different script was written to segment images into plant and not-plant densities, for example, as a result of different leaf-to-stem ra- pixels (Fig. 1). Source codes for all validation analyses are posted tios. Recent reports have suggested that estimates of biomass online . incorporating multiple traits extracted from image data can in- Based on the segmentation of the image into plant and non- crease accuracy . We tested the accuracy of biomass predic- plant pixels, plant height was scored as the y-axis dimension tion of 4 multivariate estimation techniques on this dataset (see of the minimum bounding box. Plant area was scored as the to- the Methods). The correlations of the estimated biomass mea- tal number of plant pixels observed in both side view images sures with ground truth data were 0.949, 0.958, 0.925, and 0.951 after correcting for the area of each pixel at each zoom em- for multivariate linear model, MARS, Random Forest, and SVM, ployed (see the Methods). Similar approaches to estimate plant respectively. However, even when employing the most accurate biomass have been widely employed across a range of grain of these 4 methods (MARS), 63% of the error in biomass estima- crop species including rice , wheat , barley [21,22], maize tion could be explained by genetic factors. This source of error, , sorghum , and seteria . Calculated values were com- with the biomass of some lines systematically underestimated pared with manual measurements of plant height and plant and the biomass of other lines systematically overestimated, fresh biomass, which were quantified using destructive meth- presents a significant challenge to downstream quantitative ge- ods on the last day of the experiment. In both cases, manual netic analysis, given the prevalence of plant pixel counts as a measurements and image-derived estimates were highly corre- proxy for biomass [9,12,20–23]. lated, although the correlation between manual and estimated height was greater than the correlation between manually mea- Patterns of change over time sured and estimated biomass (Fig. 2A, B). Using the PlantCV soft- One of the desirable aspects of image-based plant phenotyp- ware package , equivalent correlations between estimated ing is that, unlike destructively measured phenotypes, the same and ground truth biomass were obtained (r = 0.91). Estimates plant can be imaged repeatedly. Instead of providing a snapshot of biomass using both software packages were more correlated in time, this allows researchers to quantify rates of change in with each other (r = 0.96) than either was with ground truth phenotypic values over time, providing an additional set of de- measurements. This suggests that a significant fraction of the rived trait values. Given the issues with biomass quantification remaining error is the result of the expected imperfect corre- presented above, measurements of plant height were selected lation between plant size and plant mass, rather than inaccu- to validate patterns of change in phenotypic values over time. racies in estimating plant size using individual software pack- As expected, height increases over time, and the patterns of in- ages. Recent reports have suggested that estimates of biomass crease tended to cluster together by genotype (Fig. 3A). Increases incorporating multiple traits extracted from image data can in- in height followed by declines, as observed for ZL26, were de- crease accuracy . We tested the accuracy of biomass predic- termined to be caused by a change in the angle of the main tion of 4 multivariate estimation techniques on this dataset (see stalk. While the accuracy of height estimates was assessed by the Methods). The correlation coefficients ( r values) of the es- comparison with physical ground truth measurements only on timated biomass measures with ground truth data were 0.949, the last day, the heights of 3 randomly selected plants (Plant 0.958, 0.925, and 0.951 for multivariate linear model, MARS, Ran- 007-26, Plant 002-7, and Plant 041-29) were manually measured dom Forest, and SVM, respectively. from image data and compared with software-based height es- The residual value-difference between the destructively timates, and no significant differences were observed between measured biomass value and the predicted biomass value based the manual and automated measurements (Fig. 3B; Supplemen- on image data and the linear regression line equation—was cal- tary Table 1). To perform a similar test of the accuracy of biomass culated for each individual plant (Fig. 2C).Usingdatafromthe estimation at different stages in the maize life cycle, a set of multiple replicates of each individual accession, the proportion existing ground truth measurements for 2 genotypes under 2 Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 6 Liang et al. Figure 2: Correlation between image-based and manual measurements of individual plants. (A) Plant height. (B) Plant fresh biomass. (C) Variation in the residual between estimated biomass and ground truth measurement of biomass across inbreds. stress treatments  were combined with additional later grow all broad sense heritability (Fig. 4A,B). This result suggests that stage data (Supplementary Table 2). The correlation between to- even within controlled environments such as greenhouses, sig- tal plant pixels observed in the 2 side views and plant biomass nificant micro-environmental variation exists, and that proper was actually substantially higher in this dataset (r = 0.97) than statistically based experimental design remains critically impor- the primary dataset, likely as a result of the smaller amount of tant in even controlled environment phenotyping efforts. genetic variability among these plants (Supplementary Fig. 1). If the absolute size of measurement error was constant in this experiment, as the measured values for a given trait be- Heritability of phenotypes came larger, the total proportion of variation explained by the The proportion of total phenotypic variation for a trait con- error term should decrease and, as a result, heritability should trolled by genetic variation is referred to as the heritability of increase as observed (Fig. 4A). This trend was indeed observed that trait and is a good indicator of how easy or difficult it across 6 different phenotypic measurements (three traits calcu- will be to either identify the genes that control variation in lated from each of 2 viewing angles (Fig. 4B). Plant height also ex- a given trait or to breed new crop varieties in which a given hibited significantly greater heritability than plant area or plant trait is significantly altered. Broad-sense heritability can be es- width and greater heritability when calculated solely from the timated without the need to first link specific genes to variation 90-degree side angle photo than when calculated solely from the in specific traits [ 25]. Variation in a trait that is not controlled 0-degree angle photo. by genotype can result from environmental effects, interactions In previous studies, fluorescence intensity has been treated between genotype and environment, random variance, and as an indicator for plant abiotic stress status [7,26–28]or measurement error. Controlling for estimated row effects on dif- chlorophyll content level [29,30]. Using the fluorescence images ferent phenotypic measurements significantly increased over- collected as part of this experiment, the mean fluorescence Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phenotyping of exPVP maize 7 Figure 3: (A) Plant growth curves of each of 5 replicates of 8 selected genotypes. (B) Comparison of manual measurements of plant height from image data with automated measurements for 3 randomly selected plants on each day of the experiment. DAP: days after planting. intensity value for each plant image was calculated (see the Many uses of hyperspectral data reduce the data from a Methods). We found that this trait exhibited moderate heritabil- whole plant or whole plot of genetically identical plants to a ity, with the proportion of variation controlled by genetic factors single aggregate measurement. While these approaches can in- increasing over time and reaching approximately 60% by the last crease the precision of intensity measurements for individual day of the experiment (Fig. 4B). wavelengths, these approaches also sacrifice spatial resolution and can in some cases produce apparent changes in reflectiv- ity between plants that result from variation in the ratios of Hyperspectral image validation the sizes of different organs with different reflective properties. Hyperspectral imaging of crop plants has been employed pre- To assess the extent of variation in the reflectance properties viously in field settings using airborne cameras [ 31–33]. As of individual plants, a principal component analysis of varia- a result of the architecture of grain crops such as maize, tion in intensity values for individual pixels was conducted. Af- aerial imagery will largely capture leaf tissue during veg- ter nonplant pixels were removed from the hyperspectral data etative growth, and either tassels (maize) or seed heads cube (Methods), (Fig. 5A), false color images were generated (sorghum, millet, rice, oats, etc.) during reproductive growth. encoding the intensity values of the first 3 principal compo- The dataset described here includes hyperspectral imagery nents of variation as the intensity of the red, green, and blue taken from the side of individual plants, enabling quantifica- channels, respectively (Fig. 5B, C, and D). The second princi- tion of the reflectance properties of plant stems in addition to pal component (green channel) marked boundary pixels where leaf tissue. intensity values likely represent a mixture of reflectance data Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 8 Liang et al. Figure 4: (A) The time course broad sense heritability of PH90. The heritability in the G model was calculated using a linear model that only considers the effect of genotype with residual values in the error term while heritability in the G + E model was calculated using a linear model that considers the effect of both genotype and environment (row effect) with residual values in the error term. (B) The time course broad sense heritability of PA90 before and after controlling for the row effect. (B) Variation in broad-sense heritability (H ) after controlling row effects for 6 trait measurements every second day across the phenotyping cycle. PA0: Plant Area in 0 degrees (the major axis of leaf phylotaxy was parallel to the camera at 0 degree). PA90: Plant Area in 90 degrees (the major axis of leaf phylotaxy was perpendicular to the camera at 90 degrees). PH0: Plant Height in 0 degrees. PH90: Plant Height in 90 degrees. PW0: Plant Width in 0 degrees. PW90: Plant Width in 90 degrees. PF0: Average of plant fluorescence intensity in 0 degrees. PF90: Average of plant fluorescence intensity in 90 degrees. from the plant and from the background. The first principal techniques have great potential to accelerate efforts to link component (red channel) appeared to indicate distinctions be- genotype to phenotype through ameliorating the current bot- tween pixels within the stem of the plant and pixels within the tleneck of plant phenotypic data collection, it will be important leaves. to balance the development of new image analysis tools with Based on this observation, an index was defined that accu- the awareness of the potential for systematic error resulting rately separated plant pixels into leaf and stem (see the Meth- from genetic variation between different lines of the same crop ods). Stem pixels were segmented from the rest of the plant species. using an index value derived from the difference in intensity val- ues observed in the 1056-nm and 1151-nm hyperspectral bands. Availability of source code and requirements This methodology was previously described . The reflectance pattern of individual plant stems is quite dissimilar from the Project name: Maize Phenotype Map data observed from leaves and exhibits significantly different Project home page: https://github.com/shanwai1234/Maize˙ reflective properties in some areas of the near-infrared (Fig. 6). Phenotype˙Map Characteristics of the stem are important breeding targets for Operating system(s): Linux both agronomic traits (lodging resistance, yield for biomass Programming language: Python 2.7 crops) and value-added traits (biofuel conversion potential for Other requirements: OpenCV module 2.4.8, Numpy >1.5, bioenergy crops, yield for sugarcane and sweet sorghum). Hyper- CMake > 2.6, GCC > 4.4.x, Scipy 0.13 spectral imaging of the stem has the potential to provide nonde- License: BSD 3-Clause License structive measurements of these traits. The calculated patterns of leaf reflectance for the data presented here are comparable with those observed in field-based hyperspectral studies [ 34–36], Availability of supporting data and materials both providing external validation and suggesting that the data The image data sets from 4 types of cameras, pot weight records presented here may be of use in developing new indices for use per day, and ground truth measurements with corresponding under field conditions. documentation for 32 maize inbreds and same types of image In conclusion, while the results presented above highlight data for 2 maize inbreds under 2 stress treatments were de- some of the simplest traits that can be extracted from plant posited in the CyVerse data commons under a CC0 license . image data, these represent a small fraction of the total set All image data were stored in the following data structure: Geno- of phenotypes for which image analysis algorithms currently type −> Plant −> Camera type −> Day. For the hyperspectral exist, and those in turn represent a small fraction of the to- camera, each photo is stored as 243 sub-images, each image tal set of phenotypes that can potentially be scored from im- representing intensity values for a given wavelength, so these age data. Software packages already exist to measure a range require 1 additional level of nesting in the data structure Day of plant architectural traits such as leaf length, angle, and −> wavelength. The grayscale images from the IR camera and curvature from RGB images [6,37]. Tools are also being devel- the hyperspectral imaging system are stored as 3-channel im- oped to extract phenotypic information on abiotic stress re- ages with all 3 channels in a given pixel set to identical val- sponse patterns from fluorescence imaging [ 6,7]. The analysis ues. The fluorescence images contain almost all information in of plant traits from hyperspectral image data, while common- the red channel, with the blue and green channels having in- place in the remote sensing realm, where an entire field may tensities equal to or very close to 0, but data for all 3 channels represent a single data point, is just beginning for single-plant exist. Genotype data of 32 inbreds were generated as part of a imaging. Recent work has highlighted the potential of hyper- separate project, and SNP calls for individual inbred lines were spectral imaging to quantify changes in plant composition and made available either through the Publicly Released Genomes 2 nutrient content throughout development [12,38]. While these Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phenotyping of exPVP maize 9 Figure 5: Segmentation and visualization of variation in hyperspectral signatures of representative maize plant images. (A) RGB photo of Plant 013-2 (ZL02) collected on DAP 37. (B) False color image constructed of the same corn plant from a hyperspectral photo taken on the same day. For each plant pixel, the values for each of the first 3 principal components of variation across 243 specific wavelength intensity values are encoded as 1 of the 3 color channels in the false image. (C) Equivalent visualization for Plant 048-9 (ZL09). (D) Equivalent visualization for Plant 008-19 (ZL19). Fields 2014 Field Trial dataset  or the ZeaGBSv2.7 GBS SNP morphism; SVM: Support Vector Machines; UNL: University of dataset stored in Panzea. Measurements for 13 core phenotypes Nebraska-Lincoln; PA0: Plant Area calculated from a 0-degree at each field trial as well as local weather data can be retrieved image (plants were initially orientated; then leaves would be ar- from publicly released Genomes 2 Fields datasets released on ranged parallel to the camera at 0 degrees); PA90: Plant Area CyVerse [11,40]. Data from the 2014 G2F field trials is posted [ 40], calculated from a 90-degree image (plants were initially orien- and data from the 2015 G2F field trials is posted [ 11]. Genetically tated; then leaves would be arranged perpendicular to the cam- identical seeds from the majority of the accessions used in creat- era at 90 degrees); PCA: principal component analysis; PH0: Plant ing both this dataset and the Genomes 2 Fields field trial data can Height calculated from a 0-degree image; PH90: Plant Height be ordered from public domain sources (e.g., USDA GRIN) and are calculated from a 90-degree image; PW0: Plant Width calcu- listed in Table 1. Further supporting metadata and snapshots of lated from a 0-degree image; PW90: Plant Width calculated from the Maize Phenotype Map code are available in the GigaScience a 90-degree image; PF0: average of plant fluorescence inten- database, GigaDB . sity in 0-degree; PF90: average of plant fluorescence intensity in 90 degrees. Additional file Competing interests Figure S1. Correlation of fresh weight biomass with total number of plant pixels identified in 2 side view images for maize plants The authors declare that they have no competing interests. destructively sampled at 8 different time points between 13 days and 39 DAP. Funding Abbreviations This research was supported by the Nebraska Corn Board (Award #88-R-1617-03), the Iowa Corn Board, the National Science Foun- DAP: days after planting; GBS: genotyping by sequencing; LED: dation under Grant No. OIA-1557417, and Internal University of light-emitting diode; MARS: Multivariate Adaptive Regression Nebraska funding to J.C.S. The sources of funding had no role in Splines; NDVI: Normalized Difference Vegetation Index; NIR: the design of the study; the collection, analysis, or interpretation near-infrared; RGB: an image with separate intensity values for of data; or in writing the manuscript. the red, blue and green channels; SNP: single nucleotide poly- Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 10 Liang et al. 3. Topp CN, Iyer-Pascuzzi AS, Anderson JT et al. 3D phenotyp- ing and quantitative trait locus mapping identify core re- gions of the rice genome controlling root architecture. Proc Natl Acad Sci U S A 2013;110(18):E1695–704. 4. Das A, Schneider H, Burridge J et al. Digital imaging of root traits (DIRT): a high-throughput computing and collabora- tion platform for field-based root phenomics. Plant Methods 2015;11(1):51. 5. Zhang X, Huang C, Wu D et al. High-throughput phenotyping and QTL mapping reveals the genetic architecture of maize plant growth. Plant Physiol 2017;173:1554–64. 6. Chen D, Neumann K, Friedel S et al. Dissecting the phe- notypic components of crop plant growth and drought re- sponses based on high-throughput image analysis. Plant Cell 2014;26(12):4636–55. 7. Campbell MT, Knecht AC, Berger B et al. Integrating image- based phenomics and association analysis to dissect the genetic architecture of temporal salinity responses in rice. Plant Physiol 2015;168(4):1476–89. 8. Munns R, James RA, Sirault XR et al. New phenotyping meth- ods for screening wheat and barley for beneficial responses to water deficit. J Exp Bot 2010; 61(13):3499–507. 9. Fahlgren N, Feldman M, Gehan MA et al. A versatile pheno- typing system and analytics platform reveals diverse tem- poral responses to water availability in Setaria. Mol Plant 2015;8(10):1520–35. 10. Lobet G, Draye X, Perilleux ´ C. An online database for plant image analysis software tools. Plant Methods 2013;9(1): 11. Lawrence-Dill C. Genomes To Fields (2015 growing sea- son data release). CyVerse Data Commons. 2017; doi: 10.7946/P24S31. Figure 6: Reflectance values for 3 plants: Plant 090-6 (ZL06), Plant 002-7 (ZL07), and Plant 145-16 (ZL16) on 3 days across development. (A) Reflectance values for 12. Ge Y, Bai G, Stoerger V et al. Temporal dynamics of maize nonstem plant pixels (i.e., leaves). (B) Reflectance values for pixels within the plant growth, water use, and leaf water content using au- plant stem. tomated high throughput RGB and hyperspectral imaging. Comput Electron Agricult 2016;127:625–32. 13. Gamon J, Surfus J. Assessing leaf pigment content and activ- Author contributions ity with a reflectometer. New Phytologist 1999; 143(1):105–17. J.C.S., and Y.Q. designed the experiment; V.S., J.C.S., and Z.L. per- 14. Chen D, Shi R, Pape JM et al. Predicting plant biomass formed data acquisition; Z.L., P.P., Y.Q., Y.X., Y.G., and J.C.S. ana- accumulation from image-derived parameters. bioRxiv lyzed and interpreted the data; Z.L. and J.C.S. produced and cu- 2016;046656. rated the metadata; Z.L. and J.C.S. implemented software; Z.L. 15. Milborrow S. earth: Multivariate Adaptive Regression and J.C.S. prepared the initial draft. All authors reviewed the Splines. R package 4.5.1. 2014, https://CRAN.R-project.org/ manuscript. package=earth. 16. Breiman L, Cutler A, Liaw A et al. randomForest: Breiman and Cutler’s Random Forests for Classification and Regression. Acknowledgments 2002, https://CRAN.R-project.org/package=randomForest. The authors are grateful to Yang Zhang, Xianjun Lai, and Daniel 17. Meyer D, Dimitriadou E, Hornik K et al. e1071: Misc Func- W.C. Ngu for help in collecting manual measurements of plants, tions of the Department of Statistics, Probability Theory Thomas Hoban for manually counting pixels of selected plant Group (Formerly: E1071), TU Wien. R package 1.6.8. 2005, images, Kent M. Eskridge for valuable discussions on experimen- https://CRAN.R-project.org/package=e1071. tal design, Addie Thompson and Jinliang Yang for assistance on 18. Choudhury SD, Stoerger V, Samal A et al. Automated heritability analysis, and the members of the Genomes 2 Fields vegetative stage phenotyping analysis of maize plants Consortium for sharing both seed and datasets prior to publica- using visible light images. In: KDD: Data Science for Food, tion. CyVerse is supported by the US National Science Founda- Energy and Water workshop; San Francisco; August 2016, tion under award numbers DBI-0735191 and DBI-1265383. https://sites.google.com/site/2016dsfew/workshop-papers. Accessed 13 August, 2016. 19. Liang Z. Github. 2017, https://github.com/shanwai1234/Maize References Phenotype Map, Accessed 24 November, 2017. 1. Grassini P, Eskridge KM, Cassman KG. Distinguishing be- 20. Al-Tamimi N, Brien C, Oakey H et al. Salinity tolerance loci tween yield advances and yield plateaus in historical crop revealed in rice using high-throughput non-invasive pheno- production trends. Nat Commun 2013;4:2918. typing. Nat Commun 2016;7. 2. Hartmann A, Czauderna T, Hoffmann R et al. HTPheno: an 21. Golzarian MR, Frick RA, Rajendran K et al. Accurate infer- image analysis pipeline for high-throughput plant pheno- ence of shoot biomass from high-throughput images of ce- typing. BMC Bioinformatics 2011;12(1):148. real plants. Plant Methods 2011;7(1):2. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018 Phenotyping of exPVP maize 11 22. Honsdorf N, March TJ, Berger B et al. High-throughput phe- 32. Zaman-Allah M, Vergara O, Araus J et al. Unmanned notyping to detect drought tolerance QTL in wild barley in- aerial platform-based multi-spectral imaging for field trogression lines. PLoS One 2014;9(5):e97047. phenotyping of maize. Plant Methods 2015;11(1): 23. Neilson EH, Edwards A, Blomstedt C et al. Utilization of 35. a high-throughput shoot imaging system to examine the 33. Yendrek C, Tomaz T, Montes CM et al. High-throughput phe- dynamic phenotypic responses of a C4 cereal crop plant notyping of maize leaf physiological and biochemical traits to nitrogen and water deficiency over time. J Exp Bot using hyperspectral reflectance. Plant Physiol 2017; 173:614– 2015;66(7):1817–32. 26. 24. Gehan MA, Fahlgren N, Abbasi A et al. PlantCV v2.0: image 34. Smith K, Steven M, Colls J. Use of hyperspectral derivative analysis software for high-throughput plant phenotyping. ratios in the red-edge region to identify plant stress re- PeerJ 2017;5:e4088. sponses to gas leaks. Remote Sens Environ 2004;92(2):207– 25. Holland JB, Nyquist WE, Cervantes-Mart´ınez CT. Estimating 17. and interpreting heritability for plant breeding: an update. 35. Zhao D, Reddy KR, Kakani VG et al. Nitrogen deficiency Plant Breed Rev 2003;22:9–112. effects on plant growth, leaf photosynthesis, and hyper- 26. Van Kooten O, Snel JF. The use of chlorophyll fluorescence spectral reflectance properties of sorghum. Eur J Agron nomenclature in plant stress physiology. Photosynth Res 2005;22(4):391–403. 1990;25(3):147–50. 36. Baranowski P, Jedryczka M, Mazurek W et al. Hyperspec- 27. Fracheboud Y, Haldimann P, Leipner J et al. Chlorophyll flu- tral and thermal imaging of oilseed rape (Brassica napus)re- orescence as a selection tool for cold tolerance of photosyn- sponse to fungal species of the genus Alternaria. PloS One thesis in maize (Zea mays L.). J Exp Bot 1999;50(338):1533– 2015;10(3):e0122913. 1540. 37. Klukas C, Chen D, Pape JM. Integrated analysis platform: an 28. Kalaji HM, Jajoo A, Oukarroum A et al. Chlorophyll a fluo- open-source information system for high-throughput plant rescence as a tool to monitor physiological status of plants Physiol. Plant physiol 2014;165(2):506–18. under abiotic stress conditions. Acta Physiologiae Plantarum 38. Pandey P, Ge Y, Stoerger V et al. High throughput in vivo 2016;38(4):102. analysis of plant leaf chemical properties using hyperspec- 29. Murchie EH, Lawson T. Chlorophyll fluorescence analysis: a tral imaging. Front Plant Sci 2017;8. guide to good practice and understanding some new appli- 39. Liang Z, Schnable JC. Maize Diversity Phenotype Map. Cy- cations. J Exp Bot 2013;64(13):3983–98. Verse Data Commons 2017, doi: 10.7946/P22K7V. 30. Guanter L, Zhang Y, Jung M et al. Global and time-resolved 40. Lawrence-Dill C. Genomes To Fields (2014 growing sea- monitoring of crop photosynthesis with chlorophyll fluores- son data release). CyVerse Data Commons. 2016, doi: cence. Proc Natl Acad Sci 2014;111(14):E1327–33. 10.7946/P2V888. 31. Zarco-Tejada P, Catalina A, Gonzalez ´ M et al. Relationships 41. Liang Z, Pandey P, Stoerger V et al. Supporting data for “Con- between net photosynthesis and steady-state chlorophyll ventional and hyperspectral time-series imaging of maize fluorescence retrieved from airborne hyperspectral imagery. lines widely used in field trials.” GigaScience Database 2017. Remote Sens Environ 2013;136:247–58. http://dx.doi.org/10.5524/100371. Downloaded from https://academic.oup.com/gigascience/article-abstract/7/2/1/4656251 by Ed 'DeepDyve' Gillespie user on 16 March 2018
GigaScience – Oxford University Press
Published: Feb 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud