Quantitative nuclear histomorphometry predicts oncotype DX risk categories for early stage ER+ breast cancer

Quantitative nuclear histomorphometry predicts oncotype DX risk categories for early stage ER+... Background: Gene-expression companion diagnostic tests, such as the Oncotype DX test, assess the risk of early stage Estrogen receptor (ER) positive (+) breast cancers, and guide clinicians in the decision of whether or not to use chemotherapy. However, these tests are typically expensive, time consuming, and tissue-destructive. Methods: In this paper, we evaluate the ability of computer-extracted nuclear morphology features from routine hematoxylin and eosin (H&E) stained images of 178 early stage ER+ breast cancer patients to predict corresponding risk categories derived using the Oncotype DX test. A total of 216 features corresponding to the nuclear shape and architecture categories from each of the pathologic images were extracted and four feature selection schemes: Ranksum, Principal Component Analysis with Variable Importance on Projection (PCA-VIP), Maximum-Relevance, Minimum Redundancy Mutual Information Difference (MRMR MID), and Maximum-Relevance, Minimum Redundancy - Mutual Information Quotient (MRMR MIQ), were employed to identify the most discriminating features. These features were employed to train 4 machine learning classifiers: Random Forest, Neural Network, Support Vector Machine, and Linear Discriminant Analysis, via 3-fold cross validation. Results: The four sets of risk categories, and the top Area Under the receiver operating characteristic Curve (AUC) machine classifier performances were: 1) Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High) (AUC = 0.83), 2) Low ODx vs. High ODx (AUC = 0.72), 3) Low ODx vs. Intermediate and High ODx (AUC = 0.58), and 4) Low and Intermediate ODx vs. High ODx (AUC = 0.65). Trained models were tested independent validation set of 53 cases which comprised of Low and High ODx risk, and demonstrated per-patient accuracies ranging from 75 to 86%. Conclusion: Our results suggest that computerized image analysis of digitized H&E pathology images of early stage ER+ breast cancer mightbeablepredict thecorresponding Oncotype DX risk categories. Background effects including loss of hair, taste, cognitive function, Estrogen Receptor positive (ER+) breast cancers are a and additional extensive medical care [2]. As such, it is common subtype of breast cancer that can frequently be critical to be able to determine the level of recurrence effectively treated using hormonal therapy if deemed to risk to plan treatment effectively so that the toxic side have a low risk of recurrence. However, early stage ER+ effects of chemotherapy can be avoided in low-risk breast cancers that are at high risk of recurrence are patients. typically treated with adjuvant chemotherapy in addition Several methods of assessing tumor risk have been to hormonal therapy. While chemotherapy increases developed, including gene assays such as the Oncotype survival rates by reducing rates of recurrence in these DX (ODx) Recurrence score, that stratify patients based high risk subgroups [1], there may be significant side on their risk of cancer recurrence [3]. The ODx test is a 21 gene assay that is currently employed for separating * Correspondence: Jon.whitney@case.edu breast cancer patients into low and high risk of recurrence Department of Biomedical Engineering, Case Western Reserve University, categories to help a clinician decide whether or not to pre- 2071 Martin Luther King Drive, Cleveland, OH 44106-7207, USA scribe adjuvant chemotherapy for early stage ER+ breast Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Whitney et al. BMC Cancer (2018) 18:610 Page 2 of 15 cancers [4]. The recurrence score is derived from the cells [21]. In addition, there is evidence that stromal cells expression levels of multiple cancer-related genes, and react to tumor growth over time, and stromal phenotype ranges from 0 to 100 [4]. Patients with an ODx score of can reflect a given cancer’s genetic profile [22, 23]. For 17 or below are in the low-risk category, patients with instance in [20], Beck et al. showed the importance of ODx scores between 18 and 30 were considered intermedi- stromal morphology in predicting overall breast cancer ate risk, and scores 31 and above are in the high ODx risk survival. It is therefore useful to consider the behavior category [5]. Unfortunately, Oncotype DX and similar com- of epithelial and stromal cells as distinct groups when panion diagnostic tests (e.g. Mammaprint [6], PAM50 [7]) profiling breast cancer. tend to be expensive and time consuming due to the need In this paper we evaluate the nuclear morphologic features for physical shipping of tissue samples to proprietary testing to distinguish digitized images of H&E sections from early facilities. They are also tissue-destructive, making additional stage ER+ breast cancers into ODx risk categories using evaluation of other biomarkers or genes difficult. supervised machine learning classifiers. ODx risk categories The modified Bloom Richardson (mBR) grading scale is are comprised of three groups to reflect distinctions based based on measuring nuclear grade (variation in nuclear off 5 year survival: low, intermediate, and high risk [5, 24]. shape and size), mitotic count, and tubule density. Each of However, there is both a high degree of correlation between these individual histologic primitives are assigned a score ODx risk categories and mBR grade [8], as well as overlap from 1 to 3 and then added to generate the cumulative between the intermediate and low and intermediate and mBR grade. Mina et al. [8] showed that mBR grade was high risk categories, making accurate separation of inter- also highly correlated the expression of proliferation genes mediate cases from other risk categories difficult [25]. We used in the determination of ODx risk categories, and have therefore selected four categories to distinguish using Flanagan et al. [9] identified a positive correlation between computer extracted nuclear morphology features: 1) Low ODx risk category and nuclear grade when creating a ODx and Low mBR grade vs. High ODx and High mBR predictive model of ODx based off clinical variables. grade (Low-Low vs. High-High) to evaluate whether nuclear Unfortunately, pathologic assessments of tumor grade are morphology features were able to predict risk category known to suffer from inter-observer variability [10]. when both the difficult to classify intermediate cases Quantitative histomorphometry (QH) refers to the use and differences between mBR grade and ODx risk category of computer-aided image analysis of digitized pathology are removed. 2) Low ODx vs. High ODx to evaluate the images to “unlock” more revealing sub-visual attributes predictive ability of the nuclear morphology features when about tumor morphology, which can possibly be corre- difficult to classify intermediate cases are removed. 3) Low lated with disease recurrence independent of other clinical ODx vs. Intermediate and High ODx to evaluate the ability and pathologic features. These features might also poten- of the nuclear morphology features to identify the low tially reveal the underlying biology or molecular phenotype ODx cases specifically. 4) Low and Intermediate ODx of the tumor. For example, Buchelli et al. showed that the vs. High ODx to evaluate the ability of the nuclear number of mitoses identified via a deep learning algorithm morphology features to identify high ODx cases specifically. was predictive of the ODx risk categories [11]. The approach presented in this paper comprises the Nuclear architecture is another image attribute that following main steps (Fig. 1). First, H&E slides of surgical has been implicated in the prediction of overall cancer or biopsy specimens of breast tissue are scanned and digi- grade and cancer aggressiveness [12, 13]. Additionally, tized (Fig. 1.1). Second, nuclear segmentation is performed variations in nuclear shape could reflect genetic instability using deep learning models trained on manual breast [14] and may impact the ability of cancer cells to travel nuclei annotations, followed by watershed separation to through tissue and create metastases that lead to recur- resolve overlapping nuclei (Fig. 1.2). Third, a deep learning rence [15]. A number of recent studies have shown the model was used to separate epithelial from stromal association of QH features of nuclear architecture and regions, helping us identify which nuclei were stromal and morphology with disease progression in oropharyngeal which were epithelial (Fig. 1.3). Fourth, we extracted nuclear cancers [16], cancer recurrence in lung cancers [17], architectural and shape features from the epithelial and biochemical recurrence in prostate cancers [18, 19]and stromal regions separately (Fig. 1.4). Fifth, we perform overall breast cancer survival [20]. feature selection on the resulting features using four There is also evidence that the performance of QH different feature ranking schemes - Ranksum, PCA-VIP, analysis improves when done separately on different cell MRMR MID, and MRMR MIQ. The predictive perform- types [20]. In the context of distinguishing breast cancers ance of these features was evaluated using four different with different degrees of risk, it is likely that these cancers supervised machine learning classifiers - random forest, are characterized by different phenotypical changes in support vector machine (SVM), linear discriminant different cell types. Breast cancers are predominantly analysis (LDA), and a neural network – via a 3-fold carcinomas –cancers which are derived from epithelial cross validation scheme (Fig. 1.5). The classifiers were Whitney et al. BMC Cancer (2018) 18:610 Page 3 of 15 Fig. 1 Illustration of the methodology used to classify whole slide images into ODx risk categories. 1) Image patches are extracted at 40× from regions within whole slides identified by pathologists as containing invasive cancer. 2) Nuclei detection is performed on these image patches and 3) combined with a Deep Learning epithelial/stromal separation model. 4) Nuclear architecture and shape features are extracted from the detected epithelial and stromal nuclei separately. These features are combined with (5) a trained classification model in order predict the ODx risk category for each patch. Classification results from the image patches for each patient are (6) combined in a patch-based-voting method to (7) yield the final risk prediction on a patient level evaluated by their ability to distinguish between the by pathologists at each of the participating institutions. 9 four different classification tasks presented above using cases in which the mBR score and ODx risk category were the area (AUC) under the Receiver Operating Charac- at opposite extremes (4 low mBR and High ODx, and 5 teristic (ROC) curve, which plots the true positive rate high mBR and low ODx) were excluded from this study. against the false positive rate. Finally, classifiers are trained to create per-patch risk category predictions, Nuclei segmentation identifying the optimal threshold of what percentage of We employed the approach described in [26] by Janowczyk positively classified patches should result in a positive et al. for segmenting individual nuclei. Two Deep Learning prediction based on training data, and then applied and (DL) models were employed. The first model identified the evaluated on testing folds to create a final prediction of likelihood that a given pixel was part of a nucleus and the the ODx risk category for each patient (Fig. 1.6, 1.7). second model identified the likelihood that a pixel was part of the epithelium or stroma. Both models were trained Methods using manual segmentations of the tissue primitives of Dataset description interest (i.e. nucleus or stroma or epithelium). DL was exe- Our study comprised of 178 H&E stained whole tissue cuted using Caffe, a popular open-source DL framework slides of ER+ Lymph node negative breast cancer patients [27]. The DL models were trained using 32 × 32 sized (Table 1). These whole slide breast cancer samples dataset image patches on a Titan XGPU running CUDA 7.5, and a was selected to include 1) early stage ER+ breast cancers, 9-layer convolutional neural network framework. 2) surgically resected tissue specimens, and 3) the avail- The nuclear segmentation model was trained on a ability of a corresponding Oncotype DX risk score. These dataset of 141 manually annotated ER+ breast cancer slides were obtained from patients treated between 2004 tissue images, each patch sized 2000 × 2000 pixels and at and 2009 at the Cancer Institute of New Jersey and the 40× magnification. The epithelium/stroma separation University of Pennsylvania, and between 2008 and 2013 at model was trained on a dataset of 236 ER+ breast cancer Case Western Reserve University. Slides were locally digi- tissue image patches, each sized at 1000 × 1000 pixels tized at their originating institutions using Aperio, Leica, and at 10× magnification. Lower magnification in the and Philips scanners. The Modified Bloom-Richardson epithelial/stromal separation model allowed for more con- Grade for each of the pathologic specimens was determined textual information to be included in the image patches Whitney et al. BMC Cancer (2018) 18:610 Page 4 of 15 Table 1 Dataset characteristics – demographic and cancer subtype distribution in each risk category for the cases from the 3 different institutions considered in this study Parameters Oncotype DX Risk Category Low (< 18) Intermediate (> 18, ≤30) High(> 30) No. of Patients (N = 125) 66 (53%) 44 (35%) 15 (12%) Age 20–77 25–70 45–70 Sex Female 66 (53%) 43 (34%) 15 (12%) Male 0 (0%) 1 (1%) 0 (0%) Patient Ethnicity White 33 (26%) 23 (18%) 5 (5%) African American 3 (2%) 2 (2%) 2 (2%) Asian 1 (1%) 2 (2%) 1 (1%) Unknown 22 (18%) 17 (14%) 7(6%) PR Status Positive 64 (51%) 39 (31%) 10 (8%) Negative 2 (2%) 3 (2%) 5 (4%) Unknown 0 (0%) 2 (2%) 0 (0%) HER2 Status Positive 0 (0%) 1 (1%) 0 (0%) Negative 66 (53%) 42 (34%) 15 (12%) Unknown 0 (0%) 1 (1%) 0 (0%) Histologic Tumor Grade Low (4, 5) 10 (8%) 14 (11%) 0 (0%) Moderate (6, 7) 48 (38%) 24 (19%) 4 (3%) High (8, 9) 8 (6%) 6 (5%) 11 (9%) Tumor Type Ductal 53 (42%) 37 (30%) 14 (11%) Ductal With Lobular Features 9 (7%) 3 (2%) 1 (1%) Ductal with Mucinous Features 1 (1%) 2 (2%) 0 (0%) Mixed 3 (2%) 2 (2%) 0 (0%) during model training, improving accuracy and speed. This nuclei as the vertices of the graph. The choice of vertex patch-based approach allowed for multiple identically- connectivity determines the type of nuclear graph (i.e. sized image patches to be used, increasing the size of the Delaunay, Voronoi, MST, CCG) constructed. Features training set. In addition, the patch size was selected to use extracted from the graphs included changes in the lengths the field of view identified as being optimal for extracting of edges and distance between nearest vertices. Cellular nuclear architecture features of the tumor [28]. disorder can be measured using features derived from Cell Orientation Graphs [19]. Shape features included Invariant Feature extraction Moment, Fourier Descriptor, and Length/Width ratios. A A total of 216 nuclear features were extracted from epi- comprehensive enumeration of all the image features thelial and stromal nuclei separately, resulting in a total extracted is presented in the Additional file 1. of 432 features per patch. These features consisted of architecture and shape features. Feature ranking Architectural features were obtained by performing Feature ranking was used to identify the most relevant quantitative analysis of nuclear graphs, such as Delaunay image features for predicting the corresponding ODx Triangles, Voronoi Diagrams, Minimum Spanning Trees risk category. Features were ranked in order of highest (MST), and Cell Cluster Graphs (CCG) [29] (Fig. 2). relevance to the classification problem. The most relevant These nuclear graphs were constructed using the individual features identified were subsequently used in conjunction Whitney et al. BMC Cancer (2018) 18:610 Page 5 of 15 Fig. 2 Nuclear graphs used to calculate features relating to spatial arrangement of nuclei. Left to right: Original images at 1×, 4×, and 40×, Voronoi Diagram, Minimum Spanning Tree, and Cell Cluster Graph, reflecting local nuclear architecture. Comparison between graph appearance for a low ODx example (top) and a high ODx example (bottom) with machine learning classifiers. A number of popular fea- simultaneously present in the training and hold-out ture ranking methods were evaluated including Wilcoxon groups. Two of these groups were used for model training, Ranksum [30], PCA-VIP [31], and Maximum-Relevance while the third group was used to test the trained Minimum-Redundancy (MRMR) [32] with two variants – model. Machine learning classifiers were trained on a per- Mutual Information Difference, and Mutual Information patch basis. This allowed for a simple patch-based voting Quotient (MRMR-MID and MRMR-MIQ) [33]. Each of method, in which the classification of the patient as being these feature ranking methods takes a slightly different in the low or high-risk category was based on if the number approach to identifying the most relevant features, and of class labels predicted for a given class surpassed a patch simultaneously suppressing features that are highly percentage threshold. The optimal threshold was deter- correlated with each other. The Ranksum method identi- mined from the training data in each iteration. This fies feature relevance to classification without explicitly method can also be used to classify individual patches considering the correlation between highly-ranked features spatially in an H&E slide, providing a spatially distributed [30]. PCA-VIP uses a combination criteria of both how assessment of cancer aggression across a given sample each of the principle component vectors relate to the (Fig. 3). outcome to be predicted, and which features most highly contribute to those principle component vectors (effect- Experiments ively measuring to what extent a given feature provides The four experiments were as follows unique information in a dataset) [31]. MRMR-MID and MRMR-MIQ both use maximal relevance criteria which 1) Low ODx and Low mBR grade vs. High ODx and use the mean mutual information values between features High mBR grade (Low-Low vs. High-High). This and the relevant output class, while minimizing the redun- experiment was used to look at the cases reflecting dancy (mutual information between any feature and the the extremes in terms of tumor morphology and other features in the dataset) [32]. ODx risk. While grade and ODx risk scores are correlated for the most part [8], in this experiment we Classifier construction chose to ignore conflicting cases (i.e. cases with a low A total of four different classifiers was tested in conjunction mBR grade but a high ODx score and vice-versa). with each of the four different feature selection methods. 2) Low ODx vs. High ODx. This experiment looks at The classifiers employed included a bagged C4.5 Random cases of high distinction in terms of ODx risk Forest [34], a ten-node four-layer Neural Network [35], a 3 category, but does not exclude cases with kernel Support Vector Machine [36], and a pseudolinear conflicting grade categories. discriminant Linear Discriminant Analysis [37]. Machine 3) Low ODx vs. Intermediate and High ODx. This is learning classifiers were trained using 100 iterations of the hypothesis that is closest to the question a randomly initialized 3-fold cross-validation. 3-fold cross- clinician is interested in answering: identifying cases validation was employed to divide the entire dataset of that are low ODx risk score from all others so that image patches into three equal groups by patient ID, thus low ODx risk patients can avoid aggressive ensuring that patches from each patient were not chemotherapies. Whitney et al. BMC Cancer (2018) 18:610 Page 6 of 15 Fig. 3 Example of the Low-Low vs. High-High random forest classifier using ranksum feature selection applied to patches from whole slide image. Machine classification uses the top ranked epithelial and stromal features. Green squares indicate patches that are predicted to be Low ODx while Blue squares are predicted to be High ODx 4) Low and Intermediate ODx vs. High ODx. This the true positive rate as a function of the false positive rate experiment considers the possibility that high ODx at varying confidence thresholds. The higher the area risk patients are histologically distinct from both under the curve (indicated by the curve extending into the other ODx risk categories. upper left quadrant), the more frequently the classifier is able to correctly identify the class, and the less frequently We also quantitatively assessed the performance of it is to falsely classify a case as positive. For comparison, a each of four different feature ranking methods over stromal diagonal line extending from the bottom left to the upper and epithelial features in conjunction with four different right corner would indicate an AUC of 0.5, which is machine learning classification schemes to determine considered to be the equivalent of guessing. which combination of classification and feature ranking In order to demonstrate the significance of epithelial/ approaches resulted in the highest per-patient patch voting stromal separation, we ran two sets of features using the accuracy for each of the four experiments. Per-patient optimized machine learning classifier and feature ranking patch voting simply means that the classifier was applied algorithm. The two feature sets were: 1) nuclei features to each patch extracted from a patient, thus generating an extracted from all nuclei, 2) nuclei features extracted from ODx risk category prediction for each patch. A simple epithelial and stromal nuclei separately. The utility of majority of the per-patch risk category predictions for each separating epithelial and stromal nuclei prior to feature patient is then used to determine the predicted patient extraction was measured by comparing the AUCs between ODx risk category. The per-patient patch voting accuracy models trained from features with no epithelial/stromal is defined as the percentage of patients whose ODx risk separation, and epithelial stromal separation prior to fea- category was correctly predicted using this method. ture extraction. Feature evaluation via supervised classification Evaluation of models on external validation set For each of the 4 classification experiments described above, In order to fully assess the effectiveness of the models we identified 1) the most highly ranked and predictive generated, the models with the highest performance were epithelial and stromal nuclear morphologic features which used on an external validation set. Models were trained were evaluated via violin plots (Figs. 4), and 2) classification over theentireprimarycohort before being applied without accuracy for the machine learning classifiers in conjunction any retraining to the external validation set. with the top ranked features in the form of AUC. Violin plots illustrate the distribution of normalized Results feature values for the top performing features between The results for the four primary experiments are as follows the two risk categories. Thus, high degrees of separation between the two distributions indicate a high level of 1) Low ODx and Low mBR grade vs. High ODx and discrimination from that feature. AUC curves indicate High mBR grade (Low-Low vs. High-High) (Fig. 5, Whitney et al. BMC Cancer (2018) 18:610 Page 7 of 15 Fig. 4 Feature Distributions for the top ranked epithelial (left) and stromal (right) features using PCA-VIP feature ranking for each experiment. Green lines indicate the mean of each population, and red lines indicate the 25th and 75th percentiles of the distribution. Width of the plot indicates the relative number of data points at each normalized feature value along the y-axis top left). In this experiment, the top ranked epithelial perimeter, area ratios, and invariant moment (Table 3). features were cell cluster graphs, and the top ranked The SVM classifier using the PCA-VIP feature ranking stromal features were shape features related to nuclear scheme yielded the highest classification accuracy with Whitney et al. BMC Cancer (2018) 18:610 Page 8 of 15 Fig. 5 ROC curves for each of the four experiments conducted (panels) and classification methods (lines) using PCA-VIP feature selection. Top left: Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High). Top Right: Low ODx vs. High ODx. Bottom Left: Low ODx vs. Intermediate and High ODx. Bottom Right: Low and Intermediate ODx vs. High ODx. Each panel displays the ROC curve using either (solid) random forest, (dashed) neural network, (dotted) SVM, or (intermediate dash) LDA classification. Feature set includes epithelial and stromal features. AUC values for each curve are displayed in the legend an AUC of 0.83, and a patch voting accuracy of 3) Low ODx vs. Intermediate and High ODx (Fig. 5, 86% (Table 2). AUC results using the same bottom left): The top ranked epithelial features classifier and feature ranking methodology were primarily disorder and number of nearest improved from 0.71 to 0.83 with the inclusion of neighbors features, while the highest ranked stromal features (Table 4). stromal features were primarily metrics regarding 2) Low ODx vs. High ODx (Fig. 5, top right) (Fig. 5, the invariant moment (Table 3). The random forest top right): The top ranked epithelial features were classifier using the PCA-VIP feature ranking scheme the cell cluster graph and disorder of nearest yielded a classification AUC of 0.58, and a patch neighbors features, while the highest ranked voting accuracy of 64% (Table 2). AUC results using stromal features were similar to those identified for the same classifier and feature ranking methodology the low-low vs. high-high discrimination problem, improved from 0.55 to 0.58 with the separation of namely perimeter ratio, area ratio, and invariant epithelial and stromal nuclei (Table 4). moment (Table 3). The SVM classifier using the 4) Low and Intermediate ODx vs. High ODx (Fig. 5, PCA-VIP feature ranking scheme yielded a bottom right):: The top ranked epithelial features classification AUC of 0.72, and a patch voting were metrics concerning the mean and variation in accuracy of 76% (Table 2). AUC results using the edge length associated with cell cluster graphs, same classifier and feature ranking methodology while the highest ranked stromal features were the improved from 0.61 to 0.72 with the separation of invariant moment and standard deviation of the epithelial and stromal nuclei (Table 4). Fourier descriptor (Table 3). The SVM classifier Whitney et al. BMC Cancer (2018) 18:610 Page 9 of 15 Table 2 Classification accuracy metrics for each of the four experiments. From left to right: Low ODx Low mBR vs. High ODx Low mBR, Low ODx vs. High ODx, Low ODx vs. Intermediate and High ODx, and Low and Intermediate ODx vs. High ODx. Data for each experiment includes the AUC, best patch Voting Accuracy results, and the optimal feature ranking and classifier used to achieve the optimized patch voting accuracy results. All experiments conducted with 3-fold cross-validation Experiment LL vs. HH L vs. H L vs. Int. and H L and Int. vs. H Number of Patients 37 75 125 111 AUC 0.81 0.69 0.58 0.6 AUC STDev 0.08 0.05 0.03 0.06 Patch Voting Accuracy 82% 80% 60% 86% Best Feat. Ranking for Patch voting MRMR-MID PCA-VIP Ranksum MRMR-MID Best Classifier for Patch voting LDA Random Forest Random Forest Random Forest and PCA-VIP feature ranking scheme yielded an Of the epithelial features considered, the most discrimin- AUC of 0.65, and a patch voting accuracy of 74% ating features identified across all 4 classification problems (Table 2). AUC results using the same classifier and were those pertaining to epithelial architecture of nuclei feature ranking methodology improved from 0.55 to (Table 3). Of the stromal features, the most significant 0.65 with the separation of epithelial and stromal tended to be those related to measuring changes in the nuclei (Table 4). shape of the stromal nuclei. In each experiment, the Table 3 Top three Epithelial and Stromal features for each of the four experiments: Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High), Low ODx vs. High ODx, Low ODx vs. Intermediate and High ODx, and Low and Intermediate ODx vs. High ODx Experiments Epithelial Features (EP) Low Low vs. High High 1 EP: CCG: Clustering Coefficient E 2 EP: CCG: standard deviation edge length 3 EP: CCG: Clustering Coefficient D Low vs. High 1 EP: CCG: standard deviation edge length 2 EP: CCG: mean edge length 3 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius Low vs. Intermediate and High 1 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius 2 EP: Arch: Disorder of Nearest Neighbors in a 50 Pixel Radius 3 EP: Arch: Avg. Nearest Neighbors in a 40 Pixel Radius Low and Intermediate vs. High 1 EP: CCG: standard deviation edge length 2 EP: CCG: mean edge length 3 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius Stromal Features (ST) Low Low vs. High High 1 ST: Shape: Median Area Ratio 2 ST: Shape: Median Invariant Moment 2 3 ST: Shape: Mean Perimeter Ratio Low vs. High 1 ST: Shape: Mean Perimeter Ratio 2 ST: Shape: Mean Area Ratio 3 ST: Shape: Mean Invariant Moment 2 Low vs. Intermediate and High 1 ST: Shape: Mean Invariant Moment 2 2 ST: Shape: Median Invariant Moment 2 3 ST: Shape: Standard Deviation Invariant Moment 2 Low and Intermediate vs. High 1 ST: Shape: Median Invariant Moment 2 2 ST: Shape: Mean Invariant Moment 2 3 ST: Shape: Standard Deviation Fourier Descriptor 2 Whitney et al. BMC Cancer (2018) 18:610 Page 10 of 15 Table 4 Improvements in classification accuracy based on features extracted from all nuclei together (No Ep/St. Sep.) vs. features extracted from epithelial nuclei and stromal nuclei separately (Ep/St Sep.), ranked via the PCA-VIP feature selection scheme, and used to train an SVM classifier. All AUC scores were generated using 3-fold cross validation Experiment No Ep/St Separation Ep/St Separation AUC Improvement High-High vs. Low-Low 0.71 0.83 0.12 High vs. Low 0.61 0.72 0.11 Low vs. Intermediate and High 0.55 0.58 0.03 Low and Intermediate vs. High 0.55 0.65 0.1 Average 0.61 0.7 0.09 epithelial features were identified to be more significant in In addition, while each of the feature ranking methods separating the different risk categories compared to the had very comparable performance, the PCA-VIP feature stromalnucleifeatures (Fig. 6). The classification AUC for ranking scheme yielded slightly better performance, with the machine learning classifier was highest for the problems a peak AUC of 0.71 using a Support Vector Machine involving the extreme risk or grade categories (i.e. Low-Low (Fig. 6). vs High-High and Low ODx vs High ODx). Unsurprisingly, Comparisons between the classification efficacy with the AUC values were lower when the intermediate risk cat- and without the use of epithelial/stromal separation across egory was also included (i.e. Low ODx vs. Intermediate and the four experiments yielded an average improvement of High ODx and Low and Intermediate ODx vs. High ODx). 0.09 (Table 4). Fig. 6 Determining the optimal feature ranking method - ROC curves for different combinations of feature ranking methods (panels) and classification methods (lines) for separating low from high ODx patches. Top left: Ranksum (Wilcoxon rank sum). Top right: PCA-VIP. Bottom left: MRMR-MID. Bottom right: MRMR-MIQ. Each panel displays the ROC curve using either (solid) random forest, (dashed) neural network, (dotted) SVM, or (intermediate dash) LDA classification. Feature set includes stromal and epithelial features. AUC values for each curve are displayed in the legend Whitney et al. BMC Cancer (2018) 18:610 Page 11 of 15 Validation results Table 6 Validation dataset – Classification accuracy using Ranksum feature ranking and a SVM classifier for each of four We tested the results of the model on an external valid- classification separations ation set. The model was trained using Ranksum feature Ranksum - SVM & Classification Accuracy ranking and a Random forest classifier using 100 iterations of 3-fold cross-validation to determine the top-performing Low-Low vs. High-High 76% features. These features were then trained over the entire Low vs. High 79% training set before being evaluated on the validation set. Low and Intermediate vs. High 85% The validation set was obtained from the University of Low vs. Intermediate and High 84% Pennsylvania and contained 53 cases comprised of Low and High ODx risk cases of primarily Low and High mBR grade (Table 5). As described previously, the accuracy of stage ER+ breast cancer histology samples into different each model was determined using per-patient patch Oncotype DX determined risk categories. Nuclear feature voting, where pathologist selected ROIs were divided into extraction was accomplished by 1) obtaining nuclear sub-ROI patches, and each patch was then classified as segmentations with a deep learning algorithm, 2) using belonging to either low or high risk using each of the four deep learning epithelial/stromal separation of nuclei, and models. The classification of the patient into high or low 3) extracting nuclei shape and architectural features from risk was determined by the percentage of sample patches those segmentations. Those features were then given to a predicted to belong to either category. Because it is series of machine based classifiers and feature ranking possible that the optimal percentage threshold for distin- methods using 3-fold cross-validation to test the effective- guishing between high and low risk may not be a simple ness of each machine based classifier. These features were majority, the ideal percentage of patches that were need to then employed in the context of discriminating the following be identified as low for the patient to be categorized as 4 different grade-ODx risk categories: 1) Low ODx and Low low ODx risk was determined from the training set. Per- mBR grade vs. High ODx and High mBR grade (Low-Low patient accuracies ranged between 76 and 85% across all vs.High-High). 2) LowODx vs.HighODx.3) Low ODxvs. hypotheses evaluated. Improvements in classification ac- Intermediate and High ODx. 4) Low and Intermediate ODx curacy of low vs. high over low-low vs high-high may be vs. High ODx. explained by the fact that the validation set was composed We found that the best classifier accuracy (AUC = 0.83) exclusively of low and high ODx samples. In addition, the was obtained for the Low-Low vs. High-High classification larger number of samples which were low ODx as com- problem. Since the ODx risk category is strongly correlated pared to high ODx samples may explain why the model with tumor grade [9], by choosing to leave out conflicting trained to distinguish between low and intermediate vs cases (i.e. where the grade and ODx risk categories are not high had slightly improved performance over the model aligned), the Low-Low vs High-High categories represent trained to distinguish between low vs. intermediate and the extreme risk cases. The next highest accuracy was high. It may also reflect the fact that the low and inter- obtained for the Low ODx vs. High ODx categories, where mediate risk patients are more alike from a histomorpho- all intermediate risk cases were left out. The best classifier metric perspective compared to the intermediate and high AUC obtained in this experiment (AUC = 0.72) was lower risk patients. The accuracies were highest using models compared to the AUC obtained for the Low-Low vs High- trained to distinguish between Low vs. High and Low vs High problem, possibly due to presence of 64 cases (55 (Intermediate and High ODx) cases (Table 6). Intermediate mBR and Low ODx, and 9 Intermediate mBR High ODx) where the grade and ODx risk categories Discussion did not align. This most likely adversely affected the train- In this work, we evaluated the effectiveness of computer- ing and the evaluation of the machine learning classifiers. extracted measurements of size, shape, and architectural When evaluating the classifiers in distinguishing the Low features of epithelial and stromal nuclei in separating early vs. Intermediate and High and the Low and Intermediate vs. High ODx risk categories, the Low and Intermediate vs. High ODx distinction had slightly improved performance Table 5 Validation dataset characteristics – ODx and grade as compared to distinguishing Low vs. Intermediate and distribution High ODx risk categories. This may be due to the fact that Validation Set (N = 53) the intermediate cases identified by ODx were primarily mBR Tumor Grade\ODx Low (< 18) Intermediate High (> 30) low risk cases [38]. Category (> 18, leq30) Classifier models trained on Low vs. High and the Low Low (4, 5) 40 (75%) 0 (0%) 0 (0%) with Intermediate vs. High ODx cases yielded the highest Moderate (6, 7) 0 (0%) 0 (0%) 1(2%) classification accuracy on the validation set. These results High (8, 9) 0(0%) 0 (0%) 12 (23%) appear to suggest that histomorphometrically the low Whitney et al. BMC Cancer (2018) 18:610 Page 12 of 15 ODx and intermediate ODx appeared more similar and pathologist grading information, such as the Magee compared to the high ODx cases. Clearly this will need Equation [9]. Using these methods, low grade and low ER to be validated in additional, larger independent validation and PR (≤150) can be correctly categorized as being low studies, but if confirmed might suggest that a number of ODx 89% of the time; and when ignoring intermediate the patients currently classified as intermediate risk by ODx cases, low and high ODx samples can be correctly Oncotype DX might actually be low risk and should be identified with concordance rates between 96.9 and 100% classified as such. [25, 49]. However, these methods have between 54.3 and Tumor grade is determined by tubule formation, nuclear 59.4% concordance when considering intermediate cases pleomorphism, and mitotic count [39]. These same as well as low and high, and require pathologist-generated features are found to strongly correlate breast cancer data [25]. When considering the intermediate risk categor- outcome [40]. The state of tubule formation is reflected ies, our classification AUC ranged from 0.58 and 0.6 which in features such as the ratio of tubule nuclei to total appears to be in alignment with the findings in [25]. nuclei [41]. The architecture of tubule formation is also Several different groups have previously explored the use reflected in features used in the presented work, such of QH for predicting ODx risk categories. For example, as Cell Cluster Graphs [29], Cell Orientation Entropy Basavanhally et al. was able to separate high from low [19], and Disorder of Nearest Neighbors [19]. Nuclear grade breast cancer patients, with top performing architec- pleomorphism may be reflected in features such as the tural features such as Delaunay Triangle metrics, nuclei Mean Invariant Moment [42], and Area Ratio [43]. density, and Voronoi Diagram architectural information Thus, the features used in this work are implicitly [12]. Romo-Bucheli et al. was able to separate high-high reflective of the histomorphometric measurements used from low-low cases with an AUC of 0.76 using a single by pathologists to assess grade and breast cancer outcome. feature: the ratio of tubule nuclei to non-tubule nuclei [41]. However, the method presented can also identify complex This approach used Deep Learning to identify biologically and sub-visual (i.e. information which is present, but not relevant structures (separating tubule nuclei from non- easily discernable by a human, such as higher-order nuclei tubule nuclei), while the presented approach used a much architectural characteristics, or difficult to recognize chro- larger number of nuclei-specific features for classification matin patterns [44, 45]) relationships between quantitative purposes. features and ODx categories that are difficult for patholo- While related to these previous approaches [12], our gists to visually identify. The Oncotype gene expression focus was on quantitatively evaluating the role of test aims to capture changes in genetic expression in computer extracted features of nuclear morphology in genes that have been tied with specific cancer-related the stroma and epithelium with the Oncotype Dx risk traits [46]. For example, Ki-67, STK15, Survivin, Cyclin categories. Additionally, unlike previous related studies B1, and MYBL2 have all been associated with breast [13] our study looked at the most discriminating features cancer proliferation; Stromelysin 3 and Cathepsin L2 have to distinguish not just the extreme risk categories (low been associated with invasion; and ER, PR, Bcl2, and vs. high) but also looked at the ability of computer SCUBE2 have been associated with responsiveness to extracted nuclear morphologic features to distinguish Estrogen [47]. Variations in these genes could potentially the intermediate risk categories from the low and high lead to changes in visual presentation of the cancer, and risk categories. thus affect the features previously described. For example, We do however acknowledge the several limitations of increases in Ki-67 activity resulting in increased unregu- this work. Firstly, the validation set used only included lated cell proliferation may increase the density of cell high and low ODx cases, without any intermediate cases. nuclei, resulting in an increase in the Disorder of Nearest Secondly, the focus of this work was on finding features Neighbors, or decreased distance between nuclei in Cell that were associated with ODx risk categories and not Cluster Graphs. Tumor invasion resulting from activation patient outcome. Oncotype DX is a companion diagnostic of Stromelysin 3 could result in either a loss of tissue test, and while the risk categories have been validated differentiation, or the presence of large epithelial nuclei against outcome, it is not perfectly correlated [50]. Unfor- invading into the surrounding stroma [48]. These types of tunately, long-term disease recurrence or patient outcome phenotypic changes might be captured by architectural information was not available for the cases considered in features, or size and shape variation amongst stromal nuclei this study. We also did not conduct a detailed study of the features. For example, variation in stromal nuclei shape influence of staining and scanning variations on the could also be related to the connection between spindle-cell features identified as predictive and the influence of and round stromal nuclei contact and breast cancer patient these parameters on the subsequent classification results. survival discovered by Beck et al. [20]. Finally, we focused solely on the role of nuclear morph- Previous groups have been able to duplicate ODx ology in this work, there are clearly other features that are results using equations drawing from genetic expression known to have a prognostic role in early stage ER+ breast Whitney et al. BMC Cancer (2018) 18:610 Page 13 of 15 cancers, features relating to number and distribution of National Center for Research Resources under award number 1 C06 RR12463–01. tumor infiltrating lymphocytes, mitoses [11], and tubules The DOD Prostate Cancer Synergistic Idea Development Award (PC120857); [41]. These features have shown to be independently The DOD Lung Cancer Idea Development New Investigator Award useful in determining ODx risk categories in ER+ breast (LC130463), The DOD Prostate Cancer Idea Development Award; cancer, and would likely improve the classification results The DOD Peer Reviewed Cancer Research Program W81XWH-16-1-0329. when combined with the nuclear histomorphometric The Ohio Third Frontier Technology Validation Fund. features presented in this work. Another potential future The Hartwell Foundation. the Wallace H. Coulter Foundation Program in the Department of Biomedical avenue is the integration of histomorphometric approaches Engineering and the Clinical and Translational Science Award Program such as this with genomic based tests to determine if the (CTSA) at Case Western Reserve University. integration of morphologic and molecular measurements The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. enables more accurate risk assessment, especially for the patients currently identified as intermediate risk. We hope Availability of data and materials to address these limitations in future work. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Conclusions Authors’ contributions In this work we evaluated the role of computer extracted All authors have read and approved the manuscript. JW, GC, and AJ were features relating to spatial architecture and shape within responsible for experiments run. Manuscript was written primarily by JW and AM, with support from all authors. SG, SD, JT, MF, and HG were responsible the epithelium and stroma and showed that these features for defining the clinical problem, reviewing and annotating imaging data, could distinguish early stage ER+ breast cancers into and providing biological interpretation of the findings. Validation dataset different ODx risk categories. Our results suggest that provided by MF. with additional validation, these features could be used Ethics approval and consent to participate to create an inexpensive, rapid, and nondestructive pre- The study was HIPAA compliant and was approved by the Institutional Review dictor of low and high ODx risk categories for early stage Board at the University Hospitals Case Medical Center. The informed consent ER+ breast cancer based off digitized images of H&E slides was waived by the institutional review board for this retrospective study. alone. Competing interests Dr. Madabhushi is an equity holder in Elucid Bioimaging and in Inspirata Inc. He is also a scientific advisory consultant for Inspirata Inc. In addition, he Additional file currently serves as a scientific advisory board member for Inspirata Inc. and for Astrazeneca. He also has sponsored research agreements with Philips and Additional file 1: Table S7. Features tested for significance, and Inspirata Inc. His technology has been licensed to Elucid Bioimaging and considered for use in final analysis. Comprehensive list of features Inspirata Inc. He is also involved in a NIH U24 grant with PathCore Inc. and a investigated for classification utility. Each feature was used to analyze R01 with Inspirata Inc. Drs John Tomaszewski. Michael Feldman and Shridar epithelial and stromal nuclei separately. (XLSX 15 kb) Ganesan are members of the scientific advisory board of Inspirata, Inc. a digital pathology start-up company, and receives board fees and stock options. The authors declare that they have no competing interests. Abbreviations CCG: Cell Cluster Graph; ER +: Estrogen Receptor Positive; H&E: Hematoxylin and eosin; LDA: Linear Discriminant Analysis; mBR: Modified Bloom-Richardson; Publisher’sNote MRMR MID: Maximum Relevance, Minimum Redundancy, Mutual Information Springer Nature remains neutral with regard to jurisdictional claims in Difference; MRMR MIQ: Maximum Relevance, Minimum Redundancy, Mutual published maps and institutional affiliations. Information Quotient; MST: Minimum Spanning Trees; ODx: Oncotype Dx; PCA-VIP: Primary Component Analysis – Variable Importance; QH: Quantitative Author details Histomorphometry; ROC: Region Under the Curve; SVM: Support Vector Department of Biomedical Engineering, Case Western Reserve University, Machine 2071 Martin Luther King Drive, Cleveland, OH 44106-7207, USA. Universidad Nacional de Colombia, Bogotá D.C, Colombia. Department of Medicine, Acknowledgements Division of Medical Oncology, Rutgers Robert Wood Johnson Medical NVIDIA -a Titan X GPU, Gift of Titan X GPU to support research. Special thanks School, Rutgers Cancer Institute of New Jersey, 195 Little Albany Street, New to Natalie Shih for helping procure validation data in a timely manner. Brunswick, NJ 08903, USA. SUNY at the University at Buffalo, 3435 Main Street, Buffalo, NY, USA. Department of Pathology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA. Funding Department of Pathology, University Hospitals, Cleveland Medical Center The following funding bodies provided funding for the data collection, and Case Western Reserve University, Cleveland, OH 44106, USA. digitization, annotation and the computational and statistical analysis, as also in the writing of the manuscript. Received: 27 October 2017 Accepted: 26 April 2018 Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under award numbers. 1U24CA199374–01, R01CA202752-01A1. References R01CA208236-01A1. 1. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Effects of R21CA179327–01; chemotherapy and hormonal therapy for early breast cancer on recurrence R21CA195152–01. and 15-year survival: an overview of the randomised trials. Lancet Lond The National Institute of Diabetes and Digestive and Kidney Diseases under Engl. 2005;365(9472):1687–717. https://doi.org/10.1016/S0140- award number R01DK098503–02, 6736(05)66544-0. PMID: 15894097 Whitney et al. BMC Cancer (2018) 18:610 Page 14 of 15 2. Brezden CB, Phillips K-A, Abdolell M, Bunston T, Tannock IF. Cognitive 17. Yu K-H, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, Snyder M. Predicting function in breast cancer patients receiving adjuvant chemotherapy. J Clin non-small cell lung cancer prognosis by fully automatede microscopic Oncol. 2000;18(14):2695–701. pathology image features. Nat Commun. 2016;7:12474. https://doi.org/10. 3. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, 1038/ncomms12474. Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N. A 18. Lee G, Veltri RW, Zhu G, Ali S, Epstein JI, Madabhushi A. Nuclear shape and multigene assay to predict recurrence of tamoxifen-treated, node-negative architecture in benign fields predict biochemical recurrence in prostate breast cancer. N Engl J Med. 2004;351(27):2817–26. https://doi.org/10.1056/ Cancer patients following radical prostatectomy: preliminary findings. Eur NEJMoa041588. PMID: 15591335 Urol Focus. 2016; https://doi.org/10.1016/j.euf.2016.05.009. 4. Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, Baehner FL, Watson 19. Lee G, Ali S, Veltri R, Epstein JI, Christudass C, Madabhushi A. Cell orientation D, Bryant J, Costantino JP, Geyer CE, Wickerham DL, Wolmark N. Gene entropy (COrE): predicting biochemical recurrence from prostate cancer expression and benefit of chemotherapy in women with node-negative, tissue microarrays. Med Image Comput Comput-Assist Interv MICCAI Int estrogen receptor-positive breast Cancer. J Clin Oncol. 2006;24(23):3726–34. Conf Med Image Comput Comput-Assist Interv. 2013;16(Pt 3):396–403. https://doi.org/10.1200/JCO.2005.04.7985. PMID: 24505786 5. Sparano JA, Gray RJ, Makower DF, Pritchard KI, Albain KS, Hayes DF, Geyer 20. Beck AH, Sangoi AR, Leung S, Marinelli RJ, Nielsen TO, van de Vijver MJ, CE, Dees EC, Perez EA, Olson JA, Zujewski J, Lively T, Badve SS, Saphner TJ, West RB, van de Rijn M, Koller D. Systematic analysis of breast cancer Wagner LI, Whelan TJ, Ellis MJ, Paik S, Wood WC, Ravdin P, Keane MM, morphology uncovers stromal features associated with survival. Sci Transl Gomez Moreno HL, Reddy PS, Goggins TF, Mayer IA, Brufsky AM, Med. 2011;3(108):108ra113. https://doi.org/10.1126/scitranslmed.3002564. Toppmeyer DL, Kaklamani VG, Atkins JN, Berenberg JL, Sledge GW. PMID: 22072638 Prospective validation of a 21-gene expression assay in breast Cancer. N 21. American Cancer Society. Types of breast Cancer. [cited 2016 Aug 16]. Engl J Med. 2015;373(21):2005–14. https://doi.org/10.1056/NEJMoa1510764. Available from: http://www.cancer.org/cancer/breastcancer/detailedguide/ 6. Wittner BS, Sgroi DC, Ryan PD, Bruinsma TJ, Glas AM, Male A, Dahiya S, breast-cancer-breast-cancer-types Habin K, Bernards R, Haber DA, Van’t Veer LJ, Ramaswamy S. Analysis of the 22. Bhowmick NA, Neilson EG, Moses HL. Stromal fibroblasts in cancer initiation MammaPrint breast Cancer assay in a predominantly postmenopausal and progression. Nature. 2004;432(7015):332–7. https://doi.org/10.1038/ cohort. Clin Cancer Res. 2008;14(10):2988–93. https://doi.org/10.1158/1078- nature03096. 0432.CCR-07-4723. 23. Van den Eynden GG, Colpaert CG, Couvelard A, Pezzella F, Dirix LY, 7. Nielsen TO, Parker JS, Leung S, Voduc D, Ebbert M, Vickery T, Davies SR, Vermeulen PB, Van Marck EA, Hasebe T. A fibrotic focus is a prognostic Snider J, Stijleman IJ, Reed J, Cheang MCU, Mardis ER, Perou CM, Bernard factor and a surrogate marker for hypoxia and (lymph)angiogenesis in PS, Ellis MJ. A comparison of PAM50 intrinsic subtyping with breast cancer: review of the literature and proposal on the criteria of immunohistochemistry and clinical prognostic factors in tamoxifen-treated evaluation. Histopathology. 2007;51(4):440–51. https://doi.org/10.1111/j. estrogen receptor-positive breast Cancer. Clin Cancer Res. 2010;16(21):5222– 1365-2559.2007.02761.x. PMID: 17593207 32. https://doi.org/10.1158/1078-0432.CCR-10-1282. 24. Henson DE, Ries L, Freedman LS, Carriaga M. Relationship among outcome, 8. Mina L, Soule SE, Badve S, Baehner FL, Baker J, Cronin M, Watson D, Liu M-L, stage of disease, and histologic grade for 22,616 cases of breast cancer. The Sledge GW, Shak S, Miller KD. Predicting response to primary basis for a prognostic index. Cancer. 1991;68(10):2142–9. https://doi.org/10. chemotherapy: gene expression profiling of paraffin-embedded core biopsy 1002/1097-0142(19911115)68:10<2142::AID-CNCR2820681010>3.0.CO;2-D. tissue. Breast Cancer Res Treat. 2007;103(2):197–208. https://doi.org/10.1007/ 25. Klein ME, Dabbs DJ, Shuai Y, Brufsky AM, Jankowitz R, Puhalla SL, Bhargava R. s10549-006-9366-x. Prediction of the Oncotype DX recurrence score: use of pathology-generated 9. Flanagan MB, Dabbs DJ, Brufsky AM, Beriwal S, Bhargava R. Histopathologic equations derived by linear regression analysis. Mod Pathol. 2013;26(5):658–64. variables predict Oncotype DX recurrence score. Mod Pathol Off J U S Can https://doi.org/10.1038/modpathol.2013.36. Acad Pathol Inc. 2008;21(10):1255–61. https://doi.org/10.1038/modpathol. 26. Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: 2008.54. PMID: 18360352 A comprehensive tutorial with selected use case. J Pathol Inform. 2016. 10. Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. 27. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Interobserver reproducibility of Gleason grading of prostatic carcinoma: Darrell T. Caffe: convolutional architecture for fast feature embedding. ACM general pathologist. Hum Pathol. 2001;32(1):81–8. https://doi.org/10.1053/ Press; 2014 [cited 2016 Aug 4]. p. 675–678. Available from: http://dl.acm.org/ hupa.2001.21135. PMID: 11172299 citation.cfm?doid=2647868.2654889. https://doi.org/10.1145/2647868. 11. Romo-Bucheli D, Janowczyk A, Gilmore H, Romero E, Madabhushi A. A deep 2654889. learning based strategy for identifying and associating mitotic activity with 28. Basavanhally A, Ganesan S, Shih N, Mies C, Feldman M, Tomaszewski J, gene expression derived risk categories in estrogen receptor positive breast Madabhushi A. A boosted classifier for integrating multiple fields of view: cancers. Cytom Part J Int Soc Anal Cytol. 2017; https://doi.org/10.1002/cyto. breast cancer grading in histopathology: IEEE; 2011 [cited 2016 Aug 1]. p. a.23065. PMID: 28192639 125–128. Available from: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. 12. Basavanhally A, Ganesan S, Feldman M, Shih N, Mies C, Tomaszewski J, htm?arnumber=5872370. https://doi.org/10.1109/ISBI.2011.5872370. Madabhushi A. Multi-field-of-view framework for distinguishing tumor grade in 29. Ali S, Veltri R, Epstein JA, Christudass C, Madabhushi A. Gurcan MN, ER+ breast cancer from entire histopathology slides. IEEE Trans Biomed Eng. Madabhushi A, editors. Cell cluster graph for prediction of biochemical 2013;60(8):2089–99. https://doi.org/10.1109/TBME.2013.2245129. PMID: 23392336 recurrence in prostate cancer patients from tissue microarrays; 2013 [cited 13. Basavanhally A, Feldman M, Shih N, Mies C, Tomaszewski J, Ganesan S, 2016 Mar 18]. p. 86760H. Available from: http://proceedings. Madabhushi A. Multi-field-of-view strategy for image-based outcome spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12.2008695. https://doi. prediction of multi-parametric estrogen receptor-positive breast cancer org/10.1117/12.2008695. histopathology: comparison to Oncotype DX. J Pathol Inform. 2011;2:S1. 30. Devore J. Probability and statistics for engineering and the sciences: https://doi.org/10.4103/2153-3539.92027. PMID: 22811953 PMCID: Cengage Learning; 2015. PMC3312707 31. Ginsburg SB, Viswanath SE, Bloch BN, Rofsky NM, Genega EM, Lenkinski 14. Gisselsson D, Björk J, Höglund M, Mertens F, Dal Cin P, Åkerman M, RE, Madabhushi A. Novel PCA-VIP scheme for ranking MRI protocols Mandahl N. Abnormal nuclear shape in solid tumors reflects mitotic and identifying computer-extracted MRI measurements associated with instability. Am J Pathol. 2001 Jan;158(1):199–206. https://doi.org/10.1016/ central gland and peripheral zone prostate tumors. J Magn Reson S0002-9440(10)63958-2. Imaging JMRI. 2015;41(5):1383–1393. https://doi.org/10.1002/jmri.24676. PMID: 24943647. 15. Trepat X, Wasserman MR, Angelini TE, Millet E, Weitz DA, Butler JP, Fredberg JJ. Physical forces during collective cell migration. Nat Phys. 2009;5(6):426–30. 32. Peng H, Long F, Ding C. Feature selection based on mutual information: https://doi.org/10.1038/nphys1269. criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–1238. https://doi.org/10.1109/ 16. Lewis JS, Ali S, Luo J, Thorstad WL, Madabhushi A. A quantitative TPAMI.2005.159 PMID: 16119262. histomorphometric classifier (QuHbIC) identifies aggressive versus indolent p16-positive oropharyngeal squamous cell carcinoma. Am J Surg Pathol. 33. Ding C, Peng H. Minimum redundancy feature selection from microarray 2014;38(1):128–37. https://doi.org/10.1097/PAS.0000000000000086. PMID: gene expression data. J Bioinforma Comput Biol. 2005;03(02):185–205. 24145650 PMCID: PMC3865861 https://doi.org/10.1142/S0219720005001004. Whitney et al. BMC Cancer (2018) 18:610 Page 15 of 15 34. Strobl C, Malley J, Tutz G. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 2009 ;14(4):323–348. https:// doi.org/10.1037/a0016973 PMID: 19968396 PMCID: PMC2927982. 35. Demuth H, Beale M. Neural network toolbox for use with Matlab - User’s guide version. 1993. 36. Pelchmans K, Suykens J, Gestel T, Brabanter J, Lukaas L, Hamers B, Moor B, Vandewalle J. LS-SVMlab: a matlab/c toolbox for least squares support vector machines. 2002. 37. Izenman AJ. Linear discriminant analysis. In: Izenman AJ, editor. Mod Multivar stat tech Regres Classif manifold learn. New York, NY: Springer New York; 2008. p. 237–80. Available from: https://doi.org/10.1007/978-0-387- 78189-1_8. 38. JMS B, Bayani J, Marshall A, Dunn JA, Campbell A, Cunningham C, Sobol MS, Hall PS, Poole CJ, Cameron DA, Earl HM, Rea DW, Macpherson IR, Canney P, Francis A, McCabe C, Pinder SE, Hughes-Davies L, Makris A, Stein RC, on behalf of the OPTIMA TMG. Comparing breast Cancer multiparameter tests in the OPTIMA prelim trial: no test is more equal than the others. J Natl Cancer Inst. 2016;108(9):djw050. https://doi.org/10.1093/ jnci/djw050. 39. Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology. 1991 Nov;19(5):403–10. https:// doi.org/10.1111/j.1365-2559.1991.tb00229.x. 40. Bloom H, Richardson W. Histological grading and prognosis in breast Cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer. 1957:359–77. 41. Romo-Bucheli D, Janowczyk A, Romero E, Gilmore H, Madabhushi A. Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER+ breast cancer whole slide images. In: Gurcan MN, Madabhushi A, editors. 2016 [cited 2016 Aug 3]. p. 979106. Available from: http://proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12. 2211368 doi:https://doi.org/10.1117/12.2211368 42. Ongun G, Halici U, Leblebicioglu K, Atalay V, Beksac M, Beksac S. Feature extraction and classification of blood cells for an automated differential blood count system: IEEE; 2001 [cited 2017 Jan 10]. p. 2461–2466. Available from: http://ieeexplore.ieee.org/document/938753/. https://doi.org/10.1109/ IJCNN.2001.938753. 43. Liotta LA, Kleinerman J, Saidel GM. Quantitative relationships of intravascular tumor cells, tumor vessels, and pulmonary metastases following tumor implantation. Cancer Res. 1974 May 1;34(5):997. 44. Madabhushi A. Computerized histologic image based risk predictor (CHIRP): identifying disease aggressiveness using sub-visual image cues from image data. Microsc Microanal. 2016 Jul;22(S3):1006–7. https://doi.org/10.1017/ S1431927616005870. 45. Guillaud M, Adler-Storthz K, Malpica A, Staerkel G, Matisic J, Van Niekirk D, Cox D, Poulin N, Follen M, Macaulay C. Subvisual chromatin changes in cervical epithelium measured by texture image analysis and correlated with HPV. Gynecol Oncol. 2005;99(3 Suppl 1):S16–S23. doi:https://doi.org/10.1016/ j.ygyno.2005.07.037 PMID: 16188299. 46. Cronin M, Sangli C, Liu M-L, Pho M, Dutta D, Nguyen A, Jeong J, Wu J, Langone KC, Watson D. Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor–positive breast Cancer. Clin Chem. 2007;53(6):1084. https://doi.org/10.1373/clinchem.2006.076497. 47. Sparano JA, Paik S. Development of the 21-gene assay and its application in clinical practice and clinical trials. J Clin Oncol. 2008 Feb 10;26(5):721–8. https://doi.org/10.1200/JCO.2007.15.1068. 48. Muller D, Wolf C, Abecassis J, Millon R, Engelmann A, Bronner G, Rouyer N, Rio M-C, Eber M, Methlin G. Increased stromelysin 3 gene expression is associated with increased local invasiveness in head and neck squamous cell carcinomas. Cancer Res. 1993;53:165–9. 49. Turner BM, Skinner KA, Tang P, Jackson MC, Soukiazian N, Shayne M, Huston A, Ling M, Hicks DG. Use of modified Magee equations and histologic criteria to predict the Oncotype DX recurrence score. Mod Pathol. 2015 Jul; 28(7):921–31. https://doi.org/10.1038/modpathol.2015.50. 50. Győrffy B, Karn T, Sztupinszki Z, Weltz B, Müller V, Pusztai L. Dynamic classification using case-specific training cohorts outperforms static gene expression signatures in breast cancer. Int J Cancer. 2015;136(9):2091–2098. https://doi.org/10.1002/ijc.29247 PMID: 25274406 PMCID: PMC4354298. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Cancer Springer Journals

Quantitative nuclear histomorphometry predicts oncotype DX risk categories for early stage ER+ breast cancer

Free
15 pages

Loading next page...
 
/lp/springer_journal/quantitative-nuclear-histomorphometry-predicts-oncotype-dx-risk-66GjxW3HGc
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s).
Subject
Biomedicine; Cancer Research; Oncology; Surgical Oncology; Health Promotion and Disease Prevention; Biomedicine, general; Medicine/Public Health, general
eISSN
1471-2407
D.O.I.
10.1186/s12885-018-4448-9
Publisher site
See Article on Publisher Site

Abstract

Background: Gene-expression companion diagnostic tests, such as the Oncotype DX test, assess the risk of early stage Estrogen receptor (ER) positive (+) breast cancers, and guide clinicians in the decision of whether or not to use chemotherapy. However, these tests are typically expensive, time consuming, and tissue-destructive. Methods: In this paper, we evaluate the ability of computer-extracted nuclear morphology features from routine hematoxylin and eosin (H&E) stained images of 178 early stage ER+ breast cancer patients to predict corresponding risk categories derived using the Oncotype DX test. A total of 216 features corresponding to the nuclear shape and architecture categories from each of the pathologic images were extracted and four feature selection schemes: Ranksum, Principal Component Analysis with Variable Importance on Projection (PCA-VIP), Maximum-Relevance, Minimum Redundancy Mutual Information Difference (MRMR MID), and Maximum-Relevance, Minimum Redundancy - Mutual Information Quotient (MRMR MIQ), were employed to identify the most discriminating features. These features were employed to train 4 machine learning classifiers: Random Forest, Neural Network, Support Vector Machine, and Linear Discriminant Analysis, via 3-fold cross validation. Results: The four sets of risk categories, and the top Area Under the receiver operating characteristic Curve (AUC) machine classifier performances were: 1) Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High) (AUC = 0.83), 2) Low ODx vs. High ODx (AUC = 0.72), 3) Low ODx vs. Intermediate and High ODx (AUC = 0.58), and 4) Low and Intermediate ODx vs. High ODx (AUC = 0.65). Trained models were tested independent validation set of 53 cases which comprised of Low and High ODx risk, and demonstrated per-patient accuracies ranging from 75 to 86%. Conclusion: Our results suggest that computerized image analysis of digitized H&E pathology images of early stage ER+ breast cancer mightbeablepredict thecorresponding Oncotype DX risk categories. Background effects including loss of hair, taste, cognitive function, Estrogen Receptor positive (ER+) breast cancers are a and additional extensive medical care [2]. As such, it is common subtype of breast cancer that can frequently be critical to be able to determine the level of recurrence effectively treated using hormonal therapy if deemed to risk to plan treatment effectively so that the toxic side have a low risk of recurrence. However, early stage ER+ effects of chemotherapy can be avoided in low-risk breast cancers that are at high risk of recurrence are patients. typically treated with adjuvant chemotherapy in addition Several methods of assessing tumor risk have been to hormonal therapy. While chemotherapy increases developed, including gene assays such as the Oncotype survival rates by reducing rates of recurrence in these DX (ODx) Recurrence score, that stratify patients based high risk subgroups [1], there may be significant side on their risk of cancer recurrence [3]. The ODx test is a 21 gene assay that is currently employed for separating * Correspondence: Jon.whitney@case.edu breast cancer patients into low and high risk of recurrence Department of Biomedical Engineering, Case Western Reserve University, categories to help a clinician decide whether or not to pre- 2071 Martin Luther King Drive, Cleveland, OH 44106-7207, USA scribe adjuvant chemotherapy for early stage ER+ breast Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Whitney et al. BMC Cancer (2018) 18:610 Page 2 of 15 cancers [4]. The recurrence score is derived from the cells [21]. In addition, there is evidence that stromal cells expression levels of multiple cancer-related genes, and react to tumor growth over time, and stromal phenotype ranges from 0 to 100 [4]. Patients with an ODx score of can reflect a given cancer’s genetic profile [22, 23]. For 17 or below are in the low-risk category, patients with instance in [20], Beck et al. showed the importance of ODx scores between 18 and 30 were considered intermedi- stromal morphology in predicting overall breast cancer ate risk, and scores 31 and above are in the high ODx risk survival. It is therefore useful to consider the behavior category [5]. Unfortunately, Oncotype DX and similar com- of epithelial and stromal cells as distinct groups when panion diagnostic tests (e.g. Mammaprint [6], PAM50 [7]) profiling breast cancer. tend to be expensive and time consuming due to the need In this paper we evaluate the nuclear morphologic features for physical shipping of tissue samples to proprietary testing to distinguish digitized images of H&E sections from early facilities. They are also tissue-destructive, making additional stage ER+ breast cancers into ODx risk categories using evaluation of other biomarkers or genes difficult. supervised machine learning classifiers. ODx risk categories The modified Bloom Richardson (mBR) grading scale is are comprised of three groups to reflect distinctions based based on measuring nuclear grade (variation in nuclear off 5 year survival: low, intermediate, and high risk [5, 24]. shape and size), mitotic count, and tubule density. Each of However, there is both a high degree of correlation between these individual histologic primitives are assigned a score ODx risk categories and mBR grade [8], as well as overlap from 1 to 3 and then added to generate the cumulative between the intermediate and low and intermediate and mBR grade. Mina et al. [8] showed that mBR grade was high risk categories, making accurate separation of inter- also highly correlated the expression of proliferation genes mediate cases from other risk categories difficult [25]. We used in the determination of ODx risk categories, and have therefore selected four categories to distinguish using Flanagan et al. [9] identified a positive correlation between computer extracted nuclear morphology features: 1) Low ODx risk category and nuclear grade when creating a ODx and Low mBR grade vs. High ODx and High mBR predictive model of ODx based off clinical variables. grade (Low-Low vs. High-High) to evaluate whether nuclear Unfortunately, pathologic assessments of tumor grade are morphology features were able to predict risk category known to suffer from inter-observer variability [10]. when both the difficult to classify intermediate cases Quantitative histomorphometry (QH) refers to the use and differences between mBR grade and ODx risk category of computer-aided image analysis of digitized pathology are removed. 2) Low ODx vs. High ODx to evaluate the images to “unlock” more revealing sub-visual attributes predictive ability of the nuclear morphology features when about tumor morphology, which can possibly be corre- difficult to classify intermediate cases are removed. 3) Low lated with disease recurrence independent of other clinical ODx vs. Intermediate and High ODx to evaluate the ability and pathologic features. These features might also poten- of the nuclear morphology features to identify the low tially reveal the underlying biology or molecular phenotype ODx cases specifically. 4) Low and Intermediate ODx of the tumor. For example, Buchelli et al. showed that the vs. High ODx to evaluate the ability of the nuclear number of mitoses identified via a deep learning algorithm morphology features to identify high ODx cases specifically. was predictive of the ODx risk categories [11]. The approach presented in this paper comprises the Nuclear architecture is another image attribute that following main steps (Fig. 1). First, H&E slides of surgical has been implicated in the prediction of overall cancer or biopsy specimens of breast tissue are scanned and digi- grade and cancer aggressiveness [12, 13]. Additionally, tized (Fig. 1.1). Second, nuclear segmentation is performed variations in nuclear shape could reflect genetic instability using deep learning models trained on manual breast [14] and may impact the ability of cancer cells to travel nuclei annotations, followed by watershed separation to through tissue and create metastases that lead to recur- resolve overlapping nuclei (Fig. 1.2). Third, a deep learning rence [15]. A number of recent studies have shown the model was used to separate epithelial from stromal association of QH features of nuclear architecture and regions, helping us identify which nuclei were stromal and morphology with disease progression in oropharyngeal which were epithelial (Fig. 1.3). Fourth, we extracted nuclear cancers [16], cancer recurrence in lung cancers [17], architectural and shape features from the epithelial and biochemical recurrence in prostate cancers [18, 19]and stromal regions separately (Fig. 1.4). Fifth, we perform overall breast cancer survival [20]. feature selection on the resulting features using four There is also evidence that the performance of QH different feature ranking schemes - Ranksum, PCA-VIP, analysis improves when done separately on different cell MRMR MID, and MRMR MIQ. The predictive perform- types [20]. In the context of distinguishing breast cancers ance of these features was evaluated using four different with different degrees of risk, it is likely that these cancers supervised machine learning classifiers - random forest, are characterized by different phenotypical changes in support vector machine (SVM), linear discriminant different cell types. Breast cancers are predominantly analysis (LDA), and a neural network – via a 3-fold carcinomas –cancers which are derived from epithelial cross validation scheme (Fig. 1.5). The classifiers were Whitney et al. BMC Cancer (2018) 18:610 Page 3 of 15 Fig. 1 Illustration of the methodology used to classify whole slide images into ODx risk categories. 1) Image patches are extracted at 40× from regions within whole slides identified by pathologists as containing invasive cancer. 2) Nuclei detection is performed on these image patches and 3) combined with a Deep Learning epithelial/stromal separation model. 4) Nuclear architecture and shape features are extracted from the detected epithelial and stromal nuclei separately. These features are combined with (5) a trained classification model in order predict the ODx risk category for each patch. Classification results from the image patches for each patient are (6) combined in a patch-based-voting method to (7) yield the final risk prediction on a patient level evaluated by their ability to distinguish between the by pathologists at each of the participating institutions. 9 four different classification tasks presented above using cases in which the mBR score and ODx risk category were the area (AUC) under the Receiver Operating Charac- at opposite extremes (4 low mBR and High ODx, and 5 teristic (ROC) curve, which plots the true positive rate high mBR and low ODx) were excluded from this study. against the false positive rate. Finally, classifiers are trained to create per-patch risk category predictions, Nuclei segmentation identifying the optimal threshold of what percentage of We employed the approach described in [26] by Janowczyk positively classified patches should result in a positive et al. for segmenting individual nuclei. Two Deep Learning prediction based on training data, and then applied and (DL) models were employed. The first model identified the evaluated on testing folds to create a final prediction of likelihood that a given pixel was part of a nucleus and the the ODx risk category for each patient (Fig. 1.6, 1.7). second model identified the likelihood that a pixel was part of the epithelium or stroma. Both models were trained Methods using manual segmentations of the tissue primitives of Dataset description interest (i.e. nucleus or stroma or epithelium). DL was exe- Our study comprised of 178 H&E stained whole tissue cuted using Caffe, a popular open-source DL framework slides of ER+ Lymph node negative breast cancer patients [27]. The DL models were trained using 32 × 32 sized (Table 1). These whole slide breast cancer samples dataset image patches on a Titan XGPU running CUDA 7.5, and a was selected to include 1) early stage ER+ breast cancers, 9-layer convolutional neural network framework. 2) surgically resected tissue specimens, and 3) the avail- The nuclear segmentation model was trained on a ability of a corresponding Oncotype DX risk score. These dataset of 141 manually annotated ER+ breast cancer slides were obtained from patients treated between 2004 tissue images, each patch sized 2000 × 2000 pixels and at and 2009 at the Cancer Institute of New Jersey and the 40× magnification. The epithelium/stroma separation University of Pennsylvania, and between 2008 and 2013 at model was trained on a dataset of 236 ER+ breast cancer Case Western Reserve University. Slides were locally digi- tissue image patches, each sized at 1000 × 1000 pixels tized at their originating institutions using Aperio, Leica, and at 10× magnification. Lower magnification in the and Philips scanners. The Modified Bloom-Richardson epithelial/stromal separation model allowed for more con- Grade for each of the pathologic specimens was determined textual information to be included in the image patches Whitney et al. BMC Cancer (2018) 18:610 Page 4 of 15 Table 1 Dataset characteristics – demographic and cancer subtype distribution in each risk category for the cases from the 3 different institutions considered in this study Parameters Oncotype DX Risk Category Low (< 18) Intermediate (> 18, ≤30) High(> 30) No. of Patients (N = 125) 66 (53%) 44 (35%) 15 (12%) Age 20–77 25–70 45–70 Sex Female 66 (53%) 43 (34%) 15 (12%) Male 0 (0%) 1 (1%) 0 (0%) Patient Ethnicity White 33 (26%) 23 (18%) 5 (5%) African American 3 (2%) 2 (2%) 2 (2%) Asian 1 (1%) 2 (2%) 1 (1%) Unknown 22 (18%) 17 (14%) 7(6%) PR Status Positive 64 (51%) 39 (31%) 10 (8%) Negative 2 (2%) 3 (2%) 5 (4%) Unknown 0 (0%) 2 (2%) 0 (0%) HER2 Status Positive 0 (0%) 1 (1%) 0 (0%) Negative 66 (53%) 42 (34%) 15 (12%) Unknown 0 (0%) 1 (1%) 0 (0%) Histologic Tumor Grade Low (4, 5) 10 (8%) 14 (11%) 0 (0%) Moderate (6, 7) 48 (38%) 24 (19%) 4 (3%) High (8, 9) 8 (6%) 6 (5%) 11 (9%) Tumor Type Ductal 53 (42%) 37 (30%) 14 (11%) Ductal With Lobular Features 9 (7%) 3 (2%) 1 (1%) Ductal with Mucinous Features 1 (1%) 2 (2%) 0 (0%) Mixed 3 (2%) 2 (2%) 0 (0%) during model training, improving accuracy and speed. This nuclei as the vertices of the graph. The choice of vertex patch-based approach allowed for multiple identically- connectivity determines the type of nuclear graph (i.e. sized image patches to be used, increasing the size of the Delaunay, Voronoi, MST, CCG) constructed. Features training set. In addition, the patch size was selected to use extracted from the graphs included changes in the lengths the field of view identified as being optimal for extracting of edges and distance between nearest vertices. Cellular nuclear architecture features of the tumor [28]. disorder can be measured using features derived from Cell Orientation Graphs [19]. Shape features included Invariant Feature extraction Moment, Fourier Descriptor, and Length/Width ratios. A A total of 216 nuclear features were extracted from epi- comprehensive enumeration of all the image features thelial and stromal nuclei separately, resulting in a total extracted is presented in the Additional file 1. of 432 features per patch. These features consisted of architecture and shape features. Feature ranking Architectural features were obtained by performing Feature ranking was used to identify the most relevant quantitative analysis of nuclear graphs, such as Delaunay image features for predicting the corresponding ODx Triangles, Voronoi Diagrams, Minimum Spanning Trees risk category. Features were ranked in order of highest (MST), and Cell Cluster Graphs (CCG) [29] (Fig. 2). relevance to the classification problem. The most relevant These nuclear graphs were constructed using the individual features identified were subsequently used in conjunction Whitney et al. BMC Cancer (2018) 18:610 Page 5 of 15 Fig. 2 Nuclear graphs used to calculate features relating to spatial arrangement of nuclei. Left to right: Original images at 1×, 4×, and 40×, Voronoi Diagram, Minimum Spanning Tree, and Cell Cluster Graph, reflecting local nuclear architecture. Comparison between graph appearance for a low ODx example (top) and a high ODx example (bottom) with machine learning classifiers. A number of popular fea- simultaneously present in the training and hold-out ture ranking methods were evaluated including Wilcoxon groups. Two of these groups were used for model training, Ranksum [30], PCA-VIP [31], and Maximum-Relevance while the third group was used to test the trained Minimum-Redundancy (MRMR) [32] with two variants – model. Machine learning classifiers were trained on a per- Mutual Information Difference, and Mutual Information patch basis. This allowed for a simple patch-based voting Quotient (MRMR-MID and MRMR-MIQ) [33]. Each of method, in which the classification of the patient as being these feature ranking methods takes a slightly different in the low or high-risk category was based on if the number approach to identifying the most relevant features, and of class labels predicted for a given class surpassed a patch simultaneously suppressing features that are highly percentage threshold. The optimal threshold was deter- correlated with each other. The Ranksum method identi- mined from the training data in each iteration. This fies feature relevance to classification without explicitly method can also be used to classify individual patches considering the correlation between highly-ranked features spatially in an H&E slide, providing a spatially distributed [30]. PCA-VIP uses a combination criteria of both how assessment of cancer aggression across a given sample each of the principle component vectors relate to the (Fig. 3). outcome to be predicted, and which features most highly contribute to those principle component vectors (effect- Experiments ively measuring to what extent a given feature provides The four experiments were as follows unique information in a dataset) [31]. MRMR-MID and MRMR-MIQ both use maximal relevance criteria which 1) Low ODx and Low mBR grade vs. High ODx and use the mean mutual information values between features High mBR grade (Low-Low vs. High-High). This and the relevant output class, while minimizing the redun- experiment was used to look at the cases reflecting dancy (mutual information between any feature and the the extremes in terms of tumor morphology and other features in the dataset) [32]. ODx risk. While grade and ODx risk scores are correlated for the most part [8], in this experiment we Classifier construction chose to ignore conflicting cases (i.e. cases with a low A total of four different classifiers was tested in conjunction mBR grade but a high ODx score and vice-versa). with each of the four different feature selection methods. 2) Low ODx vs. High ODx. This experiment looks at The classifiers employed included a bagged C4.5 Random cases of high distinction in terms of ODx risk Forest [34], a ten-node four-layer Neural Network [35], a 3 category, but does not exclude cases with kernel Support Vector Machine [36], and a pseudolinear conflicting grade categories. discriminant Linear Discriminant Analysis [37]. Machine 3) Low ODx vs. Intermediate and High ODx. This is learning classifiers were trained using 100 iterations of the hypothesis that is closest to the question a randomly initialized 3-fold cross-validation. 3-fold cross- clinician is interested in answering: identifying cases validation was employed to divide the entire dataset of that are low ODx risk score from all others so that image patches into three equal groups by patient ID, thus low ODx risk patients can avoid aggressive ensuring that patches from each patient were not chemotherapies. Whitney et al. BMC Cancer (2018) 18:610 Page 6 of 15 Fig. 3 Example of the Low-Low vs. High-High random forest classifier using ranksum feature selection applied to patches from whole slide image. Machine classification uses the top ranked epithelial and stromal features. Green squares indicate patches that are predicted to be Low ODx while Blue squares are predicted to be High ODx 4) Low and Intermediate ODx vs. High ODx. This the true positive rate as a function of the false positive rate experiment considers the possibility that high ODx at varying confidence thresholds. The higher the area risk patients are histologically distinct from both under the curve (indicated by the curve extending into the other ODx risk categories. upper left quadrant), the more frequently the classifier is able to correctly identify the class, and the less frequently We also quantitatively assessed the performance of it is to falsely classify a case as positive. For comparison, a each of four different feature ranking methods over stromal diagonal line extending from the bottom left to the upper and epithelial features in conjunction with four different right corner would indicate an AUC of 0.5, which is machine learning classification schemes to determine considered to be the equivalent of guessing. which combination of classification and feature ranking In order to demonstrate the significance of epithelial/ approaches resulted in the highest per-patient patch voting stromal separation, we ran two sets of features using the accuracy for each of the four experiments. Per-patient optimized machine learning classifier and feature ranking patch voting simply means that the classifier was applied algorithm. The two feature sets were: 1) nuclei features to each patch extracted from a patient, thus generating an extracted from all nuclei, 2) nuclei features extracted from ODx risk category prediction for each patch. A simple epithelial and stromal nuclei separately. The utility of majority of the per-patch risk category predictions for each separating epithelial and stromal nuclei prior to feature patient is then used to determine the predicted patient extraction was measured by comparing the AUCs between ODx risk category. The per-patient patch voting accuracy models trained from features with no epithelial/stromal is defined as the percentage of patients whose ODx risk separation, and epithelial stromal separation prior to fea- category was correctly predicted using this method. ture extraction. Feature evaluation via supervised classification Evaluation of models on external validation set For each of the 4 classification experiments described above, In order to fully assess the effectiveness of the models we identified 1) the most highly ranked and predictive generated, the models with the highest performance were epithelial and stromal nuclear morphologic features which used on an external validation set. Models were trained were evaluated via violin plots (Figs. 4), and 2) classification over theentireprimarycohort before being applied without accuracy for the machine learning classifiers in conjunction any retraining to the external validation set. with the top ranked features in the form of AUC. Violin plots illustrate the distribution of normalized Results feature values for the top performing features between The results for the four primary experiments are as follows the two risk categories. Thus, high degrees of separation between the two distributions indicate a high level of 1) Low ODx and Low mBR grade vs. High ODx and discrimination from that feature. AUC curves indicate High mBR grade (Low-Low vs. High-High) (Fig. 5, Whitney et al. BMC Cancer (2018) 18:610 Page 7 of 15 Fig. 4 Feature Distributions for the top ranked epithelial (left) and stromal (right) features using PCA-VIP feature ranking for each experiment. Green lines indicate the mean of each population, and red lines indicate the 25th and 75th percentiles of the distribution. Width of the plot indicates the relative number of data points at each normalized feature value along the y-axis top left). In this experiment, the top ranked epithelial perimeter, area ratios, and invariant moment (Table 3). features were cell cluster graphs, and the top ranked The SVM classifier using the PCA-VIP feature ranking stromal features were shape features related to nuclear scheme yielded the highest classification accuracy with Whitney et al. BMC Cancer (2018) 18:610 Page 8 of 15 Fig. 5 ROC curves for each of the four experiments conducted (panels) and classification methods (lines) using PCA-VIP feature selection. Top left: Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High). Top Right: Low ODx vs. High ODx. Bottom Left: Low ODx vs. Intermediate and High ODx. Bottom Right: Low and Intermediate ODx vs. High ODx. Each panel displays the ROC curve using either (solid) random forest, (dashed) neural network, (dotted) SVM, or (intermediate dash) LDA classification. Feature set includes epithelial and stromal features. AUC values for each curve are displayed in the legend an AUC of 0.83, and a patch voting accuracy of 3) Low ODx vs. Intermediate and High ODx (Fig. 5, 86% (Table 2). AUC results using the same bottom left): The top ranked epithelial features classifier and feature ranking methodology were primarily disorder and number of nearest improved from 0.71 to 0.83 with the inclusion of neighbors features, while the highest ranked stromal features (Table 4). stromal features were primarily metrics regarding 2) Low ODx vs. High ODx (Fig. 5, top right) (Fig. 5, the invariant moment (Table 3). The random forest top right): The top ranked epithelial features were classifier using the PCA-VIP feature ranking scheme the cell cluster graph and disorder of nearest yielded a classification AUC of 0.58, and a patch neighbors features, while the highest ranked voting accuracy of 64% (Table 2). AUC results using stromal features were similar to those identified for the same classifier and feature ranking methodology the low-low vs. high-high discrimination problem, improved from 0.55 to 0.58 with the separation of namely perimeter ratio, area ratio, and invariant epithelial and stromal nuclei (Table 4). moment (Table 3). The SVM classifier using the 4) Low and Intermediate ODx vs. High ODx (Fig. 5, PCA-VIP feature ranking scheme yielded a bottom right):: The top ranked epithelial features classification AUC of 0.72, and a patch voting were metrics concerning the mean and variation in accuracy of 76% (Table 2). AUC results using the edge length associated with cell cluster graphs, same classifier and feature ranking methodology while the highest ranked stromal features were the improved from 0.61 to 0.72 with the separation of invariant moment and standard deviation of the epithelial and stromal nuclei (Table 4). Fourier descriptor (Table 3). The SVM classifier Whitney et al. BMC Cancer (2018) 18:610 Page 9 of 15 Table 2 Classification accuracy metrics for each of the four experiments. From left to right: Low ODx Low mBR vs. High ODx Low mBR, Low ODx vs. High ODx, Low ODx vs. Intermediate and High ODx, and Low and Intermediate ODx vs. High ODx. Data for each experiment includes the AUC, best patch Voting Accuracy results, and the optimal feature ranking and classifier used to achieve the optimized patch voting accuracy results. All experiments conducted with 3-fold cross-validation Experiment LL vs. HH L vs. H L vs. Int. and H L and Int. vs. H Number of Patients 37 75 125 111 AUC 0.81 0.69 0.58 0.6 AUC STDev 0.08 0.05 0.03 0.06 Patch Voting Accuracy 82% 80% 60% 86% Best Feat. Ranking for Patch voting MRMR-MID PCA-VIP Ranksum MRMR-MID Best Classifier for Patch voting LDA Random Forest Random Forest Random Forest and PCA-VIP feature ranking scheme yielded an Of the epithelial features considered, the most discrimin- AUC of 0.65, and a patch voting accuracy of 74% ating features identified across all 4 classification problems (Table 2). AUC results using the same classifier and were those pertaining to epithelial architecture of nuclei feature ranking methodology improved from 0.55 to (Table 3). Of the stromal features, the most significant 0.65 with the separation of epithelial and stromal tended to be those related to measuring changes in the nuclei (Table 4). shape of the stromal nuclei. In each experiment, the Table 3 Top three Epithelial and Stromal features for each of the four experiments: Low ODx and Low mBR grade vs. High ODx and High mBR grade (Low-Low vs. High-High), Low ODx vs. High ODx, Low ODx vs. Intermediate and High ODx, and Low and Intermediate ODx vs. High ODx Experiments Epithelial Features (EP) Low Low vs. High High 1 EP: CCG: Clustering Coefficient E 2 EP: CCG: standard deviation edge length 3 EP: CCG: Clustering Coefficient D Low vs. High 1 EP: CCG: standard deviation edge length 2 EP: CCG: mean edge length 3 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius Low vs. Intermediate and High 1 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius 2 EP: Arch: Disorder of Nearest Neighbors in a 50 Pixel Radius 3 EP: Arch: Avg. Nearest Neighbors in a 40 Pixel Radius Low and Intermediate vs. High 1 EP: CCG: standard deviation edge length 2 EP: CCG: mean edge length 3 EP: Arch: Disorder of Nearest Neighbors in a 40 Pixel Radius Stromal Features (ST) Low Low vs. High High 1 ST: Shape: Median Area Ratio 2 ST: Shape: Median Invariant Moment 2 3 ST: Shape: Mean Perimeter Ratio Low vs. High 1 ST: Shape: Mean Perimeter Ratio 2 ST: Shape: Mean Area Ratio 3 ST: Shape: Mean Invariant Moment 2 Low vs. Intermediate and High 1 ST: Shape: Mean Invariant Moment 2 2 ST: Shape: Median Invariant Moment 2 3 ST: Shape: Standard Deviation Invariant Moment 2 Low and Intermediate vs. High 1 ST: Shape: Median Invariant Moment 2 2 ST: Shape: Mean Invariant Moment 2 3 ST: Shape: Standard Deviation Fourier Descriptor 2 Whitney et al. BMC Cancer (2018) 18:610 Page 10 of 15 Table 4 Improvements in classification accuracy based on features extracted from all nuclei together (No Ep/St. Sep.) vs. features extracted from epithelial nuclei and stromal nuclei separately (Ep/St Sep.), ranked via the PCA-VIP feature selection scheme, and used to train an SVM classifier. All AUC scores were generated using 3-fold cross validation Experiment No Ep/St Separation Ep/St Separation AUC Improvement High-High vs. Low-Low 0.71 0.83 0.12 High vs. Low 0.61 0.72 0.11 Low vs. Intermediate and High 0.55 0.58 0.03 Low and Intermediate vs. High 0.55 0.65 0.1 Average 0.61 0.7 0.09 epithelial features were identified to be more significant in In addition, while each of the feature ranking methods separating the different risk categories compared to the had very comparable performance, the PCA-VIP feature stromalnucleifeatures (Fig. 6). The classification AUC for ranking scheme yielded slightly better performance, with the machine learning classifier was highest for the problems a peak AUC of 0.71 using a Support Vector Machine involving the extreme risk or grade categories (i.e. Low-Low (Fig. 6). vs High-High and Low ODx vs High ODx). Unsurprisingly, Comparisons between the classification efficacy with the AUC values were lower when the intermediate risk cat- and without the use of epithelial/stromal separation across egory was also included (i.e. Low ODx vs. Intermediate and the four experiments yielded an average improvement of High ODx and Low and Intermediate ODx vs. High ODx). 0.09 (Table 4). Fig. 6 Determining the optimal feature ranking method - ROC curves for different combinations of feature ranking methods (panels) and classification methods (lines) for separating low from high ODx patches. Top left: Ranksum (Wilcoxon rank sum). Top right: PCA-VIP. Bottom left: MRMR-MID. Bottom right: MRMR-MIQ. Each panel displays the ROC curve using either (solid) random forest, (dashed) neural network, (dotted) SVM, or (intermediate dash) LDA classification. Feature set includes stromal and epithelial features. AUC values for each curve are displayed in the legend Whitney et al. BMC Cancer (2018) 18:610 Page 11 of 15 Validation results Table 6 Validation dataset – Classification accuracy using Ranksum feature ranking and a SVM classifier for each of four We tested the results of the model on an external valid- classification separations ation set. The model was trained using Ranksum feature Ranksum - SVM & Classification Accuracy ranking and a Random forest classifier using 100 iterations of 3-fold cross-validation to determine the top-performing Low-Low vs. High-High 76% features. These features were then trained over the entire Low vs. High 79% training set before being evaluated on the validation set. Low and Intermediate vs. High 85% The validation set was obtained from the University of Low vs. Intermediate and High 84% Pennsylvania and contained 53 cases comprised of Low and High ODx risk cases of primarily Low and High mBR grade (Table 5). As described previously, the accuracy of stage ER+ breast cancer histology samples into different each model was determined using per-patient patch Oncotype DX determined risk categories. Nuclear feature voting, where pathologist selected ROIs were divided into extraction was accomplished by 1) obtaining nuclear sub-ROI patches, and each patch was then classified as segmentations with a deep learning algorithm, 2) using belonging to either low or high risk using each of the four deep learning epithelial/stromal separation of nuclei, and models. The classification of the patient into high or low 3) extracting nuclei shape and architectural features from risk was determined by the percentage of sample patches those segmentations. Those features were then given to a predicted to belong to either category. Because it is series of machine based classifiers and feature ranking possible that the optimal percentage threshold for distin- methods using 3-fold cross-validation to test the effective- guishing between high and low risk may not be a simple ness of each machine based classifier. These features were majority, the ideal percentage of patches that were need to then employed in the context of discriminating the following be identified as low for the patient to be categorized as 4 different grade-ODx risk categories: 1) Low ODx and Low low ODx risk was determined from the training set. Per- mBR grade vs. High ODx and High mBR grade (Low-Low patient accuracies ranged between 76 and 85% across all vs.High-High). 2) LowODx vs.HighODx.3) Low ODxvs. hypotheses evaluated. Improvements in classification ac- Intermediate and High ODx. 4) Low and Intermediate ODx curacy of low vs. high over low-low vs high-high may be vs. High ODx. explained by the fact that the validation set was composed We found that the best classifier accuracy (AUC = 0.83) exclusively of low and high ODx samples. In addition, the was obtained for the Low-Low vs. High-High classification larger number of samples which were low ODx as com- problem. Since the ODx risk category is strongly correlated pared to high ODx samples may explain why the model with tumor grade [9], by choosing to leave out conflicting trained to distinguish between low and intermediate vs cases (i.e. where the grade and ODx risk categories are not high had slightly improved performance over the model aligned), the Low-Low vs High-High categories represent trained to distinguish between low vs. intermediate and the extreme risk cases. The next highest accuracy was high. It may also reflect the fact that the low and inter- obtained for the Low ODx vs. High ODx categories, where mediate risk patients are more alike from a histomorpho- all intermediate risk cases were left out. The best classifier metric perspective compared to the intermediate and high AUC obtained in this experiment (AUC = 0.72) was lower risk patients. The accuracies were highest using models compared to the AUC obtained for the Low-Low vs High- trained to distinguish between Low vs. High and Low vs High problem, possibly due to presence of 64 cases (55 (Intermediate and High ODx) cases (Table 6). Intermediate mBR and Low ODx, and 9 Intermediate mBR High ODx) where the grade and ODx risk categories Discussion did not align. This most likely adversely affected the train- In this work, we evaluated the effectiveness of computer- ing and the evaluation of the machine learning classifiers. extracted measurements of size, shape, and architectural When evaluating the classifiers in distinguishing the Low features of epithelial and stromal nuclei in separating early vs. Intermediate and High and the Low and Intermediate vs. High ODx risk categories, the Low and Intermediate vs. High ODx distinction had slightly improved performance Table 5 Validation dataset characteristics – ODx and grade as compared to distinguishing Low vs. Intermediate and distribution High ODx risk categories. This may be due to the fact that Validation Set (N = 53) the intermediate cases identified by ODx were primarily mBR Tumor Grade\ODx Low (< 18) Intermediate High (> 30) low risk cases [38]. Category (> 18, leq30) Classifier models trained on Low vs. High and the Low Low (4, 5) 40 (75%) 0 (0%) 0 (0%) with Intermediate vs. High ODx cases yielded the highest Moderate (6, 7) 0 (0%) 0 (0%) 1(2%) classification accuracy on the validation set. These results High (8, 9) 0(0%) 0 (0%) 12 (23%) appear to suggest that histomorphometrically the low Whitney et al. BMC Cancer (2018) 18:610 Page 12 of 15 ODx and intermediate ODx appeared more similar and pathologist grading information, such as the Magee compared to the high ODx cases. Clearly this will need Equation [9]. Using these methods, low grade and low ER to be validated in additional, larger independent validation and PR (≤150) can be correctly categorized as being low studies, but if confirmed might suggest that a number of ODx 89% of the time; and when ignoring intermediate the patients currently classified as intermediate risk by ODx cases, low and high ODx samples can be correctly Oncotype DX might actually be low risk and should be identified with concordance rates between 96.9 and 100% classified as such. [25, 49]. However, these methods have between 54.3 and Tumor grade is determined by tubule formation, nuclear 59.4% concordance when considering intermediate cases pleomorphism, and mitotic count [39]. These same as well as low and high, and require pathologist-generated features are found to strongly correlate breast cancer data [25]. When considering the intermediate risk categor- outcome [40]. The state of tubule formation is reflected ies, our classification AUC ranged from 0.58 and 0.6 which in features such as the ratio of tubule nuclei to total appears to be in alignment with the findings in [25]. nuclei [41]. The architecture of tubule formation is also Several different groups have previously explored the use reflected in features used in the presented work, such of QH for predicting ODx risk categories. For example, as Cell Cluster Graphs [29], Cell Orientation Entropy Basavanhally et al. was able to separate high from low [19], and Disorder of Nearest Neighbors [19]. Nuclear grade breast cancer patients, with top performing architec- pleomorphism may be reflected in features such as the tural features such as Delaunay Triangle metrics, nuclei Mean Invariant Moment [42], and Area Ratio [43]. density, and Voronoi Diagram architectural information Thus, the features used in this work are implicitly [12]. Romo-Bucheli et al. was able to separate high-high reflective of the histomorphometric measurements used from low-low cases with an AUC of 0.76 using a single by pathologists to assess grade and breast cancer outcome. feature: the ratio of tubule nuclei to non-tubule nuclei [41]. However, the method presented can also identify complex This approach used Deep Learning to identify biologically and sub-visual (i.e. information which is present, but not relevant structures (separating tubule nuclei from non- easily discernable by a human, such as higher-order nuclei tubule nuclei), while the presented approach used a much architectural characteristics, or difficult to recognize chro- larger number of nuclei-specific features for classification matin patterns [44, 45]) relationships between quantitative purposes. features and ODx categories that are difficult for patholo- While related to these previous approaches [12], our gists to visually identify. The Oncotype gene expression focus was on quantitatively evaluating the role of test aims to capture changes in genetic expression in computer extracted features of nuclear morphology in genes that have been tied with specific cancer-related the stroma and epithelium with the Oncotype Dx risk traits [46]. For example, Ki-67, STK15, Survivin, Cyclin categories. Additionally, unlike previous related studies B1, and MYBL2 have all been associated with breast [13] our study looked at the most discriminating features cancer proliferation; Stromelysin 3 and Cathepsin L2 have to distinguish not just the extreme risk categories (low been associated with invasion; and ER, PR, Bcl2, and vs. high) but also looked at the ability of computer SCUBE2 have been associated with responsiveness to extracted nuclear morphologic features to distinguish Estrogen [47]. Variations in these genes could potentially the intermediate risk categories from the low and high lead to changes in visual presentation of the cancer, and risk categories. thus affect the features previously described. For example, We do however acknowledge the several limitations of increases in Ki-67 activity resulting in increased unregu- this work. Firstly, the validation set used only included lated cell proliferation may increase the density of cell high and low ODx cases, without any intermediate cases. nuclei, resulting in an increase in the Disorder of Nearest Secondly, the focus of this work was on finding features Neighbors, or decreased distance between nuclei in Cell that were associated with ODx risk categories and not Cluster Graphs. Tumor invasion resulting from activation patient outcome. Oncotype DX is a companion diagnostic of Stromelysin 3 could result in either a loss of tissue test, and while the risk categories have been validated differentiation, or the presence of large epithelial nuclei against outcome, it is not perfectly correlated [50]. Unfor- invading into the surrounding stroma [48]. These types of tunately, long-term disease recurrence or patient outcome phenotypic changes might be captured by architectural information was not available for the cases considered in features, or size and shape variation amongst stromal nuclei this study. We also did not conduct a detailed study of the features. For example, variation in stromal nuclei shape influence of staining and scanning variations on the could also be related to the connection between spindle-cell features identified as predictive and the influence of and round stromal nuclei contact and breast cancer patient these parameters on the subsequent classification results. survival discovered by Beck et al. [20]. Finally, we focused solely on the role of nuclear morph- Previous groups have been able to duplicate ODx ology in this work, there are clearly other features that are results using equations drawing from genetic expression known to have a prognostic role in early stage ER+ breast Whitney et al. BMC Cancer (2018) 18:610 Page 13 of 15 cancers, features relating to number and distribution of National Center for Research Resources under award number 1 C06 RR12463–01. tumor infiltrating lymphocytes, mitoses [11], and tubules The DOD Prostate Cancer Synergistic Idea Development Award (PC120857); [41]. These features have shown to be independently The DOD Lung Cancer Idea Development New Investigator Award useful in determining ODx risk categories in ER+ breast (LC130463), The DOD Prostate Cancer Idea Development Award; cancer, and would likely improve the classification results The DOD Peer Reviewed Cancer Research Program W81XWH-16-1-0329. when combined with the nuclear histomorphometric The Ohio Third Frontier Technology Validation Fund. features presented in this work. Another potential future The Hartwell Foundation. the Wallace H. Coulter Foundation Program in the Department of Biomedical avenue is the integration of histomorphometric approaches Engineering and the Clinical and Translational Science Award Program such as this with genomic based tests to determine if the (CTSA) at Case Western Reserve University. integration of morphologic and molecular measurements The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. enables more accurate risk assessment, especially for the patients currently identified as intermediate risk. We hope Availability of data and materials to address these limitations in future work. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Conclusions Authors’ contributions In this work we evaluated the role of computer extracted All authors have read and approved the manuscript. JW, GC, and AJ were features relating to spatial architecture and shape within responsible for experiments run. Manuscript was written primarily by JW and AM, with support from all authors. SG, SD, JT, MF, and HG were responsible the epithelium and stroma and showed that these features for defining the clinical problem, reviewing and annotating imaging data, could distinguish early stage ER+ breast cancers into and providing biological interpretation of the findings. Validation dataset different ODx risk categories. Our results suggest that provided by MF. with additional validation, these features could be used Ethics approval and consent to participate to create an inexpensive, rapid, and nondestructive pre- The study was HIPAA compliant and was approved by the Institutional Review dictor of low and high ODx risk categories for early stage Board at the University Hospitals Case Medical Center. The informed consent ER+ breast cancer based off digitized images of H&E slides was waived by the institutional review board for this retrospective study. alone. Competing interests Dr. Madabhushi is an equity holder in Elucid Bioimaging and in Inspirata Inc. He is also a scientific advisory consultant for Inspirata Inc. In addition, he Additional file currently serves as a scientific advisory board member for Inspirata Inc. and for Astrazeneca. He also has sponsored research agreements with Philips and Additional file 1: Table S7. Features tested for significance, and Inspirata Inc. His technology has been licensed to Elucid Bioimaging and considered for use in final analysis. Comprehensive list of features Inspirata Inc. He is also involved in a NIH U24 grant with PathCore Inc. and a investigated for classification utility. Each feature was used to analyze R01 with Inspirata Inc. Drs John Tomaszewski. Michael Feldman and Shridar epithelial and stromal nuclei separately. (XLSX 15 kb) Ganesan are members of the scientific advisory board of Inspirata, Inc. a digital pathology start-up company, and receives board fees and stock options. The authors declare that they have no competing interests. Abbreviations CCG: Cell Cluster Graph; ER +: Estrogen Receptor Positive; H&E: Hematoxylin and eosin; LDA: Linear Discriminant Analysis; mBR: Modified Bloom-Richardson; Publisher’sNote MRMR MID: Maximum Relevance, Minimum Redundancy, Mutual Information Springer Nature remains neutral with regard to jurisdictional claims in Difference; MRMR MIQ: Maximum Relevance, Minimum Redundancy, Mutual published maps and institutional affiliations. Information Quotient; MST: Minimum Spanning Trees; ODx: Oncotype Dx; PCA-VIP: Primary Component Analysis – Variable Importance; QH: Quantitative Author details Histomorphometry; ROC: Region Under the Curve; SVM: Support Vector Department of Biomedical Engineering, Case Western Reserve University, Machine 2071 Martin Luther King Drive, Cleveland, OH 44106-7207, USA. Universidad Nacional de Colombia, Bogotá D.C, Colombia. Department of Medicine, Acknowledgements Division of Medical Oncology, Rutgers Robert Wood Johnson Medical NVIDIA -a Titan X GPU, Gift of Titan X GPU to support research. Special thanks School, Rutgers Cancer Institute of New Jersey, 195 Little Albany Street, New to Natalie Shih for helping procure validation data in a timely manner. Brunswick, NJ 08903, USA. SUNY at the University at Buffalo, 3435 Main Street, Buffalo, NY, USA. Department of Pathology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA. Funding Department of Pathology, University Hospitals, Cleveland Medical Center The following funding bodies provided funding for the data collection, and Case Western Reserve University, Cleveland, OH 44106, USA. digitization, annotation and the computational and statistical analysis, as also in the writing of the manuscript. Received: 27 October 2017 Accepted: 26 April 2018 Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under award numbers. 1U24CA199374–01, R01CA202752-01A1. References R01CA208236-01A1. 1. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Effects of R21CA179327–01; chemotherapy and hormonal therapy for early breast cancer on recurrence R21CA195152–01. and 15-year survival: an overview of the randomised trials. Lancet Lond The National Institute of Diabetes and Digestive and Kidney Diseases under Engl. 2005;365(9472):1687–717. https://doi.org/10.1016/S0140- award number R01DK098503–02, 6736(05)66544-0. PMID: 15894097 Whitney et al. BMC Cancer (2018) 18:610 Page 14 of 15 2. Brezden CB, Phillips K-A, Abdolell M, Bunston T, Tannock IF. Cognitive 17. Yu K-H, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, Snyder M. Predicting function in breast cancer patients receiving adjuvant chemotherapy. J Clin non-small cell lung cancer prognosis by fully automatede microscopic Oncol. 2000;18(14):2695–701. pathology image features. Nat Commun. 2016;7:12474. https://doi.org/10. 3. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, 1038/ncomms12474. Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N. A 18. Lee G, Veltri RW, Zhu G, Ali S, Epstein JI, Madabhushi A. Nuclear shape and multigene assay to predict recurrence of tamoxifen-treated, node-negative architecture in benign fields predict biochemical recurrence in prostate breast cancer. N Engl J Med. 2004;351(27):2817–26. https://doi.org/10.1056/ Cancer patients following radical prostatectomy: preliminary findings. Eur NEJMoa041588. PMID: 15591335 Urol Focus. 2016; https://doi.org/10.1016/j.euf.2016.05.009. 4. Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, Baehner FL, Watson 19. Lee G, Ali S, Veltri R, Epstein JI, Christudass C, Madabhushi A. Cell orientation D, Bryant J, Costantino JP, Geyer CE, Wickerham DL, Wolmark N. Gene entropy (COrE): predicting biochemical recurrence from prostate cancer expression and benefit of chemotherapy in women with node-negative, tissue microarrays. Med Image Comput Comput-Assist Interv MICCAI Int estrogen receptor-positive breast Cancer. J Clin Oncol. 2006;24(23):3726–34. Conf Med Image Comput Comput-Assist Interv. 2013;16(Pt 3):396–403. https://doi.org/10.1200/JCO.2005.04.7985. PMID: 24505786 5. Sparano JA, Gray RJ, Makower DF, Pritchard KI, Albain KS, Hayes DF, Geyer 20. Beck AH, Sangoi AR, Leung S, Marinelli RJ, Nielsen TO, van de Vijver MJ, CE, Dees EC, Perez EA, Olson JA, Zujewski J, Lively T, Badve SS, Saphner TJ, West RB, van de Rijn M, Koller D. Systematic analysis of breast cancer Wagner LI, Whelan TJ, Ellis MJ, Paik S, Wood WC, Ravdin P, Keane MM, morphology uncovers stromal features associated with survival. Sci Transl Gomez Moreno HL, Reddy PS, Goggins TF, Mayer IA, Brufsky AM, Med. 2011;3(108):108ra113. https://doi.org/10.1126/scitranslmed.3002564. Toppmeyer DL, Kaklamani VG, Atkins JN, Berenberg JL, Sledge GW. PMID: 22072638 Prospective validation of a 21-gene expression assay in breast Cancer. N 21. American Cancer Society. Types of breast Cancer. [cited 2016 Aug 16]. Engl J Med. 2015;373(21):2005–14. https://doi.org/10.1056/NEJMoa1510764. Available from: http://www.cancer.org/cancer/breastcancer/detailedguide/ 6. Wittner BS, Sgroi DC, Ryan PD, Bruinsma TJ, Glas AM, Male A, Dahiya S, breast-cancer-breast-cancer-types Habin K, Bernards R, Haber DA, Van’t Veer LJ, Ramaswamy S. Analysis of the 22. Bhowmick NA, Neilson EG, Moses HL. Stromal fibroblasts in cancer initiation MammaPrint breast Cancer assay in a predominantly postmenopausal and progression. Nature. 2004;432(7015):332–7. https://doi.org/10.1038/ cohort. Clin Cancer Res. 2008;14(10):2988–93. https://doi.org/10.1158/1078- nature03096. 0432.CCR-07-4723. 23. Van den Eynden GG, Colpaert CG, Couvelard A, Pezzella F, Dirix LY, 7. Nielsen TO, Parker JS, Leung S, Voduc D, Ebbert M, Vickery T, Davies SR, Vermeulen PB, Van Marck EA, Hasebe T. A fibrotic focus is a prognostic Snider J, Stijleman IJ, Reed J, Cheang MCU, Mardis ER, Perou CM, Bernard factor and a surrogate marker for hypoxia and (lymph)angiogenesis in PS, Ellis MJ. A comparison of PAM50 intrinsic subtyping with breast cancer: review of the literature and proposal on the criteria of immunohistochemistry and clinical prognostic factors in tamoxifen-treated evaluation. Histopathology. 2007;51(4):440–51. https://doi.org/10.1111/j. estrogen receptor-positive breast Cancer. Clin Cancer Res. 2010;16(21):5222– 1365-2559.2007.02761.x. PMID: 17593207 32. https://doi.org/10.1158/1078-0432.CCR-10-1282. 24. Henson DE, Ries L, Freedman LS, Carriaga M. Relationship among outcome, 8. Mina L, Soule SE, Badve S, Baehner FL, Baker J, Cronin M, Watson D, Liu M-L, stage of disease, and histologic grade for 22,616 cases of breast cancer. The Sledge GW, Shak S, Miller KD. Predicting response to primary basis for a prognostic index. Cancer. 1991;68(10):2142–9. https://doi.org/10. chemotherapy: gene expression profiling of paraffin-embedded core biopsy 1002/1097-0142(19911115)68:10<2142::AID-CNCR2820681010>3.0.CO;2-D. tissue. Breast Cancer Res Treat. 2007;103(2):197–208. https://doi.org/10.1007/ 25. Klein ME, Dabbs DJ, Shuai Y, Brufsky AM, Jankowitz R, Puhalla SL, Bhargava R. s10549-006-9366-x. Prediction of the Oncotype DX recurrence score: use of pathology-generated 9. Flanagan MB, Dabbs DJ, Brufsky AM, Beriwal S, Bhargava R. Histopathologic equations derived by linear regression analysis. Mod Pathol. 2013;26(5):658–64. variables predict Oncotype DX recurrence score. Mod Pathol Off J U S Can https://doi.org/10.1038/modpathol.2013.36. Acad Pathol Inc. 2008;21(10):1255–61. https://doi.org/10.1038/modpathol. 26. Janowczyk A, Madabhushi A. Deep learning for digital pathology image analysis: 2008.54. PMID: 18360352 A comprehensive tutorial with selected use case. J Pathol Inform. 2016. 10. Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. 27. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Interobserver reproducibility of Gleason grading of prostatic carcinoma: Darrell T. Caffe: convolutional architecture for fast feature embedding. ACM general pathologist. Hum Pathol. 2001;32(1):81–8. https://doi.org/10.1053/ Press; 2014 [cited 2016 Aug 4]. p. 675–678. Available from: http://dl.acm.org/ hupa.2001.21135. PMID: 11172299 citation.cfm?doid=2647868.2654889. https://doi.org/10.1145/2647868. 11. Romo-Bucheli D, Janowczyk A, Gilmore H, Romero E, Madabhushi A. A deep 2654889. learning based strategy for identifying and associating mitotic activity with 28. Basavanhally A, Ganesan S, Shih N, Mies C, Feldman M, Tomaszewski J, gene expression derived risk categories in estrogen receptor positive breast Madabhushi A. A boosted classifier for integrating multiple fields of view: cancers. Cytom Part J Int Soc Anal Cytol. 2017; https://doi.org/10.1002/cyto. breast cancer grading in histopathology: IEEE; 2011 [cited 2016 Aug 1]. p. a.23065. PMID: 28192639 125–128. Available from: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. 12. Basavanhally A, Ganesan S, Feldman M, Shih N, Mies C, Tomaszewski J, htm?arnumber=5872370. https://doi.org/10.1109/ISBI.2011.5872370. Madabhushi A. Multi-field-of-view framework for distinguishing tumor grade in 29. Ali S, Veltri R, Epstein JA, Christudass C, Madabhushi A. Gurcan MN, ER+ breast cancer from entire histopathology slides. IEEE Trans Biomed Eng. Madabhushi A, editors. Cell cluster graph for prediction of biochemical 2013;60(8):2089–99. https://doi.org/10.1109/TBME.2013.2245129. PMID: 23392336 recurrence in prostate cancer patients from tissue microarrays; 2013 [cited 13. Basavanhally A, Feldman M, Shih N, Mies C, Tomaszewski J, Ganesan S, 2016 Mar 18]. p. 86760H. Available from: http://proceedings. Madabhushi A. Multi-field-of-view strategy for image-based outcome spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12.2008695. https://doi. prediction of multi-parametric estrogen receptor-positive breast cancer org/10.1117/12.2008695. histopathology: comparison to Oncotype DX. J Pathol Inform. 2011;2:S1. 30. Devore J. Probability and statistics for engineering and the sciences: https://doi.org/10.4103/2153-3539.92027. PMID: 22811953 PMCID: Cengage Learning; 2015. PMC3312707 31. Ginsburg SB, Viswanath SE, Bloch BN, Rofsky NM, Genega EM, Lenkinski 14. Gisselsson D, Björk J, Höglund M, Mertens F, Dal Cin P, Åkerman M, RE, Madabhushi A. Novel PCA-VIP scheme for ranking MRI protocols Mandahl N. Abnormal nuclear shape in solid tumors reflects mitotic and identifying computer-extracted MRI measurements associated with instability. Am J Pathol. 2001 Jan;158(1):199–206. https://doi.org/10.1016/ central gland and peripheral zone prostate tumors. J Magn Reson S0002-9440(10)63958-2. Imaging JMRI. 2015;41(5):1383–1393. https://doi.org/10.1002/jmri.24676. PMID: 24943647. 15. Trepat X, Wasserman MR, Angelini TE, Millet E, Weitz DA, Butler JP, Fredberg JJ. Physical forces during collective cell migration. Nat Phys. 2009;5(6):426–30. 32. Peng H, Long F, Ding C. Feature selection based on mutual information: https://doi.org/10.1038/nphys1269. criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–1238. https://doi.org/10.1109/ 16. Lewis JS, Ali S, Luo J, Thorstad WL, Madabhushi A. A quantitative TPAMI.2005.159 PMID: 16119262. histomorphometric classifier (QuHbIC) identifies aggressive versus indolent p16-positive oropharyngeal squamous cell carcinoma. Am J Surg Pathol. 33. Ding C, Peng H. Minimum redundancy feature selection from microarray 2014;38(1):128–37. https://doi.org/10.1097/PAS.0000000000000086. PMID: gene expression data. J Bioinforma Comput Biol. 2005;03(02):185–205. 24145650 PMCID: PMC3865861 https://doi.org/10.1142/S0219720005001004. Whitney et al. BMC Cancer (2018) 18:610 Page 15 of 15 34. Strobl C, Malley J, Tutz G. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 2009 ;14(4):323–348. https:// doi.org/10.1037/a0016973 PMID: 19968396 PMCID: PMC2927982. 35. Demuth H, Beale M. Neural network toolbox for use with Matlab - User’s guide version. 1993. 36. Pelchmans K, Suykens J, Gestel T, Brabanter J, Lukaas L, Hamers B, Moor B, Vandewalle J. LS-SVMlab: a matlab/c toolbox for least squares support vector machines. 2002. 37. Izenman AJ. Linear discriminant analysis. In: Izenman AJ, editor. Mod Multivar stat tech Regres Classif manifold learn. New York, NY: Springer New York; 2008. p. 237–80. Available from: https://doi.org/10.1007/978-0-387- 78189-1_8. 38. JMS B, Bayani J, Marshall A, Dunn JA, Campbell A, Cunningham C, Sobol MS, Hall PS, Poole CJ, Cameron DA, Earl HM, Rea DW, Macpherson IR, Canney P, Francis A, McCabe C, Pinder SE, Hughes-Davies L, Makris A, Stein RC, on behalf of the OPTIMA TMG. Comparing breast Cancer multiparameter tests in the OPTIMA prelim trial: no test is more equal than the others. J Natl Cancer Inst. 2016;108(9):djw050. https://doi.org/10.1093/ jnci/djw050. 39. Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology. 1991 Nov;19(5):403–10. https:// doi.org/10.1111/j.1365-2559.1991.tb00229.x. 40. Bloom H, Richardson W. Histological grading and prognosis in breast Cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer. 1957:359–77. 41. Romo-Bucheli D, Janowczyk A, Romero E, Gilmore H, Madabhushi A. Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER+ breast cancer whole slide images. In: Gurcan MN, Madabhushi A, editors. 2016 [cited 2016 Aug 3]. p. 979106. Available from: http://proceedings.spiedigitallibrary.org/proceeding.aspx?doi=10.1117/12. 2211368 doi:https://doi.org/10.1117/12.2211368 42. Ongun G, Halici U, Leblebicioglu K, Atalay V, Beksac M, Beksac S. Feature extraction and classification of blood cells for an automated differential blood count system: IEEE; 2001 [cited 2017 Jan 10]. p. 2461–2466. Available from: http://ieeexplore.ieee.org/document/938753/. https://doi.org/10.1109/ IJCNN.2001.938753. 43. Liotta LA, Kleinerman J, Saidel GM. Quantitative relationships of intravascular tumor cells, tumor vessels, and pulmonary metastases following tumor implantation. Cancer Res. 1974 May 1;34(5):997. 44. Madabhushi A. Computerized histologic image based risk predictor (CHIRP): identifying disease aggressiveness using sub-visual image cues from image data. Microsc Microanal. 2016 Jul;22(S3):1006–7. https://doi.org/10.1017/ S1431927616005870. 45. Guillaud M, Adler-Storthz K, Malpica A, Staerkel G, Matisic J, Van Niekirk D, Cox D, Poulin N, Follen M, Macaulay C. Subvisual chromatin changes in cervical epithelium measured by texture image analysis and correlated with HPV. Gynecol Oncol. 2005;99(3 Suppl 1):S16–S23. doi:https://doi.org/10.1016/ j.ygyno.2005.07.037 PMID: 16188299. 46. Cronin M, Sangli C, Liu M-L, Pho M, Dutta D, Nguyen A, Jeong J, Wu J, Langone KC, Watson D. Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor–positive breast Cancer. Clin Chem. 2007;53(6):1084. https://doi.org/10.1373/clinchem.2006.076497. 47. Sparano JA, Paik S. Development of the 21-gene assay and its application in clinical practice and clinical trials. J Clin Oncol. 2008 Feb 10;26(5):721–8. https://doi.org/10.1200/JCO.2007.15.1068. 48. Muller D, Wolf C, Abecassis J, Millon R, Engelmann A, Bronner G, Rouyer N, Rio M-C, Eber M, Methlin G. Increased stromelysin 3 gene expression is associated with increased local invasiveness in head and neck squamous cell carcinomas. Cancer Res. 1993;53:165–9. 49. Turner BM, Skinner KA, Tang P, Jackson MC, Soukiazian N, Shayne M, Huston A, Ling M, Hicks DG. Use of modified Magee equations and histologic criteria to predict the Oncotype DX recurrence score. Mod Pathol. 2015 Jul; 28(7):921–31. https://doi.org/10.1038/modpathol.2015.50. 50. Győrffy B, Karn T, Sztupinszki Z, Weltz B, Müller V, Pusztai L. Dynamic classification using case-specific training cohorts outperforms static gene expression signatures in breast cancer. Int J Cancer. 2015;136(9):2091–2098. https://doi.org/10.1002/ijc.29247 PMID: 25274406 PMCID: PMC4354298.

Journal

BMC CancerSpringer Journals

Published: May 30, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off