Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer

Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and... Downloaded from https://academic.oup.com/humrep/article-abstract/35/2/482/5717668 by guest on 03 March 2020 Human Reproduction, Vol.35, No.2, pp. 482–483, 2020 Advance Access Publication on February 13, 2020 doi:10.1093/humrep/dez263 LETTER TO THE EDITOR . understanding of the distribution of the classification results on the Deep learning as a predictive tool available and highly unbalanced data. for fetal heart pregnancy following time-lapse incubation and Conflict of interest blastocyst transfer No competing interest. Sir, References The work presented by Tran et al. describes the results of a deep Adams NM, Hand DJ. Comparing classifiers when the misallocation neural network model for predicting fetal heart rate (FH), trained from costs are uncertain. Pattern Recognit 1999;32:1139–1147. time-lapsed extracted sequence of images from eight IVF laboratories . Berrar D, Flach P. Caveats and pitfalls of ROC analysis in clini- using EmbryoScope (Tran et al., 2019). cal microarray research (and how to avoid them). Brief Bioinform The authors’ work is very interesting since it presents a novel . 2012;13:83–97. approach to fully automate the analysis of time-lapsed embryo images Drummond C, Holte RC. Cost curves: an improved method for while helping in making the decisions regarding the selection of blas- visualizing classifier performance. Mach Learn 2006;65:95–130. tocysts to transfer in the hope to improving the success rate in clinics. Elazmeh W, Japkowicz N, Matwin S. Evaluating misclassifications in However, the limited information available in the manuscript hampers . imbalanced data. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) the possibility of fully understanding the real impact of the contribu- . Berlin, Heidelberg: Springer, 2006, 126–137. tion, the implications of the proposed tool and its clinical relevance. Jeni LA, Cohn JF, De La Torre F. Facing imbalanced data– In particular, we identify two major concerns related to the deeply recommendations for the use of performance metrics. In: 2013 unbalanced data (8142 negative samples vs 694 positive samples) used Humaine Association Conference on Affective Computing and Intelligent in the generation and validation of the deep learning models for which . Interaction. Geneva, Switzerland: IEEE, 2013, 245–251. we would like to make some recommendations. Lobo JM, Jiménez-Valverde A, Real R. AUC: a misleading measure of The first concern is related to the lack of detailed information regard- . the performance of predictive distribution models. Glob Ecol Biogeogr ing the number of positive and negative cases provided by each clinic 2008;17:145–151. that was used for training each of the models in the ‘eight laboratories Ng AY. Preventing “overfitting” of cross-validation data. In: ICML ‘97 hold-out validation’. Similarly, the number of positive and negative Proceedings of the Fourteenth International Conference on Machine samples employed in the training and testing sets for the general model . Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc., and each of the 5-fold cross validation models. This is important since 1997,245–253 it would let the reader identify the amount of unbalance between . Saito T, Rehmsmeier M. The precision-recall plot is more informative the training and testing sets and therefore have a better idea of the than the ROC plot when evaluating binary classifiers on imbalanced predictive power of the proposed model for correctly identifying those datasets. Brock G (ed). PLoS One 2015;10:e0118432. blastocysts with the potential of developing FH and accurately detect Tran D, Cooke S, Illingworth PJ, Gardner DK. Deep learning as a pre- those that do not have a good prognosis. To clarify this point to . dictive tool for fetal heart pregnancy following time-lapse incubation the reader, we recommend the authors to use confusion matrices to . and blastocyst transfer. Hum Reprod 2019;34:1011–1018. report the results of each experiment since those data structures allow to easily identify the number of true positives, false positives, true . 1,2, 3 1 negatives and false negatives produced by each model. * A. Chavez-Badiola , G. Mendizabal-Ruiz , A. Flores-Saiffe Farias , 1 4 The second issue is related to the use of the area under the R. Garcia-Sanchez , and Andrew J. Drakeley curve (AUC) of the receiver operating characteristic curve (ROC) Computational Biology, New Hope Fertility Center Mexico, for reporting the performance of the trained models on the three . Guadalajara, Mexico . 2 validation schemes. The ROC and AUC have been considered to be Research and Development, Darwin Technologies Ltd, Liverpool, UK misleading of model performance (Adams and Hand, 1999; Lobo et al., Departamento de Ciencias Computacionales, Universidad de 2008) or even inadequate for machine learning research (Drummond Guadalajara, Guadalajara, Mexico and Holte, 2006), in particular in the case of unbalanced datasets . Hewitt Centre for Reproductive Medicine, Liverpool Women’s Hospital, (Berrar and Flach, 2012; Saito and Rehmsmeier, 2015) and when Liverpool, UK the number of positive examples is limited (Elazmeh et al., 2006) . *Correspondence address. New Hope Fertility Mexico, Av. Prado since it is possible that ROC may mask poor performance (Ng, 1997; Norte 135, Lomas de Chapultepec, Miguel Hidalgo, CP 11000, Jeni et al., 2013). Therefore, we recommend that the authors report . Mexico City, Mexico. E-mail: drchavez-badiola@nhfc.mx the performance of their models using confusion matrices, F1 score and precision-recall curves since these will offer the reader a better Advance Access Publication on February 13, 2020 © The Author(s) 2020. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permission@oup.com. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Human Reproduction Oxford University Press

Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer

Loading next page...
 
/lp/oxford-university-press/deep-learning-as-a-predictive-tool-for-fetal-heart-pregnancy-following-yOv0WFe4Hs
Publisher
Oxford University Press
Copyright
© The Author(s) 2020. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
ISSN
0268-1161
eISSN
1460-2350
DOI
10.1093/humrep/dez263
Publisher site
See Article on Publisher Site

Abstract

Downloaded from https://academic.oup.com/humrep/article-abstract/35/2/482/5717668 by guest on 03 March 2020 Human Reproduction, Vol.35, No.2, pp. 482–483, 2020 Advance Access Publication on February 13, 2020 doi:10.1093/humrep/dez263 LETTER TO THE EDITOR . understanding of the distribution of the classification results on the Deep learning as a predictive tool available and highly unbalanced data. for fetal heart pregnancy following time-lapse incubation and Conflict of interest blastocyst transfer No competing interest. Sir, References The work presented by Tran et al. describes the results of a deep Adams NM, Hand DJ. Comparing classifiers when the misallocation neural network model for predicting fetal heart rate (FH), trained from costs are uncertain. Pattern Recognit 1999;32:1139–1147. time-lapsed extracted sequence of images from eight IVF laboratories . Berrar D, Flach P. Caveats and pitfalls of ROC analysis in clini- using EmbryoScope (Tran et al., 2019). cal microarray research (and how to avoid them). Brief Bioinform The authors’ work is very interesting since it presents a novel . 2012;13:83–97. approach to fully automate the analysis of time-lapsed embryo images Drummond C, Holte RC. Cost curves: an improved method for while helping in making the decisions regarding the selection of blas- visualizing classifier performance. Mach Learn 2006;65:95–130. tocysts to transfer in the hope to improving the success rate in clinics. Elazmeh W, Japkowicz N, Matwin S. Evaluating misclassifications in However, the limited information available in the manuscript hampers . imbalanced data. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) the possibility of fully understanding the real impact of the contribu- . Berlin, Heidelberg: Springer, 2006, 126–137. tion, the implications of the proposed tool and its clinical relevance. Jeni LA, Cohn JF, De La Torre F. Facing imbalanced data– In particular, we identify two major concerns related to the deeply recommendations for the use of performance metrics. In: 2013 unbalanced data (8142 negative samples vs 694 positive samples) used Humaine Association Conference on Affective Computing and Intelligent in the generation and validation of the deep learning models for which . Interaction. Geneva, Switzerland: IEEE, 2013, 245–251. we would like to make some recommendations. Lobo JM, Jiménez-Valverde A, Real R. AUC: a misleading measure of The first concern is related to the lack of detailed information regard- . the performance of predictive distribution models. Glob Ecol Biogeogr ing the number of positive and negative cases provided by each clinic 2008;17:145–151. that was used for training each of the models in the ‘eight laboratories Ng AY. Preventing “overfitting” of cross-validation data. In: ICML ‘97 hold-out validation’. Similarly, the number of positive and negative Proceedings of the Fourteenth International Conference on Machine samples employed in the training and testing sets for the general model . Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc., and each of the 5-fold cross validation models. This is important since 1997,245–253 it would let the reader identify the amount of unbalance between . Saito T, Rehmsmeier M. The precision-recall plot is more informative the training and testing sets and therefore have a better idea of the than the ROC plot when evaluating binary classifiers on imbalanced predictive power of the proposed model for correctly identifying those datasets. Brock G (ed). PLoS One 2015;10:e0118432. blastocysts with the potential of developing FH and accurately detect Tran D, Cooke S, Illingworth PJ, Gardner DK. Deep learning as a pre- those that do not have a good prognosis. To clarify this point to . dictive tool for fetal heart pregnancy following time-lapse incubation the reader, we recommend the authors to use confusion matrices to . and blastocyst transfer. Hum Reprod 2019;34:1011–1018. report the results of each experiment since those data structures allow to easily identify the number of true positives, false positives, true . 1,2, 3 1 negatives and false negatives produced by each model. * A. Chavez-Badiola , G. Mendizabal-Ruiz , A. Flores-Saiffe Farias , 1 4 The second issue is related to the use of the area under the R. Garcia-Sanchez , and Andrew J. Drakeley curve (AUC) of the receiver operating characteristic curve (ROC) Computational Biology, New Hope Fertility Center Mexico, for reporting the performance of the trained models on the three . Guadalajara, Mexico . 2 validation schemes. The ROC and AUC have been considered to be Research and Development, Darwin Technologies Ltd, Liverpool, UK misleading of model performance (Adams and Hand, 1999; Lobo et al., Departamento de Ciencias Computacionales, Universidad de 2008) or even inadequate for machine learning research (Drummond Guadalajara, Guadalajara, Mexico and Holte, 2006), in particular in the case of unbalanced datasets . Hewitt Centre for Reproductive Medicine, Liverpool Women’s Hospital, (Berrar and Flach, 2012; Saito and Rehmsmeier, 2015) and when Liverpool, UK the number of positive examples is limited (Elazmeh et al., 2006) . *Correspondence address. New Hope Fertility Mexico, Av. Prado since it is possible that ROC may mask poor performance (Ng, 1997; Norte 135, Lomas de Chapultepec, Miguel Hidalgo, CP 11000, Jeni et al., 2013). Therefore, we recommend that the authors report . Mexico City, Mexico. E-mail: drchavez-badiola@nhfc.mx the performance of their models using confusion matrices, F1 score and precision-recall curves since these will offer the reader a better Advance Access Publication on February 13, 2020 © The Author(s) 2020. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

Journal

Human ReproductionOxford University Press

Published: Feb 29, 2020

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off