Downloaded from https://academic.oup.com/humrep/article-abstract/35/2/482/5717668 by guest on 03 March 2020 Human Reproduction, Vol.35, No.2, pp. 482–483, 2020 Advance Access Publication on February 13, 2020 doi:10.1093/humrep/dez263 LETTER TO THE EDITOR . understanding of the distribution of the classiﬁcation results on the Deep learning as a predictive tool available and highly unbalanced data. for fetal heart pregnancy following time-lapse incubation and Conflict of interest blastocyst transfer No competing interest. Sir, References The work presented by Tran et al. describes the results of a deep Adams NM, Hand DJ. Comparing classiﬁers when the misallocation neural network model for predicting fetal heart rate (FH), trained from costs are uncertain. Pattern Recognit 1999;32:1139–1147. time-lapsed extracted sequence of images from eight IVF laboratories . Berrar D, Flach P. Caveats and pitfalls of ROC analysis in clini- using EmbryoScope (Tran et al., 2019). cal microarray research (and how to avoid them). Brief Bioinform The authors’ work is very interesting since it presents a novel . 2012;13:83–97. approach to fully automate the analysis of time-lapsed embryo images Drummond C, Holte RC. Cost curves: an improved method for while helping in making the decisions regarding the selection of blas- visualizing classiﬁer performance. Mach Learn 2006;65:95–130. tocysts to transfer in the hope to improving the success rate in clinics. Elazmeh W, Japkowicz N, Matwin S. Evaluating misclassiﬁcations in However, the limited information available in the manuscript hampers . imbalanced data. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) the possibility of fully understanding the real impact of the contribu- . Berlin, Heidelberg: Springer, 2006, 126–137. tion, the implications of the proposed tool and its clinical relevance. Jeni LA, Cohn JF, De La Torre F. Facing imbalanced data– In particular, we identify two major concerns related to the deeply recommendations for the use of performance metrics. In: 2013 unbalanced data (8142 negative samples vs 694 positive samples) used Humaine Association Conference on Affective Computing and Intelligent in the generation and validation of the deep learning models for which . Interaction. Geneva, Switzerland: IEEE, 2013, 245–251. we would like to make some recommendations. Lobo JM, Jiménez-Valverde A, Real R. AUC: a misleading measure of The ﬁrst concern is related to the lack of detailed information regard- . the performance of predictive distribution models. Glob Ecol Biogeogr ing the number of positive and negative cases provided by each clinic 2008;17:145–151. that was used for training each of the models in the ‘eight laboratories Ng AY. Preventing “overﬁtting” of cross-validation data. In: ICML ‘97 hold-out validation’. Similarly, the number of positive and negative Proceedings of the Fourteenth International Conference on Machine samples employed in the training and testing sets for the general model . Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc., and each of the 5-fold cross validation models. This is important since 1997,245–253 it would let the reader identify the amount of unbalance between . Saito T, Rehmsmeier M. The precision-recall plot is more informative the training and testing sets and therefore have a better idea of the than the ROC plot when evaluating binary classiﬁers on imbalanced predictive power of the proposed model for correctly identifying those datasets. Brock G (ed). PLoS One 2015;10:e0118432. blastocysts with the potential of developing FH and accurately detect Tran D, Cooke S, Illingworth PJ, Gardner DK. Deep learning as a pre- those that do not have a good prognosis. To clarify this point to . dictive tool for fetal heart pregnancy following time-lapse incubation the reader, we recommend the authors to use confusion matrices to . and blastocyst transfer. Hum Reprod 2019;34:1011–1018. report the results of each experiment since those data structures allow to easily identify the number of true positives, false positives, true . 1,2, 3 1 negatives and false negatives produced by each model. * A. Chavez-Badiola , G. Mendizabal-Ruiz , A. Flores-Saiffe Farias , 1 4 The second issue is related to the use of the area under the R. Garcia-Sanchez , and Andrew J. Drakeley curve (AUC) of the receiver operating characteristic curve (ROC) Computational Biology, New Hope Fertility Center Mexico, for reporting the performance of the trained models on the three . Guadalajara, Mexico . 2 validation schemes. The ROC and AUC have been considered to be Research and Development, Darwin Technologies Ltd, Liverpool, UK misleading of model performance (Adams and Hand, 1999; Lobo et al., Departamento de Ciencias Computacionales, Universidad de 2008) or even inadequate for machine learning research (Drummond Guadalajara, Guadalajara, Mexico and Holte, 2006), in particular in the case of unbalanced datasets . Hewitt Centre for Reproductive Medicine, Liverpool Women’s Hospital, (Berrar and Flach, 2012; Saito and Rehmsmeier, 2015) and when Liverpool, UK the number of positive examples is limited (Elazmeh et al., 2006) . *Correspondence address. New Hope Fertility Mexico, Av. Prado since it is possible that ROC may mask poor performance (Ng, 1997; Norte 135, Lomas de Chapultepec, Miguel Hidalgo, CP 11000, Jeni et al., 2013). Therefore, we recommend that the authors report . Mexico City, Mexico. E-mail: firstname.lastname@example.org the performance of their models using confusion matrices, F1 score and precision-recall curves since these will offer the reader a better Advance Access Publication on February 13, 2020 © The Author(s) 2020. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: email@example.com.
Human Reproduction – Oxford University Press
Published: Feb 29, 2020
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera