Access the full text.
Sign up today, get DeepDyve free for 14 days.
M. Sperrin, David Jenkins, G. Martin, N. Peek (2019)
Explicit causal reasoning is needed to prevent prognostic models being victims of their own successJournal of the American Medical Informatics Association : JAMIA, 26
Harrell (1996)
361Statist Med, 15
J. Sterne, I. White, J. Carlin, Michael Spratt, P. Royston, M. Kenward, A. Wood, J. Carpenter (2009)
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfallsThe BMJ, 338
H. Houwelingen (2000)
Validation, calibration, revision and combination of prognostic survival models.Statistics in medicine, 19 24
A. Fagot-Largeault (1992)
Human fertilisation and Embryology AuthorityThe Lancet, 340
S. Nelson, D. Lawlor (2011)
Predicting Live Birth, Preterm Delivery, and Low Birth Weight in Infants Born from In Vitro Fertilisation: A Prospective Study of 144,018 Treatment CyclesPLoS Medicine, 8
B. Luke, Morton Brown, E. Wantman, J. Stern, V. Baker, E. Widra, C. Coddington, William Gibbons, G. Ball (2014)
A prediction model for live birth and multiple births within the first three cycles of assisted reproductive technology.Fertility and sterility, 102 3
(1996)
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errorsStatist Med, 15
(2019)
Stata Statistical Software: Release 16
Gary Collins, Johannes Reitsma, Douglas Altman, K. Moons (2015)
Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD StatementDiabetic Medicine, 32
W. Bouwmeester, N. Zuithoff, S. Mallett, M. Geerlings, Y. Vergouwe, E. Steyerberg, D. Altman, K. Moons (2012)
Reporting and Methods in Clinical Prediction Research: A Systematic ReviewPLoS Medicine, 9
(2015)
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statementBr J Surg, 102
FE Harrell, KL Lee, DB. Mark (1996)
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errorsStatist Med, 15
A. Zarinara, H. Zeraati, K. Kamali, K. Mohammad, Parisa Shahnazari, M. Akhondi (2016)
Models Predicting Success of Infertility Treatment: A Systematic ReviewJournal of Reproduction & Infertility, 17
D. McLernon, E. Raja, J. Toner, V. Baker, K. Doody, D. Seifer, A. Sparks, E. Wantman, P. Lin, S. Bhattacharya, B. Voorhis (2021)
Predicting personalized cumulative live birth following in vitro fertilization.Fertility and sterility
A. Lynam, J. Dennis, K. Owen, R. Oram, A. Jones, B. Shields, Lauric Ferrat (2020)
Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adultsDiagnostic and Prognostic Research, 4
(2014)
Prediction models in in vitro fertilization; where are we? A mini reviewJ Advanced Res, 5
D. Cox (1958)
Two further applications of a model for binary regressionBiometrika, 45
GS Collins, JB Reitsma, DG Altman, KG. Moons (2015)
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statementBr J Surg, 102
(2020)
Fertility Treatment 2018: Trends and Figures
S. Coppus, F. Veen, B. Opmeer, B. Mol, P. Bossuyt (2009)
Evaluating prediction models in reproductive medicine.Human reproduction, 24 8
Collins (2015)
148Br J Surg, 102
Constanza Navarro, J. Damen, T. Takada, Steven Nijman, P. Dhiman, Jie Ma, G. Collins, R. Bajpai, R. Riley, K. Moons, L. Hooft (2021)
Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic reviewThe BMJ, 375
P. Arvis, Philippe Lehert, Philippe Lehert, A. guivarc'h-Levêque (2012)
Simple adaptations to the Templeton model for IVF outcome prediction make it current and clinically useful.Human reproduction, 27 10
O. Ishihara, R. Araki, A. Kuwahara, A. Itakura, H. Saito, G. Adamson (2014)
Impact of frozen-thawed single-blastocyst transfer on maternal and neonatal outcome: an analysis of 277,042 single-embryo transfer cycles from 2008 to 2010 in Japan.Fertility and sterility, 101 1
Gary Collins, Johannes Reitsma, Douglas Altman, K. Moons (2015)
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD StatementBMC Medicine, 13
T. Debray, J. Damen, K. Snell, J. Ensor, L. Hooft, J. Reitsma, R. Riley, K. Moons (2017)
A guide to systematic review and meta-analysis of prediction model performanceBritish Medical Journal, 356
E. Steyerberg, G. Borsboom, H. Houwelingen, M. Eijkemans, J. Habbema (2004)
Validation and updating of predictive logistic regression models: a study on sample size and shrinkageStatistics in Medicine, 23
S. Greenland, W. Finkle (1995)
A critical look at methods for handling missing covariates in epidemiologic regression analyses.American journal of epidemiology, 142 12
RStudio: Integrated Development Environment for
(2020)
Fertility Treatment 2018: Trends and Figures
B. calster, E. Steyerberg, L. Wynants, M. Smeden (2023)
There is no such thing as a validated prediction modelBMC Medicine, 21
F Harrell, Kerry Lee, D. Mark (2005)
Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors
M. Brandes, J.O.M. Steen, S. Bokdam, C. Hamilton, J. Bruin, W. Nelen, J. Kremer (2009)
When and why do subfertile couples discontinue their fertility care? A longitudinal cohort study in a secondary care subfertility population.Human reproduction, 24 12
D. McLernon, E. Steyerberg, E. Velde, Amanda Lee, S. Bhattacharya (2018)
An improvement in the method used to assess discriminatory ability when predicting the chances of a live birth after one or more complete cycles of in vitro fertilisationBritish Medical Journal, 362
Jori Leijdekkers, M. Eijkemans, T. Tilborg, S. Oudshoorn, D. McLernon, S. Bhattacharya, B. Mol, F. Broekmans, H. Torrance (2018)
Predicting the cumulative chance of live birth over multiple complete cycles of in vitro fertilization: an external validation studyHuman Reproduction, 33
(2022)
Should we adopt a prognosis-based approach to unexplained infertility?Hum Reprod Open, 4
(2018)
Fertility Treatment 2014–2016: Trends and Figures
Zhiyang Chen, Duoduo Zhang, J. Zhen, Zhengyi Sun, Qi Yu (2021)
Predicting cumulative live birth rate for patients undergoing in vitro fertilization (IVF)/intracytoplasmic sperm injection (ICSI) for tubal and male infertility: a machine learning approach using XGBoostChinese Medical Journal, 135
L. Loendersloot, S. Repping, P. Bossuyt, F. Veen, M. Wely (2013)
Prediction models in in vitro fertilization; where are we? A mini reviewJournal of Advanced Research, 5
N. Cook (2008)
Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve.Clinical chemistry, 54 1
A. Maheshwari, D. McLernon, S. Bhattacharya (2015)
Cumulative live birth rate: time for a consensus?Human reproduction, 30 12
L. Shingshetty, A. Maheshwari, D. McLernon, S. Bhattacharya (2022)
Should we adopt a prognosis-based approach to unexplained infertility?Human Reproduction Open, 2022
(2023)
A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2021
(2009)
Validating IVF models predicting cumulative live birth |
K. Moons, A. Kengne, D. Grobbee, P. Royston, Y. Vergouwe, D. Altman, M. Woodward (2012)
Risk prediction models: II. External validation, model updating, and impact assessmentHeart, 98
D. McLernon, E. Steyerberg, E. Velde, Amanda Lee, S. Bhattacharya (2016)
Predicting the chances of a live birth after one or more complete cycles of in vitro fertilisation: population based study of linked cycle data from 113 873 womenThe BMJ, 355
E. Leushuis, J. Steeg, P. Steures, P. Bossuyt, M. Eijkemans, F. Veen, B. Mol, P. Hompes (2009)
Prediction models in reproductive medicine: a critical appraisal.Human reproduction update, 15 5
(2019)
Statistical Software: Release 16
B. Liew, F. Kovacs, David Rügamer, A. Royuela (2022)
Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck painEuropean Spine Journal, 31
A. Templeton, J. Morris, William Parslow (1996)
Factors that affect outcome of in-vitro fertilisation treatmentThe Lancet, 348
M. Riegler, M. Stensen, O. Witczak, J. Andersen, Steven Hicks, H. Hammer, E. Delbarre, P. Halvorsen, A. Yazidi, N. Holst, T. Haugen (2021)
Artificial intelligence in the fertility clinic: status, pitfalls and possibilities.Human reproduction
David Jenkins, M. Sperrin, G. Martin, N. Peek (2018)
Dynamic models to predict health outcomes: current status and methodological challengesDiagnostic and Prognostic Research, 2
R. Team (2014)
R: A language and environment for statistical computing.MSOR connections, 1
M. Ratna, Sohinee Bhattacharya, B. Abdulrahim, D. McLernon (2020)
A systematic review of the quality of clinical prediction models in in vitro fertilisation.Human reproduction
(2018)
Dynamic models to predict health outcomes: current status and methodological challengesDiag Prognost Res, 2
Michael Miller, C. Langefeld, W. Tierney, S. Hui, C. McDonald (1993)
Validation of Probabilistic PredictionsMedical Decision Making, 13
H. Lavretsky, M. Sajatovic, C. Reynolds (2013)
Clinical Prediction Models
K. Janssen, K. Moons, C. Kalkman, D. Grobbee, Y. Vergouwe (2008)
Updating methods improved the performance of a clinical prediction model in new patients.Journal of clinical epidemiology, 61 1
K. Wong, S. Mastenbroek, S. Repping (2014)
Cryopreservation of human embryos and its contribution to in vitro fertilization success rates.Fertility and sterility, 102 1
D. McLernon, S. Bhattacharya (2022)
Quality of clinical prediction models in in vitro fertilisation: Which covariates are really important to predict cumulative live birth and which models are best?Best practice & research. Clinical obstetrics & gynaecology
C. Olivius, B. Fridén, Gunilla Borg, C. Bergh (2004)
Why do couples discontinue in vitro fertilization treatment? A cohort study.Fertility and sterility, 81 2
Human Reproduction, 2023, 38(10), 1998–2010 https://doi.org/10.1093/humrep/dead165 Advance Access Publication Date: August 25, 2023 Original Article Reproductive epidemiology External validation of models for predicting cumulative live birth over multiple complete cycles of IVF treatment 1,2 3 1, Mariam B. Ratna , Siladitya Bhattacharya , and David J. McLernon * Institute of Applied Health Sciences, School of Medicine, Medical Sciences & Nutrition, University of Aberdeen, Aberdeen, UK Clinical Trials Unit, Warwick Medical School, University of Warwick, Warwick, UK School of Medicine, Medical Sciences & Nutrition, University of Aberdeen, Aberdeen, UK *Correspondence address. Medical Statistics Team, Institute of Applied Health Sciences, School of Medicine, Medical Sciences & Nutrition, Polwarth Building, Foresterhill, Cornhill Road, University of Aberdeen, Aberdeen AB25 2ZD, UK. E-mail: [email protected] https://orcid.org/0000-0001-8905-2429 ABSTRACT STUDY QUESTION: Can two prediction models developed using data from 1999 to 2009 accurately predict the cumulative probability of live birth per woman over multiple complete cycles of IVF in an updated UK cohort? SUMMARY ANSWER: After being updated, the models were able to estimate individualized chances of cumulative live birth over multiple complete cycles of IVF with greater accuracy. WHAT IS KNOWN ALREADY: The McLernon models were the first to predict cumulative live birth over multiple complete cycles of IVF. They were converted into an online calculator called OPIS (Outcome Prediction In Subfertility) which has 3000 users per month on average. A previous study externally validated the McLernon models using a Dutch prospective cohort containing data from 2011 to 2014. With changes in IVF practice over time, it is important that the McLernon models are externally validated on a more recent cohort of patients to ensure that predictions remain accurate. STUDY DESIGN, SIZE, DURATION: A population-based cohort of 91 035 women undergoing IVF in the UK between January 2010 and December 2016 was used for external validation. Data on frozen embryo transfers associated with these complete IVF cycles con- ducted from 1 January 2017 to 31 December 2017 were also collected. PARTICIPANTS/MATERIALS, SETTING, METHODS: Data on IVF treatments were obtained from the Human Fertilisation and Embryology Authority (HFEA). The predictive performances of the McLernon models were evaluated in terms of discrimination and calibration. Discrimination was assessed using the c-statistic and calibration was assessed using calibration-in-the-large, calibration slope, and calibration plots. Where any model demonstrated poor calibration in the validation cohort, the models were updated us- ing intercept recalibration, logistic recalibration, or model revision to improve model performance. MAIN RESULTS AND THE ROLE OF CHANCE: Following exclusions, 91 035 women who underwent 144 734 complete cycles were in- cluded. The validation cohort had a similar distribution age profile to women in the development cohort. Live birth rates over all complete cycles of IVF per woman were higher in the validation cohort. After calibration assessment, both models required updating. The coefficients of the pre-treatment model were revised, and the updated model showed reasonable discrimination (c-statistic: 0.67, 95% CI: 0.66 to 0.68). After logistic recalibration, the post-treatment model showed good discrimination (c-statistic: 0.75, 95% CI: 0.74 to 0.76). As an example, in the updated pre-treatment model, a 32-year-old woman with 2 years of primary infertility has a 42% chance of having a live birth in the first complete ICSI cycle and a 77% chance over three complete cycles. In a couple with 2 years of primary male factor infertility where a 30-year-old woman has 15 oocytes collected in the first cycle, a single fresh blastocyst embryo transferred in the first cycle and spare embryos cryopreserved, the estimated chance of live birth provided by the post-treatment model is 46% in the first complete ICSI cycle and 81% over three complete cycles. LIMITATIONS, REASONS FOR CAUTION: Two predictors from the original models, duration of infertility and previous pregnancy, which were not available in the recent HFEA dataset, were imputed using data from the older cohort used to develop the models. The HFEA dataset does not contain some other potentially important predictors, e.g. BMI, ethnicity, race, smoking and alcohol intake in women, as well as measures of ovarian reserve such as antral follicle count. WIDER IMPLICATIONS OF THE FINDINGS: Both updated models show improved predictive ability and provide estimates which are more reflective of current practice and patient case mix. The updated OPIS tool can be used by clinicians to help shape couples’ expectations by informing them of their individualized chances of live birth over a sequence of multiple complete cycles of IVF. STUDY FUNDING/COMPETING INTEREST(S): This study was supported by an Elphinstone scholarship scheme at the University of Aberdeen and Aberdeen Fertility Centre, University of Aberdeen. S.B. has a commitment of research funding from Merck. D.J.M. and M.B.R. declare support for the present manuscript from Elphinstone scholarship scheme at the University of Aberdeen and Assisted Reproduction Unit at Aberdeen Fertility Centre, University of Aberdeen. D.J.M. declares grants received by University of Aberdeen from NHS Grampian, The Meikle Foundation, and Chief Scientist Office in the past 3 years. D.J.M. declares receiving an honorarium Received: October 3, 2022. Revised: July 28, 2023. Editorial decision: August 4, 2023. V C The Author(s) 2023. Published by Oxford University Press on behalf of European Society of Human Reproduction and Embryology. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Validating IVF models predicting cumulative live birth | 1999 for lectures from Merck. D.J.M. is Associate Editor of Human Reproduction Open and Statistical Advisor for Reproductive BioMed Online. S.B. declares royalties from Cambridge University Press for a book. S.B. declares receiving an honorarium for lectures from Merck, Organon, Ferring, Obstetric and Gynaecological Society of Singapore, and Taiwanese Society for Reproductive Medicine. S.B. has re- ceived support from Merck, ESHRE, and Ferring for attending meetings as speaker and is on the METAFOR and CAPRE Trials Data Monitoring Committee. TRIAL REGISTRATION NUMBER: N/A. Keywords: IVF / live birth / clinical prediction model / validation / calibration Introduction increasing use of frozen-thawed embryo transfers (Human Fertilisation and Embryology Authority, 2018; Ishihara et al., A recent systematic review identified over 30 clinical prediction 2014). Therefore, it is important that the McLernon models are models which estimate individualized chances of pregnancy out- externally validated on a more up-to-date cohort of patients to comes following IVF treatment (Ratna et al., 2020). These models ensure that the predictions are still accurate. Therefore, the aim can help clinicians communicate chances of treatment success of the study is to conduct a temporal external validation of the to couples undergoing IVF, but their use in clinical practice has McLernon models in order to demonstrate the continued general- been limited. The quality of these models is impacted by issues izability of these models to the current UK IVF population. such as small sample sizes, lack of external validation and failure to demonstrate clinical impact (Leushuis et al., 2009; Van Loendersloot et al., 2014; Ratna et al., 2020). Materials and methods Five IVF prediction model studies have been conducted using Data sources large national databases (Templeton et al., 1996; Nelson and To perform external validation of the McLernon models, this Lawlor, 2011; Luke et al., 2014; McLernon et al., 2016; McLernon study used the HFEA database which links all fresh and frozen et al., 2021). Of these, three utilized data from the Human IVF treatment cycles to individual women. Database access was Fertilisation and Embryology Authority (HFEA) registry in the UK granted following approval by the North of Scotland Research to estimate the chances of a live birth after IVF (Templeton et al., Ethics Committee, the Confidentiality Advisory Group, and the 1996; Nelson and Lawlor, 2011; McLernon et al., 2016). Two of HFEA register research panel. The data were anonymized and these articles published models that predict cumulative live birth transferred to the University of Aberdeen where they were stored over complete cycles of IVF, where a complete cycle is defined as on the Data Safe Haven (DaSH) server for analysis. all fresh and frozen-thawed embryo transfers associated with one episode of ovarian stimulation (McLernon et al., 2016; Study population McLernon et al., 2021). With the increasing use of frozen-thawed Information was collected from 91 035 women who started their embryos in IVF (Wong et al., 2014), cumulative live birth rate first ovarian stimulation in the UK between January 2010 and (LBR) over multiple complete cycles is a more clinically relevant December 2016. The records of all complete IVF cycles which be- outcome than the chance of live birth following a single embryo gan during this period were extracted. Data on frozen embryo transfer (Maheshwari et al., 2015) and clinical prediction models transfers associated with these between 1 January 2017 and 31 need to make sure that they address this need (McLernon and December 2017 were also collected. No data recorded after the 31 Bhattacharya, 2023). December 2017 were extracted. This data selection method en- Two UK models by McLernon et al. were developed to predict sured a minimum of 1-year exposure to all embryo transfer the chances of cumulative live birth over multiple complete attempts within a complete cycle. Women whose treatment in- cycles of IVF: a pre-treatment model which predicts cumulative volved donor insemination, egg donation and/or surrogacy were live birth in women before the first complete cycle commences; excluded. and a post-treatment model which updates predictions of cumu- lative live birth after the first fresh embryo transfer episode Baseline characteristics (McLernon et al., 2016). The models were converted into an online For this validation study, the same baseline characteristics that prediction tool called OPIS (Outcome Prediction In Subfertility) were used in the original McLernon pre- and post-treatment (https://w3.abdn.ac.uk/clsm/opis/) and used by 3000 patients and models were selected from the new dataset (with the exception clinicians on average each month. The models which were devel- of duration of infertility and pregnancy history which are dis- oped using data from IVF treatments conducted from 1999 to cussed in the missing data section). The McLernon pre-treatment 2008 showed good predictive performance in the development model predicts the probability of a live birth over six complete dataset but have not been validated in the UK since. External val- cycles at the start of a first complete cycle. Predictions are based idation in an independent cohort is essential as it supports the on couple characteristics and the type of treatment (IVF or ICSI) generalizability of the model (Harrell et al., 1996; Steyerberg, to be used. The included predictors are female age (years), dura- 2019). Using prospectively collected Dutch data between 2011 tion of infertility (years), causes of infertility (tubal, male factor, and 2014, a study externally validated the performance of the anovulation, or unexplained), pregnancy history (yes or no), type McLernon et al. up to three complete cycles (Leijdekkers et al., of treatment (IVF or ICSI), and treatment year. 2018). The findings revealed that the pre-treatment model sys- After the first fresh embryo transfer, the McLernon post- tematically overestimated the probability of cumulative live birth treatment model revises the predictions using additional in the external cohort but provided more accurate predictions af- treatment-specific data from this cycle. The added predictors are ter recalibration, whilst the post-treatment model calibrated well number of eggs collected, availability of cryopreserved embryos, in the external cohort. number of embryos transferred (one, two, or three), and stage of IVF practice in the UK has since undergone major changes, transferred embryos, i.e. blastocyst (Day 5 or 6) or cleavage stage with greater emphasis on elective single embryo transfer and (Day 2 or 3). For the validation of the post-treatment model, 2000 | Ratna et al. women who had no eggs collected were excluded as it is impossi- results were pooled using the metamisc package in R version ble for them to achieve a live birth in the first complete cycle. 4.1.1 (Debray et al., 2017). Calibration plots and predicted curves The number of complete cycles was included in both models for hypothetical couples were generated using the first imputed as a discrete time variable to predict the probability of a live birth dataset. Supplementary data file S4 gives a detailed description in the ith cycle, assuming no live birth occurred in the previous of all calibration techniques used in the study. cycle(s). The formulae for calculating the cumulative predicted Updating the model probability of a live birth over six complete cycles can be seen in Supplementary data files S1 and S2 (McLernon et al., 2016). Where any model demonstrated poor calibration in the valida- tion cohort, the models were updated using the following three Statistical analysis methods to try to improve performance (Steyerberg et al., 2004; Missing data: multiple imputation Janssen et al., 2008; Moons et al.,2012): Data on the duration of infertility were missing for 97% of Update intercept (Method 1): adjustment of the intercept us- women, and pregnancy history was entirely missing. This is be- ing the calibration intercept; cause the HFEA stopped collecting this information since 2008 Logistic recalibration (Method 2): adjustment of the intercept (HFEA communication) (Supplementary data file S3). Since these and the regression coefficients using the calibration intercept variables were fully recorded from 1999 to 2007, data from this and calibration slope; and period were used to impute the missing values of these variables Model revision (Method 3): further model adjustment for indi- in the validation dataset (2010 to 2016). vidual predictors which had a different effect in the validation In the study, three predictor variables had missing values: du- cohort compared to the development cohort. ration of infertility, pregnancy history, and stage and number of embryos transferred. Multiple imputation of these predictors was The method which demonstrated the best agreement between performed to increase the statistical power of the model and to the predictions and observed outcomes was used to update each adjust for any biases caused by excluding women with missing model. information (Greenland and Finkle, 1995). Ten imputed datasets Supplementary data file S5 includes a detailed description of were created using the chained equation (MICE) method (to at- these methods. tain a monotone missing data pattern) (Sterne et al., 2009). Then All statistical analyses were conducted using STATA version each missing variable was considered as a dependent variable in 16 (StataCorp, 2019) and R version 4.1.1 (R Core Team, 2021; Posit its own imputation model where it was regressed onto all the team, 2023). other variables. The following variables were included to inform the imputation process: female age, year of treatment, cause of Patient involvement infertility, IVF versus ICSI, and whether embryos were cryopre- No patients were involved in framing the research question, served. For the continuous variable ‘duration of infertility’, a pre- choosing the outcome measures, or developing plans for the de- dicted mean matching regression model was used; to impute the sign or implementation of the study. Patient input was not binary variable ‘pregnancy history’, a logistic regression model sought on interpreting or writing up the results of the study. We was used; and to impute the nominal categorical variable ‘stage have plans to disseminate the results of this research study to and number of embryos transferred’, a discriminant function patients via national fertility charities and the HFEA. method was used. This imputation was performed under the as- sumption that the data were missing at random (MAR) which Results means that the missing data depend on the values of the ob- served variables and treatment outcome. Following exclusions, the dataset included 91 035 women who underwent 144 734 complete cycles of IVF/ICSI between January Model implementation 2010 and December 2016 (Supplementary Fig. S1). The baseline The predictor values for women in the validation cohort were characteristics of couples and the treatments they underwent be- multiplied by the corresponding parameter estimates of the pre- fore initiating IVF are presented in Table 1 for each cohort. The dictors from the original pre-treatment model and then added to- development cohort comprised women who started IVF between gether. The same was done for the post-treatment model 1999 and 2008, whereas the validation cohort consisted of women (McLernon et al., 2016). The predicted probabilities were calcu- who started IVF between 2010 and 2016. Women included in the lated using the formulas in Supplementary data files S1 and S2. validation cohort had a similar distribution of age to the women in the development cohort. There was also a similar distribution Predictive performance in causes of infertility between the two cohorts. The predictive performance of the McLernon models was evalu- A higher proportion of women underwent ICSI in the valida- ated in terms of discrimination and calibration. Discrimination tion cohort compared to women in the development cohort (51% refers to the ability of the models to distinguish between women versus 41%). After the first IVF/ICSI cycle, embryo cryopreserva- who will achieve a live birth and those who will not (Moons et al., tion was more frequently performed in women belonging to 2012) and was assessed using the c-statistic. the validation cohort compared to the development cohort (35% Calibration refers to the degree of agreement between the ob- versus 25%). Only 32% of women in the validation cohort had a served live birth in the external cohort and predicted live birth double cleavage embryo transfer compared to 66% in the devel- (Moons et al., 2012). This was formally assessed using calibration- opment cohort. About half of the women in the validation sample in-the-large (CIL) and the calibration slope, and graphically had a single embryo transfer (17.8% had single cleavage-stage assessed using a calibration plot (Cox, 1958; Miller et al., 1993). transfer and 30.1% had a single blastocyst transfer), whereas For perfect calibration, the calibration slope and calibration inter- only 9% of women in the development dataset had a single em- cept should be 1 and 0 respectively. We calculated c-statistics, CIL, and calibration slope on each imputed dataset and separate bryo transfer. Validating IVF models predicting cumulative live birth | 2001 Table 1. Baseline characteristics of couples and their treatment before undergoing the first complete cycle of IVF in the cohorts used for model development and validation. *Characteristics HFEA 1999–2008 Development cohort HFEA 2010–2016 Validation cohort Number of patients 113 873 91 035 Number of complete cycles 184 269 137 879 Patient characteristics Woman’s age (years), Mean (SD) 34.1 (5) 35 (4) Duration of infertility (year), Median (IQR) Complete cases 4 (3–6) 9 (7–12) Missing, % 18 225 (16) 88 753 (97) After imputation in validation cohort – 4 (2–6) Pregnancy history No 75 541 (66) 0 (0) Yes 28 070 (25) 0 (0) Missing, % 10 262 (9) 91035 (100) After imputation in validation cohort No 57 039 (63) Yes – 33 996 (37) Causes of infertility Tubal 26 545 (23) 13 493 (15) Anovulatory 15 942 (14) 11 474 (13) Male factor 49 753 (44) 35 275 (39) Unexplained 32 693 (29) 28 433 (31) Endometriosis 7590 (7) 6709 (7) More than one 13 414 (12) 10 882 (12) Treatment characteristics at completed cycle 1 Type of treatment IVF 67 511 (59) 44 252 (49) ICSI 46 362 (41) 46 782 (51) No of oocytes collected, Median (IQR) 8 (5–13) 9 (6–13) No of embryos created, Median (IQR) 5 (2–8) 5 (3–8) No of embryos frozen, Median (IQR) 0 (0–1) 0 (0–1) Cryopreservation of embryos 28 950 (25) 31 874 (35) Stage and number of embryos transferred Single cleavage 9248 (8) 16 180 (18) Single blastocyst 662 (1) 27 364 (30) Double cleavage 75 701 (66) 29 021 (32) Double Blastocyst 2960 (3) 10 659 (12) Triple cleavage 8649 (8) 1144 (1) Triple blastocyst 130 (0.1) 241 (0.3) No transfer 15 501 (14) 5407 (6) Missing 1022 (1) 1019 (1) * The variables listed were included as predictors in the development sample (HFEA 1999–2008 cohort) and the validation sample (HFEA 2010–2017 cohort). 93% of women had missing data on duration of infertility in 2010 which increased to almost 100% in 2017. a,b From 2008, the Human Fertilisation and Embryology Authority (HFEA) changed the layout of their forms for recording data and removed questions regarding pregnancy history and duration of infertility (HFEA communication). Therefore, since these variables were fully recorded from 1998 to 2007, previous pregnancy status (which was 100% missing) and duration of infertility (97% missing) were imputed in the validation dataset using this data to inform the imputation process. IQR: interquartile range. The number of women in each cohort who started a treatment had a calibration slope of 0.74 (95% CI: 0.72 to 0.76), and the post- cycle, had a live birth, or discontinued treatment without having treatment calibration plot had a calibration slope of 0.68 (95% CI: a live birth is presented in Fig. 1. The LBRs per woman were 0.67 to 0.70) (Supplementary Tables S1 and S2). The CIL analyses higher in the validation cohort (HFEA 2010–2016) compared to showed little systematic underestimation for the pre-treatment the development cohort (HFEA 1999–2008) for all complete cycles model (CIL¼ 0.01 or O/E¼ 1.01). Systematic overestimation was evi- denced for the post-treatment model (CIL¼0.12 or O/E¼ 0.94) (see of IVF. Over six complete cycles of IVF/ICSI, the overall LBR of the Supplementary Tables S1 and S2). Both the calibration slopes both recent and old HFEA cohorts was 45% and 43%, respectively. indicated that the original regression coefficient estimates were too Predictive performance of the original models large, resulting in extreme predictions in new patients. For example, In the validation cohort, the pooled c-statistic for the pre-treatment the calibration slope of 0.74 of the pre-treatment model indicates model was 0.68 (95% CI: 0.68 to 0.68) and for the post-treatment that the original regression coefficient estimates of the model are model 0.75 (95% CI: 0.75 to 0.75). Figure 2a shows the calibration plot over-optimistic by around 26% in new patients, i.e. low chances of for the first imputed dataset (representative of all 10 imputations) live birth calculated by the model are too low and high probabilities depicting the observed cumulative LBR in the validation cohort ver- are too high compared with the observed LBRs. sus the predicted probability of cumulative live birth from the pre- Given the poor calibration, both the pre- and post-treatment treatment model (Fig. 3a shows the post-treatment model calibra- models were updated in an effort to improve performance in the tion plot) (McLernon et al.,2016). The pre-treatment calibration plot validation cohort. 2002 | Ratna et al. Cycle 1 (HFEA 1999-2008: 113873 HFEA 2010-2016: 91035) Cycle 2 Live birth IVF discontinued (HFEA 1999-2008: 45384(39.9) (HFEA 1999-2008: 33154(29.1) (HFEA 1999-2008: 35335(31.0) HFEA 2010-2016: 32774(36.0)) HFEA 2010-2016: 28306(31.1)) HFEA 2010-2016: 29955(32.9)) Cycle 3 IVF discontinued Live birth (HFEA 1999-2008: 16473(36.3) (HFEA 1999-2008: 18026(39.7) (HFEA 1999-2008: 10885(24.0) HFEA 2010-2016: 10341(31.6)) HFEA 2010-2016: 13378(40.8)) HFEA 2010-2016: 9055(27.6)) Cycle 4 Live birth IVF discontinued (HFEA 1999-2008: 5551(33.7) (HFEA 1999-2008: 3441(20.9) (HFEA 1999-2008: 7481(45.4) HFEA 2010-2016: 2682(25.9)) HFEA 2010-2016: 2540(24.6)) HFEA 2010-2016: 5119(49.5)) Cycle 5 IVF discontinued Live birth (HFEA 1999-2008: 1891(34.1) (HFEA 1999-2008: 2625(47.3) (HFEA 1999-2008: 1035(18.6) HFEA 2010-2016: 798(29.8)) HFEA 2010-2016: 1323(49.3)) HFEA 2010-2016: 561(20.9)) Cycle 6 Live birth IVF discontinued (HFEA 1999-2008: 684(36.2) (HFEA 1999-2008: 309(16.3) (HFEA 1999-2008: 898(47.5) HFEA 2010-2016: 249(31.2)) HFEA 2010-2016: 151(18.9)) HFEA 2010-2016: 398(49.9)) Live birth Cycle 7 IVF discontinued (HFEA 1999-2008: 101(14.8) (HFEA 1999-2008: 258(37.7) (HFEA 1999-2008: 325(47.5) HFEA 2010-2016: 39(15.7)) HFEA 2010-2016: 88(35.3)) HFEA 2010-2016: 122(49.0)) Figure 1. Flowchart of number of treatments and live birth outcomes over six complete cycles. Frequency and percentage of women having a live birth, continuing treatment without having a live birth, and discontinuing treatment without having a live birth over six complete cycles of IVF/ICSIin the HFEA 1999–2008 development cohort and 2010–2016 validation cohort. Percentages are in parentheses. HFEA: Human Fertilisation and Embryology Authority. (Fig. 2c). Figure 2d shows a better update of the model (as 95% CIs Updating the models of all deciles overlap the diagonal reference line) after model The updated pre-treatment model revision (Method 3) in the validation cohort and therefore was The estimated parameters of the original pre-treatment model chosen as the best method to update the pre-treatment model and the three different updated versions of the pre-treatment (Supplementary data file S6). model (i.e. updated intercept (Method 1), logistic recalibration The c-statistic of the model updated by the model revision (Method 2), and model revision (Method 3)) are summarized in (Method 3) method decreased very slightly to 0.67 (95% CI: 0.66 to Table 2. 0.68) (using imputed dataset 1). The estimates of the main parameters of the pre-treatment model using the different updating methods and the details of es- The updated post-treatment model timating these parameters are presented in Supplementary Table The estimated parameters of the original post-treatment model S3 and Supplementary data file S6, respectively. and the three versions of the updated post-treatment model (i.e. The Method 1 updating approach did not lead to any improve- updated intercept (Method 1), logistic recalibration (Method 2), ment in calibration for the pre-treatment model when reapplied and model revision (Method 3)) are summarized in Table 3. to the validation cohort. However, Methods 2 and 3 did result in Supplementary Table S4 and Supplementary data file S7 pre- improved calibration (Fig. 2b–d). We compared these approaches using Fig. 2 in order to identify the one which had the most bene- sent the different updated estimates of the main post-treatment ficial impact on calibration. After recalibrating the original model model parameters and the details of estimating these parameters respectively. by adjusting the intercept and slope (Method 2), calibration was All three updating approaches can be compared to the original good for all tenths except the seventh tenth (as 95% CI of the sev- enth decile does not overlap with the diagonal reference line) model in Fig. 3.In Fig. 3b, some deciles were still outside the Validating IVF models predicting cumulative live birth | 2003 Figure 2. Calibration plots for the pre-treatment model showing the association between the predicted and observed cumulative live birth rates over six complete IVF/ICSI cycles in the validation dataset. (a) Calibration plot for the original McLernon pre-treatment model as explained by McLernon et al. (2016) applied to the validation dataset; (b) calibration plot for the recalibrated pre-treatment model following adjustment of the intercept in the validation dataset (update intercept method); (c) calibration plot for the recalibrated model following adjustment of both the intercept and slope in the validation dataset (logistic recalibration method 2); and (d) calibration plot for the revised model after updating some coefficients using the validation dataset (model revision method). diagonal line, indicating that the model updated with Method 1 Figure 4b presents the predictions from the updated post- still needs further improvement. Method 2 (Fig. 3c) showed the treatment model. The predicted probability of a live birth was best improvement in calibration and was chosen as the best updated for a couple with the following characteristics: 30-year- method to update the post-treatment model. old woman, 2 years of male factor primary infertility, 15 oocytes The c-statistic of the model updated by the logistic recalibra- collected at the start of the first cycle, embryos cryopreserved af- tion (Method 2) method was 0.75 (95% CI: 0.74 to 0.76) (using im- ter fertilization, and a single fresh blastocyst embryo transferred puted dataset 1) (Supplementary data file S7). in the first cycle. The predicted probability of live birth after the first complete ICSI cycle is 46%. Cumulatively, this increases to 81% over three complete cycles. A woman who is 40 years old, Examples of model predictions has five oocytes collected, no embryos cryopreserved, and has a Figure 4 shows examples of both the pre- and post-treatment single cleavage stage embryo transferred has a 11% chance of a model predictions in different case scenarios using the final live birth after the first complete cycle. Cumulatively, this rises to updated models. 27% over three complete cycles. Figure 4a shows the cumulative predictions of live birth from the updated pre-treatment model over three complete ICSI cycles. These are presented for women aged 30 and 40 years with Discussion either a 2- or 5-year duration of male factor infertility. As shown Main findings in the figure, younger women have a much higher chance of success. A 30-year-old woman with 2 years of infertility has a The results of this study show that the pre- and post-treatment 42% predicted chance of having a live birth in the first complete models discriminate reasonably well between couples with and ICSI cycle. This increases to 77% over three complete cycles. For a without live birth when applied to a more recent cohort of IVF 40-year-old woman with 2 years of primary infertility, these prob- patients. However, both models required updating owing to poor abilities are 20% and 45% for one complete cycle and three com- calibration in the external dataset. The updated models should plete cycles, respectively. In contrast, for a similar woman with provide more accurate predictions in future patients, and, like 5 years of infertility the probabilities are 19% and 43% for one the original models, will be incorporated within the OPIS online complete cycle and three complete cycles, respectively. calculator for regular clinical use. 2004 | Ratna et al. Figure 3. Calibration plots for the post-treatment model showing the association between the predicted and observed cumulative live birth rates over six complete IVF/ICSI cycles in the validation dataset. (a) Calibration plot for the original McLernon post-treatment model as explained by McLernon et al. (2016) applied to the validation dataset; (b) calibration plot for the recalibrated post-treatment model following adjustment of the intercept in the validation dataset (update intercept method); (c) the recalibrated model following adjustment of both the intercept and slope in the validation dataset (logistic recalibration method); and (d) calibration plot for the revised model after updating some coefficients using the validation dataset (model revision method). missingness is assumed to be conditional on observed variables Strengths and limitations and treatment outcome. Since these variables were consistently For external validation, this study selected the McLernon models recorded between 1998 and 2007, the patient data from that time which were developed using appropriate methodology, showed period were used to inform the imputation. Multiple, rather than good predictive performance ability at both internal and external single, imputation was performed as a large amount of missing validation, and scored better on the Transparent Reporting of a data may lead to an underestimation of the uncertainty associated multivariable prediction model for Individual Prognosis Or with the imputed values (Steyerberg, 2019). Female age explained Diagnosis (TRIPOD) checklist than other IVF prediction models most of the variation from all of the predictors included in the pre- (Collins et al., 2015; Ratna et al., 2020). treatment model, and female age, number of eggs and cryopreser- In the study, calibration was assessed with multiple methods vation status explained most in the post-treatment model. including CIL, logistic calibration, and by visualizing the agree- However, we cannot rule out the possibility that imputed values ment between the predicted and observed LBRs (Bouwmeester for duration of infertility and previous pregnancy could have et al., 2012). To improve predictions, the study updated the pre- accounted for some of the difference in model performance in the treatment model using a more extensive model revision method, external cohort compared to the development cohort. while the post-treatment model was updated through the sim- The McLernon models estimate the individualized cumulative pler approach of recalibration. The recalibration methods (inter- chances of live birth under the optimistic assumption that cou- cept updating and logistic recalibration) are simple and stable ples who discontinue IVF treatment without a live birth have the because of the low number of parameters estimated. However, same chances of a live birth as couples who continue further the model revision method is expected to lead to a lower bias in the updated model since more parameters are estimated treatment cycles. This assumption may lead to an overestima- (Steyerberg et al., 2004). tion of the predicted cumulative probability of live birth, as some This study has some limitations. First, the external validation of the women who discontinue treatment will have stopped be- cause of poor prognosis (Olivius et al., 2004; Brandes et al., 2009) exercise involved a dataset with a very high proportion of missing values for duration of infertility (97%) and no data on previous meaning they will have an almost zero chance of conceiving. pregnancy. Therefore, both predictors had to be imputed in our The original models were not able to account for other poten- analysis. These variables were assumed to be MAR as the tial predictors, such as BMI, ovarian reserve tests and ethnicity, Validating IVF models predicting cumulative live birth | 2005 Table 2. Coefficients of the predictors from the original McLernon pre-treatment model and updated coefficients using three different methods in the validation dataset. Update intercept Logistic recalibration Model revision Predictors Original model (Method 1) (Method 2) (Method 3) Intercept 0.995 0.983 1.193 1.775 Complete cycle number 1 (reference) 0 0 0 0 2 0.239 0.239 0.178 0.226 3 0.411 0.411 0.306 0.388 4 0.563 0.563 0.419 0.531 5 0.719 0.719 0.535 0.679 6 0.814 0.814 0.606 0.768 Couple characteristics Woman’s age Age 0.028 0.028 0.021 0.025 Age1 0.181 0.181 0.135 0.222 Age2 0.455 0.455 0.339 0.732 Age3 1.199 1.199 0.892 1.804 Duration of infertility, (year) 0.029 0.029 0.022 0.016 Type of treatment, ICSI versus IVF 0.216 0.216 0.161 0.006 Pregnancy history, no versus yes 0.077 0.077 0.057 0.143 Tubal infertility, yes versus no 0.096 0.096 0.071 0.091 Male factor infertility, yes versus no 0.101 0.101 0.075 0.051 Anovulatory infertility, yes versus no 0.049 0.049 0.036 0.139 Unexplained infertility, yes versus no 0.060 0.060 0.045 0.057 Year of first oocyte collection Year 0.033 0.033 0.025 0.111 Year1 0.037 0.037 0.028 0.255 Year2 0.217 0.217 0.161 0.587 because they were absent in the HFEA database. We emphasize models showed improved agreement between live birth predic- that the models can only be used in heterosexual couples using tions and observed LBRs, as expected. As such, they can be con- their own eggs and sperm and not undergoing preimplantation sidered suitable for clinical use and can be used to inform future genetic testing. It should also be noted that the predictions from couples of their likely chances of treatment success (Arvis et al., 2012; Zarinara et al., 2016). When updating the McLernon models our models will represent an average prediction over all clinics for the later time period, the differences in the relative weights of within the UK. Clinic identifiers are not accessible from the HFEA and so it was not possible to adjust at the individual clinic level. the variables was probably a result of a combination of differen- In October 2009, the HFEA changed their consent policy so ces in IVF protocols, improved IVF success rates, and differences that patients had to opt-in for their IVF data to be used for re- in case mix between the two cohorts (McLernon et al., 2016; search purposes. Our validation study used data from 2010 and Leijdekkers et al., 2018). The proportion of women having embryo so only would have included couples who opted in. We do not ex- cryopreservation, single embryo transfer, and blastocyst transfer pect there to be a difference in characteristics and outcome be- were higher in the validation cohort than the development co- hort. This is a result of the increased use of single embryo trans- tween those who opt-in and those who opt-out, but it is difficult fer following the introduction of the UK ‘one-at-a-time’ policy in to know for sure without access to the data of those who opted out. 2007. It also reflects the increased use of embryo cryopreserva- We were able to reassess calibration and discrimination after tion owing to improvements in embryo freezing techniques updating both models. This would be considered a type of inter- (Human Fertilisation and Embryology Authority, 2018; Ishihara nal validation as it involves assessing the performance of the et al.,2014). These changes in practice and techniques may have updated models in the dataset used to update them. Ideally, we resulted in a degree of calibration drift which could explain the different performances of the McLernon models in the validation would like to be able to validate the updated models using a data- cohort (Jenkins et al., 2018). Even our updated models will have set from a separate population or to conduct a further temporal suffered some calibration drift since the end of our study period validation on a more up to date version of the HFEA dataset. The latter would be preferable from a practical perspective, as the in 2016. Since then, the national LBR per embryo transferred has models were developed for, and validated on, UK national data only increased by 1%, from 22% to 23% in 2018, which suggests and are intended for use by UK couples. We aim to continue vali- that not much changed in the following 2 years (Human Fertilisation and Embryology Authority, 2018, 2020). The HFEA dating the models periodically in the future using UK data to en- has yet to publish data on UK LBRs for 2019–2022 so it is difficult sure that they remain fit for purpose (Van Calster et al., 2023). to estimate how much calibration drift has affected our updated Interpretation of the findings model. Our results show that when applied to more recently treated Both live birth and treatment discontinuation rates in all com- patients, our models underpredicted outcomes in women with plete cycles of IVF were higher in the validation cohort than the low observed LBRs and slightly overpredicted in women with high development cohort. Year of treatment was strongly positively observed LBRs. Therefore, it was very important to update these associated with live birth, reflecting improvements in ART over models to reflect current practice and to provide more accurate time (McLernon et al., 2016). From October 2009, the HFEA patient predictions for patients and clinicians. After updating, the consent forms were changed so that patients had to explicitly 2006 | Ratna et al. Table 3. Coefficients of the predictors from the original McLernon post-treatment model and updated coefficients using three different methods in the validation dataset. Update intercept Logistic recalibration Model revision Predictors Original model (Method 1) (Method 2) (Method 3) Intercept 1.761 1.882 2.085 2.272 Complete cycle number 1 (reference) 0 0 0 0 2 0.193 0.193 0.132 0.123 3 0.354 0.354 0.242 0.226 4 0.512 0.512 0.351 0.327 5 0.679 0.679 0.465 0.434 6 0.767 0.767 0.525 0.490 Couple characteristics Woman’s age Age 0.027 0.027 0.019 0.028 Age1 0.156 0.156 0.107 0.213 Age2 0.382 0.382 0.261 0.769 Age3 1.019 1.019 0.697 1.849 Duration of infertility, years 0.021 0.021 0.014 0.004 Pregnancy history, no versus yes 0.050 0.050 0.035 0.008 Tubal infertility, yes versus no 0.221 0.221 0.151 0.141 Year of first oocyte collection Year 0.002 0.002 0.001 0.022 Year1 0.062 0.062 0.042 0.014 Treatment characteristics at complete cycle 1 Number of oocytes collected Eggs 0.064 0.064 0.044 0.067 Eggs1 0.050 0.050 0.034 0.061 Cryopreservation of embryos, yes vs no 0.650 0.650 0.445 0.517 Stage and number of embryos transferred Double cleavage stage 0 0 0 0 No embryos transferred 1.083 1.083 0.742 1.218 Single cleavage stage 0.566 0.566 0.388 0.404 Single blastocyst stage 0.069 0.069 0.048 0.223 Double Blastocyst stage 0.582 0.582 0.040 0.439 Triple cleavage stage 0.022 0.022 0.015 0.238 Triple blastocyst stage 0.456 0.456 0.312 0.573 Type of treatment, ICSI versus IVF 0.097 0.097 0.066 0.062 agree that their data could be used for research purposes. This most improvement (as evidenced by the calibration plot) was se- change led to higher discontinuation rates owing to many women lected to update the models (Janssen et al., 2008). Model updating opting not to disclose their treatment information. Therefore, over time is expected given improvements in IVF practice and only couples who provided explicit consent for their information technology, and changes in patient case mix. to be used in research were included in this study. Data collected Although our post-treatment model showed good discrimina- in 2009 were also excluded from this study to ensure that the tion after recalibration, the discriminatory ability of the pre-treatment model remained reasonably low, as is the case for dataset only encompassed the time period after which the new forms were introduced across the whole of the UK. almost all fertility-based prediction models (Leushuis et al., 2009). Regarding discrimination, the updated pre-treatment model The literature suggests that the low c-statistic reflects the had a slightly lower c-statistic (0.68, 95% CI: 0.67 to 0.69) in the homogeneity of the study population e.g. infertile women of validation cohort than in the development cohort (0.69, 95% CI: reproductive age (Cook, 2008; Coppus et al., 2009). However, a low 0.68 to 0.69) (McLernon et al., 2018). The recalibrated post- c-statistic does not necessarily imply that such prediction models treatment model had a good c-statistic of 0.75 (95% CI: 0.74 to have limited use in clinical practice. Couples with a fertility prob- 0.76) in the validation cohort which is slightly lower than the lem are more interested in knowing their chances of live birth c-statistic of 0.76 (95% CI: 0.75 to 0.77) in the development cohort (calibration) rather than the ability of the model to discriminate (McLernon et al., 2018). A previous validation study also reported between couples who will have a live birth and couples who will lower c-statistics of 0.62 (95% Cl: 0.59 to 0.64) and 0.71 (95% CI: not. Therefore, assessment by calibration is more relevant. 0.69 to 0.74) for the recalibrated pre-treatment McLernon model Comparison with other studies and the calibrated post-treatment McLernon model, respectively (Leijdekkers et al., 2018). These reductions in model discrimina- Two prediction models were developed using national US data from the Society for Assisted Reproductive Technology (SART) tion ability are likely due to the differences in couple and treat- ment characteristics and outcome prevalence between the two (McLernon et al., 2021). The first model is a pre-treatment model, similar to that validated in our current study. The second model cohorts (Moons et al., 2012). Poor calibration was evidenced in the external validation for is a post-treatment model but differs from the one validated here McLernon models. Three increasingly complicated methods were because it predicts cumulative live birth chances in couples start- ing a second complete cycle whose first complete cycle was un- explored for updating the models (i.e. intercept updating, logistic recalibration, and model revision). The method that led to the successful. The pre-treatment model was adjusted for BMI, Validating IVF models predicting cumulative live birth | 2007 Figure 4. Examples of the updated models predicting cumulative live birth over three complete cycles of ICSI for couples with different characteristics. (a) couples with either 2 or 5 years of primary male factor infertility, where the female partner is aged either 30 or 40 years (pre- treatment model); (b) couples with 2 years of primary male factor infertility, where the female partner is aged either 30 or 40 years, with either 5 or 15 oocytes collected in the first complete cycle. Those with five oocytes have a single cleavage embryo transfer with no embryos cryopreserved, and those with 15 oocytes have a single blastocyst embryo transfer with embryos cryopreserved. S: single. which was not available for the UK models. Furthermore, anti- reproductive medicine (Shingshetty et al., 2022). Our regression- Mu ¨ llerian hormone (AMH) was included in a second pre- based models will be useful until a reliable and tested AI model treatment model developed using a sub-population who had an has been developed, validated and shown to perform better than AMH measurement. The SART data did not have duration of in- our model. There are many publications showing that traditional fertility which was available in the HFEA data and included as a statistical regression models can match or even outperform AI predictor in the UK model. The US models have yet to be exter- models (Liew et al., 2022; Lynam et al., 2020). Indeed, statistical nally validated but the c-statistic of the pre-treatment model in models are more generalizable to other populations and easier to the development dataset was slightly higher than that for the UK interpret. pre-treatment model (0.71 versus 0.69) (McLernon et al., 2018). Clinical implications The amount of electronic data produced and stored in the field of reproductive medicine has increased considerably. Artificial Both the updated models provide more accurate predictions for intelligence (AI) (or machine learning) is progressively used in the current IVF population and can be used as counselling tools medical research to predict future outcomes and is often used in in fertility clinics within the UK. Before initiating treatment, the place of regression-based models. Approaches such as Bayesian revised pre-treatment model can be used to inform clinicians and neural networks and boosting algorithms are more suited to high couples of their individualized estimates of treatment success dimensional datasets, i.e. containing a large number of potential over multiple complete cycles of IVF. Then, after the first fresh predictors which may include imaging information. Because of embryo transfer, the recalibrated post-treatment model can pro- this, they require many patients to avoid risk of bias (Andaur vide a revised estimate of treatment success using treatment- related information. Clinicians can use these models in their Navarro et al., 2021). Models using such approaches that are de- daily practice to shape couples’ expectations by informing them veloped in a single clinic may not be transportable to other clinics as they tend to detect patterns unique to that particular clinic of their individualized chances of live birth over a sequence of (Chen et al., 2022). However, if clinics are able to share and com- multiple complete cycles of IVF. bine their data to develop such models and then assess heteroge- Our models should not be used for excluding couples from neity in predictive performance between clinics then they may be treatment. A model which is intended for use in clinical deci- transportable (Riegler MA et al., 2021). High dimensional elec- sions, such as whether or not to have treatment, should be devel- tronic health records are not commonly available yet in oped using data from patients who were not treated as well as 2008 | Ratna et al. patients who were treated, preferably using data from random- Grampian DaSH (www.abdn.ac.uk/iahs/facilities/grampian-data- ized controlled trials with treated and untreated patients. This safe-haven.php). would allow us to assess treatment effectiveness (i.e. are couples more likely to have a baby with or without IVF?) and treatment Authors’ roles benefit (if they are more likely to have a baby with IVF, is the in- D.J.M. and S.B. generated the research idea and designed the crease in the predicted chance worth the physical, emotional and study. M.B.R. conducted the statistical analysis and literature financial burden of the treatment?). Our prediction models are search and wrote the initial draft of the article. D.J.M. supervised not meant to aid decisions around whether to have IVF or ICSI. the statistical analysis. M.B.R., S.B., and D.J.M. contributed intel- Such a decision must be made before using the models to make lectually to the writing and revising of the manuscript. All predictions in new patients. For models that aim to facilitate authors approved the final version of the article. decisions on treatment type, a different causal modelling ap- proach is required when only observational data is available (Sperrin et al., 2019). Funding The original McLernon models were converted into the OPIS This work was supported by the Elphinstone scholarship scheme online calculator so that they could be used in clinical practice to at the University of Aberdeen and the Assisted Reproduction Unit estimate the probability of live birth based on the characteristics at Aberdeen Fertility Centre, University of Aberdeen. The funder of the couple and treatment (https://w3.abdn.ac.uk/clsm/opis). did not have any role in the study design; data collection, data Since both the original models underestimate predicted cumula- analysis, and interpretation of data; the writing of the report; nor tive live birth for couples in the recent UK IVF cohort, conversion the decision to submit the paper for publication. S.B. has a com- of the updated models into a new online calculator is required. mitment of research funding from Merck. The updated online calculator will be able to provide accurate and more up-to-date predictions to both clinicians and couples considering IVF/ICSI treatment. Conflict of interest While we did not involve patients and clinicians in this valida- D.J.M. declares grants received by University of Aberdeen from tion study, our online OPIS calculator has been updated with an NHS Grampian, The Meikle Foundation, and Chief Scientist optional questionnaire for patients and healthcare professionals Office in the past 3 years. D.J.M. declares receiving an honorarium to obtain feedback on the tool. We will use the findings to make for lectures from Merck. D.J.M. is Associate Editor of Human future refinements to the models and our calculator. Reproduction Open and Statistical Advisor for Reproductive BioMed Online. S.B. declares royalties from Cambridge University Press for Conclusion a book. S.B. declares receiving an honorarium for lectures from Merck, Organon, Ferring, Obstetric and Gynaecological Society of The updated McLernon prediction models provide accurate pre- Singapore, and Taiwanese Society for Reproductive Medicine. S.B. dictions of cumulative live birth over multiple complete cycles of has received support from Merck, ESHRE, and Ferring for attend- treatment which reflect current UK IVF practice. These models, ing meetings as speaker and is on the METAFOR and CAPRE which will be available in our updated OPIS calculator (http://w3. Trials Data Monitoring Committee. abdn.ac.uk/clsm/opis), can be used as counselling tools to inform couples of their prognosis before commencing IVF/ICSI treatment as well as after the first fresh embryo transfer. They will help References couples prepare emotionally and financially for their future treatment. Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM et al. Risk of bias in studies on prediction models developed using supervised ma- Supplementary data chine learning techniques: systematic review. BMJ 2021;375: Supplementary data are available at Human Reproduction online. n2281. Arvis P, Lehert P, Guivarc’h LA. Simple adaptations to the Templeton model for IVF outcome prediction make it current and clinically Data availability useful. Hum Reprod 2012;27:2971–2978. The data underlying this article cannot be shared publicly due to Bouwmeester W, Zuithoff NP, Mallett S, Geerlings MI, Vergouwe Y, the privacy of individuals that participated in the study. The data Steyerberg EW, Altman DG, Moons KG. Reporting and methods in can be shared on reasonable request to the corresponding author clinical prediction research: a systematic review. PLoS Med 2012;9: with permission of the HFEA. Access to the anonymized HFEA e1001221. database was approved by the north of Scotland research ethics Brandes M, van der Steen JO, Bokdam SB, Hamilton CJ, de Bruin JP, committee (12/NS/0119), the Confidentiality Advisory Group Nelen WL, Kremer JA. When and why do subfertile couples dis- (CAG), and the HFEA Register Research Panel. continue their fertility care? A longitudinal cohort study in a sec- ondary care subfertility population. Hum Reprod 2009;24: 3127–3135. Acknowledgements Chen Z, Zhang D, Zhen J, Sun Z, Yu Q. Predicting cumulative live We are grateful to the HFEA for their permission to analyse the birth rate for patients undergoing in vitro fertilization (IVF)/intra- database, extracting the requested information, and assisting cytoplasmic sperm injection (ICSI) for tubal and male infertility: with all our queries in an efficient manner. We are also thankful a machine learning approach using XGBoost. Chin Med J (Engl) to the data management support of the Grampian Data Safe 2022;135:997–999. Haven (DaSH) and the associated financial support of NHS Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting Research Scotland, through NHS Grampian investment in the of a multivariable prediction model for individual prognosis or Validating IVF models predicting cumulative live birth | 2009 diagnosis (TRIPOD): the TRIPOD statement. Br J Surg 2015;102: McLernon DJ, Steyerberg EW, Te Velde ER, Lee AJ, Bhattacharya S. 148–158. Predicting the chances of a live birth after one or more complete Cook NR. Statistical evaluation of prognostic versus diagnostic mod- cycles of in vitro fertilisation: population based study of linked els: beyond the ROC curve. Clin Chem 2008;54:17–23. cycle data from 113 873 women. BMJ 2016;355:i5735. Coppus SF, van der Veen F, Opmeer BC, Mol BW, Bossuyt PM. McLernon DJ, Steyerberg EW, Te Velde ER, Lee AJ, Bhattacharya S. An improvement in the method used to assess discriminatory ability Evaluating prediction models in reproductive medicine. Hum when predicting the chances of a live birth after one or more Reprod 2009;24:1774–1778. Cox DR. Two further applications of a model for binary regression. complete cycles of in vitro fertilisation. BMJ 2018;362:k3598. McLernon DJ, Raja EA, Toner JP, Baker VL, Doody KJ, Seifer DB, Biometrika 1958;45:562–565. Sparks AE, Wantman E, Lin PC, Bhattacharya S et al. Predicting Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, personalized cumulative live birth following in vitro fertilization. Riley RD, Moons KGM. A guide to systematic review and meta- Fertil Steril 2021;117:326–338. analysis of prediction model performance. BMJ 2017;356:i6460. McLernon DJ, Bhattacharya S. Quality of clinical prediction models Greenland S, Finkle WD. A critical look at methods for handling in in vitro fertilisation: Which covariates are really important to missing covariates in epidemiologic regression analyses. Am J predict cumulative live birth and which models are best? Best Epidemiol 1995;142:1255–1264. Pract Res Clin Obstet Gynaecol 2023;86:102309. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: Miller ME, Langefeld CD, Tierney WM, Hui SL, McDonald CJ. issues in developing models, evaluating assumptions and ade- Validation of probabilistic predictions. Med Decis Making 1993;13: quacy, and measuring and reducing errors. Statist Med 1996;15: 49–58. 361–387. Moons KGM, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Human Fertilisation and Embryology Authority. Fertility Treatment Altman DG, Woodward M. Risk prediction models: II. External 2014–2016: Trends and Figures. London: HFEA, 2018. https://www. validation, model updating, and impact assessment. Heart 2012; hfea.gov.uk/media/3188/hfea-fertility-trends-and-figures-2014- 98:691–698. 2016.pdf (16 August 2023, date last accessed). Nelson SM, Lawlor DA. Predicting live birth, preterm delivery, and Human Fertilisation and Embryology Authority. Fertility Treatment low birth weight in infants born from in vitro fertilisation: a pro- 2018: Trends and Figures. London: HFEA, 2020. https://www.hfea. spective study of 144,018 treatment cycles. PLoS Med 2011;8: gov.uk/about-us/publications/research-and-data/fertility-treat e1000386. ment-2018-trends-and-figures/ (16 August 2023, date last Olivius C, Friden B, Borg G, Bergh C. Why do couples discontinue accessed). in vitro fertilization treatment? A cohort study. Fertil Steril 2004; Ishihara O, Araki R, Kuwahara A, Itakura A, Saito H, Adamson GD. 81:258–261. Impact of frozen-thawed single-blastocyst transfer on maternal Posit team. RStudio: Integrated Development Environment for R. Boston, and neonatal outcome: an analysis of 277,042 single-embryo MA: PBC, Posit Software, 2023. http://www.posit.co/ (16 August transfer cycles from 2008 to 2010 in Japan. Fertil Steril 2014;101: 2023, date last accessed). 128–133. Ratna MB, Bhattacharya S, Abdulrahim B, McLernon DJ. A system- Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y. atic review of the quality of clinical prediction models in in vitro Updating methods improved the performance of a clinical pre- fertilisation. Hum Reprod 2020;35:100–116. diction model in new patients. J Clin Epidemiol 2008;61:76–86. R Core Team. R: A Language and Environment for Statistical Computing. Jenkins DA, Sperrin M, Martin GP, Peek N. Dynamic models to predict Vienna, Austria: R Foundation for Statistical Computing, 2021. health outcomes: current status and methodological challenges. https://www.R-project.org/ (16 August 2023, date last accessed). Diag Prognost Res 2018;2:23. Riegler MA, Stensen MH, Witczak O, Andersen JM, Hicks SA, Leijdekkers JA, Eijkemans MJ, Van Tilborg TC, Oudshoorn SC, Hammer HL, Delbarre E, Halvorsen P, Yazidi A, Holst N et al. McLernon DJ, Bhattacharya S, Mol BW, Broekmans FJ, Torrance Artificial intelligence in the fertility clinic: status, pitfalls and pos- HL; OPTIMIST group. Predicting the cumulative chance of live sibilities. Hum Reprod 2021;36:2429–2442. birth over multiple complete cycles of in vitro fertilization: an ex- Shingshetty L, Maheshwari A, McLernon DJ, Bhattacharya S. Should ternal validation study. Hum Reprod 2018;33:1684–1695. we adopt a prognosis-based approach to unexplained infertility? Leushuis E, Van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, Hum Reprod Open 2022;4:hoac046. Van der Veen F, Mol BW, Hompes PG. Prediction models in repro- Sperrin M, Jenkins D, Martin GP, Peek N. Explicit causal reasoning is ductive medicine: a critical appraisal. Hum Reprod Update 2009;15: needed to prevent prognostic models being victims of their own 537–552. success. J Am Med Inform Assoc 2019;26:1675–1676. Liew BXW, Kovacs FM, Ru ¨ gamer D, Royuela A. Machine learning ver- StataCorp. Stata Statistical Software: Release 16. College Station, TX, sus logistic regression for prognostic modelling in individuals USA: StataCorp LLC, 2019. with non-specific neck pain. Eur Spine J 2022;31:2082–2091. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Luke B, Brown MB, Wantman E, Stern JE, Baker VL, Widra E, Wood AM, Carpenter JR. Multiple imputation for missing data in Coddington IC, Gibbons WE, Ball GD. A prediction model for live epidemiological and clinical research: potential and pitfalls. BMJ birth and multiple births within the first three cycles of assisted 2009;338:b2393. reproductive technology. Fertil Steril 2014;102:744–752. Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Lynam AL, Dennis JM, Owen KR, Oram RA, Jones AG, Shields BM, Habbema JD. Validation and updating of predictive logistic re- Ferrat LA. Regression has similar performance to optimised ma- gression models: a study on sample size and shrinkage. Stat Med chine learning algorithms in a clinical setting: application to the 2004;23:2567–2586. discrimination between type 1 and type 2 diabetes in young Steyerberg EW. Clinical Prediction Models, 2nd edn. Cham: Springer adults. Diagn Progn Res 2020;4:6. International Publishing, 2019. Maheshwari A, McLernon D, Bhattacharya S. Cumulative live birth Templeton A, Morris JK, Parslow W. Factors that affect outcome of rate: time for a consensus? Hum Reprod 2015;30:2703–2707. in-vitro fertilisation treatment. Lancet 1996;348:1402–1406. 2010 | Ratna et al. Van Calster B, Steyerberg EW, Wynants L, van Smeden M. There is no Wong KM, Mastenbroek S, Repping S. Cryopreservation of human such thing as a validated prediction model. BMC Med 2023;21:70. embryos and its contribution to in vitro fertilization success van Houwelingen HC. Validation, calibration, revision and combina- rates. Fertil Steril 2014;102:19–26. tion of prognostic survival models. Statist Med 2000;19:3401–3415. Zarinara A, Zeraati H, Kamali K, Mohammad K, Shahnazari P, Van Loendersloot L, Repping S, Bossuyt PM, van der Veen F, van Akhondi MM. Models predicting success of infertility treatment: Wely M. Prediction models in in vitro fertilization; where are we? a systematic review. J Reprod Infertil 2016;17:68–81. A mini review. J Advanced Res 2014;5:295–301.
Human Reproduction – Oxford University Press
Published: Aug 25, 2023
Keywords: IVF; live birth; clinical prediction model; validation; calibration
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.