PharmacoEconomics Open (2018) 2:165–177 https://doi.org/10.1007/s41669-017-0049-9 ORIGINAL RESEARCH ARTICLE Direct Mapping of the QLQ-C30 to EQ-5D Preferences: A Comparison of Regression Methods Ralph Crott Published online: 7 August 2017 The Author(s) 2017. This article is an open access publication Abstract utilities equal to one. Zero–one inﬂated beta regression was Background Several mapping or cross-walking algorithms also promising. However, OLS regression proved to be the for deriving utilities from the European Organisation for most accurate for the mean. The prediction of utilities Research and Treatment of Cancer Quality of Life Ques- equal to one was poor in all regression approaches. tionnaire for Cancer (EORTC QLQ-C30) scores have been Conclusions Three-part regression methods that separately published in recent years. However, the large majority used target low, medium and high (\0.50, 0.51–0.99 or 1) ordinary least squares (OLS) regression, which proved to utilities seem to have better prediction power than OLS be not very accurate because of the speciﬁcs of the quality- with EQ-5D-3L data, although OLS also seems quite of-life measures. robust. Exploration of three-part approaches compared Objective Our objective was to compare regression meth- with single (OLS) regression should be further tested in ods that have been used to map EuroQol 5 Dimensions 3 other similar datasets or using individual pooled data from Levels (EQ-5D-3L) utility values from the general EORTC various clinical or observational studies. The use of alter- QLQ-C30 using OLS as a benchmark while ﬁxing the native goodness-of-ﬁt measures for mapping studies and number of explanatory variables and to explore an alter- their inﬂuence on the choice of the best performing native three-part model. methods should also be investigated. Methods We conducted a regression analysis of predicted EQ- 5D-3L utilities generated using data from an observational study in ambulatory patients with non-small-cell lung cancer in Key Points For Decision Makers a Toronto hospital. Six alternative regression methods were compared with a simple OLS regression as benchmark. The six Mapping EuroQol-5 Dimensions (EQ-5D) utilities alternative regression models were Tobit, censored least from cancer-speciﬁc non-preference measures have absolute deviation, normal mixture, beta, zero–one inﬂated beta used ordinary least squares regression and, more and a mix of piecewise OLS and logistic regression. recently, a variety of more complex statistical Results The best predictive ﬁt was obtained by a mix of regression methods. OLS regression(s) for utilities lower than 1 with a cut-off We have shown that these should be rejected in point of 0.50 and a separate binary logistic regression for favour of three-part models that are more able to take into account the tri-modal distribution of the 3-level Electronic supplementary material The online version of this (EQ-5D-3L) measures. article (doi:10.1007/s41669-017-0049-9) contains supplementary material, which is available to authorized users. Further research should be undertaken to validate our results in other cancer data and with the more recent & Ralph Crott 5-level (EQ-5D-5L) questionnaire. email@example.com IRSS, Universite Catholique de Louvain, Clos Chapelle Aux Champs, 1200 Brussels, Belgium 166 R. Crott who were relapse free post-resection with or without 1 Introduction undergoing chemotherapy or combined radio-chemother- 1.1 Study Rationale apy in a single major Canadian centre in Toronto on a single visit in 2009. Economic evaluation of medical technology often empha- The mean age of the patients was 66 years, 46.5% were male, and the mean EQ-5D utility score was 0.76 ± 0.20 sizes that outcomes be expressed in terms of quality-ad- justed life-years (QALYs). In cancer, the main accepted (valued by the D2 US valuation tariff of Shaw et al. ). The mean QLQ-C30 scores were equal to ‘physical primary long-term endpoints are overall survival and dis- ease- or progression-free survival; however, the aggres- function’ (PF) 3.25; ‘role function’ (RF) 67.44; ‘emotional function’ (EF) 75.19; ‘cognitive function’ (CF) 79.84; siveness of the treatments means health-related quality of ‘social function’ (SF) 73.16; and overall quality of life life (HRQOL) is often also measured using various disease- (QOL) 65.89. Most symptom scores were relatively low speciﬁc questionnaires such as the European Organisation (\0.30), except for fatigue 40.83; dyspnoea 31.20 and for Research and Treatment of Cancer Quality-of-Life Questionnaire for Cancer (EORTC QLQ-C30) or the insomnia 34.88, reﬂecting the expected symptoms proﬁle of this population (for further details, see Jang et al. ). Functional Assessment of Cancer Therapy-General (FACT-G) and their variants. We re-analysed these data using instead the original UK EQ-5D-3L valuation tariff . As clinical trials or other clinical studies do not often col- lect preference-based measures, statistical mapping would Jang et al.  performed a simple OLS regression with all the QLQ-C30 scores (called the full model) and a sec- provide a statistical model or formula that allows the esti- ond one limited to a number of signiﬁcant variables from mation of utilities and the subsequent calculation of QALYs in the full regression (called the reduced model). clinical studies that do not use any preference-based HRQOL instrument, provided it has a good predictive accuracy. 2.1.2 External Validation Sample We previously showed that current ordinary least squares (OLS)-based mapping algorithms showed poor external validity [1, 2]. Given the exploratory nature of this study and the small number of observations, no external validation sample was 1.2 Study Objective used. While most previous studies used OLS regression, more 2.2 Instruments Description complex methods such as beta-binomial (BB), normal mixture (NMIX) and beta-regression have recently been 2.2.1 Source and Target Measures proposed in the mapping literature. The EORTC QLQ-C30 version 3 is a cancer-speciﬁc The aims of the current exploratory study were to compare these existing regression methods that have been used to map patient-administered questionnaire of 30 questions (items) scored from 1 (very poor) to 7 (excellent) and incorporates EuroQol 5 Dimensions 3 Levels (EQ-5D-3L) utility values from the general EORTC QLQ-C30 using OLS as bench- ﬁve functional multi-item dimensions (PF, RF, CF, EF and SF); three symptom domains (fatigue, pain and nausea/ mark while ﬁxing the number of explanatory variables and to propose a possible simple three-part method in practice. vomiting); and a Global Health Status/QOL score (two items). A further six single items, mainly tracking symp- Reporting and article structure followed the recent Mapping onto Preference-based measures reporting Stan- toms, are also included (dyspnoea, insomnia, appetite loss, constipation, diarrhoea and ﬁnancial difﬁculties). dards (MAPS) recommendations . The QLQ-C30 functional domain scores and item (i.e. symptom) scores can be standardized from the raw item scores to have a 0–100 range through a linear transfor- 2 Methods mation. The combined HSQOL score was constructed as the average of the ‘health status’ (HS) and overall QOL 2.1 Patient Population and Setting scores. For functional scores, a high score means a high level of 2.1.1 Estimation Sample functioning, whereas a high symptom score means a high level of symptom severity. The functional and symptom Jang et al.  collected QLQ-C30 and EQ-5D-3L data from a sample (N = 172) of ambulatory patients with scores were constructed following the EORTC published scoring manual , resulting in a total of 15 distinct mainly stage III/IV non-small cell lung cancer (NSCLC) Direct Mapping of the QLQ-C30 to EQ-5D Preferences 167 variables (ﬁve functions, eight symptoms, one overall and a three-part piecewise linear (PWL), comprising two QOL, one ﬁnancial impact). separate OLS and one logistic regression to cover the most The EQ-5D-3L provides a simple descriptive QOL common as well as some more recent published mapping proﬁle or vector of ﬁve items (mobility, self-care, usual regression models for the QLQ-C30. activities, pain/discomfort and anxiety/depression) with We did not investigate a response-level model, as this three levels. Each individual EQ-5D-3L proﬁle can be was outside the scope of this article [9, 10]. All calculations translated into utilities by applying country-speciﬁc general were conducted in STATA version 14. population-elicited ‘tariffs’ to generate a single utility index . 2.3.4 Estimation of Predicted Utilities The EQ-5D-3L utilities were constructed using the original UK tariff instead of the original US tariff used by For ease of comparison between the different regression Jang et al.  to enhance comparisons, as this is the most methods, the predictive variables were ﬁxed in all regres- widely used tariff in published mapping studies to date, and sions to include only the physical, emotional and pain applied to the observed EQ-5D-3L health dimensions. QLQ-C30 scores as these corresponded to the original reduced model from Jang et al.  (except for role func- 2.3 Statistical Analysis tioning, see Table 1); based on the results of a preliminary OLS regression involving all the QLQ-C30 functional and 2.3.1 Exploratory Analysis symptom scores and comparing its results with a reduced model by means of a likelihood ratio (LR) test. However, the overlap of EQ-5D-3L items with those of the The emphasis is therefore placed on the comparison QLQ-C30 scores is only partial. To explore the overlap, we between the different regression methods and not on pro- performed a non-parametric Spearman rank correlation viding a mapping algorithm as such (which would involve analysis at a function/item level between the two. using all QLQ-C30 scores with a variable number of variables ultimately possibly being retained in each 2.3.2 Missing Data regression and exploring various functional forms of the regression equations). All records were used; there were no missing data in the available dataset. 2.3.5 Measures of Model Performance 2.3.3 Modelling Approaches First, the predicted utilities were plotted and visually compared with the observed utilities in a series of plots. Mapping methods can be divided into regression-based and Second, the mean, standard deviation, median and upper non-regression methods (for an early literature review, see and lower quintiles of the mapped utilities were compared Mortimer and Segal ). Regression-based methods can be with the original observed utilities. This allowed us to further subdivided into direct one-step models that estimate judge the bias and precision of the estimates. the target utility value or two-step models that estimate ﬁrst Finally, a series of GOF statistics were calculated and the response level for each item of the multiple attribute summarized. These were mean absolute error (MAE), root utility (MAU) target measure and then apply a tariff for- mean squared error (RMSE) (or sigma for Tobit regres- mula to the estimated responses. sion), the number of absolute errors[0.05 as an indication We then regressed all QLQ-C30 functional scores on the of minimal clinical important difference (MCID) and the observed EQ-5D-3L utilities and reran the OLS regression number of estimated observations greater than one and with the restricted model-retained variables to get our lower than zero. benchmark OLS algorithm. As the goodness-of-ﬁt (GOF) measures of the OLS regression between the full and 2.3.6 Validation Methods reduced model were very close, we chose to use the reduced model for further analysis because including Given the exploratory nature of this study and the small additional variables would not provide new information. number of available observations, no in-sample cross-val- Six different regression methods were used to predict idation or external validation sample was performed. EQ-5D-3L utilities from the QLQ-C30 functional scores Generally, in-sample validation is of limited use as it using OLS as benchmark. The other approaches were preserves the internal structure of the data, which is not the Tobit, censored least absolute deviation (CLAD), beta case with independent external samples. It is our intent to regression (BB), zero-one inﬂated beta regression (ZOIB), explore this aspect in further research using a set of dif- Gaussian Mixture (NMIX) with two or three components, ferent external NSCLC patient samples. 168 R. Crott Table 1 Original non-small-cell lung cancer ordinary least squares results (Jang et al. ) with USA tariff compared with UK tariff regression Variables Jang et al.  UK tariff full model Jang et al.  UK tariff reduced model a a USA full model USA reduced model Intercept 0.3381 0.1873 (p = 0.177) 0.4029 0.1963*** (p = 0.016) Physical functioning (PF) 0.0035*** 0.0051*** (p = 0.000) 0.0039*** 0.0058*** (p = 0.000) Role functioning (RF) 0.0007 0.0011 (p = 0.158) 0.0008*** Emotional functioning (EF) 0.0011*** 0.0016* (p = 0.064) 0.0015*** 0.0019*** (p = 0.005) Cognitive functioning (CF) 0.0007 0.0005 (p = 0.575) Social functioning (SF) -0.0007 -0.0013 (p = 0.100) -0.0007 Global health status/QOL (HSQOL) 0.0009 0.0009 (p = 0.448) Fatigue (FA) 0.0003 0.0003 (p = 0.784) Nausea and vomiting (NV) -0.0002 -0.0005 (p = 0.693) Pain (PA)*** -0.0021** -0.0032*** (p = 0.000) -0.0021** -0.0034*** (p \ 0.0001) Dyspnoea (DY) -0.0001 -0.0002 (p = 0.735) Insomnia (SL) -0.0001 -0.0002 (p = 0.712) Appetite loss (AP) -0.0001 -0.0003 (p = 0.656) Constipation (CO) 0.0005 0.0006 (p = 0.267) Diarrhoea (DI) 0.0004 0.0006 (p = 0.380) Financial difﬁculties (FI) -0.0001 -0.0001 (p = 0.494) * p \ 0.10, ** p \ 0.05, *** p \ 0.01 p values not published 3 Results Clearly, fatigue and diminished PF (which are them- selves correlated, rho 0.68) have the broadest impact on the 3.1 Exploratory Analysis EQ-5D-3L dimensions, and there is strong one-to-one relationship between the items for pain and depression. Dyspnoea is probably speciﬁc to this lung cancer patient Typical of EQ-5D-3L utilities, we observed a large ceiling population and was only moderately correlated with usual effect, a gap around 0.90 and a left skew with some neg- activity performance. ative observations and a clustering of values in the 0.60–0.85 range [mean 0.667; median 0.743; standard Some of the above QLQ-C30 items were also moder- ately to highly cross-correlated (q [ 0.50–0.70), with some deviation (SD) 0.285; skewness -1.365; kurtosis 4.564] (Table 2). others such as PF with EF, fatigue and dyspnoea, RF with SF and fatigue, SF with fatigue, and ﬁnally fatigue with The tri-modal aspect of the distribution is apparent with a long lower tail, a clustering at medium values and a high pain. These inter-item correlations in the QLQ-C30 mean that upper ceiling effect (Fig. 1). Mapping still requires checking for some concordance some multicollinearity might be present when performing regressions using all the QLQ-C30 scores. between the dimensions of both questionnaires . As one would expect, the pain items were highly cor- 3.2 Individual Model Coefﬁcients related in both scales. Fatigue symptoms were associated at more or less the same degree with mobility, usual activities and pain/dis- 3.2.1 Benchmark Ordinary Least Squares Regression on Non-Small-Cell Lung Cancer comfort, whereas dyspnoea was only associated with usual activity performance but not strongly with mobility (rho We only retained the explanatory variables with p values 0.35). PF impairment was relatively highly associated with \0.10 from the overall linear regression including all QLQ-C30 scores. The number of retained variables set performing usual activities and somewhat lesser with using UK tariff values is more restricted than the original mobility and self-care, as was RF except for self-care. restricted formula published by Jang et al.  using a USA EF was clearly associated with depression/anxiety in the valuation tariff, i.e. respectively, PF-EF-PA versus PF-RF- EQ-5D-3L. Direct Mapping of the QLQ-C30 to EQ-5D Preferences 169 Table 2 Pearson correlations QLQ-C30/EQ-5D-3L Mobility Self-care Usual activity Pain/discomfort Depression/anxiety between QLQ-C30 scores and EQ-5D-3L for signiﬁcant Physical function (PF) -0.603 -0.595 -0.609 -0.427 -0.268 variables in the full model by Role function (RF) -0.415 -0.391 -0.683 -0.412 -0.213 Jang et al.  (all p \ 0.001) Emotional function (EF) -0.162 -0.286 -0.374 -0.350 -0.590 Social function (SF) -0.277 -0.305 -0.516 -0.391 -0.232 Fatigue (FA) ?0.469 ?0.380 ?0.555 ?0.501 ?0.256 Pain (PA) ?0.275 ?0.348 ?0.366 ?0.699 ?0.256 Dyspnoea (DY) ?0.372 ?0.301 ?0.453 ?0.278 ?0.167 Observed EQ_5D NSCLC Utilities Table 3 Goodness of ﬁt measures for the full and reduced ordinary UK Tariff least squares regression non-small-cell lung cancer model (UK tariff) Adj-R Log-likelihood AIC BIC RMSE Full model 0.58 48.5 -89.0 -76.4 0.1847 Reduced model 0.57 52.77 -73.6 -23.2 0.1869 AIC Akaike information criterion, BIC Bayes information criterion, RMSE root mean square error residuals tail, which in theory leads to biased OLS estimators. -.4 -.2 0 .2 .4 .6 .8 1 One can also see clearly that the estimated OLS utilities Observed EQ_5D-3L Utility overestimate the ‘true’ observed utilities below 0.50 and Fig. 1 Observed EQ-5D-3L utility values underestimate utilities equal to one with the ‘best’ ﬁtting occurring in the interval 0.50–0.85. Notice also the gap EF-PA, with RF becoming non-signiﬁcant. However, around 0.90 inherent to the UK Tariff valuation. remarkably, the overall explained variance of the reduced In the following sections, we present the results of model was similar (adjusted R 0.58), with barely a change alternative regression methods using the same reduced in the adjusted R compared with the full model and a model. This allows us to estimate a ‘pure method’ effect similar RMSE of 0.187 and equal to that obtained by Jang compared with OLS without introducing additional et al.  (adjusted R 0.57 and 0.58 for the full and reduced explanatory variables. linear models, respectively). The GOF statistics of the full and reduced UK tariff benchmark OLS model are pre- 3.2.2 Tobit Regression sented in Table 3. As the adjusted R and RMSE were very close, we We ﬁnd very comparable results as in OLS and a somewhat performed a classical LR test (v = 8.55, p = 0.74), which improved ﬁt for utilities equal to one. [see Appendix 4 and indicated the reduced model was not different from the full Fig. S1 in the Electronic Supplementary Material (ESM)]. one; therefore, we decided to use the reduced model as our benchmark . 3.2.3 CLAD regression We also plotted the residuals to assess departure from normality (Fig. 2). Visually, CLAD regression with a lower limit set at - Clearly at the lower end of the quantile plot, residuals 0.319 does not seem to improve the ﬁt much compared deviate from the normal quantile line but are otherwise with OLS, with the ﬁt perhaps even slightly worse for rather well behaved. lower utilities (see Appendix 4 and Fig. S2 in the ESM). We also formally tested for the presence of heteroscedasticity of the residuals and their normality by 3.2.4 Normal Mixture Regression applying the Breusch–Pagan test and the Shapiro–Wilks test on the OLS residuals (Table 4). We ﬁrst ﬁtted an uncensored NMIX model with two and The assumption of homoscedasticity and normality of three components to the data. Compared with the two- the residuals are rejected, with mainly a large non-normal component model, barely any difference can be Percent 0 5 10 15 20 170 R. Crott Fig. 2 Normal quantile plot of residuals in benchmark non- small-cell lung cancer reduced ordinary least squares model Table 4 Benchmark ordinary Breusch–Pagan test Shapiro–Wilks test Prob [ z least squares model tests for 2 2 heteroscedasticity and normality Variable v v p value WV z of residuals df p Physical functioning 18.98 0.0000 –– – Emotional functioning 12.58 0.0012 –– – Pain 14.65 0.0004 –– – Simultaneous 25.51 0.0000 0.95326 6.117 4.135 0.00002 Bonferroni corrected 3.2.6 Beta-Binomial Regression distinguished between the two-component and three-com- ponent mixture models (see Appendix 4 and Figs. S3, S4 Recently, some authors used a BB regression similar in and S5 in the ESM). However, the ﬁt for utilities = 1 was still poor in both models and did not improve in the three- some respects to the zero–one inﬂated beta (ZOIB) model for mapping purposes [15, 16]. component model. We performed a similar regression using the ZOIB 3.2.5 Beta Regression procedure in Stata  by putting all negative utility values equal to zero and considering only a one-inﬂated We also ﬁtted a simple beta regression as proposed by model. Hunger et al.  using a maximum likelihood procedure This is obviously one of the drawbacks of all beta-re- (Betaﬁt procedure in Stata). gression approaches, as they are constrained to a [0, 1] interval. However, it did seem to improve somewhat the ﬁt We ﬁrst transformed the utility range to constrain the data in the range ]0,1[ by applying the formula for low utility values compared with a simple beta-re- gression approach (see Appendix 4 and Fig. S6 in the Uscale_UKbeta = (Uscale_UK 9 (172 - 1) ? 0.5)/172 . ESM). This generated a more constrained range of utilities with mean 0.675 (±0.283) very similar to the original data but 3.2.7 Piecewise Linear Regression with a maximum of 0.997 instead of 1. However, there were still eight observations with negative values, which To construct a piecewise linear regression, we split the were discarded from the regression. sample at 0.50 (following the OLS results in Fig. 3)to separate low utilities and higher utilities as demonstrated No signiﬁcant improvement in GOF seems to appear except for a slightly better ﬁt for lower utilities (see by Versteegh et al. . Likewise, we separated utilities equal to one from the rest. Appendix 4 and Fig. S5 in the ESM). Direct Mapping of the QLQ-C30 to EQ-5D Preferences 171 Fig. 3 Predicted versus observed utilities in non-small- cell lung cancer ordinary least squares benchmark model. Diagonal line indicates the perfect ﬁt -.5 0 .5 1 Observed Utility EQ5D UK tariff Linear prediction We therefore had three separate subgroups to estimate, these observations were not adequately predicted. Never- with utilities ranging from -0.319 to 0.50, from 0.51 to theless, for other utility values than one, this approach 0.99 and equal to one. seemed to give quite a good overall ﬁt compared with OLS. We ﬁrst used a logistic regression to predict which observations would be equal to one by setting all other 3.2.8 Summary of Goodness-of-Fit Measures Across observations equal to zero to obtain a binary dataset. We Regression Methods then regressed all QLQ-C30 functional scales on the binary utility outcome (0–1) to obtain a predictive ﬁt (see the When looking at the regression coefﬁcients per regression tables in the appendix in the ESM). method (Table 5), we observed a relative closeness of the The two other subgroups were then estimated separately OLS, Tobit and CLAD coefﬁcients but a much more pro- by OLS using only the three retained signiﬁcant scores nounced difference between the simple and ZOIB from the reduced benchmark OLS regression, as we approaches, whereas the two NMIX components are expected a difference in the coefﬁcients between the low clearly different, as are the high and low parts of the and high utility subgroups. piecewise regression. However, the odds ratios in the As can be seen, the slopes of the regression lines were logistic regression are barely different from one, indicating nearly identical between the low and high utility groups for a poor predictive value of the function scores for patients all three scores (Fig. 4). with utility equal to one. We then joined the predictions of all three sub-models and compared the results with the original utility values 3.2.9 Model Performance (Fig. 5). The piecewise OLS regression on utility values below The three-part model scored better on most validation one gave quite a good ﬁt, with a nearly identical slope for statistics in Table 6, except for the mean utility estimation. the low and high regression lines in all cases. However, the The lower mean of the piecewise regression is partly due to logistic regression failed to adequately predict a number of the choice of the replacement estimated utility for the 18 observations with utility equal to one. observations with a mismatch between the binary utility Even with the whole set of QLQ-C30 functional and estimation by the logistic regression and the observed symptom scores as predictors, the sensitivity was only utility (observed 1, estimated 0). In those cases, we sub- equal to 0.52 with no more of 14 of the 29 observations stituted the predicted utility by its estimated value from the correctly predicted, although speciﬁcity was high (0.95) high utilities (range 0. 51–0.90) OLS regression. However, (see Appendix 1 in the ESM). This is because a number of this underestimates the true utility value (mean 0.784, observations with observed utilities equal to one presented range 0.689–0.813). Using the predictive values from the with some relatively low function scores and therefore overall benchmark OLS regression instead increased the Estimated Utility (Yhat) -.5 0 .5 1 172 R. Crott Fig. 4 Low-high utilities separate regressions: a QLQ- C30 physical function (PFscore); b QLQ-C30 emotional function (EFscore); c QLQ-C30 pain score (PAscore) 0 20 40 60 80 100 Physical function Observed Utility EQ5D UK tariff Fitted values Observed Utility EQ5D UK tariff Fitted values 0 20 40 60 80 100 Emotional Function Observed Utility EQ5D UK tariff Fitted values Observed Utility EQ5D UK tariff Fitted values 0 20 40 60 80 100 Pain scale Observed Utility EQ5D UK tariff Fitted values Observed Utility EQ5D UK tariff Fitted values Uscale Uscale Uscale -.5 0 .5 1 -.5 0 .5 1 -.5 0 .5 1 Direct Mapping of the QLQ-C30 to EQ-5D Preferences 173 Fig. 5 Predicted versus observed utilities in non-small- cell lung cancer: piecewise linear model. Diagonal line indicates the perfect ﬁt Table 5 Regression coefﬁcients per regression method Dep OLS Tobit CLAD NMIX NMIX Simple ZOIB Piecewise Piecewise Piecewise variable component component beta linear logit linear OLS – linear OLS – 1 2 low high PFscore 0.0058 0.0064 0.0059 0.0068 0.0031 0.0232 0.0156 0.6083 -0.0053 ?0.0022 EFscore 0.0018 0.0021 0.0014 0.0034 0.0010 0.0104 0.0056 0.0350 -0.0010 ?0.0010 PAscore -0.0033 -0.0037 -0.0033 -0.0041 -0.0016 -0.0149 -0.0092 -0.0619 ?0.0033 -0.0010 Constant 0.196 0.159 0.251 -0.0594 0.493 -1.1748 -0.7221 -8.882 ?0.5528 0.5085 As our emphasis is on the choice between regression methods and their likeness with a ﬁxed set of explanatory variables and not to provide a usable mapping algorithm as such for the QLQ-C30, we chose not to present the conﬁdence intervals in this table. All coefﬁcients in all regressions were signiﬁcant at the p = 0.05 level with narrow conﬁdence intervals CLAD censored least absolute deviation, EFscore emotional function, NMIX normal mixture, OLS ordinary least squares, PAscore pain, PFscore physical function, ZOIB zero–one inﬂated beta estimate (mean 0.829, range 0.577–0.917) somewhat but 4 Discussion not sufﬁciently, as both underestimated the utilities at the higher end above 0.90 (see Appendix 2 in the ESM). Our results show that none of the alternative methods fared When focusing on the predicted mean utility, OLS better than OLS except a three-part linear piecewise OLS/ proved the most accurate because its underestimation of logit when based on the usual observation-based GOF high utilities was compensated by its overestimation of low measures. utilities. Whether this is by happenstance or is a constant The best predictive ﬁt was obtained by a mix of OLS feature in QLQ-C30 mapping to the EQ-5D-3L is unclear regression(s) for utilities lower than one with a cut-off (Fig. 6; Table 7). point of 0.50 and a separate binary logistic regression for GOF measures are all in favour of the three-part model, utilities equal to one, but single OLS had the best pre- except for the Bayesian information criterion (BIC), which dicted mean utility. However, the prediction of utilities favours a simple beta regression. Although, when rerun- equal to one was poor in all regression approaches and ning it per utility class of poor and good health patients, the should be further explored and improved in future map- BIC results were very similar (-20 and -285, respec- ping studies (see appendix 4 ﬁgures S1 to S7 in the tively) to those of the piecewise model. ESM). 174 R. Crott Table 6 Summary validation statistics of predicted utilities (YHAT) Methods Observed OLS Tobit CLAD Simple ZOIB NMIX 2 Piecewise linear with logit beta components component Mean 0.676 0.676 0.700 0.707 0.694 0.667 0.688 0.654–0.663 Range 1.319 1.075 1.194 0.872 0.823 0.934 0.917 1.203 SD 0.28 0.22 0.24 0.17 0.191 0.204 0.185 0.26 Median 0.74 0.73 0.76 0.75 0.755 0.698 0.72 0.74 Minimum 20.319 0.110 -0.174 0.073 0.076 0.001 0.017 20.203 Maximum 1 0.965 1.021 0.946 0.898 0.935 0.934 1 SEM 0.017 0.016 0.018 0.013 0.015 0.016 0.014 0.019 Lower 95% CI of 0.643 0.645 0.674 0.681 0.664 0.637 0.660 0.617 mean Upper 95% CI of 0.709 0.708 0.746 0.733 0.722 0.697 0.714 0.691 mean Skewness 21.38 -0.98 -0.98 -0.94 -1.18 -1.16 -0.98 21.53 The SEM allows us to calculate the 95% conﬁdence interval of the mapped means (CI = mean ± 1.96 SEM) in a hypothetical population In bold, the best ﬁtting method according to the criterion in question CI conﬁdence interval, CLAD censored least absolute deviation, OLS ordinary least squares, SD standard deviation, SEM standard error of the mean, ZOIB zero–one inﬂated beta Depending on the mismatch imputation method used Fig. 6 Mean predicted utility Linear Regressions Comparison per observed utility decile Means per decile 0 .2 .4 .6 .8 1 Utility decile avgUscaleUK (mean) OLS (mean) piecewise (mean) betabinomial 4.1 Comparison with Recent Studies 0.749 versus 0.75, with the BB regression yielding the best accuracy. Khan and Morris  used a BB approach and compared it Nonetheless, when testing each developed model on the to linear, quadratic, Tobit, CLAD and quantile regression other trial data, performance was degraded, especially for in data from two NSCLC trials (TOPICAL and SOCCAR) the SOCCAR algorithm, resulting in an RMSE of 0.132 and obtained an MAE of, respectively, 0.10 and 0.13 and (TOPICAL ? SOCCAR) and 0.159 (SOCCAR ? TOPI- an RMSE of 0.09 and 0.11. The predicted mean compared CAL) with the 95% conﬁdence interval of the estimated with the observed mean utility were 0.608 versus 0.61 and mean only containing the true mean in 60% of the cases. Estimated Utility -.5-.4-.3-.2-.1 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Direct Mapping of the QLQ-C30 to EQ-5D Preferences 175 Table 7 Non-small-cell lung cancer regression goodness-of-ﬁt data Methods Observed OLS Tobit CLAD Simple ZOIB01 NMIX model 2 Three-part piecewise beta components linear RMSE – 0.184 0.208 0.197 0.174 0.183 0.186 0.104 MAE – 0.135 0.135 0.144 0.135 0.136 0.152 0.073 BIC -23 ?31 NA -187 ?68 -115 ?123 -22 -281 # Obs Abs error [0.05 – N = 130 N = 125 N = 124 N = 122. N = 131 N = 148 N = 87 # Obs [1 0 02 00 0 0 0 # Obs \ 0 (negative 8 1 12 0 0 0 0 6 utilities) abs absolute, BIC Bayesian information criterion, CLAD censored least absolute deviation, MAE mean absolute error, NA not applicable, NMIX normal mixture, obs observed, OLS ordinary least squares, RMSE root mean square error, ZOIB zero–one inﬂated beta Sigma logistic, OLS \ 0.50, OLS C 0.50 They also showed that the worse the health state, the more changes in the results, although comparisons of EQ-5D-3L the regressions, whatever the method, overstated the EQ- tariffs, at least within European countries, show them to be 5D-3L utilities. quite close . This effect would be expected to be more Wailoo et al.  used a bespoke mixture model with pronounced for non-European EQ-5D-3L tariffs [22, 23]. four components to map the Bath Ankylosing Spondylitis Second, some previous published studies used earlier Disease Activity Index (BASDAI) to EQ-5D-3L utilities versions of the QLQ-C30. Although the differences and compared it with a linear model and an indirect method between the different versions of the QLQ-C30 are rela- based on a generalized ordered probit model. They showed tively small and relate only to two or three of the function that the best ﬁt was obtained by their mixture model. scales, this may also possibly inﬂuence the external validity However, MAE and RMSE were rather elevated: 0.158 and of the mapping algorithm. Regardless, QLQ-C30 version 3 0.210. To our knowledge, their method has not yet been is currently the most widely used. applied to cancer data. Third, our sample is relatively small and does not Skaltsa et al.  used a separate logistic model in a include repeated measurements, which could introduce three-part approach to estimate EQ-5D-3L utilities from more variability and a possible time trend. the FACT-P questionnaire in patients with prostate cancer Fourth, we did not try to compare direct regression to (mean utility 0.688 ± 0.0282) and compared it with a indirect response mapping methods for mapping purposes, single linear generalized estimating equation (GEE) nor did we try to test our results on another independent data regression and with a three-part model consisting of a sample as this was outside the scope of the current study . logistic regression and two separate GEE regressions with a breakpoint ﬁxed at 76 points of the total FACT score. The 4.3 Scope of Applications latter showed the best performance, with an RMSE of 0.162 and an MAE of 0.117 and a high R of 0.718, with Although a linear piecewise three-part model approach the predictive ﬁt decreasing for utility values below 0.50. looks promising and is relatively easy to use, more com- Their results are largely in agreement with ours, high- parative research is needed with similar data both in lung lighting the different nature of the data-generating process cancer and in other cancer types to assess the stability and in patients in poor, good and perfect health. replicability of our results regarding the use of three-part models for the purposes of mapping QLQ-C30 scores to 4.2 Study Limitations EQ-5D-3L utilities . Our study compared alternative regression methods for mapping purposes in the cancer ﬁeld. Nevertheless, it 5 Conclusions suffers from several limitations. First, the UK EQ-5D-3L tariff was used in in all of the As yet, no preferred mapping method is advocated in the datasets to enhance comparability. It is possible that using literature, so our primary goal was to compare whether some the original tariffs from other countries would lead to some published or recommended single regression methods for 176 R. Crott Qual Life Res. 2013;22(5):1045–54. doi:10.1007/s11136-012- mapping QLQ-C30 to the EQ-5D-3L would yield reasonably 0220-9. accurate predictive results in a selected dataset and whether 2. Crott R. Mapping algorithms from QLQ-C30 to EQ-5D utilities: we could improve on this using a three-part approach. no ﬁrm ground to stand on yet. Expert Rev Pharmacoecon Out- Our results indicate that the best approach is a piecewise comes Res. 2014;14(4):569–76. 3. Petrou Stavros, Rivero-Arias Oliver, Dakin Helen, Longworth mix of two separate OLS and one binary logistic regres- Louise, Oppe Mark, Froud Robert, Gray Alastair. The MAPS sion, while—surprisingly—OLS still had the best predicted reporting statement for studies mapping onto generic preference- overall mean utility. based outcome measures: explanation and elaboration. Pharma- We conclude, nevertheless, that direct mapping regres- coeconomics. 2015;33(10):993–1011. 4. Jang R, Isogai P, Mittmann N, et al. Derivation of utility values sion methods based on a single distribution should be used from European organization for research and treatment of cancer with great care, especially for low and very high utilities, as quality of life-core 30 questionnaire values in lung cancer. these methods generally do not adequately represent the J Thorac Oncol. 2010;5(12):1953–7. speciﬁcs of the tri-modal distribution of EQ-5D-3L pref- 5. Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development and testing of the D1 valuation model. erence values. Med Care. 2005;43(3):203–20. Therefore, EQ-5D-3L mapping methods based on three 6. Dolan P. Modeling valuations for EuroQol health states. Med components or three-part models should be preferred Care. 1997;35(11):1095–108. [25, 26] and further investigated with emphasis on the 7. EORTC QLQC30 Scoring Manual, EORTC, Brussels, Belgium http://groups.eortc.be/qol/manuals. Accessed 3 Aug 2017. upper ceiling problem. 8. Mortimer D, Segal L. Comparing the Incomparable? A system- Whether our results can also be extended to other cancer atic review of competing techniques for converting descriptive QOL scales such as the widely used FACT questionnaire or measures of health status into QALY-weights. Med Decis Mak. to generic utilities measures other than the EQ-5D-3L and 2008;28(1):66–89. 9. McKenzie L, Van der Pol M. Mapping the EORTC QLQ-C30 in other cancer types remains to be assessed. Whether our onto the EQ-5D instrument: the potential to estimate QALYs ﬁndings also apply to the more recently developed ﬁve- without generic preference data. Value Health. level scale (EQ-5D-5L) is unknown. 2009;12:167–71. 10. Versteegh MM, Leunis MA, Luime JJ, et al. Mapping QLQ-C30, Acknowledgements The author thanks Dr. Leighl from the Depart- HAQ, and MSIS-29 on EQ-5D. Med Decis Mak. ment of Medicine, Division of Medical Oncology, Princess Margaret 2012;32(4):554–68. Hospital, University of Toronto, and Dr. Mittmann from the Health 11. Round J, Hawton A. Statistical alchemy: conceptual validity and Outcomes and PharmacoEconomic (HOPE) Research Centre, Sun- mapping to generate health state utility values. Pharmacoecon nybrook Research Institute, University of Toronto, for their help in Open. 2017;. doi:10.1007/541669-017-0027-2. gaining access the NSCLC data. 12. Institute for Digital Research and Education. FAQ: How are the likelihood ratio, WALD, and Lagrange multiplier (score) tests Data Availability Statement The data that support the ﬁndings of different and/or similar. https://stats.idre.ucla.edu/other/mult-pkg/ this study are available from Dr Leighl from the Department of faq/general/faqhow-are-the-likelihood-ratio-wald-and-lagrange- Medicine, Division of Medical Oncology Princess Margaret Hospital, multiplier-score-tests-different-andor-similar/. Accessed 17 July University of Toronto, Toronto, Canada, but restrictions apply to the availability of these data, as they were used under license for the 13. Hunger M, Baumert J, Holle R. Analysis of SF-6D index data: is current study and so are not publicly available. beta regression appropriate? Value Health. 2011;14(5):759–67. 14. Smithson M, Verkuilen J. A better lemon squeezer? Maximum- likelihood regression with beta-distributed dependent variables. Psychol Methods. 2006;11(1):54–71. Compliance with Ethical Standards 15. Khan I, Morris S. A non-linear beta-binomial regression model for mapping EORTC QLQ- C30 to the EQ-5D-3L in lung cancer Funding No funding was received for this study. patients: a comparison with existing approaches. Health Qual Life Outcomes. 2014;12:163. Conﬂict of interest The author is not aware of any existing conﬂicts ´ ´ 16. Arostegui I, Nun˜ez-Anton V, Quintana J. Analysis of the short of interest. form-36 (SF-36): the beta-binomial distribution approach. Stat Med. 2006;26(6):1318–42. Open Access This article is distributed under the terms of the 17. Buis M, Maarten L. ZOIB: Stata module to ﬁt a zero-one inﬂated Creative Commons Attribution-NonCommercial 4.0 International beta distribution by maximum likelihood. Statistical Software License (http://creativecommons.org/licenses/by-nc/4.0/), which per- Components (2012). https://ideas.repec.org/c/boc/bocode/ mits any noncommercial use, distribution, and reproduction in any s457156.html. Accessed 30 July 2017. medium, provided you give appropriate credit to the original 18. Versteegh MM, Rowen D, Brazier JE, Stolk EA. Mapping onto author(s) and the source, provide a link to the Creative Commons Eq-5 D for patients in poor health. Health Qual Life Outcomes. license, and indicate if changes were made. 2010;8(1):1. 19. Wailoo A, Herna´ndez M, Philips C, Brophy S, Siebert S. mod- eling health state utility values in ankylosing spondylitis: com- References parisons of direct and indirect methods. Value Health. 2015;18(4):425–31. 20. Skaltsa K, Longworth L, Ivanescu C, et al. Mapping the FACT-P 1. Crott R, Versteegh M, Uyl-de-Groot C. An assessment of the to the preference-based EQ-5D questionnaire in metastatic external validity of mapping QLQ-C30 to EQ-5D preferences. Direct Mapping of the QLQ-C30 to EQ-5D Preferences 177 castration-resistant prostate cancer. Value Health. 24. Hernandez Alava M, Wailoo A, Wolfe F, Michaud K. A com- 2014;17(2):238–44. parison of direct and indirect methods for the estimation of health 21. Greiner W, Weijnen T, Nieuwenhuizen M, et al. A single utilities from clinical outcomes. Med Decis Mak. European currency for EQ-5D health states. Results from a six 2014;34(7):919–30. country study. Eur J Health Econ. 2003;4(3):222–31. 25. Kent S, Gray A, Schlackow I, Jenkinson C, McIntosh E. Mapping 22. Karlsson JA, Nilsson JA, Neovius M, Kristensen LE, Gu¨lfe A, from the Parkinson’s disease questionnaire PDQ-39 to the generic Saxne T, Geborek P. National EQ-5D tariffs and quality-adjusted EuroQol EQ-5D-3L: the value of mixture models. Med Decis life-year estimation: comparison of UK, US and Danish utilities Mak. 2015;35(7):902–11. in south Swedish rheumatoid arthritis patients. Ann Rheum Dis. 26. Verkuilen J, Smithson M. Mixed and mixture regression models 2011;70(12):2163–6. for continuous bounded responses using the beta distribution. 23. Lien K, Tam VC, Ko YJ, Mittmann N, Cheung MC, Chan KKW. J Educ Behav Stat. 2012;37(1):82–113. Impact of country-speciﬁc EQ-5D-3L tariffs on the economic value of systemic therapies used in the treatment of metastatic pancreatic cancer. Curr Oncol. 2015;22(6):e443–52.
PharmacoEconomics - Open – Springer Journals
Published: Aug 7, 2017
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera