Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Randomized trials and propensity score analyses in transcatheter aortic valve replacement: how should we interpret the results?

Randomized trials and propensity score analyses in transcatheter aortic valve replacement: how... Randomized trials, Propensity score analyses, Transcatheter aortic valve replacement, Coronary artery bypass grafting, Clinical epidemiology, Meta-analysis INTRODUCTION Transcatheter aortic valve replacement (TAVR) is increasingly considered an established treatment for patients with severe symptomatic aortic stenosis who are at high risk of surgical mortality or who are not suitable for surgery [1, 2]. Success in high-risk patients has inevitably raised the potential to extend the use of TAVR technology to lower risk subjects, fuelled by both clinician innovation and commercial interests. The evidence base to support this development has included randomized trials and quasi-experimental propensity score based analyses. In this article, we discuss the landmark available studies (summarized in Table 1), their strengths and limitations, and make some general evidence based recommendations on their interpretation. Table 1: Overview of major TAVR studies PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc TAVR: transcatheter aortic valve replacement. Table 1: Overview of major TAVR studies PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc TAVR: transcatheter aortic valve replacement. DEVICE DEVELOPMENT AND REGULATION A challenge of device development and regulation is that each current version of a Class 3 medical device (be it a pacemaker or a TAVR system) is effectively a prototype for future more advanced versions. Unlike the field of pharmaceutical regulation, where the form of a treatment is described in the patent and specified exactly through the broader regulatory process, devices often experience an ongoing development based on a series of incremental steps. Often individual innovation steps may seem superficial or simply additive (e.g. improving the durability of batteries in pacemakers), although a challenge exists in separating substantive development from less important, and a series of small steps may collectively raise real questions about differences in effectiveness and safety. WHEN TO DO A RANDOMIZED TRIAL Randomized outcomes trials cost millions of dollars to conduct. They are normally required for regulatory purposes each time a new pharmaceutical is granted a marketing authorization, although there are exceptions such as when a pharmaceutical is used in high-risk patients and rare conditions [6] or where new treatment bridges from an existing formulation [7]. Similarly, Class 3 medical devices also often require randomized trials in order to support an application for marketing approval; however, such devices may not require a new randomized trial each time a new device version is marketed. Edwards Life Sciences sponsored the PARTNERS 2 trial to evaluate the SAPIEN XT valve system compared with conventional surgery in patients with severe aortic stenosis and intermediate-risk clinical profiles [3]. SAPIEN XT differed from the previous SAPIEN in having a thinner strut, a cobalt–chromium frame, a partially closed resting geometry of the bovine pericardial leaflets, the addition of a valve size that is 29 mm in diameter, and a reduced profile delivery catheter [3]. PARTNERS 2 randomized 2032 patients on a 1:1 basis between the 2 experimental conditions and found a non-significant reduction in the composite event of death from all cause or disabling stroke at 2 years, hazard ratio of 0.89, 95% confidence interval (CI) of 0.73–1.09, and P = 0.25. The trialists’ criterion for non-inferiority was a hazard ratio upper CI smaller than 1.2. Some may feel that this non-inferiority boundary (equating to a risk difference in death or disabling stroke less than 4.2% on an absolute scale) to include values of the difference which are actually clinically relevant, although, of note, it is more rigorous than those applied to the 2 other studies. The SAPIEN 3 observational study included patients with moderate symptomatic aortic stenosis treated with the SAPIEN 3 value system which is described as differing from the previous XT system, with improved geometry of the trileafelet bovine pericardial valve; different cobalt alloy frame, which is longer than the early version of the balloon-expandable valve system (SAPIEN XT valve; Edwards Lifesciences) with more open outlet cells and denser inlet cells; a polyethylene terephthalate fabric skirt sewn to the bottom portion of the interior and exterior of the frame (providing an external circumferential seal to reduce paravalvular leak); 4 valve sizes (20 mm, 23 mm, 26 mm and 29 mm diameters); and lower-profile delivery catheters with more precise valve positioning inserted through 14 or 16 Fr expandable sheaths for transfemoral access [4]. PROPENSITY SCORES Rather than conducting a further randomized trial, Edwards Lifesciences undertook a propensity score analysis comparing the SAPIEN 3 observational study patients with the surgical patients in PARTNERS 2 [4, 8]. Unlike the more rigorous propensity score matched approach, the authors merely stratified patients into 5 quintiles by propensity score to address confounders between the comparator groups. The propensity score approach was developed by Rosenbaum and Rubin [9] in order to provide an efficient method for quasi-experimental comparison between treatments using non-randomized comparative data. The method requires the calculation of a propensity score for each subject (the likelihood that a patient will receive the treatment of interest) derived from a logistic regression model including patients’ characteristics as explanatory variables and exposure to the experimental therapy as the response or the dependent variable. An imperfect instrument to account for bias, the propensity score relies upon the inclusion of the appropriate observed characteristics, and that there are no important omissions of those characteristics, such that exposure to treatment carries no additional risk compared to control apart from that derived from the comparison of treatment strategies. However, the additional risks (if any) associated with exposure are completely confounded with that exposure and are thus a latency in the data set which cannot be elicited directly [10]. In other words, the key assumption of the propensity score that any additional risks, other than the effect of treatment, are conditioned out by the propensity score cannot be measured directly and may lead to substantial bias in the results. Bias is particularly likely when a treatment exposure (e.g. TAVR) is the result of an expert clinical judgement, as inevitably the complexity of such judgements will not, indeed often cannot, be captured by an observational dataset. This problem is discussed extensively in the context of aldosterone antagonists in heart failure in Freemantle et al. [10] and is a form of ‘confounding by indication’. There is also a useful discussion of systematic bias in propensity scores in acute coronary syndromes by Dahabreh et al. [11]. In Fig. 1, the main results of PARTNERS 2 [3] and Thourani et al. [4] on all-cause mortality and disabling stroke are contrasted. The difference between the results is striking and may strain plausibility, implying through indirect comparison that the SAPIEN 3 value system is significantly better than the SAPIEN X system on this outcome with a hazard ratio of 0.66 (95% CI 0.48–0.90; P = 0.008). In other words, it seems unlikely that the propensity score stratification has accounted adequately for confounding in the comparison and instead, patients with a relatively good prognosis have been recruited to the SAPIEN 3 observational study [4]. Figure 1: View largeDownload slide Randomized and propensity score based analyses. Figure 1: View largeDownload slide Randomized and propensity score based analyses. NON-INFERIORITY MARGINS Because of the probabilistic nature of estimation, it is challenging to demonstrate that there is no difference between the 2 treatments; all comparisons are undertaken in the context of measurement error. The concept of non-inferiority [12] is best understood in the context of CIs. A study excludes a risk that is outside the CI. In clinical areas where there is regular need to undertake non-inferiority trials, the non-inferiority boundary is a given, declared by the regulatory authority on the basis of informed clinical experience. For example, in diabetes, the non-inferiority margin in regulatory studies is glycated haemoglobin (HbA1c) <0.3% [13], which is widely recognised to be a clinically trivial value. In cardiac surgery, there is no prespecified non-inferiority boundary, and we observe substantial variability among the individual criteria specified in trials. Thus for SURTAVI, the non-inferiority boundary was an absolute risk difference of 7% [5]; for PARTNERS 2 [3], it was effectively an absolute risk difference of 4.2% (although somewhat unhelpfully specified on the ratio scale as a hazard ratio of 1.2 and thus depended upon the rate of events in the control condition). These boundaries may be considered surprisingly wide and varied; nearly 7 more subjects in one hundred treated experiencing major morbidity or death may not be considered trivial by patients and clinicians interpreting the trials. Regulators could usefully take a stronger position on this point given, the importance of the clinical area and the relative commonness of the intervention. RANDOMIZED TRIALS OF TRANSCATHETER AORTIC VALVE REPLACEMENT VERSUS SURGERY Ronald Fisher, the father of biostatistics, commented in 1935 that ‘the simple act of randomization assures the internal validity of the test for significance’ [14]. However in order to benefit from this protection, the act of randomization must be preserved in the implementation of the trial and the analysis. In comparative trials, subjects must be prepared to receive either intervention on offer in the trial, and clinicians must also be content that either may be used. This can prove challenging. In the 2 main trials of TAVR [3, 5], the baseline characteristic of the included subjects highlight that these are samples which may be considered optimal for TAVR, with for example substantial rates of prior coronary artery bypass grafting (CABG). The intention-to-treat principle preserves randomization, regardless of the treatment actually received, having the consequence ‘that subjects allocated to a treatment group should be followed up, assessed and analysed as members of that group irrespective of their compliance to the planned course of treatment’ [15]. In SURTAVI [5] there was a substantial imbalance in the extent to which subjects randomized to each experimental condition (TAVR or surgery) actually received that intervention with 1.7% of the TAVR not receiving the allocated intervention versus 8.2% of the surgery patients not receiving that treatment (P < 0.0001). The authors incorrectly undertook a biased, modified intention-to-treat analysis, which counted patients as having been randomized when randomized treatment was attempted. This approach is reasonably used in randomized double blind pharmaceutical trials where the knowledge of which treatment the subject will take is obfuscated by blinding. However, in a unblinded case, such as the SURTAVI trial, the subjects’ decision to undertake treatment (and their clinicians propensity to give them that treatment) may be mediated by the knowledge of the treatment on offer, as was clearly the case in SURTAVI [5], and results in bias. Fortunately, the authors also provided the analysis based on conventional intention to treat. Both the PARTNERS 2 [3] and SURTAVI [5] included intermediate-risk patients with severe aortic stenosis to undergo either TAVR or surgical replacement. However, the characteristics of the patient population across both trials may not be representative of that patient group in practice, with for example a relatively high rate of prior CABG (21%) [3, 5]. Thus, despite broad inclusion criteria, the patient population actually recruited to the trials appears to be selective of likely candidates for TAVR. INTERPRETATION OF THE RESULTS OF PARTNERS 2 AND SURTAVI Figure 2A–C describe the pooled 2 year results for the SURTAVI intention-to-treat population [5] and the PARTNERS 2 trial population [3], for all-cause mortality plus disabling stroke and each component of the composite primary outcomes separately on the absolute risk difference scale. Figure 2: View largeDownload slide (A) Death or disabling stroke, percent difference and 95% CI at 24 months. (B) All cause mortality, percent difference and 95% CI at 24 months. (C) Disabling stroke, risk difference and 95% CI at 24 months Figure 2: View largeDownload slide (A) Death or disabling stroke, percent difference and 95% CI at 24 months. (B) All cause mortality, percent difference and 95% CI at 24 months. (C) Disabling stroke, risk difference and 95% CI at 24 months COMMENT There are several observations of note on the comparative data for TAVR and surgery. First, these comparisons represent the difference in effectiveness of TAVR and surgery in a patient population that appears to be selected as candidates for TAVR and thus may overestimate benefits in a less selected population. Second, in the major comparative randomized trials, the 95% CIs on all 3 pooled results are quite wide. The results reasonably exclude a benefit of surgery compared to TAVR of 0.7% on disabling stroke, but only 1.5% on the composite outcome of disabling stroke or all-cause mortality, and 2.2% on all-cause mortality. Third, given that surgery is the established therapy, the rational criteria for replacing surgery with transcatheter aortic valve implantation should be superiority or at least non-inferiority of TAVR over surgery, and we might expect tougher criteria for non-inferiority than we observe in trials. Fourth, given the increasing activity in randomized trials comparing established surgical procedures with new and less invasive technologies, the regulatory agencies should to take a stronger view of what effect may be considered as a non-inferiority boundary in these trials. Professional organizations in collaboration with patient representative groups could also usefully address this question together. Conflict of interest: none declared. REFERENCES 1 Vahanian A , Alfieri O , Andreotti F , Antunes MJ , Barón-Esquivias G , Baumgartner H. Guidelines on the management of valvular heart disease (version 2012): the joint task force on the management of valvular heart disease of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS) . Eur J Cardiothorac Surg 2012 ; 42 : S1 – 44 . Google Scholar CrossRef Search ADS PubMed 2 Nishimura RA , Otto CM , Bonow RO , Carabello BA , Erwin JP 3rd , Guyton RA et al. 2014 AHA/ACC guideline for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines . J Am Coll Cardiol 2014 ; 63 : e57 – 185 . Google Scholar CrossRef Search ADS PubMed 3 Leon MB , Smith CR , Mack MJ , Makkar RR , Svensson LG , Kodali SK et al. Transcatheter or surgical aortic-valve replacement in intermediate-risk patients . N Engl J Med 2016 ; 374 : 1609 – 20 . Google Scholar CrossRef Search ADS PubMed 4 Thourani VH , Kodali S , Makkar RR , Herrmann HC , Williams M , Babaliaros V et al. Transcatheter aortic valve replacement versus surgical valve replacement in intermediate-risk patients: a propensity score analysis . Lancet 2016 ; 387 : 2218 – 25 . Google Scholar CrossRef Search ADS PubMed 5 Reardon MJ , Van Mieghem NM , Popma JJ , Kleiman NS , Søndergaard L , Mumtaz M et al. Surgical or transcatheter aortic-valve replacement in intermediate-risk patients . N Engl J Med 2017 ; 376 : 1321 – 31 . Google Scholar CrossRef Search ADS PubMed 6 Hatswell AJ , Baio G , Berlin JA , Irs A , Freemantle N. Regulatory approval of pharmaceuticals without a randomised controlled study: analysis of EMA and FDA approvals 1999–2014 . BMJ Open 2016 ; 6 : e011666 . Google Scholar CrossRef Search ADS PubMed 7 Sanofi Aventis . Toujeo Summary of Product Characteristics . https://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206538lbl.pdf (8 March 2018, date last accessed). 8 Barili F , Freemantle N , Folliguet T , Muneretto C , De Bonis M , Czerny M et al. The flaws in the detail of an observational study on transcatheter aortic valve implantation versus surgical aortic valve replacement in intermediate-risks patients . Eur J Cardiothorac Surg 2017 ; 51 : 1031 – 35 . Google Scholar CrossRef Search ADS PubMed 9 Rosenbaum PR , Rubin DB. The central role of the propensity score in observational studies for causal effects . Biometrika 1983 ; 70 : 41 – 55 . Google Scholar CrossRef Search ADS 10 Freemantle N , Marston L , Walters K , Wood J , Reynolds MR , Petersen I. Making inferences on treatment effects from real world data: propensity scores, confounding by indication and other perils for the unwary in observational research . BMJ 2013 ; 347 : f6409 . Google Scholar CrossRef Search ADS PubMed 11 Dahabreh IJ , Sheldrick RC , Paulus JK , Chung M , Varvarigou V , Jafri H et al. Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes . Eur Heart J 2012 ; 33 : 1893 – 901 . Google Scholar CrossRef Search ADS PubMed 12 Mauri L , D’Agostino RB. Challenges in the design and interpretation of noninferiority trials . N Engl J Med 2017 ; 377 : 1357 – 67 . Google Scholar CrossRef Search ADS PubMed 13 Wangge G , Putzeist M , Knol MJ , Klungel OH , Gispen-De Wied CC , de Boer A et al. Regulatory scientific advice on non-inferiority drug trials . PLoS One 2013 ; 8 : e74818 . Google Scholar CrossRef Search ADS PubMed 14 Fisher RA. The Design of Experiments , 9th edn. London : Macmillan , 1935 . 15 International conference on harmonisation of technical requirements for registration of pharmaceuticals for human use . ICH Harmonised tripartite guideline. Statistical principles for clinical trials e9 . http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf (8 March 2018, date last accessed). © The Author(s) 2018. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png European Journal of Cardio-Thoracic Surgery Oxford University Press

Randomized trials and propensity score analyses in transcatheter aortic valve replacement: how should we interpret the results?

European Journal of Cardio-Thoracic Surgery , Volume Advance Article (6) – Mar 21, 2018

Loading next page...
 
/lp/oxford-university-press/randomized-trials-and-propensity-score-analyses-in-transcatheter-r2105oBHLX

References (16)

Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.
ISSN
1010-7940
eISSN
1873-734X
DOI
10.1093/ejcts/ezy120
Publisher site
See Article on Publisher Site

Abstract

Randomized trials, Propensity score analyses, Transcatheter aortic valve replacement, Coronary artery bypass grafting, Clinical epidemiology, Meta-analysis INTRODUCTION Transcatheter aortic valve replacement (TAVR) is increasingly considered an established treatment for patients with severe symptomatic aortic stenosis who are at high risk of surgical mortality or who are not suitable for surgery [1, 2]. Success in high-risk patients has inevitably raised the potential to extend the use of TAVR technology to lower risk subjects, fuelled by both clinician innovation and commercial interests. The evidence base to support this development has included randomized trials and quasi-experimental propensity score based analyses. In this article, we discuss the landmark available studies (summarized in Table 1), their strengths and limitations, and make some general evidence based recommendations on their interpretation. Table 1: Overview of major TAVR studies PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc TAVR: transcatheter aortic valve replacement. Table 1: Overview of major TAVR studies PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc PARTNER 2 [3] Thourani [4] SURTAVI [5] Design 2032 intermediate-risk patients with severe aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair 963 patients from the SAPIEN 3 observational study at intermediate risk with severe, symptomatic aortic stenosis and treated with TAVR were compared with 747 surgical aortic valve repair patients from PARTNERS 2 [4] with 1 year follow-up using a propensity score methodology 1746 patients at intermediate risk with severe, symptomatic aortic stenosis were randomized in an open label trial to TAVR or surgical aortic valve repair Primary end point Composite of death from any cause or disabling stroke at 24 months Composite of death from any cause, all strokes, and incidence of moderate or severe aortic regurgitation Composite of death from any cause or disabling stroke at 24 months Non-inferiority boundary The upper boundary of the 2-sided 95% confidence interval for the hazard ratio of the primary end point at 2 years was below a hazard ratio of 1.20 Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7.5% Considered non-inferior if the lower confidence interval on the primary outcome excludes an absolute risk difference of 7% Funding Edwards Life Sciences No funding declared for these analyses; PARTNERS 2 and SAPIEN 3 were funded by Edwards Life Sciences Medtronic Inc TAVR: transcatheter aortic valve replacement. DEVICE DEVELOPMENT AND REGULATION A challenge of device development and regulation is that each current version of a Class 3 medical device (be it a pacemaker or a TAVR system) is effectively a prototype for future more advanced versions. Unlike the field of pharmaceutical regulation, where the form of a treatment is described in the patent and specified exactly through the broader regulatory process, devices often experience an ongoing development based on a series of incremental steps. Often individual innovation steps may seem superficial or simply additive (e.g. improving the durability of batteries in pacemakers), although a challenge exists in separating substantive development from less important, and a series of small steps may collectively raise real questions about differences in effectiveness and safety. WHEN TO DO A RANDOMIZED TRIAL Randomized outcomes trials cost millions of dollars to conduct. They are normally required for regulatory purposes each time a new pharmaceutical is granted a marketing authorization, although there are exceptions such as when a pharmaceutical is used in high-risk patients and rare conditions [6] or where new treatment bridges from an existing formulation [7]. Similarly, Class 3 medical devices also often require randomized trials in order to support an application for marketing approval; however, such devices may not require a new randomized trial each time a new device version is marketed. Edwards Life Sciences sponsored the PARTNERS 2 trial to evaluate the SAPIEN XT valve system compared with conventional surgery in patients with severe aortic stenosis and intermediate-risk clinical profiles [3]. SAPIEN XT differed from the previous SAPIEN in having a thinner strut, a cobalt–chromium frame, a partially closed resting geometry of the bovine pericardial leaflets, the addition of a valve size that is 29 mm in diameter, and a reduced profile delivery catheter [3]. PARTNERS 2 randomized 2032 patients on a 1:1 basis between the 2 experimental conditions and found a non-significant reduction in the composite event of death from all cause or disabling stroke at 2 years, hazard ratio of 0.89, 95% confidence interval (CI) of 0.73–1.09, and P = 0.25. The trialists’ criterion for non-inferiority was a hazard ratio upper CI smaller than 1.2. Some may feel that this non-inferiority boundary (equating to a risk difference in death or disabling stroke less than 4.2% on an absolute scale) to include values of the difference which are actually clinically relevant, although, of note, it is more rigorous than those applied to the 2 other studies. The SAPIEN 3 observational study included patients with moderate symptomatic aortic stenosis treated with the SAPIEN 3 value system which is described as differing from the previous XT system, with improved geometry of the trileafelet bovine pericardial valve; different cobalt alloy frame, which is longer than the early version of the balloon-expandable valve system (SAPIEN XT valve; Edwards Lifesciences) with more open outlet cells and denser inlet cells; a polyethylene terephthalate fabric skirt sewn to the bottom portion of the interior and exterior of the frame (providing an external circumferential seal to reduce paravalvular leak); 4 valve sizes (20 mm, 23 mm, 26 mm and 29 mm diameters); and lower-profile delivery catheters with more precise valve positioning inserted through 14 or 16 Fr expandable sheaths for transfemoral access [4]. PROPENSITY SCORES Rather than conducting a further randomized trial, Edwards Lifesciences undertook a propensity score analysis comparing the SAPIEN 3 observational study patients with the surgical patients in PARTNERS 2 [4, 8]. Unlike the more rigorous propensity score matched approach, the authors merely stratified patients into 5 quintiles by propensity score to address confounders between the comparator groups. The propensity score approach was developed by Rosenbaum and Rubin [9] in order to provide an efficient method for quasi-experimental comparison between treatments using non-randomized comparative data. The method requires the calculation of a propensity score for each subject (the likelihood that a patient will receive the treatment of interest) derived from a logistic regression model including patients’ characteristics as explanatory variables and exposure to the experimental therapy as the response or the dependent variable. An imperfect instrument to account for bias, the propensity score relies upon the inclusion of the appropriate observed characteristics, and that there are no important omissions of those characteristics, such that exposure to treatment carries no additional risk compared to control apart from that derived from the comparison of treatment strategies. However, the additional risks (if any) associated with exposure are completely confounded with that exposure and are thus a latency in the data set which cannot be elicited directly [10]. In other words, the key assumption of the propensity score that any additional risks, other than the effect of treatment, are conditioned out by the propensity score cannot be measured directly and may lead to substantial bias in the results. Bias is particularly likely when a treatment exposure (e.g. TAVR) is the result of an expert clinical judgement, as inevitably the complexity of such judgements will not, indeed often cannot, be captured by an observational dataset. This problem is discussed extensively in the context of aldosterone antagonists in heart failure in Freemantle et al. [10] and is a form of ‘confounding by indication’. There is also a useful discussion of systematic bias in propensity scores in acute coronary syndromes by Dahabreh et al. [11]. In Fig. 1, the main results of PARTNERS 2 [3] and Thourani et al. [4] on all-cause mortality and disabling stroke are contrasted. The difference between the results is striking and may strain plausibility, implying through indirect comparison that the SAPIEN 3 value system is significantly better than the SAPIEN X system on this outcome with a hazard ratio of 0.66 (95% CI 0.48–0.90; P = 0.008). In other words, it seems unlikely that the propensity score stratification has accounted adequately for confounding in the comparison and instead, patients with a relatively good prognosis have been recruited to the SAPIEN 3 observational study [4]. Figure 1: View largeDownload slide Randomized and propensity score based analyses. Figure 1: View largeDownload slide Randomized and propensity score based analyses. NON-INFERIORITY MARGINS Because of the probabilistic nature of estimation, it is challenging to demonstrate that there is no difference between the 2 treatments; all comparisons are undertaken in the context of measurement error. The concept of non-inferiority [12] is best understood in the context of CIs. A study excludes a risk that is outside the CI. In clinical areas where there is regular need to undertake non-inferiority trials, the non-inferiority boundary is a given, declared by the regulatory authority on the basis of informed clinical experience. For example, in diabetes, the non-inferiority margin in regulatory studies is glycated haemoglobin (HbA1c) <0.3% [13], which is widely recognised to be a clinically trivial value. In cardiac surgery, there is no prespecified non-inferiority boundary, and we observe substantial variability among the individual criteria specified in trials. Thus for SURTAVI, the non-inferiority boundary was an absolute risk difference of 7% [5]; for PARTNERS 2 [3], it was effectively an absolute risk difference of 4.2% (although somewhat unhelpfully specified on the ratio scale as a hazard ratio of 1.2 and thus depended upon the rate of events in the control condition). These boundaries may be considered surprisingly wide and varied; nearly 7 more subjects in one hundred treated experiencing major morbidity or death may not be considered trivial by patients and clinicians interpreting the trials. Regulators could usefully take a stronger position on this point given, the importance of the clinical area and the relative commonness of the intervention. RANDOMIZED TRIALS OF TRANSCATHETER AORTIC VALVE REPLACEMENT VERSUS SURGERY Ronald Fisher, the father of biostatistics, commented in 1935 that ‘the simple act of randomization assures the internal validity of the test for significance’ [14]. However in order to benefit from this protection, the act of randomization must be preserved in the implementation of the trial and the analysis. In comparative trials, subjects must be prepared to receive either intervention on offer in the trial, and clinicians must also be content that either may be used. This can prove challenging. In the 2 main trials of TAVR [3, 5], the baseline characteristic of the included subjects highlight that these are samples which may be considered optimal for TAVR, with for example substantial rates of prior coronary artery bypass grafting (CABG). The intention-to-treat principle preserves randomization, regardless of the treatment actually received, having the consequence ‘that subjects allocated to a treatment group should be followed up, assessed and analysed as members of that group irrespective of their compliance to the planned course of treatment’ [15]. In SURTAVI [5] there was a substantial imbalance in the extent to which subjects randomized to each experimental condition (TAVR or surgery) actually received that intervention with 1.7% of the TAVR not receiving the allocated intervention versus 8.2% of the surgery patients not receiving that treatment (P < 0.0001). The authors incorrectly undertook a biased, modified intention-to-treat analysis, which counted patients as having been randomized when randomized treatment was attempted. This approach is reasonably used in randomized double blind pharmaceutical trials where the knowledge of which treatment the subject will take is obfuscated by blinding. However, in a unblinded case, such as the SURTAVI trial, the subjects’ decision to undertake treatment (and their clinicians propensity to give them that treatment) may be mediated by the knowledge of the treatment on offer, as was clearly the case in SURTAVI [5], and results in bias. Fortunately, the authors also provided the analysis based on conventional intention to treat. Both the PARTNERS 2 [3] and SURTAVI [5] included intermediate-risk patients with severe aortic stenosis to undergo either TAVR or surgical replacement. However, the characteristics of the patient population across both trials may not be representative of that patient group in practice, with for example a relatively high rate of prior CABG (21%) [3, 5]. Thus, despite broad inclusion criteria, the patient population actually recruited to the trials appears to be selective of likely candidates for TAVR. INTERPRETATION OF THE RESULTS OF PARTNERS 2 AND SURTAVI Figure 2A–C describe the pooled 2 year results for the SURTAVI intention-to-treat population [5] and the PARTNERS 2 trial population [3], for all-cause mortality plus disabling stroke and each component of the composite primary outcomes separately on the absolute risk difference scale. Figure 2: View largeDownload slide (A) Death or disabling stroke, percent difference and 95% CI at 24 months. (B) All cause mortality, percent difference and 95% CI at 24 months. (C) Disabling stroke, risk difference and 95% CI at 24 months Figure 2: View largeDownload slide (A) Death or disabling stroke, percent difference and 95% CI at 24 months. (B) All cause mortality, percent difference and 95% CI at 24 months. (C) Disabling stroke, risk difference and 95% CI at 24 months COMMENT There are several observations of note on the comparative data for TAVR and surgery. First, these comparisons represent the difference in effectiveness of TAVR and surgery in a patient population that appears to be selected as candidates for TAVR and thus may overestimate benefits in a less selected population. Second, in the major comparative randomized trials, the 95% CIs on all 3 pooled results are quite wide. The results reasonably exclude a benefit of surgery compared to TAVR of 0.7% on disabling stroke, but only 1.5% on the composite outcome of disabling stroke or all-cause mortality, and 2.2% on all-cause mortality. Third, given that surgery is the established therapy, the rational criteria for replacing surgery with transcatheter aortic valve implantation should be superiority or at least non-inferiority of TAVR over surgery, and we might expect tougher criteria for non-inferiority than we observe in trials. Fourth, given the increasing activity in randomized trials comparing established surgical procedures with new and less invasive technologies, the regulatory agencies should to take a stronger view of what effect may be considered as a non-inferiority boundary in these trials. Professional organizations in collaboration with patient representative groups could also usefully address this question together. Conflict of interest: none declared. REFERENCES 1 Vahanian A , Alfieri O , Andreotti F , Antunes MJ , Barón-Esquivias G , Baumgartner H. Guidelines on the management of valvular heart disease (version 2012): the joint task force on the management of valvular heart disease of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS) . Eur J Cardiothorac Surg 2012 ; 42 : S1 – 44 . Google Scholar CrossRef Search ADS PubMed 2 Nishimura RA , Otto CM , Bonow RO , Carabello BA , Erwin JP 3rd , Guyton RA et al. 2014 AHA/ACC guideline for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines . J Am Coll Cardiol 2014 ; 63 : e57 – 185 . Google Scholar CrossRef Search ADS PubMed 3 Leon MB , Smith CR , Mack MJ , Makkar RR , Svensson LG , Kodali SK et al. Transcatheter or surgical aortic-valve replacement in intermediate-risk patients . N Engl J Med 2016 ; 374 : 1609 – 20 . Google Scholar CrossRef Search ADS PubMed 4 Thourani VH , Kodali S , Makkar RR , Herrmann HC , Williams M , Babaliaros V et al. Transcatheter aortic valve replacement versus surgical valve replacement in intermediate-risk patients: a propensity score analysis . Lancet 2016 ; 387 : 2218 – 25 . Google Scholar CrossRef Search ADS PubMed 5 Reardon MJ , Van Mieghem NM , Popma JJ , Kleiman NS , Søndergaard L , Mumtaz M et al. Surgical or transcatheter aortic-valve replacement in intermediate-risk patients . N Engl J Med 2017 ; 376 : 1321 – 31 . Google Scholar CrossRef Search ADS PubMed 6 Hatswell AJ , Baio G , Berlin JA , Irs A , Freemantle N. Regulatory approval of pharmaceuticals without a randomised controlled study: analysis of EMA and FDA approvals 1999–2014 . BMJ Open 2016 ; 6 : e011666 . Google Scholar CrossRef Search ADS PubMed 7 Sanofi Aventis . Toujeo Summary of Product Characteristics . https://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206538lbl.pdf (8 March 2018, date last accessed). 8 Barili F , Freemantle N , Folliguet T , Muneretto C , De Bonis M , Czerny M et al. The flaws in the detail of an observational study on transcatheter aortic valve implantation versus surgical aortic valve replacement in intermediate-risks patients . Eur J Cardiothorac Surg 2017 ; 51 : 1031 – 35 . Google Scholar CrossRef Search ADS PubMed 9 Rosenbaum PR , Rubin DB. The central role of the propensity score in observational studies for causal effects . Biometrika 1983 ; 70 : 41 – 55 . Google Scholar CrossRef Search ADS 10 Freemantle N , Marston L , Walters K , Wood J , Reynolds MR , Petersen I. Making inferences on treatment effects from real world data: propensity scores, confounding by indication and other perils for the unwary in observational research . BMJ 2013 ; 347 : f6409 . Google Scholar CrossRef Search ADS PubMed 11 Dahabreh IJ , Sheldrick RC , Paulus JK , Chung M , Varvarigou V , Jafri H et al. Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes . Eur Heart J 2012 ; 33 : 1893 – 901 . Google Scholar CrossRef Search ADS PubMed 12 Mauri L , D’Agostino RB. Challenges in the design and interpretation of noninferiority trials . N Engl J Med 2017 ; 377 : 1357 – 67 . Google Scholar CrossRef Search ADS PubMed 13 Wangge G , Putzeist M , Knol MJ , Klungel OH , Gispen-De Wied CC , de Boer A et al. Regulatory scientific advice on non-inferiority drug trials . PLoS One 2013 ; 8 : e74818 . Google Scholar CrossRef Search ADS PubMed 14 Fisher RA. The Design of Experiments , 9th edn. London : Macmillan , 1935 . 15 International conference on harmonisation of technical requirements for registration of pharmaceuticals for human use . ICH Harmonised tripartite guideline. Statistical principles for clinical trials e9 . http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf (8 March 2018, date last accessed). © The Author(s) 2018. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

European Journal of Cardio-Thoracic SurgeryOxford University Press

Published: Mar 21, 2018

There are no references for this article.