Yes, Size Does Matter!

Yes, Size Does Matter! We will use the paper by Hunstad et al1 to address an oversight commonly made by investigators and frequently overlooked by the reviewers. The issue we would like to address pertains to the sample size of a trial. The authors labeled their study as a multicenter randomized study. Although there may be other issues with the design of this randomized controlled trial (RCT), we will not address them in this Hub. The optimal methodology of an RCT design is provided in the references for interested readers.2,3 So let’s return to the sample size calculation, which is the main objective of this Hub. Why Do We Need to Calculate a Sample Size for Our Study? The authors performed their study to compare a new type of liposuction (tissue liquefaction liposuction, TLL) to the traditional standard (suction-assisted liposuction, SAL). In their study design, the authors compared treatments with TTL on one side of each study subject to SAL on the other side. By the way, we like this type of side-to-side comparison, because each study patient serves as their own control, which, of course, is a pretty good match! In this study, the authors treated 31 female patients. Is 31 the right number? How do you know? More importantly, why is a chosen number the right − or wrong − number? When we perform a study such as the one performed by Hunstad and his coauthors,1 we cannot examine the whole population to see if TLL is superior to SAL. (Even if we could study the whole population, there would be no more patients left to treat!) To determine whether TTL (novel technology) is better than SAL (standard technology), we select a small sample from the target population (all those eligible for liposuction) and perform our clinical experiment. The number of patients selected is – yes, hold on to your seats − the sample size. If we find some positive results, we declare that this novel technology is beneficial in our sample and, by inference, to the larger population. This interrelationship between the population and sample are shown in Figure 1. Figure 1. View largeDownload slide Making a statistical inference. The interrelationship between a population and the study sample is shown here. Sampling is the process of selecting a group from the population that is representative of the population. Inclusion and exclusion criteria are key in this process. For the given characteristic or outcome being assessed, there is a mean for the population (true mean, or μ) and a mean for the sample (sample mean, or X). Likewise, there is a standard deviation for the population (σ) and for the sample (SD) in terms of the characteristic or outcome being assessed. Figure 1. View largeDownload slide Making a statistical inference. The interrelationship between a population and the study sample is shown here. Sampling is the process of selecting a group from the population that is representative of the population. Inclusion and exclusion criteria are key in this process. For the given characteristic or outcome being assessed, there is a mean for the population (true mean, or μ) and a mean for the sample (sample mean, or X). Likewise, there is a standard deviation for the population (σ) and for the sample (SD) in terms of the characteristic or outcome being assessed. How do we determine whether this inference that we’re making is true? The process of drawing conclusions valid for the population from data from the sample group is called statistical inference. Choosing the correct number of patients to include in a study has everything to do with statistical inference. To make statistical inference, there are two approaches: (A) hypothesis testing; and (B) estimation-point estimate and interval estimation (confidence interval). Let’s look at hypothesis testing first. Hypothesis Testing The question the authors want to answer is “In patients undergoing liposuction, does TTL result in a lower total score of for bruising, swelling, tenderness, and incision appearance in comparison to SAL at 30 days post-surgery?” We want to know that the answer to this question is real. Hypothesis testing confirms or refutes the hypothesis that the study results did not occur by chance but rather occurred due to an effect or an association between the intervention and primary outcome. A study is usually designed around a predetermined null hypothesis (H0) and alternate hypothesis (H1). Let’s see how this works in the present study, focusing on bruising, swelling, tenderness, and incision appearance. The null hypothesis would state that “There is no difference between bruising with TTL and bruising with SAL.” The alternative hypothesis would be the opposite, or “There is a difference between bruising with TTL and bruising with SAL.” We can express this mathematically, where the hypotheses would look like following: The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments View Large The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments View Large The basic format of the hypotheses above represents a two-sided equivalence study. It is two-sided because the alternative hypothesis states that the mean difference can be either higher or lower than the anticipated difference between interventions (ie, that the bruising could be more with the TTL, or it could be more with the SAL). It is an equivalence study as the null hypothesis states that there is no difference between TTL and SAL, thus are equivalent to one another. It is important to consider these hypotheses before designing the study as they will determine the type of study (superiority, equivalence, or noninferiority),4 as well as whether the study is going to be a one- or two-sided study. These pieces of information will determine which sample size formula is the best to use for that particular study. OK, so we know how to write our null hypothesis and alternative hypothesis now, but we need to test them. In hypothesis testing, we talk about type I and type II errors. Type I errors (denoted by the Greek letter α) tell us about the probability of rejecting the null hypothesis when it is true (rejection error). Another way of expressing this is something like stating there is an association between TTL and total bruising score, when in fact there is none. Type II errors (denoted by the Greek letter β) tell us about the probability of accepting the null hypothesis when it is false (acceptance error). This is akin stating there is no association between TTL and total bruising score, when in fact there is one. By convention, the probabilities of α and β are fixed a priori so that we know how much each of these may play a role in our hypothesis testing. The commonly accepted values are 0.05 (or 5%) for α . . . hey wait a minute . . . 0.05 sounds kind of familiar. The commonly accepted value for β is 0.2 (or 20%). The power of the study is represented by 1-β, which is equal to 0.8 (or 80%). Once the statistical analyses are completed, the predefined significance level (P-value) is then used to determine whether or not the null hypothesis should be rejected and the alternative hypothesis is accepted. In general, this is done by comparing the P-value given by the statistical test to the predefined P-value. If the P-value is ≤ α, then the null is rejected and it can be concluded that there is likely an effect. Aha! So that’s why we say an outcome having a P-value of less than or equal to 0.05 is significant – it means that the outcome is likely true and reflects the outcome in the general population. How likely is that it is true? 95%, because α was set at less than or equal to 0.05 (5%), so the chance of a type I error is a possibility of 0.05 or 5%). Alternatively, if the P-value is > α, then we do not reject the null hypothesis and it can be concluded that there is likely no effect. (Or similar to above, it is also possible that a type II error exists with a probability of 20% of being wrong, because β is 0.20) (Table 1). Table 1. Hypothesis Testing Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) View Large Table 1. Hypothesis Testing Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) View Large Sample Size Calculation As mentioned above, the sample size calculation requires predetermined information such as the type of study (superiority, equivalence, or noninferiority), which will determine whether or not the study will be a one- or two-sided study. In addition to this information, the primary outcome is another piece of information required to calculate the sample size. In other words, you design the study to have the right number of patients to answer the most important question (the primary outcome). In the present article by Hunstad et al,1 we do not know for certain what this primary outcome is. In the Methods section of the paper, we are told this is “efficiency” of device for surgeons (ie, strokes/time unit). In the Statistical Methods subsection, however, we are told that the primary endpoint was the total score of bruising, swelling, tenderness, and incision appearance. If you don’t define the primary outcome, you can’t set up the study correctly. Authors therefore need to be very explicit in defining the primary outcome. The next important measure we need is the minimal clinical important difference (MCID) or Δ between TTL and SAL. We also call this the effect size. If the primary outcome was the total score (bruising, etc.), the authors should have told us what difference (Δ) in score between TTL and SAL would be considered clinically important. In other words, we need to know what difference in total score our colleagues would consider important to be willing to adopt the TTL (novel technology) and abandon SAL (standard technology). Some of our colleagues may want to switch to the new technology with only 2%, difference in bruising score, some 5%, and still some others may require a 50% difference. You will shortly see how this variation in preference causes problems for our sample size. The sample size formula in which the values for Z (a measure of how many standard deviations below a normal distribution curve a mean value is, which is often 1 SD and equal to 1.96), α (predefined significance level), power (1-β), standard deviation (σ) and mean difference (Δ) are included looks something like this5: n=2(Zα+Z1−β))2σ2Δ2 As you can see from the formula, it is easy for investigators to manipulate the sample size by inserting in the formula a “convenient Δ or MCID.” They do this when they are unable to recruit enough patients for their study. As you can see from the formula above if the Δ is small the n (sample size) increases. This means you need more patients for your study. If the Δ on the other hand is made larger then you do not need as many patients. So…readers be aware of this sleight of hand! Investigator should always support the choice of Δ by referencing their sources such as the literature, the consensus of experts (or still better the consensus of our patients who supposedly will benefit from the new technology!). Again, the exact equation necessary for a given study is determined by all the factors listed above. This is an equation that could have been used by Hunstad et al1 to calculate a proper sample size. It should be noted that the above formula will change if proportions are used. We encourage readers to be familiar with the calculation of sample size and the power of their study and be able to appraise an article’s validity on this issue.6 The authors reported a combined mean total score reduction in bruising, swelling, treatment site tenderness, and incision appearance of 2.09, which is a 12.5% difference in mean total score (TLL, 11.4; SAL, 13.1). Since they don’t say, we don’t know what Hunstad et al1 consider as a MCID in their study. Because this was not reported, it is difficult to determine what exactly that 2.09 difference in mean total score actually means. When investigators use a validated scale (instrument, patient reported outcome) in a clinical experiment, they also report the MCID for that particular scale. This MCID is entered into the sample size calculation formula. If the recruitment achieved their pre-determined sample size, they did not have any drop outs and the P-value was < 0.05, then their conclusions would likely be valid. You may be wondering, “what’s the big deal?” What if you, as the surgeon, are most concerned with bruising and a significant difference in a bruising score might compel you to begin to use the alternative treatment (in this case, the TLL device)? How much of a difference would you want to see? If we plug the typical values for α (0.05) and β (0.20) into the equation, and work backwards from the authors sample size of 31, we see that with an 80% power, they would be able to detect an effect size (Δ) of about 53% reduction in bruising (bolded in Table 2). In other words, the authors would not have been able detect such a small effect size of 12.5% (or 16%, which is what they said they observed) difference between treatments using their sample size of 31. The sample size of 31 actually only yields a power of approximately 14%. If other amounts of bruising reduction are indicated, you’d need a different sample size to show it, as in the table below. Table 2. The Relationship Between the MCID, Power, and Sample Size Based on Total Mean Score of Bruising, Swelling, Tenderness, and Incision Appearance MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 *As the article being evaluated used paired data, these sample size estimates would refer to the total number of PAIRS required. View Large Table 2. The Relationship Between the MCID, Power, and Sample Size Based on Total Mean Score of Bruising, Swelling, Tenderness, and Incision Appearance MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 *As the article being evaluated used paired data, these sample size estimates would refer to the total number of PAIRS required. View Large Not calculating the sample size based on the primary outcome and a “legitimate” MCID will limit the value of the conclusion being made, including both type I and type II errors. As the study found an effect, we know that a type I error (finding an effect when there is no effect) could have occurred in the Hunstad et al study. So now you can see the importance of sample size and the related MCID in determining the impact of a study. A study outcome can be statistically significant but clinically irrelevant, as authors reported a significant 16% (actually 12.5%) difference between TLL and SAL but did not have a sample size big enough to show that difference. Evaluating the sample size, the MCID and whether the appropriate statistical analysis methods were used is how we can be sure that a study is both valid and relevant. If you are performing a study, once you have an idea of the necessary sample size required, it is then imperative to adjust for some unavoidable factors in a clinical study such as low recruitment rates, loss to follow up and dropout rates. This can be done by using a common rate from other clinical studies and inflating the calculated sample size by that number (ie, 20% or 30%). CONCLUSION The calculation of the sample size requires some careful thought and this should be done before the study begins. To determine your required sample size, the MCID needs to be determined beforehand by reviewing the literature. If you cannot find it there, then you can obtain a consensus from the experts (in this case the liposuction experts or still better a consensus of liposuction patients who will be benefitting from the TTL). Most importantly, the sample size needs guidance from a biostatistician before the project starts. Disclosures The authors have no conflict of interests to disclose related to the content of this article. Funding The authors received no financial support for the research, authorship, and publication of this article. REFERENCES 1. Hunstad JP , Godek CP , Van Natta BW et al. A multicenter, prospective, randomized, contralateral study of tissue liquefaction liposuction versus suction-assisted liposuction . Aesthet Surg J . 2018 ; 38 ( 9 ): 980 - 989 . 2. Thoma A , Farrokhyar F , Bhandari M , Tandan V ; Evidence-Based Surgery Working Group . Users’ guide to the surgical literature. How to assess a randomized controlled trial in surgery . Can J Surg . 2004 ; 47 ( 3 ): 200 - 208 . Google Scholar PubMed 3. Farrokhyar F , Karanicolas PJ , Thoma A et al. Randomized controlled trials of surgical interventions . Ann Surg . 2010 ; 251 ( 3 ): 409 - 416 . Google Scholar CrossRef Search ADS PubMed 4. Thoma A , Farrokhyar F , Waltho D , Braga LH , Sprague S , Goldsmith CH . Users’ guide to the surgical literature: how to assess a noninferiority trial . Can J Surg . 2017 ; 60 ( 6 ): 426 - 432 . Google Scholar CrossRef Search ADS PubMed 5. Kadam P , Bhalerao S . Sample size calculation . Int J Ayurveda Res . 2010 ; 1 ( 1 ): 55 - 57 . Google Scholar CrossRef Search ADS PubMed 6. Cadeddu M , Farrokhyar F , Thoma A , Haines T , Garnett A , Goldsmith CH ; Evidence-Based Surgery Working Group . Users’ guide to the surgical literature: how to assess power and sample size. Laparoscopic vs open appendectomy . Can J Surg . 2008 ; 51 ( 6 ): 476 - 482 . Google Scholar PubMed © 2018 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Aesthetic Surgery Journal Oxford University Press

Yes, Size Does Matter!

Loading next page...
 
/lp/ou_press/yes-size-does-matter-zfL4GTCV0N
Publisher
Oxford University Press
Copyright
© 2018 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com
ISSN
1090-820X
eISSN
1527-330X
D.O.I.
10.1093/asj/sjy106
Publisher site
See Article on Publisher Site

Abstract

We will use the paper by Hunstad et al1 to address an oversight commonly made by investigators and frequently overlooked by the reviewers. The issue we would like to address pertains to the sample size of a trial. The authors labeled their study as a multicenter randomized study. Although there may be other issues with the design of this randomized controlled trial (RCT), we will not address them in this Hub. The optimal methodology of an RCT design is provided in the references for interested readers.2,3 So let’s return to the sample size calculation, which is the main objective of this Hub. Why Do We Need to Calculate a Sample Size for Our Study? The authors performed their study to compare a new type of liposuction (tissue liquefaction liposuction, TLL) to the traditional standard (suction-assisted liposuction, SAL). In their study design, the authors compared treatments with TTL on one side of each study subject to SAL on the other side. By the way, we like this type of side-to-side comparison, because each study patient serves as their own control, which, of course, is a pretty good match! In this study, the authors treated 31 female patients. Is 31 the right number? How do you know? More importantly, why is a chosen number the right − or wrong − number? When we perform a study such as the one performed by Hunstad and his coauthors,1 we cannot examine the whole population to see if TLL is superior to SAL. (Even if we could study the whole population, there would be no more patients left to treat!) To determine whether TTL (novel technology) is better than SAL (standard technology), we select a small sample from the target population (all those eligible for liposuction) and perform our clinical experiment. The number of patients selected is – yes, hold on to your seats − the sample size. If we find some positive results, we declare that this novel technology is beneficial in our sample and, by inference, to the larger population. This interrelationship between the population and sample are shown in Figure 1. Figure 1. View largeDownload slide Making a statistical inference. The interrelationship between a population and the study sample is shown here. Sampling is the process of selecting a group from the population that is representative of the population. Inclusion and exclusion criteria are key in this process. For the given characteristic or outcome being assessed, there is a mean for the population (true mean, or μ) and a mean for the sample (sample mean, or X). Likewise, there is a standard deviation for the population (σ) and for the sample (SD) in terms of the characteristic or outcome being assessed. Figure 1. View largeDownload slide Making a statistical inference. The interrelationship between a population and the study sample is shown here. Sampling is the process of selecting a group from the population that is representative of the population. Inclusion and exclusion criteria are key in this process. For the given characteristic or outcome being assessed, there is a mean for the population (true mean, or μ) and a mean for the sample (sample mean, or X). Likewise, there is a standard deviation for the population (σ) and for the sample (SD) in terms of the characteristic or outcome being assessed. How do we determine whether this inference that we’re making is true? The process of drawing conclusions valid for the population from data from the sample group is called statistical inference. Choosing the correct number of patients to include in a study has everything to do with statistical inference. To make statistical inference, there are two approaches: (A) hypothesis testing; and (B) estimation-point estimate and interval estimation (confidence interval). Let’s look at hypothesis testing first. Hypothesis Testing The question the authors want to answer is “In patients undergoing liposuction, does TTL result in a lower total score of for bruising, swelling, tenderness, and incision appearance in comparison to SAL at 30 days post-surgery?” We want to know that the answer to this question is real. Hypothesis testing confirms or refutes the hypothesis that the study results did not occur by chance but rather occurred due to an effect or an association between the intervention and primary outcome. A study is usually designed around a predetermined null hypothesis (H0) and alternate hypothesis (H1). Let’s see how this works in the present study, focusing on bruising, swelling, tenderness, and incision appearance. The null hypothesis would state that “There is no difference between bruising with TTL and bruising with SAL.” The alternative hypothesis would be the opposite, or “There is a difference between bruising with TTL and bruising with SAL.” We can express this mathematically, where the hypotheses would look like following: The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments View Large The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments The hypothesis Mathematical expression Plain ‘ole English Null (H0) H0: δ = μ TTL- μ SAL = 0 There is no difference (delta, or δ) in the averages (mu, or μ) between the treatments Alternate (H1) H1: δ = μ TTL- μ SAL ≠ 0 There is a difference in the averages of the treatments View Large The basic format of the hypotheses above represents a two-sided equivalence study. It is two-sided because the alternative hypothesis states that the mean difference can be either higher or lower than the anticipated difference between interventions (ie, that the bruising could be more with the TTL, or it could be more with the SAL). It is an equivalence study as the null hypothesis states that there is no difference between TTL and SAL, thus are equivalent to one another. It is important to consider these hypotheses before designing the study as they will determine the type of study (superiority, equivalence, or noninferiority),4 as well as whether the study is going to be a one- or two-sided study. These pieces of information will determine which sample size formula is the best to use for that particular study. OK, so we know how to write our null hypothesis and alternative hypothesis now, but we need to test them. In hypothesis testing, we talk about type I and type II errors. Type I errors (denoted by the Greek letter α) tell us about the probability of rejecting the null hypothesis when it is true (rejection error). Another way of expressing this is something like stating there is an association between TTL and total bruising score, when in fact there is none. Type II errors (denoted by the Greek letter β) tell us about the probability of accepting the null hypothesis when it is false (acceptance error). This is akin stating there is no association between TTL and total bruising score, when in fact there is one. By convention, the probabilities of α and β are fixed a priori so that we know how much each of these may play a role in our hypothesis testing. The commonly accepted values are 0.05 (or 5%) for α . . . hey wait a minute . . . 0.05 sounds kind of familiar. The commonly accepted value for β is 0.2 (or 20%). The power of the study is represented by 1-β, which is equal to 0.8 (or 80%). Once the statistical analyses are completed, the predefined significance level (P-value) is then used to determine whether or not the null hypothesis should be rejected and the alternative hypothesis is accepted. In general, this is done by comparing the P-value given by the statistical test to the predefined P-value. If the P-value is ≤ α, then the null is rejected and it can be concluded that there is likely an effect. Aha! So that’s why we say an outcome having a P-value of less than or equal to 0.05 is significant – it means that the outcome is likely true and reflects the outcome in the general population. How likely is that it is true? 95%, because α was set at less than or equal to 0.05 (5%), so the chance of a type I error is a possibility of 0.05 or 5%). Alternatively, if the P-value is > α, then we do not reject the null hypothesis and it can be concluded that there is likely no effect. (Or similar to above, it is also possible that a type II error exists with a probability of 20% of being wrong, because β is 0.20) (Table 1). Table 1. Hypothesis Testing Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) View Large Table 1. Hypothesis Testing Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) Result of study Population = the truth There is no effect (H0) δ = μ TTL- μ SAL = 0 There is an effect (H1) δ = μ TTL- μ SAL ≠ 0 Study shows no effect (1-α) (β = Type II error = say it isn’t true but it is true) Study shows effect (α = Type I error = say it’s true but it isn’t true) (1-β) View Large Sample Size Calculation As mentioned above, the sample size calculation requires predetermined information such as the type of study (superiority, equivalence, or noninferiority), which will determine whether or not the study will be a one- or two-sided study. In addition to this information, the primary outcome is another piece of information required to calculate the sample size. In other words, you design the study to have the right number of patients to answer the most important question (the primary outcome). In the present article by Hunstad et al,1 we do not know for certain what this primary outcome is. In the Methods section of the paper, we are told this is “efficiency” of device for surgeons (ie, strokes/time unit). In the Statistical Methods subsection, however, we are told that the primary endpoint was the total score of bruising, swelling, tenderness, and incision appearance. If you don’t define the primary outcome, you can’t set up the study correctly. Authors therefore need to be very explicit in defining the primary outcome. The next important measure we need is the minimal clinical important difference (MCID) or Δ between TTL and SAL. We also call this the effect size. If the primary outcome was the total score (bruising, etc.), the authors should have told us what difference (Δ) in score between TTL and SAL would be considered clinically important. In other words, we need to know what difference in total score our colleagues would consider important to be willing to adopt the TTL (novel technology) and abandon SAL (standard technology). Some of our colleagues may want to switch to the new technology with only 2%, difference in bruising score, some 5%, and still some others may require a 50% difference. You will shortly see how this variation in preference causes problems for our sample size. The sample size formula in which the values for Z (a measure of how many standard deviations below a normal distribution curve a mean value is, which is often 1 SD and equal to 1.96), α (predefined significance level), power (1-β), standard deviation (σ) and mean difference (Δ) are included looks something like this5: n=2(Zα+Z1−β))2σ2Δ2 As you can see from the formula, it is easy for investigators to manipulate the sample size by inserting in the formula a “convenient Δ or MCID.” They do this when they are unable to recruit enough patients for their study. As you can see from the formula above if the Δ is small the n (sample size) increases. This means you need more patients for your study. If the Δ on the other hand is made larger then you do not need as many patients. So…readers be aware of this sleight of hand! Investigator should always support the choice of Δ by referencing their sources such as the literature, the consensus of experts (or still better the consensus of our patients who supposedly will benefit from the new technology!). Again, the exact equation necessary for a given study is determined by all the factors listed above. This is an equation that could have been used by Hunstad et al1 to calculate a proper sample size. It should be noted that the above formula will change if proportions are used. We encourage readers to be familiar with the calculation of sample size and the power of their study and be able to appraise an article’s validity on this issue.6 The authors reported a combined mean total score reduction in bruising, swelling, treatment site tenderness, and incision appearance of 2.09, which is a 12.5% difference in mean total score (TLL, 11.4; SAL, 13.1). Since they don’t say, we don’t know what Hunstad et al1 consider as a MCID in their study. Because this was not reported, it is difficult to determine what exactly that 2.09 difference in mean total score actually means. When investigators use a validated scale (instrument, patient reported outcome) in a clinical experiment, they also report the MCID for that particular scale. This MCID is entered into the sample size calculation formula. If the recruitment achieved their pre-determined sample size, they did not have any drop outs and the P-value was < 0.05, then their conclusions would likely be valid. You may be wondering, “what’s the big deal?” What if you, as the surgeon, are most concerned with bruising and a significant difference in a bruising score might compel you to begin to use the alternative treatment (in this case, the TLL device)? How much of a difference would you want to see? If we plug the typical values for α (0.05) and β (0.20) into the equation, and work backwards from the authors sample size of 31, we see that with an 80% power, they would be able to detect an effect size (Δ) of about 53% reduction in bruising (bolded in Table 2). In other words, the authors would not have been able detect such a small effect size of 12.5% (or 16%, which is what they said they observed) difference between treatments using their sample size of 31. The sample size of 31 actually only yields a power of approximately 14%. If other amounts of bruising reduction are indicated, you’d need a different sample size to show it, as in the table below. Table 2. The Relationship Between the MCID, Power, and Sample Size Based on Total Mean Score of Bruising, Swelling, Tenderness, and Incision Appearance MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 *As the article being evaluated used paired data, these sample size estimates would refer to the total number of PAIRS required. View Large Table 2. The Relationship Between the MCID, Power, and Sample Size Based on Total Mean Score of Bruising, Swelling, Tenderness, and Incision Appearance MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 MCID (Δ) β α Sample size (n)* 10% (0.10) 0.20 0.05 787 20% (0.20) 0.20 0.05 199 30% (0.30) 0.20 0.05 90 40% (0.40) 0.20 0.05 52 50% (0.50) 0.20 0.05 34 53% (0.53) 0.20 0.05 30 *As the article being evaluated used paired data, these sample size estimates would refer to the total number of PAIRS required. View Large Not calculating the sample size based on the primary outcome and a “legitimate” MCID will limit the value of the conclusion being made, including both type I and type II errors. As the study found an effect, we know that a type I error (finding an effect when there is no effect) could have occurred in the Hunstad et al study. So now you can see the importance of sample size and the related MCID in determining the impact of a study. A study outcome can be statistically significant but clinically irrelevant, as authors reported a significant 16% (actually 12.5%) difference between TLL and SAL but did not have a sample size big enough to show that difference. Evaluating the sample size, the MCID and whether the appropriate statistical analysis methods were used is how we can be sure that a study is both valid and relevant. If you are performing a study, once you have an idea of the necessary sample size required, it is then imperative to adjust for some unavoidable factors in a clinical study such as low recruitment rates, loss to follow up and dropout rates. This can be done by using a common rate from other clinical studies and inflating the calculated sample size by that number (ie, 20% or 30%). CONCLUSION The calculation of the sample size requires some careful thought and this should be done before the study begins. To determine your required sample size, the MCID needs to be determined beforehand by reviewing the literature. If you cannot find it there, then you can obtain a consensus from the experts (in this case the liposuction experts or still better a consensus of liposuction patients who will be benefitting from the TTL). Most importantly, the sample size needs guidance from a biostatistician before the project starts. Disclosures The authors have no conflict of interests to disclose related to the content of this article. Funding The authors received no financial support for the research, authorship, and publication of this article. REFERENCES 1. Hunstad JP , Godek CP , Van Natta BW et al. A multicenter, prospective, randomized, contralateral study of tissue liquefaction liposuction versus suction-assisted liposuction . Aesthet Surg J . 2018 ; 38 ( 9 ): 980 - 989 . 2. Thoma A , Farrokhyar F , Bhandari M , Tandan V ; Evidence-Based Surgery Working Group . Users’ guide to the surgical literature. How to assess a randomized controlled trial in surgery . Can J Surg . 2004 ; 47 ( 3 ): 200 - 208 . Google Scholar PubMed 3. Farrokhyar F , Karanicolas PJ , Thoma A et al. Randomized controlled trials of surgical interventions . Ann Surg . 2010 ; 251 ( 3 ): 409 - 416 . Google Scholar CrossRef Search ADS PubMed 4. Thoma A , Farrokhyar F , Waltho D , Braga LH , Sprague S , Goldsmith CH . Users’ guide to the surgical literature: how to assess a noninferiority trial . Can J Surg . 2017 ; 60 ( 6 ): 426 - 432 . Google Scholar CrossRef Search ADS PubMed 5. Kadam P , Bhalerao S . Sample size calculation . Int J Ayurveda Res . 2010 ; 1 ( 1 ): 55 - 57 . Google Scholar CrossRef Search ADS PubMed 6. Cadeddu M , Farrokhyar F , Thoma A , Haines T , Garnett A , Goldsmith CH ; Evidence-Based Surgery Working Group . Users’ guide to the surgical literature: how to assess power and sample size. Laparoscopic vs open appendectomy . Can J Surg . 2008 ; 51 ( 6 ): 476 - 482 . Google Scholar PubMed © 2018 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

Aesthetic Surgery JournalOxford University Press

Published: Sep 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off