Apparent Age is a Reliable Assessment Tool in 20 Facelift Patients

Apparent Age is a Reliable Assessment Tool in 20 Facelift Patients Abstract Background Although the literature is replete with favorable facelift results, there are few validated facial rejuvenation outcome measures. Apparent age (AA), a visual estimate of age by objective observers, has been utilized in several studies; although attractive, AA lacks validation. Objective The aim of this study is to examine the reliability of AA, highlighting the importance of the exclusive use of validated outcome measures in future studies. Methods Ten blinded reviewers assessed pre- and postoperative photographs of 32 patients who underwent facelift. Each reviewer completed 3 surveys at 3-month intervals composed of 40 randomly ordered photos; totaling 1200 photographs assigned an AA. The intra-class correlation coefficient was classified as “excellent,” “good,” “fair,” or “poor.” The accuracy of assigned AA, agreement within 5 years, and reduction in AA were also evaluated. Results The mean difference of preoperative true age from assigned AA was 2.74 ± 4.36 years. Forty-three percent of raters were within 5-years (±2.5) of the mean. Intra-rater reliability preoperatively and postoperatively were 0.77 (95% CI, 0.82-0.72) and 0.75 (95% CI, 0.79-0.71), respectively. Inter-rater reliability preoperatively was 0.98 (95% CI, 0.99-0.96), while postoperatively was 0.95 (95% CI, 0.99-0.95). Mean AA reduction was 5.23 ± 2.81, with an intra-rater reliability 0.15 (95% CI, 0.03-0.34) and inter-rater reliability 0.65 (95% CI, 0.84-0.38). Conclusion Using current statistical measures and analysis, AA is an acceptable tool for pre- and postoperative facial evaluation when assessed by a group of 10 reviewers. Therefore, apparent age represents a reliable and valid objective observer assigned measure for evaluation of facelift outcomes. Outcomes research has garnered a significant amount of rightful attention, focusing on patient-reported measures such as satisfaction and quality of life. In the realm of aesthetic surgery, outcome analysis has relied largely on these subjective appraisals, making quantification of these qualitative results difficult.1 This lack of quantitative objective data has made evidence based conclusion regarding the clinical result challenging. Further while patient satisfaction is certainly of utmost importance, Reich reported that the basis of dissatisfaction in a sample of aesthetic surgery patients was predominantly the result of unfavorable interpersonal relationships,2 with individual patient character and personality also influencing his or her assessment of the surgical outcome.1 Thus, the surgical result may or may not correlate with patient-reported outcome measures. While a variety of equipment, software, scales, and anthropometric assessments have been developed in an attempt to provide objective outcome assessment,3-9 few if any in this category have undergone tests of reliability and meet the guiding principles of simple, streamlined, and convenient.1,10 Further, the use of non-validated tests or measures to assess surgical outcomes is not considered acceptable in most fields today. In addition to the lack of measures, the terms valid and reliable are often inappropriately applied or vaguely described when classifying an outcome measure. Therefore, we examine apparent age (AA), a visual estimate of age by objective observers, by examining its reliability and validity. More specifically, we aim to gauge the precision and accuracy of people as estimators of age when examining photographs of patients before and after facelift surgery. In the process, we hope to highlight the importance of outcome measure evaluation and encourage further study of this subject, providing more objective evidence-based data to the field of aesthetic surgery. METHODS This Cleveland Clinic institutional review board approved cross-sectional observational study was performed from December 1, 2015, through August 1, 2016. Female patients who underwent facelift surgery performed by a single surgeon (J.E.Z.) between August of 2001 and March of 2015 were eligible. The inclusion criteria was as follows: (1) primary isolated face and necklift or face/necklift combined with blepharoplasty, and/or brow-lift, fat injections, chemical peel, and laser resurfacing; (2) patient signed photograph-release consent forms; (3) minimum of 8 months follow-up data; (4) standardized photographs (frontal, oblique, and lateral) taken a minimum of 4 months postoperatively with the same background color and camera settings, in neutral expression with the same degree of chin elevation; and (5) no nonsurgical treatments during the pre- and postoperative photograph interval. Males were excluded from this analysis to ensure consistency among the patient population. Study photographs and data were collected and managed using REDCap electronic data capture tools (Vanderbilt University, Nashville, TN) hosted at the Cleveland Clinic.11 Each electronic survey consisted of 20 patients, totaling 40 photographs (pre- postoperative) triplet sets (frontal, obliques, and lateral) (Appendix A, available online as Supplementary Material at www.aestheticsurgeryjournal.com). Facial sides were kept consistent in photo panels both pre- and postoperatively (eg, frontal, right oblique, right lateral). The survey administrator electronically mixed the photographs within the REDCap software, ensuring no patient photographs were presented consecutively. The order of the photographs was the same for each reviewer to ensure consistency. Reviewers were given the following instructions at survey initiation: “You are being asked to complete this survey by guessing the patient’s age at the time of their photos. Please use only numbers (20-100).” Below each photographic triplet was “How old do you think this patient looks?” There was no time limit for the reviewer’s numerical input. The reviewers were not informed that the patients had undergone facial rejuvenation procedures or that they were viewing “before and after” photos. The surveys were sent to ten departmental plastic surgery trainees (fellows, residents, and researchers). Reviewers had no involvement in the care of these patients. Their responses constituted the “preoperative apparent age” and “postoperative apparent age” for each patient. Each reviewer completed 3 surveys at 3-month intervals, totaling 1200 photographs assigned an AA. In the first month, all pre- and postoperative photograph sets were new, but the subsequent surveys contained a mix of new and previously evaluated photographs to allow for the assessment of intra-rater reliability (Table 1). Reviewers had no access to previous age evaluations when performing subsequent surveys. The patients presented in the first survey, acted as the primary cohort for which data was analyzed. All reviewers were blinded to patient and procedure-related information. Table 1. Survey Composition of New and Repeated Patient Photographs at 3-Month Intervals Month  New  Repeat  Total  0  20  0  20  3  12  8  20  6  0  20  20  Total  32  28  60  Month  New  Repeat  Total  0  20  0  20  3  12  8  20  6  0  20  20  Total  32  28  60  View Large Statistical Methods Regarding sample size calculations, for inter-rater agreement an intra-class correlation coefficient (ICC) of at least 0.75 is desirable, while for intra-rater agreement a higher standard of 0.85 is preferred. Using the sample size calculations described by Walter et al for intra-rater agreement,12 if the true ICC is 0.95, then with 20 repeated photo sets, there will be at least 80% power to demonstrate that the intra-rater ICC is at least 0.85. Intra-rater and inter-rater reliability were assessed using ICC. Briefly, analysis of variance models was used to evaluate the variability between reviewers and within reviewers to calculate the ICC. Analysis was performed on the reviewer reported preoperative age, postoperative age, and the difference between ages (AA reduction). Due to the potential for the actual age to differ between preoperative and postoperative photographs, AA reduction was calculated with the formula (postoperative apparent age − postoperative actual age) − (preoperative apparent age − preoperative actual age) to account for the time lapse.13 This calibration of the formula eliminated the influence of aging with time and yielded an accurate calculation of reduction in AA. For interpretation of the ICC this study used the Cicchetti et al ranges, classifying values less than 0.40 as poor, values between 0.40 and 0.59 as fair, values between 0.60 and 0.74 as good, and values between 0.75 and 1.00 as excellent.14 Additional descriptive measures assessed included accuracy of assigned AA (rater assigned apparent age − true age) and percent agreement within 5-years (±2.5). In order to examine if the time interval between photo reviewing had any effect on rater reliability, the ICC was analyzed independently for the 3-month and 6-month interval repeat. Statistical analysis was performed using SPSS 24.0 for Mac (IBM Corporation, New York, NY). Statistical tests were performed at a 0.05 significance level and estimates were calculated with 95% confidence intervals. RESULTS From 2001-2015 there were 1184 facelift procedures performed by the primary investigator, of these patients 112 had signed photo release consent forms on file for all uses. After application of our inclusion criteria 43 patients were eligible, from which the most recent 32 patients were included in the study. Patients in this study ranged in age from 49 to 76 years at the time of surgery, with a mean age of 60.5 ± 6.5 years and photographic follow-up of 9.2 months (range, 4-27 months). Patient demographics and procedural characteristics are detailed in Table 2. Three minor complications were observed in the primary cohort: one patient experienced hypertrophic scarring that resolved with intralesional steroid injection, one patient experienced hyperpigmentation that resolved with topical retin-a 0.025% and hydroquinone 4%, and one patient experienced cellulitis of right postauricular region that resolved with oral antibiotics. Table 2. Patient Demographics and Details of Facelift and Adjunct Procedures Description  No. (%)  Age (range), yr  60.5 (49-76)  Photo follow-up (range), mo  9.2 (4-27)  Female  20 (100)  Facelift type   Extended SMAS  17 (85)   SMAS placation  1 (5)   Deep plane  1 (5)   MACS  1 (5)  Adjunct Procedures   Lipofilling—cheeks  14 (70)   Lipofilling—nasolabial  9 (45)   Upper blepharoplasty  9 (45)   Perioral peeling  6 (30)   Lower blepharoplasty  4 (20)   Platysmaplasty with lipectomy  3 (15)   Endoscopic brow lift  3 (15)   Lipofilling—lips  2 (10)   Lipofilling—infraorbital  1 (5)   Periorbital peeling  1 (5)   Lip lift  1 (5)  Description  No. (%)  Age (range), yr  60.5 (49-76)  Photo follow-up (range), mo  9.2 (4-27)  Female  20 (100)  Facelift type   Extended SMAS  17 (85)   SMAS placation  1 (5)   Deep plane  1 (5)   MACS  1 (5)  Adjunct Procedures   Lipofilling—cheeks  14 (70)   Lipofilling—nasolabial  9 (45)   Upper blepharoplasty  9 (45)   Perioral peeling  6 (30)   Lower blepharoplasty  4 (20)   Platysmaplasty with lipectomy  3 (15)   Endoscopic brow lift  3 (15)   Lipofilling—lips  2 (10)   Lipofilling—infraorbital  1 (5)   Periorbital peeling  1 (5)   Lip lift  1 (5)  View Large When preoperative actual age was compared with preoperative AA, the patients appeared 2.74 ± 4.36 years older than their true age (Figures 1-2). Table 3 summarizes the patient data pertaining to actual and AA. In order to examine the distribution of reviewer assigned AA, the percentage of reviewers within 5 years (±2.5) of the reviewer mean AA was calculated. The results reveal that 45.1% of reviewers were within 5 years of the mean AA preoperatively and 40.8% postoperatively, representing good agreement among the reviewers with minimal skew. Notably the tendency to look younger following surgery, defined as the mean AA reduction, was found to be 5.23 ± 2.81 years. All patients experienced a reduction in AA following surgery, which ranged from 0.7 to 10.8 years at a mean postoperative photo follow-up of 9.2 months (Table 3). Figures 3, 4, 5, 6 represent the photograph presentation and results for four patients. Figure 1. View largeDownload slide Difference between mean reviewer assigned preoperative apparent age and true age for primary patient cohort (n = 20). For example, 1-1 represents a mean preoperative apparent age of 5.7 years older than the patients true age for patient #1 based on the first viewing by 10 reviewers, while 1-2 represents a mean apparent age of 6.7 years older than the patients true age for patient #1 based on the second viewing by 10 reviewers 3 months after the first. Figure 1. View largeDownload slide Difference between mean reviewer assigned preoperative apparent age and true age for primary patient cohort (n = 20). For example, 1-1 represents a mean preoperative apparent age of 5.7 years older than the patients true age for patient #1 based on the first viewing by 10 reviewers, while 1-2 represents a mean apparent age of 6.7 years older than the patients true age for patient #1 based on the second viewing by 10 reviewers 3 months after the first. Figure 2. View largeDownload slide Pre- and postoperative apparent age values for all ten reviwers and the primary patient cohort (n = 20). The trend of higher preoperative apparent age (red circle) is evident. Figure 2. View largeDownload slide Pre- and postoperative apparent age values for all ten reviwers and the primary patient cohort (n = 20). The trend of higher preoperative apparent age (red circle) is evident. Table 3. Reviewer Generated Preoperative and Postoperative Apparent Ages of Patients and True Age (n = 20) Patient  True Age (yr)  Preoperative Apparent Age (yr)  Postoperative Apparent Age (yr)  Apparent Age Reduction (yr)      Viewing 1  Viewing 2  Viewing 1  Viewing 2    1  64  69.7  70.7  60.1  58.7  10.8  2  67  68.2  69.0  60.9  66.3  5.0  3  61  55.6  59.1  50.8  51.3  6.3  4  49  52.8  57.0  52.7  55.7  0.7  5  76  77.9  79.8  75.7  74.3  3.9  6  67  69.6  70.2  69.6  67.5  1.4  7  58  57.4  59.1  51.7  59.6  2.6  8  62  62.6  62.7  58.0  57.9  4.7  9  49  54.1  53.3  47.8  50.3  4.7  10  62  71.5  71.6  64.5  65.1  6.8  11  54  52.2  54.8  49.4  52.1  2.8  12  65  63.2  65.3  56.5  60.3  5.9  13  55  64.6  66.3  58.0  60.0  6.5  14  65  63.2  61.7  57.7  60.9  3.2  15  54  51.2  54.2  45.6  47.3  6.3  16  61  62.6  61.6  53.8  54.1  8.2  17  56  65  67.5  57.7  60.2  7.3  18  64  61.7  61.6  53.2  56.3  6.9  19  57  57.3  59.6  54.8  55.3  3.4  20  63  69.6  71.1  61.8  63.2  7.8  Patient  True Age (yr)  Preoperative Apparent Age (yr)  Postoperative Apparent Age (yr)  Apparent Age Reduction (yr)      Viewing 1  Viewing 2  Viewing 1  Viewing 2    1  64  69.7  70.7  60.1  58.7  10.8  2  67  68.2  69.0  60.9  66.3  5.0  3  61  55.6  59.1  50.8  51.3  6.3  4  49  52.8  57.0  52.7  55.7  0.7  5  76  77.9  79.8  75.7  74.3  3.9  6  67  69.6  70.2  69.6  67.5  1.4  7  58  57.4  59.1  51.7  59.6  2.6  8  62  62.6  62.7  58.0  57.9  4.7  9  49  54.1  53.3  47.8  50.3  4.7  10  62  71.5  71.6  64.5  65.1  6.8  11  54  52.2  54.8  49.4  52.1  2.8  12  65  63.2  65.3  56.5  60.3  5.9  13  55  64.6  66.3  58.0  60.0  6.5  14  65  63.2  61.7  57.7  60.9  3.2  15  54  51.2  54.2  45.6  47.3  6.3  16  61  62.6  61.6  53.8  54.1  8.2  17  56  65  67.5  57.7  60.2  7.3  18  64  61.7  61.6  53.2  56.3  6.9  19  57  57.3  59.6  54.8  55.3  3.4  20  63  69.6  71.1  61.8  63.2  7.8  View Large Figure 3. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photographs. (B, D, F) She is shown 7 months postoperatively following extended SMAS face lift combined with platysmaplasty, anterior lipectomy, and lipofilling to the cheeks and nasolabial folds. The reviewers estimated her preoperative apparent age to be 62.7 years old and postoperatively to be 58.0 years old; therefore, her apparent age reduction was 4.7 years. Figure 3. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photographs. (B, D, F) She is shown 7 months postoperatively following extended SMAS face lift combined with platysmaplasty, anterior lipectomy, and lipofilling to the cheeks and nasolabial folds. The reviewers estimated her preoperative apparent age to be 62.7 years old and postoperatively to be 58.0 years old; therefore, her apparent age reduction was 4.7 years. Figure 4. View largeDownload slide (A, C, E) This female patient was 55 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with upper lid blepharoplasy, perioral phenol-croton oil peel, and lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 65.5 years old and postoperatively to be 59.0 years old; therefore, her apparent age reduction was 6.5 years. Figure 4. View largeDownload slide (A, C, E) This female patient was 55 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with upper lid blepharoplasy, perioral phenol-croton oil peel, and lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 65.5 years old and postoperatively to be 59.0 years old; therefore, her apparent age reduction was 6.5 years. Figure 5. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks and upper lid blepharoplasty. The reviewers estimated her preoperative apparent age to be 71.6 years old and postoperatively to be 64.8 years old; therefore, her apparent age reduction was 6.8 years. Figure 5. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks and upper lid blepharoplasty. The reviewers estimated her preoperative apparent age to be 71.6 years old and postoperatively to be 64.8 years old; therefore, her apparent age reduction was 6.8 years. Figure 6. View largeDownload slide (A, C, E) This female patient was 54 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 53.5 years old and postoperatively to be 50.8 years old; therefore, her apparent age reduction was 2.7 years. Figure 6. View largeDownload slide (A, C, E) This female patient was 54 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 53.5 years old and postoperatively to be 50.8 years old; therefore, her apparent age reduction was 2.7 years. Intra-rater reliability was classified as “excellent” both preoperatively (0.77, 95% CI: 0.82-0.72) and postoperatively (0.75, 95% CI: 0.79-0.71) (Figure 7). However, it has been suggested that the ICC should be greater than 0.90 to ensure reasonable validity for making clinical decisions based on individual performance.15Inter-rater reliability was also classified as “excellent” both preoperatively (0.98. 95% CI: 0.99-0.96), and postoperatively (0.95, 95% CI: 0.99-0.95) (Figure 8). These values approaching 1.0 indicated that the 10 reviewers AA values were extremely similar, with excellent consistency and homogeneity. Furthermore, the intra-rater reliability (0.15, 95% CI: 0.03-0.34) and inter-rater reliability (0.65, 95% CI: 0.84-0.38) of the difference between pre- and postoperative AA (ie, AA reduction) was classified as “poor” and “good,” respectively (Table 4). Table 4. Intra-class Correlation Coefficient Values With Confidence Intervals (CI)   Preoperative Apparent Age  Postoperative Apparent Age  Difference of Pre- and Postoperative Apparent Age    Value  CI  Value  CI  Value  CI  Intra-Rater Reliability  0.77  0.82-0.72  0.75  0.79-0.71  0.15  0.34-0.03  Inter-Rater Reliability  0.98  0.99-0.96  0.95  0.99-0.95  0.65  0.84-0.38    Preoperative Apparent Age  Postoperative Apparent Age  Difference of Pre- and Postoperative Apparent Age    Value  CI  Value  CI  Value  CI  Intra-Rater Reliability  0.77  0.82-0.72  0.75  0.79-0.71  0.15  0.34-0.03  Inter-Rater Reliability  0.98  0.99-0.96  0.95  0.99-0.95  0.65  0.84-0.38  View Large Figure 7. View largeDownload slide Graphical depiction of intra-rater reliability for reviewer 1 of the primary cohort (n = 20). The similarity of reviewer assigned preoperative apparent age at two different time points can be appreciated. Figure 7. View largeDownload slide Graphical depiction of intra-rater reliability for reviewer 1 of the primary cohort (n = 20). The similarity of reviewer assigned preoperative apparent age at two different time points can be appreciated. Figure 8. View largeDownload slide Graphical depiction of inter-rater reliability for patients 1-5. The similarity of reviewer assigned preoperative apparent age within the ten reviewers can be appreciated. Figure 8. View largeDownload slide Graphical depiction of inter-rater reliability for patients 1-5. The similarity of reviewer assigned preoperative apparent age within the ten reviewers can be appreciated. There was no statistically significant difference in reliability in any of the variables when comparing the 3 and 6-month time intervals between viewing the photo for the first time and repeated viewing. Intra-rater reliability preoperatively was 0.82 for 3-month interval repeats and 0.73 for 6-month interval repeats, with a p-value of 0.19 indicating no statistically significant difference between interval lengths. Similarly, intra-rater reliability postoperatively was 0.72 and 0.75 at the 3 and 6 intervals, respectively (P = 0.60). The 10 reviewers consisted of 6 men and 4 women with a mean age of 31.4 (25-47). There was no significant difference in reviewer assigned preoperative AA (F, 63.10; M, 63.05; P = 0.85) or accuracy of assigned AA (F, +2.6; M, +2.78; P = 0.82) between the male and female reviewers. However, female reviewers were more likely to assign an older postoperative age (F, 59.03; M, 57.12; P = 0.04) and decreased AA reduction (F, 4.2; M, 5.93; P = 0.01). DISCUSSION While motives for undergoing facelift surgery may differ, following surgery patients expect an improvement in appearance. A number of validated scales for various facial rejuvenation procedures have been developed, yet a scale that is reliable, simple, and can include the perception of appearance to both the patient and observer has been lacking.10,16 Three systematic reviews regarding outcome measures in aesthetic surgery reached similar conclusions, finding a significant paucity of valid and reliable instruments available to be used for outcomes assessment.1,10,16 While the FACE-Q represents a well-described and validated patient-reported outcome instrument, the universality, ease of use, and rigor varies widely for objective observer-reported outcome measures. Additionally, the FACE-Q does not provide a measure of apparent age reduction; therefore it is not valid for measuring the rejuvenating effects of surgery. This gap in scientifically grounded observer reported outcomes has led to the reliance primarily on non-validated patient satisfaction surveys. While these surveys are certainly a critical and reasonable element of outcome analysis, it should not replace the drive to develop and evaluate quantitative measures achievable on the basis of current professional knowledge. While AA simplistic, logical in nature, and repeatedly used in clinical studies, to date there has been no validation regarding the accuracy or precision of AA as an assessment tool since its first introduction by Swanson in 2011.13 Though there have been studies examining the reliability of facial age estimation17 and perceived age reversal via laypersons,7 there has been no study examining the reliability of apparent age reduction following facial rejuvenation procedures by reviewers in the field of plastic surgery. AA has been utilized in at least four studies evaluating outcomes in facial cosmetic surgery.13,18-20 Swanson evaluated outcomes in deep plane facelift patients and described a patient reported subjective AA reduction of 11.9 years,20 and in another study he reported an observer reported objective AA reduction of 6.0 years for facelift in combination with other procedures and 4.6 years for facelift alone.13 Subsequently, Zins et al used AA to evaluate outcomes of combination facelift and perioral phenol-croton oil peel reporting an observer reported objective AA reduction of 5.3 years,19 followed by a study of patients undergoing facelift following massive weight loss with an observer reported objective AA reduction of 6.0 years.18 The current study found a mean observer reported objective AA reduction of 5.23 years. While this information is interesting and valuable, the significance and application of this data is lost without evidence of reliability and validity. Reliability denotes the reproducibility of an outcome measure, analogous to precision. In the current study, ICC was used to quantify the consistency of measurements made by multiple observers reviewing the same stimulus.21Inter-rater reliability represents the degree of agreement among raters, giving a score to how much homogeneity or consensus, there is in the ratings given by the reviewers. In other words, it represents how similar the ages assigned by all of the reviewers were. Our analysis demonstrated excellent pre- and postoperative AA inter-rater reliability (0.98 and 0.95, respectively), representing a strong consensus among our reviewers regarding the patient’s AA. Similarly, the intra-rater reliability, representing the ability of a reviewer to reproduce the same quantitative value for AA at repeated viewing of the same photograph, was excellent both pre- and postoperatively (0.77 and 0.75, respectively). Yet, these values do not meet the preferred greater values (>0.85) for making clinical decisions based on individual evaluation. Thus, AA does meet rigorous reliability standards when data is examined by a group of 10 reviewers, but not at the individual level. Similarly, the reliability of the intra- and inter-rater AA reduction values was poor and good (0.15 and 0.65, respectively) indicating that the value produced from a single reviewer should not be considered a highly reproducible outcome measure, but when taken as a group of ten reviewers (inter-rater reliability of 0.65) this value constitutes good reliability according to Fleiss22 and Cicchetti et al.14 Based on these results if the process were repeated under similar conditions, the same AA results from a group of 10 reviewers should be obtained, representing the reproducibility and consistency of AA as an outcome measure. Nevertheless, reliability alone does not produce a valid measure. For example, a measure may be reliable (consistently yielding the same score), but it may not be valid if it is not measuring the outcome of interest for which conclusions are being drawn. Validity is a substantial term, involving a host of guiding principles (face validity, content validity, predictive validity, and convergent-discriminate validity10) often not fully appreciated. Simply, a valid test is one that measures what it intends to measure.23 In essence, a valid outcome measure should produce accurate results, encompass the measured condition, be evidence linked to the outcome of interest, and agree with other similar measures. Therefore when examining the validity of AA, we find accurate results (2.74 years away from true age), it encompasses the core of facial rejuvenation, evidence has shown that aesthetic procedures reduce the signs of aging,18-20,24 and it agrees with other similar studies examining the perception of facial age.17,25 Therefore, the AA outcome measure evaluated in the current study was found to be clinically appropriate, reliable, and valid using a rigorous approach to provide the research and clinical community with an observer generated objective outcome measure. Furthermore, this measure and information can be used for the quantification of positive effects and patient education. It is important to note that the aim of the current study was not to compare techniques, make conclusions regarding adjunctive procedures, or draw conclusions regarding the favorability of outcomes, but only to examine the reliability of apparent age. Moreover, it seems appropriate to suggest that routine use of this measurement could be highly beneficial to all those concerned with the success of aesthetic treatments with the knowledge that it is a reliable measure. Limitations of the current study include a sample of patients composed of solely women. The time interval between photograph reviewing was eliminated as a potential confounding variable, as there was no statistically significant difference when the intra-rater reliability was analyzed independently for the two time intervals. The reviewer reliability remain equivalent over the six month interval and reviewers were no more reliable at the 3-month photo recurrence, indicating a personal methodology for assigning AA rather than simply recollecting what they assigned at the initial photo viewing. However, the exact technique that reviewers used to assign an AA and what aspects of a patients face have the greatest influence on the appearance of aging remain undetermined. Although we attempted to ensure standardization of photographs, changes such as patient expression and hair styling could impact the results. Furthermore, while all photographs are professional and high quality, we recognized the possibility of inconsistency regarding chin inclination and oblique alignment with inner canthus approximating the nasion. A computer based randomization software was not utilized in the ordering of patient photos. While this was not an interventional study, computer based randomization could have controlled for the potential effects of waning attention spans and survey fatigue. While reviewer age could not be analyzed independently due to the homogeneity of the group, gender analysis did reveal that female reviewers were more likely to assign an older postoperative age and decreased AA reduction. This information indicates that reviewers should be composed of an equal gender makeup in order to avoid AA skew. However, given the small reviewer size it may be difficult to reliable draw conclusions regarding the reviewer demographic information. Additionally, the reviewers were younger than the patients and were 60% male, while none of the patients were male potentially affecting our analysis. In addition, further studies should be performed to evaluate the external validity of AA as an outcome measure in situations beyond the facelift. Furthermore, the nonhomogeneous nature of the current study with patients having procedures in addition to a facelift, while common in our practice, could represent a variable not adequately weighted in our analysis. Lastly, comparing the impressions of reviewers who are not in the plastic surgery field could be of interest, as those in the field are more likely to perceive the stigmata of facial procedures, however subtle, potentially skewing their assessment. CONCLUSIONS Apparent age represents a reliable and valid method to quantify objective observer assigned evaluations of patient outcomes. By applying a simple, quick, inexpensive, and easily reproducible method, we found our reviewers to accurately and precisely estimate age when examining photographs of patients before and after facelift surgery. Aside from demonstrating the utility of apparent age as an outcome measure, we have demonstrated the necessity of evaluating validity and reliability as an approach to other outcome measures in the future. Supplementary Material This article contains supplementary material located online at www.aestheticsurgeryjournal.com. Disclosures The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article. Funding The authors received no financial support for the research, authorship, a nd publication of this article. REFERENCES 1. Alsarraf R. Outcomes research in facial plastic surgery: a review and new directions. Aesthetic Plast Surg . 2000; 24( 3): 192- 197. Google Scholar CrossRef Search ADS PubMed  2. Reich J. Factors influencing patient satisfaction with the results of esthetic plastic surgery. Plast Reconstr Surg . 1975; 55( 1): 5- 13. Google Scholar CrossRef Search ADS PubMed  3. Buchner L, Vamvakias G, Rom D. Validation of a photonumeric wrinkle assessment scale for assessing nasolabial fold wrinkles. Plast Reconstr Surg . 2010; 126( 2): 596- 601. Google Scholar CrossRef Search ADS PubMed  4. La Padula S, Hersant B, SidAhmed M, Niddam J, Meningaud JP. Objective estimation of patient age through a new composite scale for facial aging assessment: The face—objective assessment scale. J Craniomaxillofac Surg . 2016; 44( 7): 775- 782. Google Scholar CrossRef Search ADS PubMed  5. Lorenc ZP, Bank D, Kane M, Lin X, Smith S. Validation of a four-point photographic scale for the assessment of midface volume loss and/or contour deficiency. Plast Reconstr Surg . 2012; 130( 6): 1330- 1336. Google Scholar CrossRef Search ADS PubMed  6. van Dongen JA, Eyck BM, van der Lei B, Stevens HP. The rainbow scale: a simple, validated online method to score the outcome of aesthetic treatments. Aesthet Surg J . 2016; 36( 3): NP128- NP130. Google Scholar CrossRef Search ADS PubMed  7. Zimm AJ, Modabber M, Fernandes V, Karimi K, Adamson PA. Objective assessment of perceived age reversal and improvement in attractiveness after aging face surgery. JAMA Facial Plast Surg . 2013; 15( 6): 405- 410. Google Scholar CrossRef Search ADS PubMed  8. Pitanguy I, Pamplona D, Weber HI, Leta F, Salgado F, Radwanski HN. Numerical modeling of facial aging. Plast Reconstr Surg . 1998; 102( 1): 200- 204. Google Scholar CrossRef Search ADS PubMed  9. Tapia A, Etxeberria E, Blanch A, Laredo C. A review of 685 rhytidectomies: a new method of analysis based on digitally processed photographs with computer-processed data. Plast Reconstr Surg . 1999; 104( 6): 1800- 1810; discussion 1811. Google Scholar CrossRef Search ADS PubMed  10. Ching S, Thoma A, McCabe RE, Antony MM. Measuring outcomes in aesthetic surgery: a comprehensive review of the literature. Plast Reconstr Surg . 2003; 111( 1): 469- 480; discussion 481. Google Scholar CrossRef Search ADS PubMed  11. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform . 2009; 42( 2): 377- 381. Google Scholar CrossRef Search ADS PubMed  12. Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med . 1998; 17( 1): 101- 110. Google Scholar CrossRef Search ADS PubMed  13. Swanson E. Objective assessment of change in apparent age after facial rejuvenation surgery. J Plast Reconstr Aesthet Surg . 2011; 64( 9): 1124- 1131. Google Scholar CrossRef Search ADS PubMed  14. Cicchetti D, Bronen R, Spencer Set al.   Rating scales, scales of measurement, issues of reliability: resolving some critical issues for clinicians and researchers. J Nerv Ment Dis . 2006; 194( 8): 557- 564. Google Scholar CrossRef Search ADS PubMed  15. Portney L, Watkins M Foundations of Clinical Research; Applications to Practice . New Jersey: Prentice Hall Inc.; 2009. 16. Kosowski TR, McCarthy C, Reavey PLet al.   A systematic review of patient-reported outcome measures after facial cosmetic surgery and/or nonsurgical facial rejuvenation. Plast Reconstr Surg . 2009; 123( 6): 1819- 1827. Google Scholar CrossRef Search ADS PubMed  17. Valente DS, da Silva JB, Lerias AG, Rossi DD, Padoin AV. Validation of a method for estimation of facial age by plastic surgeons. JAMA Facial Plast Surg . 2017; 19( 2): 133- 138. Google Scholar CrossRef Search ADS PubMed  18. Couto RA, Waltzman JT, Tadisina KKet al.   Objective assessment of facial rejuvenation after massive weight loss. Aesthetic Plast Surg . 2015; 39( 6): 847- 855. Google Scholar CrossRef Search ADS PubMed  19. Ozturk CN, Huettner F, Ozturk C, Bartz-Kurycki MA, Zins JE. Outcomes assessment of combination face lift and perioral phenol-croton oil peel. Plast Reconstr Surg . 2013; 132( 5): 743e- 753e. Google Scholar CrossRef Search ADS PubMed  20. Swanson E. Outcome analysis in 93 facial rejuvenation patients treated with a deep-plane face lift. Plast Reconstr Surg . 2011; 127( 2): 823- 834. Google Scholar CrossRef Search ADS PubMed  21. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull . 1979; 86( 2): 420- 428. Google Scholar CrossRef Search ADS PubMed  22. Fleiss JL The Design and Analysis of Clinical Experiments . New York: John Wiley & Sons; 1986. 23. Swanson E. Validity, reliability, and the questionable role of psychometrics in plastic surgery. Plast Reconstr Surg Glob Open . 2014; 2( 6): e161. Google Scholar CrossRef Search ADS PubMed  24. Connell BF. Pushing the clock back 15 to 20 years with facial rejuvenation. Clin Plast Surg . 2008; 35( 4): 553- 566, vi. Google Scholar CrossRef Search ADS PubMed  25. Chauhan N, Warner JP, Adamson PA. Perceived age change after aesthetic facial surgical procedures quantifying outcomes of aging face surgery. Arch Facial Plast Surg . 2012; 14( 4): 258- 262. Google Scholar CrossRef Search ADS PubMed  © 2017 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Aesthetic Surgery Journal Oxford University Press

Apparent Age is a Reliable Assessment Tool in 20 Facelift Patients

Loading next page...
 
/lp/ou_press/apparent-age-is-a-reliable-assessment-tool-in-20-facelift-patients-LpZv79PTZd
Publisher
Mosby Inc.
Copyright
© 2017 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com
ISSN
1090-820X
eISSN
1527-330X
D.O.I.
10.1093/asj/sjx143
Publisher site
See Article on Publisher Site

Abstract

Abstract Background Although the literature is replete with favorable facelift results, there are few validated facial rejuvenation outcome measures. Apparent age (AA), a visual estimate of age by objective observers, has been utilized in several studies; although attractive, AA lacks validation. Objective The aim of this study is to examine the reliability of AA, highlighting the importance of the exclusive use of validated outcome measures in future studies. Methods Ten blinded reviewers assessed pre- and postoperative photographs of 32 patients who underwent facelift. Each reviewer completed 3 surveys at 3-month intervals composed of 40 randomly ordered photos; totaling 1200 photographs assigned an AA. The intra-class correlation coefficient was classified as “excellent,” “good,” “fair,” or “poor.” The accuracy of assigned AA, agreement within 5 years, and reduction in AA were also evaluated. Results The mean difference of preoperative true age from assigned AA was 2.74 ± 4.36 years. Forty-three percent of raters were within 5-years (±2.5) of the mean. Intra-rater reliability preoperatively and postoperatively were 0.77 (95% CI, 0.82-0.72) and 0.75 (95% CI, 0.79-0.71), respectively. Inter-rater reliability preoperatively was 0.98 (95% CI, 0.99-0.96), while postoperatively was 0.95 (95% CI, 0.99-0.95). Mean AA reduction was 5.23 ± 2.81, with an intra-rater reliability 0.15 (95% CI, 0.03-0.34) and inter-rater reliability 0.65 (95% CI, 0.84-0.38). Conclusion Using current statistical measures and analysis, AA is an acceptable tool for pre- and postoperative facial evaluation when assessed by a group of 10 reviewers. Therefore, apparent age represents a reliable and valid objective observer assigned measure for evaluation of facelift outcomes. Outcomes research has garnered a significant amount of rightful attention, focusing on patient-reported measures such as satisfaction and quality of life. In the realm of aesthetic surgery, outcome analysis has relied largely on these subjective appraisals, making quantification of these qualitative results difficult.1 This lack of quantitative objective data has made evidence based conclusion regarding the clinical result challenging. Further while patient satisfaction is certainly of utmost importance, Reich reported that the basis of dissatisfaction in a sample of aesthetic surgery patients was predominantly the result of unfavorable interpersonal relationships,2 with individual patient character and personality also influencing his or her assessment of the surgical outcome.1 Thus, the surgical result may or may not correlate with patient-reported outcome measures. While a variety of equipment, software, scales, and anthropometric assessments have been developed in an attempt to provide objective outcome assessment,3-9 few if any in this category have undergone tests of reliability and meet the guiding principles of simple, streamlined, and convenient.1,10 Further, the use of non-validated tests or measures to assess surgical outcomes is not considered acceptable in most fields today. In addition to the lack of measures, the terms valid and reliable are often inappropriately applied or vaguely described when classifying an outcome measure. Therefore, we examine apparent age (AA), a visual estimate of age by objective observers, by examining its reliability and validity. More specifically, we aim to gauge the precision and accuracy of people as estimators of age when examining photographs of patients before and after facelift surgery. In the process, we hope to highlight the importance of outcome measure evaluation and encourage further study of this subject, providing more objective evidence-based data to the field of aesthetic surgery. METHODS This Cleveland Clinic institutional review board approved cross-sectional observational study was performed from December 1, 2015, through August 1, 2016. Female patients who underwent facelift surgery performed by a single surgeon (J.E.Z.) between August of 2001 and March of 2015 were eligible. The inclusion criteria was as follows: (1) primary isolated face and necklift or face/necklift combined with blepharoplasty, and/or brow-lift, fat injections, chemical peel, and laser resurfacing; (2) patient signed photograph-release consent forms; (3) minimum of 8 months follow-up data; (4) standardized photographs (frontal, oblique, and lateral) taken a minimum of 4 months postoperatively with the same background color and camera settings, in neutral expression with the same degree of chin elevation; and (5) no nonsurgical treatments during the pre- and postoperative photograph interval. Males were excluded from this analysis to ensure consistency among the patient population. Study photographs and data were collected and managed using REDCap electronic data capture tools (Vanderbilt University, Nashville, TN) hosted at the Cleveland Clinic.11 Each electronic survey consisted of 20 patients, totaling 40 photographs (pre- postoperative) triplet sets (frontal, obliques, and lateral) (Appendix A, available online as Supplementary Material at www.aestheticsurgeryjournal.com). Facial sides were kept consistent in photo panels both pre- and postoperatively (eg, frontal, right oblique, right lateral). The survey administrator electronically mixed the photographs within the REDCap software, ensuring no patient photographs were presented consecutively. The order of the photographs was the same for each reviewer to ensure consistency. Reviewers were given the following instructions at survey initiation: “You are being asked to complete this survey by guessing the patient’s age at the time of their photos. Please use only numbers (20-100).” Below each photographic triplet was “How old do you think this patient looks?” There was no time limit for the reviewer’s numerical input. The reviewers were not informed that the patients had undergone facial rejuvenation procedures or that they were viewing “before and after” photos. The surveys were sent to ten departmental plastic surgery trainees (fellows, residents, and researchers). Reviewers had no involvement in the care of these patients. Their responses constituted the “preoperative apparent age” and “postoperative apparent age” for each patient. Each reviewer completed 3 surveys at 3-month intervals, totaling 1200 photographs assigned an AA. In the first month, all pre- and postoperative photograph sets were new, but the subsequent surveys contained a mix of new and previously evaluated photographs to allow for the assessment of intra-rater reliability (Table 1). Reviewers had no access to previous age evaluations when performing subsequent surveys. The patients presented in the first survey, acted as the primary cohort for which data was analyzed. All reviewers were blinded to patient and procedure-related information. Table 1. Survey Composition of New and Repeated Patient Photographs at 3-Month Intervals Month  New  Repeat  Total  0  20  0  20  3  12  8  20  6  0  20  20  Total  32  28  60  Month  New  Repeat  Total  0  20  0  20  3  12  8  20  6  0  20  20  Total  32  28  60  View Large Statistical Methods Regarding sample size calculations, for inter-rater agreement an intra-class correlation coefficient (ICC) of at least 0.75 is desirable, while for intra-rater agreement a higher standard of 0.85 is preferred. Using the sample size calculations described by Walter et al for intra-rater agreement,12 if the true ICC is 0.95, then with 20 repeated photo sets, there will be at least 80% power to demonstrate that the intra-rater ICC is at least 0.85. Intra-rater and inter-rater reliability were assessed using ICC. Briefly, analysis of variance models was used to evaluate the variability between reviewers and within reviewers to calculate the ICC. Analysis was performed on the reviewer reported preoperative age, postoperative age, and the difference between ages (AA reduction). Due to the potential for the actual age to differ between preoperative and postoperative photographs, AA reduction was calculated with the formula (postoperative apparent age − postoperative actual age) − (preoperative apparent age − preoperative actual age) to account for the time lapse.13 This calibration of the formula eliminated the influence of aging with time and yielded an accurate calculation of reduction in AA. For interpretation of the ICC this study used the Cicchetti et al ranges, classifying values less than 0.40 as poor, values between 0.40 and 0.59 as fair, values between 0.60 and 0.74 as good, and values between 0.75 and 1.00 as excellent.14 Additional descriptive measures assessed included accuracy of assigned AA (rater assigned apparent age − true age) and percent agreement within 5-years (±2.5). In order to examine if the time interval between photo reviewing had any effect on rater reliability, the ICC was analyzed independently for the 3-month and 6-month interval repeat. Statistical analysis was performed using SPSS 24.0 for Mac (IBM Corporation, New York, NY). Statistical tests were performed at a 0.05 significance level and estimates were calculated with 95% confidence intervals. RESULTS From 2001-2015 there were 1184 facelift procedures performed by the primary investigator, of these patients 112 had signed photo release consent forms on file for all uses. After application of our inclusion criteria 43 patients were eligible, from which the most recent 32 patients were included in the study. Patients in this study ranged in age from 49 to 76 years at the time of surgery, with a mean age of 60.5 ± 6.5 years and photographic follow-up of 9.2 months (range, 4-27 months). Patient demographics and procedural characteristics are detailed in Table 2. Three minor complications were observed in the primary cohort: one patient experienced hypertrophic scarring that resolved with intralesional steroid injection, one patient experienced hyperpigmentation that resolved with topical retin-a 0.025% and hydroquinone 4%, and one patient experienced cellulitis of right postauricular region that resolved with oral antibiotics. Table 2. Patient Demographics and Details of Facelift and Adjunct Procedures Description  No. (%)  Age (range), yr  60.5 (49-76)  Photo follow-up (range), mo  9.2 (4-27)  Female  20 (100)  Facelift type   Extended SMAS  17 (85)   SMAS placation  1 (5)   Deep plane  1 (5)   MACS  1 (5)  Adjunct Procedures   Lipofilling—cheeks  14 (70)   Lipofilling—nasolabial  9 (45)   Upper blepharoplasty  9 (45)   Perioral peeling  6 (30)   Lower blepharoplasty  4 (20)   Platysmaplasty with lipectomy  3 (15)   Endoscopic brow lift  3 (15)   Lipofilling—lips  2 (10)   Lipofilling—infraorbital  1 (5)   Periorbital peeling  1 (5)   Lip lift  1 (5)  Description  No. (%)  Age (range), yr  60.5 (49-76)  Photo follow-up (range), mo  9.2 (4-27)  Female  20 (100)  Facelift type   Extended SMAS  17 (85)   SMAS placation  1 (5)   Deep plane  1 (5)   MACS  1 (5)  Adjunct Procedures   Lipofilling—cheeks  14 (70)   Lipofilling—nasolabial  9 (45)   Upper blepharoplasty  9 (45)   Perioral peeling  6 (30)   Lower blepharoplasty  4 (20)   Platysmaplasty with lipectomy  3 (15)   Endoscopic brow lift  3 (15)   Lipofilling—lips  2 (10)   Lipofilling—infraorbital  1 (5)   Periorbital peeling  1 (5)   Lip lift  1 (5)  View Large When preoperative actual age was compared with preoperative AA, the patients appeared 2.74 ± 4.36 years older than their true age (Figures 1-2). Table 3 summarizes the patient data pertaining to actual and AA. In order to examine the distribution of reviewer assigned AA, the percentage of reviewers within 5 years (±2.5) of the reviewer mean AA was calculated. The results reveal that 45.1% of reviewers were within 5 years of the mean AA preoperatively and 40.8% postoperatively, representing good agreement among the reviewers with minimal skew. Notably the tendency to look younger following surgery, defined as the mean AA reduction, was found to be 5.23 ± 2.81 years. All patients experienced a reduction in AA following surgery, which ranged from 0.7 to 10.8 years at a mean postoperative photo follow-up of 9.2 months (Table 3). Figures 3, 4, 5, 6 represent the photograph presentation and results for four patients. Figure 1. View largeDownload slide Difference between mean reviewer assigned preoperative apparent age and true age for primary patient cohort (n = 20). For example, 1-1 represents a mean preoperative apparent age of 5.7 years older than the patients true age for patient #1 based on the first viewing by 10 reviewers, while 1-2 represents a mean apparent age of 6.7 years older than the patients true age for patient #1 based on the second viewing by 10 reviewers 3 months after the first. Figure 1. View largeDownload slide Difference between mean reviewer assigned preoperative apparent age and true age for primary patient cohort (n = 20). For example, 1-1 represents a mean preoperative apparent age of 5.7 years older than the patients true age for patient #1 based on the first viewing by 10 reviewers, while 1-2 represents a mean apparent age of 6.7 years older than the patients true age for patient #1 based on the second viewing by 10 reviewers 3 months after the first. Figure 2. View largeDownload slide Pre- and postoperative apparent age values for all ten reviwers and the primary patient cohort (n = 20). The trend of higher preoperative apparent age (red circle) is evident. Figure 2. View largeDownload slide Pre- and postoperative apparent age values for all ten reviwers and the primary patient cohort (n = 20). The trend of higher preoperative apparent age (red circle) is evident. Table 3. Reviewer Generated Preoperative and Postoperative Apparent Ages of Patients and True Age (n = 20) Patient  True Age (yr)  Preoperative Apparent Age (yr)  Postoperative Apparent Age (yr)  Apparent Age Reduction (yr)      Viewing 1  Viewing 2  Viewing 1  Viewing 2    1  64  69.7  70.7  60.1  58.7  10.8  2  67  68.2  69.0  60.9  66.3  5.0  3  61  55.6  59.1  50.8  51.3  6.3  4  49  52.8  57.0  52.7  55.7  0.7  5  76  77.9  79.8  75.7  74.3  3.9  6  67  69.6  70.2  69.6  67.5  1.4  7  58  57.4  59.1  51.7  59.6  2.6  8  62  62.6  62.7  58.0  57.9  4.7  9  49  54.1  53.3  47.8  50.3  4.7  10  62  71.5  71.6  64.5  65.1  6.8  11  54  52.2  54.8  49.4  52.1  2.8  12  65  63.2  65.3  56.5  60.3  5.9  13  55  64.6  66.3  58.0  60.0  6.5  14  65  63.2  61.7  57.7  60.9  3.2  15  54  51.2  54.2  45.6  47.3  6.3  16  61  62.6  61.6  53.8  54.1  8.2  17  56  65  67.5  57.7  60.2  7.3  18  64  61.7  61.6  53.2  56.3  6.9  19  57  57.3  59.6  54.8  55.3  3.4  20  63  69.6  71.1  61.8  63.2  7.8  Patient  True Age (yr)  Preoperative Apparent Age (yr)  Postoperative Apparent Age (yr)  Apparent Age Reduction (yr)      Viewing 1  Viewing 2  Viewing 1  Viewing 2    1  64  69.7  70.7  60.1  58.7  10.8  2  67  68.2  69.0  60.9  66.3  5.0  3  61  55.6  59.1  50.8  51.3  6.3  4  49  52.8  57.0  52.7  55.7  0.7  5  76  77.9  79.8  75.7  74.3  3.9  6  67  69.6  70.2  69.6  67.5  1.4  7  58  57.4  59.1  51.7  59.6  2.6  8  62  62.6  62.7  58.0  57.9  4.7  9  49  54.1  53.3  47.8  50.3  4.7  10  62  71.5  71.6  64.5  65.1  6.8  11  54  52.2  54.8  49.4  52.1  2.8  12  65  63.2  65.3  56.5  60.3  5.9  13  55  64.6  66.3  58.0  60.0  6.5  14  65  63.2  61.7  57.7  60.9  3.2  15  54  51.2  54.2  45.6  47.3  6.3  16  61  62.6  61.6  53.8  54.1  8.2  17  56  65  67.5  57.7  60.2  7.3  18  64  61.7  61.6  53.2  56.3  6.9  19  57  57.3  59.6  54.8  55.3  3.4  20  63  69.6  71.1  61.8  63.2  7.8  View Large Figure 3. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photographs. (B, D, F) She is shown 7 months postoperatively following extended SMAS face lift combined with platysmaplasty, anterior lipectomy, and lipofilling to the cheeks and nasolabial folds. The reviewers estimated her preoperative apparent age to be 62.7 years old and postoperatively to be 58.0 years old; therefore, her apparent age reduction was 4.7 years. Figure 3. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photographs. (B, D, F) She is shown 7 months postoperatively following extended SMAS face lift combined with platysmaplasty, anterior lipectomy, and lipofilling to the cheeks and nasolabial folds. The reviewers estimated her preoperative apparent age to be 62.7 years old and postoperatively to be 58.0 years old; therefore, her apparent age reduction was 4.7 years. Figure 4. View largeDownload slide (A, C, E) This female patient was 55 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with upper lid blepharoplasy, perioral phenol-croton oil peel, and lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 65.5 years old and postoperatively to be 59.0 years old; therefore, her apparent age reduction was 6.5 years. Figure 4. View largeDownload slide (A, C, E) This female patient was 55 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with upper lid blepharoplasy, perioral phenol-croton oil peel, and lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 65.5 years old and postoperatively to be 59.0 years old; therefore, her apparent age reduction was 6.5 years. Figure 5. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks and upper lid blepharoplasty. The reviewers estimated her preoperative apparent age to be 71.6 years old and postoperatively to be 64.8 years old; therefore, her apparent age reduction was 6.8 years. Figure 5. View largeDownload slide (A, C, E) This female patient was 62 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks and upper lid blepharoplasty. The reviewers estimated her preoperative apparent age to be 71.6 years old and postoperatively to be 64.8 years old; therefore, her apparent age reduction was 6.8 years. Figure 6. View largeDownload slide (A, C, E) This female patient was 54 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 53.5 years old and postoperatively to be 50.8 years old; therefore, her apparent age reduction was 2.7 years. Figure 6. View largeDownload slide (A, C, E) This female patient was 54 years old at the time of the preoperative photograph. (B, D, F) She is shown 6 months postoperatively following extended SMAS face lift combined with lipofilling to the cheeks. The reviewers estimated her preoperative apparent age to be 53.5 years old and postoperatively to be 50.8 years old; therefore, her apparent age reduction was 2.7 years. Intra-rater reliability was classified as “excellent” both preoperatively (0.77, 95% CI: 0.82-0.72) and postoperatively (0.75, 95% CI: 0.79-0.71) (Figure 7). However, it has been suggested that the ICC should be greater than 0.90 to ensure reasonable validity for making clinical decisions based on individual performance.15Inter-rater reliability was also classified as “excellent” both preoperatively (0.98. 95% CI: 0.99-0.96), and postoperatively (0.95, 95% CI: 0.99-0.95) (Figure 8). These values approaching 1.0 indicated that the 10 reviewers AA values were extremely similar, with excellent consistency and homogeneity. Furthermore, the intra-rater reliability (0.15, 95% CI: 0.03-0.34) and inter-rater reliability (0.65, 95% CI: 0.84-0.38) of the difference between pre- and postoperative AA (ie, AA reduction) was classified as “poor” and “good,” respectively (Table 4). Table 4. Intra-class Correlation Coefficient Values With Confidence Intervals (CI)   Preoperative Apparent Age  Postoperative Apparent Age  Difference of Pre- and Postoperative Apparent Age    Value  CI  Value  CI  Value  CI  Intra-Rater Reliability  0.77  0.82-0.72  0.75  0.79-0.71  0.15  0.34-0.03  Inter-Rater Reliability  0.98  0.99-0.96  0.95  0.99-0.95  0.65  0.84-0.38    Preoperative Apparent Age  Postoperative Apparent Age  Difference of Pre- and Postoperative Apparent Age    Value  CI  Value  CI  Value  CI  Intra-Rater Reliability  0.77  0.82-0.72  0.75  0.79-0.71  0.15  0.34-0.03  Inter-Rater Reliability  0.98  0.99-0.96  0.95  0.99-0.95  0.65  0.84-0.38  View Large Figure 7. View largeDownload slide Graphical depiction of intra-rater reliability for reviewer 1 of the primary cohort (n = 20). The similarity of reviewer assigned preoperative apparent age at two different time points can be appreciated. Figure 7. View largeDownload slide Graphical depiction of intra-rater reliability for reviewer 1 of the primary cohort (n = 20). The similarity of reviewer assigned preoperative apparent age at two different time points can be appreciated. Figure 8. View largeDownload slide Graphical depiction of inter-rater reliability for patients 1-5. The similarity of reviewer assigned preoperative apparent age within the ten reviewers can be appreciated. Figure 8. View largeDownload slide Graphical depiction of inter-rater reliability for patients 1-5. The similarity of reviewer assigned preoperative apparent age within the ten reviewers can be appreciated. There was no statistically significant difference in reliability in any of the variables when comparing the 3 and 6-month time intervals between viewing the photo for the first time and repeated viewing. Intra-rater reliability preoperatively was 0.82 for 3-month interval repeats and 0.73 for 6-month interval repeats, with a p-value of 0.19 indicating no statistically significant difference between interval lengths. Similarly, intra-rater reliability postoperatively was 0.72 and 0.75 at the 3 and 6 intervals, respectively (P = 0.60). The 10 reviewers consisted of 6 men and 4 women with a mean age of 31.4 (25-47). There was no significant difference in reviewer assigned preoperative AA (F, 63.10; M, 63.05; P = 0.85) or accuracy of assigned AA (F, +2.6; M, +2.78; P = 0.82) between the male and female reviewers. However, female reviewers were more likely to assign an older postoperative age (F, 59.03; M, 57.12; P = 0.04) and decreased AA reduction (F, 4.2; M, 5.93; P = 0.01). DISCUSSION While motives for undergoing facelift surgery may differ, following surgery patients expect an improvement in appearance. A number of validated scales for various facial rejuvenation procedures have been developed, yet a scale that is reliable, simple, and can include the perception of appearance to both the patient and observer has been lacking.10,16 Three systematic reviews regarding outcome measures in aesthetic surgery reached similar conclusions, finding a significant paucity of valid and reliable instruments available to be used for outcomes assessment.1,10,16 While the FACE-Q represents a well-described and validated patient-reported outcome instrument, the universality, ease of use, and rigor varies widely for objective observer-reported outcome measures. Additionally, the FACE-Q does not provide a measure of apparent age reduction; therefore it is not valid for measuring the rejuvenating effects of surgery. This gap in scientifically grounded observer reported outcomes has led to the reliance primarily on non-validated patient satisfaction surveys. While these surveys are certainly a critical and reasonable element of outcome analysis, it should not replace the drive to develop and evaluate quantitative measures achievable on the basis of current professional knowledge. While AA simplistic, logical in nature, and repeatedly used in clinical studies, to date there has been no validation regarding the accuracy or precision of AA as an assessment tool since its first introduction by Swanson in 2011.13 Though there have been studies examining the reliability of facial age estimation17 and perceived age reversal via laypersons,7 there has been no study examining the reliability of apparent age reduction following facial rejuvenation procedures by reviewers in the field of plastic surgery. AA has been utilized in at least four studies evaluating outcomes in facial cosmetic surgery.13,18-20 Swanson evaluated outcomes in deep plane facelift patients and described a patient reported subjective AA reduction of 11.9 years,20 and in another study he reported an observer reported objective AA reduction of 6.0 years for facelift in combination with other procedures and 4.6 years for facelift alone.13 Subsequently, Zins et al used AA to evaluate outcomes of combination facelift and perioral phenol-croton oil peel reporting an observer reported objective AA reduction of 5.3 years,19 followed by a study of patients undergoing facelift following massive weight loss with an observer reported objective AA reduction of 6.0 years.18 The current study found a mean observer reported objective AA reduction of 5.23 years. While this information is interesting and valuable, the significance and application of this data is lost without evidence of reliability and validity. Reliability denotes the reproducibility of an outcome measure, analogous to precision. In the current study, ICC was used to quantify the consistency of measurements made by multiple observers reviewing the same stimulus.21Inter-rater reliability represents the degree of agreement among raters, giving a score to how much homogeneity or consensus, there is in the ratings given by the reviewers. In other words, it represents how similar the ages assigned by all of the reviewers were. Our analysis demonstrated excellent pre- and postoperative AA inter-rater reliability (0.98 and 0.95, respectively), representing a strong consensus among our reviewers regarding the patient’s AA. Similarly, the intra-rater reliability, representing the ability of a reviewer to reproduce the same quantitative value for AA at repeated viewing of the same photograph, was excellent both pre- and postoperatively (0.77 and 0.75, respectively). Yet, these values do not meet the preferred greater values (>0.85) for making clinical decisions based on individual evaluation. Thus, AA does meet rigorous reliability standards when data is examined by a group of 10 reviewers, but not at the individual level. Similarly, the reliability of the intra- and inter-rater AA reduction values was poor and good (0.15 and 0.65, respectively) indicating that the value produced from a single reviewer should not be considered a highly reproducible outcome measure, but when taken as a group of ten reviewers (inter-rater reliability of 0.65) this value constitutes good reliability according to Fleiss22 and Cicchetti et al.14 Based on these results if the process were repeated under similar conditions, the same AA results from a group of 10 reviewers should be obtained, representing the reproducibility and consistency of AA as an outcome measure. Nevertheless, reliability alone does not produce a valid measure. For example, a measure may be reliable (consistently yielding the same score), but it may not be valid if it is not measuring the outcome of interest for which conclusions are being drawn. Validity is a substantial term, involving a host of guiding principles (face validity, content validity, predictive validity, and convergent-discriminate validity10) often not fully appreciated. Simply, a valid test is one that measures what it intends to measure.23 In essence, a valid outcome measure should produce accurate results, encompass the measured condition, be evidence linked to the outcome of interest, and agree with other similar measures. Therefore when examining the validity of AA, we find accurate results (2.74 years away from true age), it encompasses the core of facial rejuvenation, evidence has shown that aesthetic procedures reduce the signs of aging,18-20,24 and it agrees with other similar studies examining the perception of facial age.17,25 Therefore, the AA outcome measure evaluated in the current study was found to be clinically appropriate, reliable, and valid using a rigorous approach to provide the research and clinical community with an observer generated objective outcome measure. Furthermore, this measure and information can be used for the quantification of positive effects and patient education. It is important to note that the aim of the current study was not to compare techniques, make conclusions regarding adjunctive procedures, or draw conclusions regarding the favorability of outcomes, but only to examine the reliability of apparent age. Moreover, it seems appropriate to suggest that routine use of this measurement could be highly beneficial to all those concerned with the success of aesthetic treatments with the knowledge that it is a reliable measure. Limitations of the current study include a sample of patients composed of solely women. The time interval between photograph reviewing was eliminated as a potential confounding variable, as there was no statistically significant difference when the intra-rater reliability was analyzed independently for the two time intervals. The reviewer reliability remain equivalent over the six month interval and reviewers were no more reliable at the 3-month photo recurrence, indicating a personal methodology for assigning AA rather than simply recollecting what they assigned at the initial photo viewing. However, the exact technique that reviewers used to assign an AA and what aspects of a patients face have the greatest influence on the appearance of aging remain undetermined. Although we attempted to ensure standardization of photographs, changes such as patient expression and hair styling could impact the results. Furthermore, while all photographs are professional and high quality, we recognized the possibility of inconsistency regarding chin inclination and oblique alignment with inner canthus approximating the nasion. A computer based randomization software was not utilized in the ordering of patient photos. While this was not an interventional study, computer based randomization could have controlled for the potential effects of waning attention spans and survey fatigue. While reviewer age could not be analyzed independently due to the homogeneity of the group, gender analysis did reveal that female reviewers were more likely to assign an older postoperative age and decreased AA reduction. This information indicates that reviewers should be composed of an equal gender makeup in order to avoid AA skew. However, given the small reviewer size it may be difficult to reliable draw conclusions regarding the reviewer demographic information. Additionally, the reviewers were younger than the patients and were 60% male, while none of the patients were male potentially affecting our analysis. In addition, further studies should be performed to evaluate the external validity of AA as an outcome measure in situations beyond the facelift. Furthermore, the nonhomogeneous nature of the current study with patients having procedures in addition to a facelift, while common in our practice, could represent a variable not adequately weighted in our analysis. Lastly, comparing the impressions of reviewers who are not in the plastic surgery field could be of interest, as those in the field are more likely to perceive the stigmata of facial procedures, however subtle, potentially skewing their assessment. CONCLUSIONS Apparent age represents a reliable and valid method to quantify objective observer assigned evaluations of patient outcomes. By applying a simple, quick, inexpensive, and easily reproducible method, we found our reviewers to accurately and precisely estimate age when examining photographs of patients before and after facelift surgery. Aside from demonstrating the utility of apparent age as an outcome measure, we have demonstrated the necessity of evaluating validity and reliability as an approach to other outcome measures in the future. Supplementary Material This article contains supplementary material located online at www.aestheticsurgeryjournal.com. Disclosures The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article. Funding The authors received no financial support for the research, authorship, a nd publication of this article. REFERENCES 1. Alsarraf R. Outcomes research in facial plastic surgery: a review and new directions. Aesthetic Plast Surg . 2000; 24( 3): 192- 197. Google Scholar CrossRef Search ADS PubMed  2. Reich J. Factors influencing patient satisfaction with the results of esthetic plastic surgery. Plast Reconstr Surg . 1975; 55( 1): 5- 13. Google Scholar CrossRef Search ADS PubMed  3. Buchner L, Vamvakias G, Rom D. Validation of a photonumeric wrinkle assessment scale for assessing nasolabial fold wrinkles. Plast Reconstr Surg . 2010; 126( 2): 596- 601. Google Scholar CrossRef Search ADS PubMed  4. La Padula S, Hersant B, SidAhmed M, Niddam J, Meningaud JP. Objective estimation of patient age through a new composite scale for facial aging assessment: The face—objective assessment scale. J Craniomaxillofac Surg . 2016; 44( 7): 775- 782. Google Scholar CrossRef Search ADS PubMed  5. Lorenc ZP, Bank D, Kane M, Lin X, Smith S. Validation of a four-point photographic scale for the assessment of midface volume loss and/or contour deficiency. Plast Reconstr Surg . 2012; 130( 6): 1330- 1336. Google Scholar CrossRef Search ADS PubMed  6. van Dongen JA, Eyck BM, van der Lei B, Stevens HP. The rainbow scale: a simple, validated online method to score the outcome of aesthetic treatments. Aesthet Surg J . 2016; 36( 3): NP128- NP130. Google Scholar CrossRef Search ADS PubMed  7. Zimm AJ, Modabber M, Fernandes V, Karimi K, Adamson PA. Objective assessment of perceived age reversal and improvement in attractiveness after aging face surgery. JAMA Facial Plast Surg . 2013; 15( 6): 405- 410. Google Scholar CrossRef Search ADS PubMed  8. Pitanguy I, Pamplona D, Weber HI, Leta F, Salgado F, Radwanski HN. Numerical modeling of facial aging. Plast Reconstr Surg . 1998; 102( 1): 200- 204. Google Scholar CrossRef Search ADS PubMed  9. Tapia A, Etxeberria E, Blanch A, Laredo C. A review of 685 rhytidectomies: a new method of analysis based on digitally processed photographs with computer-processed data. Plast Reconstr Surg . 1999; 104( 6): 1800- 1810; discussion 1811. Google Scholar CrossRef Search ADS PubMed  10. Ching S, Thoma A, McCabe RE, Antony MM. Measuring outcomes in aesthetic surgery: a comprehensive review of the literature. Plast Reconstr Surg . 2003; 111( 1): 469- 480; discussion 481. Google Scholar CrossRef Search ADS PubMed  11. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform . 2009; 42( 2): 377- 381. Google Scholar CrossRef Search ADS PubMed  12. Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med . 1998; 17( 1): 101- 110. Google Scholar CrossRef Search ADS PubMed  13. Swanson E. Objective assessment of change in apparent age after facial rejuvenation surgery. J Plast Reconstr Aesthet Surg . 2011; 64( 9): 1124- 1131. Google Scholar CrossRef Search ADS PubMed  14. Cicchetti D, Bronen R, Spencer Set al.   Rating scales, scales of measurement, issues of reliability: resolving some critical issues for clinicians and researchers. J Nerv Ment Dis . 2006; 194( 8): 557- 564. Google Scholar CrossRef Search ADS PubMed  15. Portney L, Watkins M Foundations of Clinical Research; Applications to Practice . New Jersey: Prentice Hall Inc.; 2009. 16. Kosowski TR, McCarthy C, Reavey PLet al.   A systematic review of patient-reported outcome measures after facial cosmetic surgery and/or nonsurgical facial rejuvenation. Plast Reconstr Surg . 2009; 123( 6): 1819- 1827. Google Scholar CrossRef Search ADS PubMed  17. Valente DS, da Silva JB, Lerias AG, Rossi DD, Padoin AV. Validation of a method for estimation of facial age by plastic surgeons. JAMA Facial Plast Surg . 2017; 19( 2): 133- 138. Google Scholar CrossRef Search ADS PubMed  18. Couto RA, Waltzman JT, Tadisina KKet al.   Objective assessment of facial rejuvenation after massive weight loss. Aesthetic Plast Surg . 2015; 39( 6): 847- 855. Google Scholar CrossRef Search ADS PubMed  19. Ozturk CN, Huettner F, Ozturk C, Bartz-Kurycki MA, Zins JE. Outcomes assessment of combination face lift and perioral phenol-croton oil peel. Plast Reconstr Surg . 2013; 132( 5): 743e- 753e. Google Scholar CrossRef Search ADS PubMed  20. Swanson E. Outcome analysis in 93 facial rejuvenation patients treated with a deep-plane face lift. Plast Reconstr Surg . 2011; 127( 2): 823- 834. Google Scholar CrossRef Search ADS PubMed  21. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull . 1979; 86( 2): 420- 428. Google Scholar CrossRef Search ADS PubMed  22. Fleiss JL The Design and Analysis of Clinical Experiments . New York: John Wiley & Sons; 1986. 23. Swanson E. Validity, reliability, and the questionable role of psychometrics in plastic surgery. Plast Reconstr Surg Glob Open . 2014; 2( 6): e161. Google Scholar CrossRef Search ADS PubMed  24. Connell BF. Pushing the clock back 15 to 20 years with facial rejuvenation. Clin Plast Surg . 2008; 35( 4): 553- 566, vi. Google Scholar CrossRef Search ADS PubMed  25. Chauhan N, Warner JP, Adamson PA. Perceived age change after aesthetic facial surgical procedures quantifying outcomes of aging face surgery. Arch Facial Plast Surg . 2012; 14( 4): 258- 262. Google Scholar CrossRef Search ADS PubMed  © 2017 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com

Journal

Aesthetic Surgery JournalOxford University Press

Published: Apr 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off