NIH Toolbox Picture Sequence Memory Test for Assessing Clinical Memory Function: Diagnostic Relationship to the Rey Auditory Verbal Learning Test

NIH Toolbox Picture Sequence Memory Test for Assessing Clinical Memory Function: Diagnostic... Abstract Background The NIH Cognitive Toolbox Picture Sequence Memory Test (PSMT) was developed as a measure of learning ability. PSMT use in clinical populations is only beginning to be investigated. Method PSMT performance was analyzed in a retrospective series of 221 patients referred to either the Deep Brain Stimulation Clinic (n = 128) for presurgical evaluation, or to the Cognitive Screening Clinic (n = 93). Patients were also administered the Rey Verbal Auditory Verbal Learning test (AVLT). In addition to correlation between measures, classification agreement was examined based upon performance ratings of normal (>16th percentile), borderline (5–16th percentile), or impaired (<5th percentile). Results Correlation between measures was significant (r = 0.48, p < .0001), with classification agreement of 62% (weighted Kappa = 0.43). For patients with valid PVT scores (n = 147), correlation between tests was 0.67 (p < .0001) with a classification agreement of 72% (weighted Kappa = 0.44). Multiple level likelihood ratios (LRs) relating PSMT to various dichotomous AVLT learning classifications were modest, with the largest group LR obtained for impaired PSMT increasing the likelihood of obtaining impaired AVLT by 7.62 (95% CI = 3.54–16.42). Conclusion Despite significant correlations between measures, the NIH Toolbox PSMT and AVLT learning score often generate different interpretive results. Impaired PSMT appears better at predicting impaired AVLT performance rather than predicting combined borderline/impaired AVLT performance. Ultimately, individual clinicians will need to determine whether the PSMT can be used independently without other memory tests in the clinical environment in which they practice while further validation studies are performed. NIH Cognitive Toolbox, Picture Sequence Memory Test, Rey Auditory Verbal Learning Test, Diagnostic accuracy, Logistic models One feature of neuropsychological testing is that to obtain accurate trait estimation, assessments are time and labor-intensive, associated with many hours of testing and scoring. In part due to this burden, the availability of cognitive testing to either aid in clinical decision making or to include as part of large-scale multi-center projects is often limited. Another concern, particularly in the context of big data, is that multiple neuropsychological tests exist to assess the same cognitive constructs (Loring & Hermann, 2011). Different test usage decreases the opportunity for data aggregation and data integration, not only across studies, but also across diseases. To address these concerns, the National Institutes of Health (NIH) developed the NIH Toolbox, which consists of four independent modules to assess cognitive, emotional, sensory, and motor functioning (Bleck, Nowinski, Gershon, & Koroshetz, 2013). The NIH Cognitive Toolbox module provides a ~30–45 min assessment that includes episodic memory, executive function and attention, working memory, language, and processing speed (Weintraub et al., 2013). The cognitive tasks were designed so that the same measures or assessment paradigms could be employed from infancy through old age. Although originally developed for desktop computer use, the iPad is now the preferred administration platform and thus permits assessments outside the traditional neuropsychology laboratory setting. In addition to the computerized tests developed for the NIH Cognitive Toolbox, there are two supplemental Toolbox measures—a 3-trial version of the Rey Auditory Verbal Learning Test and Oral Symbol Digit Test. The supplemental tasks are not computer dependent, and limited normative data presently exist. Although the NIH Cognitive Toolbox has potential as a cognitive screening instrument due to its brevity and ease of administration, it is necessary to understand how Toolbox cognitive measures correspond to traditional paper and pencil approaches to neuropsychological assessment. In the present report, we compare the NIH Cognitive Toolbox Picture Sequence Memory Test (PSMT) to the Rey Auditory Verbal Learning Test (AVLT). Because PSMT consists of only two learning trials with no delayed recall, we examined the relationship of PSMT to AVLT learning across trials as a direct comparison of comparable learning constructs. Methods Subjects Subjects were retrospectively identified following approval by the institutional IRB and included 221 patients referred to one of two Neuropsychology Service specialty clinics in which the NIH Cognitive Toolbox was part of the assessment protocol. The specialty clinics were: (1) Deep Brain Stimulation (DBS) Clinic, which provides neuropsychological assessment of DBS candidates for the treatment of movement disorders (n = 128) and which also included two patients evaluated in this clinic for more general cognitive concerns, and (2) Cognitive Screening Clinic, which provides brief cognitive evaluation of patients referred from the General Neurology Clinic with possible cognitive impairment to identify whether more comprehensive testing would be beneficial (n = 93). Patients were tested between November 17, 2015 and August 3, 2017. Subjects represent a subsample of 340 consecutive referrals for neuropsychological evaluation performed during this period, the sampling flow diagram of patient recruitment is presented in Fig. 1. Fig. 1. View largeDownload slide Flow diagram of establishing patient recruitment. Fig. 1. View largeDownload slide Flow diagram of establishing patient recruitment. DBS Clinic patients averaged 63.2 (SD = 10.9) years of age and 14.5 (SD = 2.8) years of education, including 51 females and 77 males. Cognitive Screening Clinic referrals averaged 47.7 (SD = 16.4) years of age and 14.7 (SD = 2.4) years of education; there were 75 females and 18 males. Memory Measures Picture Sequence Memory Test This NIH Cognitive Toolbox measure consists of objects and activities that are presented sequentially. The task is to learn the sequence of pictures across two learning trials, and scores are based upon the ability to recall each adjacent pair of pictures. Thus, the final score reflects the cumulative number of adjacent pairs of pictures remembered correctly over both learning trials. There is no delayed recall component. Test–retest PSMT reliability for healthy adults living in the community ranging in age from 20 to 85 years is 0.77 (95% CI = 0.67–0.84) (Weintraub et al., 2013). The PSMT was developed from imitation-based memory tasks used in infants and young children (Bauer et al., 2013). This approach to memory testing was included in the NIH Cognitive Toolbox since it could be adapted for use across the lifespan from 3 to 85 years. Demographically adjusted normative percentiles including age, education, and gender based upon the NIH normative sample were used (Casaletto et al., 2015). There were 26 patients tested using the desktop Cognitive Toolbox, and 195 patients tested using the iPad. Rey Auditory Verbal Learning Test This is a serial word learning task in which 15 words are presented over five learning trials (Boake, 2000; Rey, 1964). Following the fifth presentation, a sixth trial is administered with 15 new words, and after free-recall of the second list, the patient is asked to recall the initial 15-word list. Retention is assessed with ~30-min delayed recall and recognition trials. While a 3-trial version of this test is considered an optional or supplemental NIH Toolbox measure, the traditional five trial version was administered to all patients. The standard clinical assessment AVLT was used since the AVLT was the reference standard to characterize memory function, and because the standard version has appropriate clinical validation in multiple diseases (Lezak, Howieson, & Loring, 2004; Strauss, Sherman, & Spreen, 2006). Age-based normative percentiles representing the sum across the five AVLT learning trials were calculated using Schmidt metanorms (Schmidt, 1996). Only the AVLT learning score was examined in comparison to the PSMT since no delayed PSMT recall measure exists. Performance Validity Tests Patients referred to the Cognitive Screening Clinic were administered the Word Memory Test (WMT), a computerized Performance Validity Tests (PVT) based upon patterns of verbal learning, recognition, and recall (Green, 2005). Most patients (n = 113) referred to the DBS Clinic were administered the Medical Symptom Validity Test (MSVT; Green, 2004), a PVT similar to the WMT but designed for use in medical rather than medico-legal settings (Howe, Anderson, Kaufman, Sachs, & Loring, 2007), although two DBS Clinic patients were administered the WMT. Of the 115 DBS Clinic patients with PVT testing, 84 obtained passing scores on the three primary validity indices of the WMT/MSVT and 13 patients were not administered any PVT. All 93 Cognitive Clinic patients were administered a stand-alone PVT (91 MSVT, 2 WMT); 63 patients had passing scores on all three MSVT or WMT primary validity indices. Results We first examined the effects of age and education upon memory performance. For AVLT learning, there was a statistically significant correlation with both age (r = 0.30, p < .0001) and education (r = 0.37, p < .0001). For PSMT, there was no significant relationship with age (r = 0.12, p = .12), although a significant correlation with education was observed (r = 0.30, p < .0001). The average percentile for AVLT learning was 39.9 (SD = 35.1) and the average PSMT percentile was 35.3 (SD = 28.1). In addition to Pearson correlations between task performance, we also calculated disattenuated coefficients, which adjusts the correlation based upon the attenuating effects of measurement error (Gravetter & Wallnau, 2007; Schmidt & Hunter, 1996) using software developed by Advanced Projects R&D Ltd. (http://www.pbarrett.net/Atten3/Attenuation.html#:1). Across all 221 patients, there was a significant correlation in test performance between measures (r = 0.48, p < .0001; disattenuated convergent validity coefficient = 0.69). When restricting the correlation to only those subjects with passing PVT scores (n = 138), the average AVLT percentile was 53.9 (SD = 33.4) and the average PSMT percentile was 42.0 (SD = 29.6). The correlation between these measures in this subsample was comparable to the full sample (r = 0.47, p < .0001; disattenuated convergent validity coefficient = 0.67). To examine classification agreement, test performance was characterized as either normal (>16th percentile), borderline (5th−16th percentile), or impaired (<5th percentile). As shown in Table 1, overall classification agreement between tests was 136/221 (62%). Ratings differed by one classification level for 61/221 (28%) cases (lower PSMT rating in 37 subjects (17%) and higher PSMT rating in 24 subjects (11%)). Difference of two classification levels was present in 21 (10%) subjects (higher PSMT rating in 14 subjects (6%) and lower PSMT rating in 7 subjects (3%)). Agreement assessed using a weighted Kappa score reflecting order was 0.42, which by convention reflects moderate agreement between measures (Gravetter & Wallnau, 2007). Table 1. Classification agreement between AVLT learning and PSMT from entire sample (n = 221) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 1. Classification agreement between AVLT learning and PSMT from entire sample (n = 221) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Because the sample included general clinical neurology referrals to the Cognitive Screening Clinic, there were concerns regarding task engagement that could potentially bias our findings compared to DBS Clinic patients, who are considered highly motivated in order to qualify for surgery. To ensure that poor task engagement was not contributing to our findings, we performed an additional classification using only those patients with “valid” classification thresholds on each of the three primary WMT or MSVT PVT indices resulting in a sample of 147 subjects. As shown in Table 2, classification agreement was present for 106/147 subjects (71%). Ratings differed by one classification level for 29/147 (20%) cases [higher PSMT rating in six subjects (4%) and lower PSMT rating in 23/147 subjects (16%)]. Rating difference of two classifications was present in 12/147 (8%) subjects [higher PSMT rating in 5/147 (3%) subjects and lower PSMT rating in seven (5%) subjects]. The weighted Kappa score reflecting order was 0.43, considered to indicate moderate agreement. Table 2. Classification agreement between AVLT learning and PSMT from subsample of subjects passing PVT using either the WMT or MSVT (n = 147) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Table 2. Classification agreement between AVLT learning and PSMT from subsample of subjects passing PVT using either the WMT or MSVT (n = 147) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Multiple-Level Likelihood Ratios We calculated multiple level likelihood ratios (LRs) using PSMT as the index test to be evaluated with three performance classification levels related to the reference standard AVLT using software provided by the University of Oxford Centre for Evidence-Based Medicine (http://www.cebm.net/blog/2014/06/09/catmaker-ebm-calculators/). Because LRs require the reference standard to be dichotomized and AVLT scores were initially classified into three categories, we conducted two sets of analyses, one of which grouped borderline AVLT with impaired AVLT learning scores to contrast with normal AVLT performance, and a second approach in which borderline AVLT was combined with normal scores to contrast with impaired AVLT scores. In both cases, high LRs indicate agreement on impairment (test positive) and low LRs indicate agreement on normality (test negative). As seen in Table 3, the LR associated with impaired PSMT predicting borderline/impaired AVLT learning for the entire sample was a modest 2.36 (95% CI = 1.38–4.04). A borderline PSMT modestly predicted the likelihood of a borderline/impaired AVLT (LR = 1.96, 95% CI = 0.85–4.45). A normal PSMT was associated with a decreased likelihood of borderline/impaired AVLT (LR = 0.42, 95% CI = 0.26–0.66). When examining only patients with passing scores on all three primary PVT indices, a similar pattern was present although the borderline LR 95% CI overlapped with the value 1.00 and was not statistically significant (see Table 4). Table 3. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for entire sample AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 3. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for entire sample AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 4. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for patients passing all primary PVT scores AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Table 4. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for patients passing all primary PVT scores AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Somewhat more useful LRs were obtained when grouping borderline scores with normal AVLT performance, with impaired PSMT predicting impaired AVLT. As seen in Table 5, the LR for impaired PSMT predicting impaired AVLT was 7.62 (95% CI = 3.54–16.42). The likelihood of an impaired AVLT with a borderline PSMT was also increased (LR = 2.67, 95% CI = 1.65–4.32). A decreased likelihood was observed for predicting impaired AVLT after obtaining a normal PSMT score (LR = 0.26, 95% CI = 0.17–0.40). A comparable pattern was observed examining patients with passing PVT scores (see Table 6). Impaired PSMT was associated with an increased likelihood of impaired AVLT performance (LR = 7.31, 95% CI = 3.28–16.33). Borderline PSMT was associated with a smaller but still statistically increased likelihood of obtaining an impaired AVLT score (LR = 3.20, 95% CI = 1.64–6.24). A normal PSMT was associated with a decreased likelihood of obtaining impaired AVLT performance (LR = 0.21, 95% CI = 0.10–0.43). Of note, however, are the relatively large confidence intervals associated with positive predictions. Table 5. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for entire sample AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 5. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for entire sample AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 6. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for patients passing all primary PVT scores PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 6. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for patients passing all primary PVT scores PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Because there were two distinct referral sources for patients (programmatic evaluation vs. clinical evaluations) and because these two groups differed in age in the context of the significant age correlation with AVLT learning score, we performed secondary classification analyses in each group (see Table 7). In the DBS population, impaired PSMT was associated with a non-significant increase in the likelihood of impaired/borderline AVLT (LR = 2.15, 95% CI = 0.77–6.02). However, borderline PSMT increased the likelihood of impaired/borderline AVLT (LR = 2.70, 95% CI = 1.47–4.94) and a normal PSMT was associated with a decreased likelihood of an impaired/borderline AVLT (LR = 0.55, 95% CI = 0.38–0.79). Table 7. Classification agreement between AVLT learning and PSMT by referral source AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 7. Classification agreement between AVLT learning and PSMT by referral source AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Impaired PSMT was associated with a non-significant increase in predicting impaired AVLT (LR = 2.26, 95% CI = 0.77–6.67). However, the relationship between borderline PSMT predicting impaired AVLT was significant (LR = 2.32, 95% CI = 1.29–4.15). Obtaining an impaired AVLT was significantly less likely with a normal PSMT score (LR = 0.46, 95% CI = 0.25–0.87). When PSMT performance ratings were used to predict impaired/borderline AVLT, a similar pattern was observed Similar analyses were conducted for Cognitive Screening Clinic patients. When PSMT performance ratings were used to predict impaired/borderline AVLT, a similar pattern was observed. Impaired PSMT was associated with a significant increase in the likelihood of impaired/borderline AVLT (LR = 13.95, 95% CI = 1.88–103.69). Borderline PSMT increased the likelihood of impaired/borderline AVLT (LR = 2.90, 95% CI = 1.32–6.38) and a normal PSMT was associated with a decreased likelihood of an impaired/borderline AVLT (LR = 0.40, 95% CI = 0.26–0.63). The relationship between impaired PSMT to impaired AVLT was significant (LR = 10.50, 95% CI = 2.45–44.97). Borderline PSMT was associated with a non-significant LR of 1.93 (95% CI = 0.96–3.85) and a normal PSMT was associated with significant decrease in the likelihood of obtaining an impaired AVLT score (LR = 0.39, 95% CI = 0.22–0.68). Discussion This report demonstrates that the PSMT and AVLT learning scores may generate different interpretive results. In the entire sample, classification agreement was present in 62% of subjects, partial agreement (i.e., deviating by only a single category) was present in 28%, and full disagreement (i.e., deviating by two classification categories) in 10%. In the subgroup with no suggestion of poor task engagement based upon PVT scores, full classification agreement was present in 72% of the sample, partial agreement present in 20% of the sample, and complete disagreement in 8% of the sample. Across all subjects, the correlation between PSMT percentile performance and AVLT learning percentile performance was statistically significant (r = 0.48; p < .0001), which is broadly lower than that reported in children (Bauer et al., 2013; r = 0.58) and healthy adults (r = 0.65; Dikmen et al., 2014), although as we discuss below, there are reasons other than sample composition that may be contributing to this discrepancy. Despite the high levels of statistical significance for PSMT and AVLT correlations, the effect sizes were modest. These data suggest that for patients without PVT performance indicating poor task engagement, the risk of fully discrepant classification between the two tests is 8%, with risk of misclassification approximately the same for both directions of misclassification. While PVT performance is not a primary focus of this evaluation, it does provide an objective measure to characterize patients whose task engagement was not in doubt. Although many DBS Clinic patients “failed” PVT testing based upon their failure to obtain “passing scores” on all three primary PVT indices, we are reluctant to infer that the patients were not sufficiently motivated to obtain valid neuropsychological test scores. This is a population that is highly motivated to perform well since their movement disorders often decrease quality of life and socialization, and interfere with activities of daily living. As has been previously argued (Loring et al., 2016; Bowden, Shores, & Mathias, 2006), with the exception of traumatic brain injury, PVT false positive error rates have been insufficiently studied in patients with established neurologic disease. When PVT failure occurs in these patients, there are multiple possible reasons for poor performance including the neurologic disease effects. Nevertheless, to eliminate questions regarding possible effects of insufficient task engagement, even if due to neurologic etiologies, we analyzed both the entire dataset as well as those patients for whom there were no concerns regarding task engagement based upon their PVT performance levels. We also note that 13 DBS who were not administered a PVT were excluded from this secondary analysis given the absence of any PVT score for classification. Based upon the multiple-level LR analyses, PSMT scores appear better at predicting impaired (<5th percentile) AVLT scores vs. non-impaired AVLT performance (5th percentile and higher) compared to PSMT predicting non-normal AVLT performance (impaired or borderline). For example, the LR = 7.62 for impaired PSMT predicting impaired AVLT is over twice that of impaired PSMT predicting impaired/borderline scores (LR = 2.36). This difference, however, should be tempered based upon differences in the 95% CIs, since the CI range for impaired PSMT predicting impaired AVLT is much larger than that associated with impaired PSMT predicting impaired or borderline AVLT (12.88 vs. 2.66). These data do not necessarily indicate that PSMT cannot provide clinically relevant information about memory systems, but rather that the clinical classifications based upon available percentile ratings often generate different interpretative results. For example, using different methods of validation in which test sensitivity is established based upon relationships with established lateralized temporal lobe disease in patients with medically refractory epilepsy, story memory recall using the Logical Memory subtest of the Wechsler Memory Scale is not helpful in identifying lateralized memory impairment (Umfleet et al., 2015). However, story memory tests are superior to identifying cognitive side effects associated with various antiepilepsy medications which is thought to be related to attentional and processing speed characteristics of the task (Meador et al., 1995). Even memory tasks with superficial structural similarity such as the Rey AVLT and California Verbal Learning Test have different sensitivities to detecting lateralized temporal lobe impairment with epilepsy (Loring et al., 2008). Validation studies are in their early stages for both pediatric and adult versions of the PSMT. In a pediatric validation study in 208 healthy children ranging between 3 and 15 years of age, the PSMT was found to be significantly correlated with age (r = 0.78), which was of stronger magnitude that the correlation between age and performance on the sentence repetition subtest of the NEPSY-II (r = 0.58; Bauer et al., 2013). Test–retest reliability was good (intraclass correlation coefficient = .76). In a validation study in 268 healthy adults relating the PSMT to 3-trial Toolbox AVLT scores, a strong correlation between tests was reported (r = 0.65; Dikmen et al., 2014). However, both pediatric and adult studies report data using a slightly different PSMT version that includes three rather than two learning tasks, and therefore, generalization to the current two trial version of the PSMT should be made with caution. As observed in both pediatric and adult studies, “three trials were administered to improve test score variability and test–retest reliability.” Thus, decreasing PSMT learning trials by 1/3 likely alters psychometric characteristics compared to those initial validation studies, although we have been unable to find published reports describing the psychometric effects of changing the number of PSMT learning trials. The correlation of 0.48 between AVLT and PSMT measures is lower than findings by both Bauer and colleagues (2013) and Dikmen and colleagues (2014). As noted, these studies employed a 3-trial version of the PSMT, and in the adult validation study, PSMT performance levels were derived from the sample under study and for all subjects, who ranged from 20 to 85 years. Thus, it is likely that the effects of normal aging in both validation studies may have contributed to higher correlations between memory measures due to this shared aging variance. Nevertheless, an initial clinical validation study has demonstrated evidence for the appears to use a 2-trial test version PSMT utility to discriminate across a variety of clinical groups (Carlozzi et al., 2017), PSMT was found to discriminate patients with spinal cord injury (M = 47.2, SD = 9.7) from both patients with mild-severe TBI (M = 42.4, SD = 10.9) or with mild-severe stroke (ischemic or hemorrhagic) but without significant aphasia (M = 42.6, SD = 12.4). TBI and stroke patients did not differ from each other. Despite the modest correlations between measures, the disattenuated convergent validity coefficients of approximately 0.7, which reflects adjustment due to attenuating measurement error, indicates both tests are assessing a similar memory construct. One of the difficulties in test validation is the selection of the gold standard reference. In this case, the AVLT serves as a gold standard, although the AVLT is also associated with non-perfect reliability. For example, when computing the disattenuated coefficients, we used a reliability of 0.7 (Delaney, Prevey, Cramer, & Mattson, 1992), which is slightly lower than the 0.76 reported by Bauer and colleagues (2013). By convention, the gold-standard measure should be more valid than the index test, and this may not necessary be the case with these two measures. Consequently, additional correlational and classification studies of these measures will provide additional detail for the identification of any gold standard memory measure. One limitation of this report is the reference normative databases. When contrasting test performance levels, AVLT sum percentile was generated from metanorms based upon multiple reports rather than a formal normative sample with age being the only demographic correction (Schmidt, 1996). In contrast, formal normative sampling was performed for the PSMT and reported percentile values were adjusted for age, gender, race, and education. Thus, percentile values to characterize performances for both tasks were obtained from different normative databases with differing degrees of demographic correction. Although the 3-trial version of the AVLT is considered a component of the Cognitive Toolbox, presently there are insufficient norms for the shortened version available to guide interpretation. The validation studies that have been done against a 3-trial AVLT have been correlative rather than reflecting clinical classification, and consequently do not need percentile values to establish magnitude of association. In addition, although the development and validation of the NIH Cognitive Toolbox tests were conducted using desktop computer assessment, there are no data comparing desktop versus iPad assessments in terms of psychometric equivalence. In summary, the data from the present report indicate that the clinical inference regarding memory function from the PSMT and Rey AVLT are not strictly equivalent, although the rate of major disagreement of the two test classification levels is relatively small. Ultimately, it will be up to individual clinicians to determine whether only one measure can be used without other measures of memory assessment in the clinical environment in which they practice. Further studies should examine the relationship of both tests to various clinical factors including clinical diagnosis, independent neuropsychological tests, and lesion location and lateralization. Conflict of interest None declared. References Bauer , P. J. , Dikmen , S. S. , Heaton , R. K. , Mungas , D. , Slotkin , J. , & Beaumont , J. L. ( 2013 ). III. NIH Toolbox Cognition Battery (CB): Measuring episodic memory . Monographs of the Society for Research in Child Development , 78 , 34 – 48 . doi:10.1111/mono.12033 . Google Scholar CrossRef Search ADS PubMed Bleck , T. P. , Nowinski , C. J. , Gershon , R. , & Koroshetz , W. J. ( 2013 ). What is the NIH toolbox, and what will it mean to neurology? Neurology , 80 , 874 – 875 . doi:10.1212/WNL.0b013e3182872ea0 . Google Scholar CrossRef Search ADS PubMed Boake , C. ( 2000 ). Édouard Claparède and the Auditory Verbal Learning Test . Journal of Clinical and Experimental Neuropsychology , 22 , 286 – 292 . Google Scholar CrossRef Search ADS PubMed Bowden , S. C. , Shores , E. A. & Mathias , J. L. ( 2006 ). Does effort suppress cognition after traumatic brain injury? A re-examination of the evidence for the Word Memory Test . The Clinical Neuropsychologist , 20 , 858 – 872 . doi:10.1080/13854040500246935 . Google Scholar CrossRef Search ADS PubMed Carlozzi , N. E. , Goodnight , S. , Casaletto , K. B. , Goldsmith , A. , Heaton , R. K. , Wong , A. W. K. , et al. . ( 2017 ). Validation of the NIH Toolbox in individuals with neurologic disorders . Archives of Clinical Neuropsychology , 32 , 555 – 573 . doi:10.1093/arclin/acx020 . Google Scholar CrossRef Search ADS PubMed Casaletto , K. B. , Umlauf , A. , Beaumont , J. , Gershon , R. , Slotkin , J. , Akshoomoff , N. , et al. . ( 2015 ). Demographically corrected normative standards for the English version of the NIH Toolbox Cognition Battery . Journal of the International Neuropsychological Society , 21 , 378 – 391 . doi:10.1017/s1355617715000351 . Google Scholar CrossRef Search ADS PubMed Delaney , R. C. , Prevey , M. L. , Cramer , J. , & Mattson , R. H. ( 1992 ). Test-retest comparability and control subject data for the Rey-Auditory Verbal Learning Test and Rey-Osterrieth/Taylor Complex Figures . Archives of Clinical Neuropsychology , 7 , 523 – 528 . doi:088761779290142A . Google Scholar PubMed Dikmen , S. S. , Bauer , P. J. , Weintraub , S. , Mungas , D. , Slotkin , J. , Beaumont , J. L. , et al. . ( 2014 ). Measuring episodic memory across the lifespan: NIH Toolbox Picture Sequence Memory Test . Journal of the International Neuropsychological Society , 20 , 611 – 619 . doi:10.1017/S1355617714000460 . Google Scholar CrossRef Search ADS PubMed Gravetter , F. J. , & Wallnau , L. B. ( 2007 ). Statistics for the behavioral sciences ( 7th ed. ). Belmont, CA : Thomas Wadsworth . Green , P. ( 2004 ). Green’s Medical Symptom Validity Test (MSVT) for Microsoft Windows: User’s manual . Edmonton, Canada : Green’s Publishing . Green , P. ( 2005 ). Green’s Word Memory Test for Windows User’s Manual-Revised . Edmonton, Alberta : Green’s Publishing . Howe , L. L. , Anderson , A. M. , Kaufman , D. A. , Sachs , B. C. , & Loring , D. W. ( 2007 ). Characterization of the Medical Symptom Validity Test in evaluation of clinically referred memory disorders clinic patients . Archives of Clinical Neuropsychology , 22 , 753 – 761 . Google Scholar CrossRef Search ADS PubMed Lezak , M. D. , Howieson , D. B. , & Loring , D. W. ( 2004 ). Neuropsychological assessment ( 4th ed. ). New York : Oxford University Press . Loring , D. W. , Goldstein , F. C. , Chen , C. , Drane , D. L. , Lah , J. J. , Zhao , L. , et al. . ( 2016 ). False-positive error rates for Reliable Digit Span and Auditory Verbal Learning Test performance validity measures in amnestic Mild Cognitive Impairment and Early Alzheimer Disease . Archives of Clinical Neuropsychology , 31 , 313 – 331 . doi:10.1093/arclin/acw014 . Google Scholar CrossRef Search ADS PubMed Loring , D. W. , & Hermann , B. P. ( 2011 ). Neuropsychology and the Epilepsy Common Data Elements Project. In Helmstaedter C. , Hermann B. , Lassonde M. , Kahane P. , & Arzimanoglou A. (Eds.) , Neuropsychology in the care of people with epilepsy , Vol. 11 , pp. 59 – 65 ). Surrey, UK : John Libbey Eurotext . Loring , D. W. , Strauss , E. , Hermann , B. P. , Barr , W. B. , Perrine , K. , Trenerry , M. R. , et al. . ( 2008 ). Differential neuropsychological test sensitivity to left temporal lobe epilepsy . Journal of the International Neuropsychological Society , 14 , 394 – 400 . Google Scholar CrossRef Search ADS PubMed Meador , K. J. , Loring , D. W. , Moore , E. E. , Thompson , W. O. , Nichols , M. E. , Oberzan , R. E. , et al. . ( 1995 ). Comparative cognitive effects of phenobarbital, phenytoin, and valproate in healthy adults . Neurology , 45 , 1494 – 1499 . Google Scholar CrossRef Search ADS PubMed Rey , A. ( 1964 ). L’examen clinique en psychologie . Paris : Presses Universitaires de France . Schmidt , M. ( 1996 ). Rey Auditory and Verbal Learning Test: A handbook . Los Angeles : Western Psychological Services . Schmidt , F. L. , & Hunter , J. E. ( 1996 ). Measurement error in psychological research: Lessons from 26 research scenarios . Psychological Methods , 1 , 199 – 233 . Google Scholar CrossRef Search ADS Strauss , E. , Sherman , E. M. S. , & Spreen , O. ( 2006 ). A compendium of neuropsychological tests: Administration, norms, and commentary ( 3rd ed. ). New York : Oxford University Press . Umfleet , L. G. , Janecek , J. K. , Quasney , E. , Sabsevitz , D. S. , Ryan , J. J. , Binder , J. R. , et al. . ( 2015 ). Sensitivity and specificity of memory and naming tests for identifying left temporal-lobe epilepsy . Applied Neuropsychology: Adult , 22 , 189 – 196 . doi:10.1080/23279095.2014.895366 . Google Scholar CrossRef Search ADS PubMed Weintraub , S. , Dikmen , S. S. , Heaton , R. K. , Tulsky , D. S. , Zelazo , P. D. , Bauer , P. J. , et al. . ( 2013 ). Cognition assessment using the NIH Toolbox . Neurology , 80 , S54 – S64 . doi:10.1212/WNL.0b013e3182872ded . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Archives of Clinical Neuropsychology Oxford University Press

NIH Toolbox Picture Sequence Memory Test for Assessing Clinical Memory Function: Diagnostic Relationship to the Rey Auditory Verbal Learning Test

Loading next page...
 
/lp/ou_press/nih-toolbox-picture-sequence-memory-test-for-assessing-clinical-memory-XyHgZ0HZ31
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ISSN
0887-6177
eISSN
1873-5843
D.O.I.
10.1093/arclin/acy028
Publisher site
See Article on Publisher Site

Abstract

Abstract Background The NIH Cognitive Toolbox Picture Sequence Memory Test (PSMT) was developed as a measure of learning ability. PSMT use in clinical populations is only beginning to be investigated. Method PSMT performance was analyzed in a retrospective series of 221 patients referred to either the Deep Brain Stimulation Clinic (n = 128) for presurgical evaluation, or to the Cognitive Screening Clinic (n = 93). Patients were also administered the Rey Verbal Auditory Verbal Learning test (AVLT). In addition to correlation between measures, classification agreement was examined based upon performance ratings of normal (>16th percentile), borderline (5–16th percentile), or impaired (<5th percentile). Results Correlation between measures was significant (r = 0.48, p < .0001), with classification agreement of 62% (weighted Kappa = 0.43). For patients with valid PVT scores (n = 147), correlation between tests was 0.67 (p < .0001) with a classification agreement of 72% (weighted Kappa = 0.44). Multiple level likelihood ratios (LRs) relating PSMT to various dichotomous AVLT learning classifications were modest, with the largest group LR obtained for impaired PSMT increasing the likelihood of obtaining impaired AVLT by 7.62 (95% CI = 3.54–16.42). Conclusion Despite significant correlations between measures, the NIH Toolbox PSMT and AVLT learning score often generate different interpretive results. Impaired PSMT appears better at predicting impaired AVLT performance rather than predicting combined borderline/impaired AVLT performance. Ultimately, individual clinicians will need to determine whether the PSMT can be used independently without other memory tests in the clinical environment in which they practice while further validation studies are performed. NIH Cognitive Toolbox, Picture Sequence Memory Test, Rey Auditory Verbal Learning Test, Diagnostic accuracy, Logistic models One feature of neuropsychological testing is that to obtain accurate trait estimation, assessments are time and labor-intensive, associated with many hours of testing and scoring. In part due to this burden, the availability of cognitive testing to either aid in clinical decision making or to include as part of large-scale multi-center projects is often limited. Another concern, particularly in the context of big data, is that multiple neuropsychological tests exist to assess the same cognitive constructs (Loring & Hermann, 2011). Different test usage decreases the opportunity for data aggregation and data integration, not only across studies, but also across diseases. To address these concerns, the National Institutes of Health (NIH) developed the NIH Toolbox, which consists of four independent modules to assess cognitive, emotional, sensory, and motor functioning (Bleck, Nowinski, Gershon, & Koroshetz, 2013). The NIH Cognitive Toolbox module provides a ~30–45 min assessment that includes episodic memory, executive function and attention, working memory, language, and processing speed (Weintraub et al., 2013). The cognitive tasks were designed so that the same measures or assessment paradigms could be employed from infancy through old age. Although originally developed for desktop computer use, the iPad is now the preferred administration platform and thus permits assessments outside the traditional neuropsychology laboratory setting. In addition to the computerized tests developed for the NIH Cognitive Toolbox, there are two supplemental Toolbox measures—a 3-trial version of the Rey Auditory Verbal Learning Test and Oral Symbol Digit Test. The supplemental tasks are not computer dependent, and limited normative data presently exist. Although the NIH Cognitive Toolbox has potential as a cognitive screening instrument due to its brevity and ease of administration, it is necessary to understand how Toolbox cognitive measures correspond to traditional paper and pencil approaches to neuropsychological assessment. In the present report, we compare the NIH Cognitive Toolbox Picture Sequence Memory Test (PSMT) to the Rey Auditory Verbal Learning Test (AVLT). Because PSMT consists of only two learning trials with no delayed recall, we examined the relationship of PSMT to AVLT learning across trials as a direct comparison of comparable learning constructs. Methods Subjects Subjects were retrospectively identified following approval by the institutional IRB and included 221 patients referred to one of two Neuropsychology Service specialty clinics in which the NIH Cognitive Toolbox was part of the assessment protocol. The specialty clinics were: (1) Deep Brain Stimulation (DBS) Clinic, which provides neuropsychological assessment of DBS candidates for the treatment of movement disorders (n = 128) and which also included two patients evaluated in this clinic for more general cognitive concerns, and (2) Cognitive Screening Clinic, which provides brief cognitive evaluation of patients referred from the General Neurology Clinic with possible cognitive impairment to identify whether more comprehensive testing would be beneficial (n = 93). Patients were tested between November 17, 2015 and August 3, 2017. Subjects represent a subsample of 340 consecutive referrals for neuropsychological evaluation performed during this period, the sampling flow diagram of patient recruitment is presented in Fig. 1. Fig. 1. View largeDownload slide Flow diagram of establishing patient recruitment. Fig. 1. View largeDownload slide Flow diagram of establishing patient recruitment. DBS Clinic patients averaged 63.2 (SD = 10.9) years of age and 14.5 (SD = 2.8) years of education, including 51 females and 77 males. Cognitive Screening Clinic referrals averaged 47.7 (SD = 16.4) years of age and 14.7 (SD = 2.4) years of education; there were 75 females and 18 males. Memory Measures Picture Sequence Memory Test This NIH Cognitive Toolbox measure consists of objects and activities that are presented sequentially. The task is to learn the sequence of pictures across two learning trials, and scores are based upon the ability to recall each adjacent pair of pictures. Thus, the final score reflects the cumulative number of adjacent pairs of pictures remembered correctly over both learning trials. There is no delayed recall component. Test–retest PSMT reliability for healthy adults living in the community ranging in age from 20 to 85 years is 0.77 (95% CI = 0.67–0.84) (Weintraub et al., 2013). The PSMT was developed from imitation-based memory tasks used in infants and young children (Bauer et al., 2013). This approach to memory testing was included in the NIH Cognitive Toolbox since it could be adapted for use across the lifespan from 3 to 85 years. Demographically adjusted normative percentiles including age, education, and gender based upon the NIH normative sample were used (Casaletto et al., 2015). There were 26 patients tested using the desktop Cognitive Toolbox, and 195 patients tested using the iPad. Rey Auditory Verbal Learning Test This is a serial word learning task in which 15 words are presented over five learning trials (Boake, 2000; Rey, 1964). Following the fifth presentation, a sixth trial is administered with 15 new words, and after free-recall of the second list, the patient is asked to recall the initial 15-word list. Retention is assessed with ~30-min delayed recall and recognition trials. While a 3-trial version of this test is considered an optional or supplemental NIH Toolbox measure, the traditional five trial version was administered to all patients. The standard clinical assessment AVLT was used since the AVLT was the reference standard to characterize memory function, and because the standard version has appropriate clinical validation in multiple diseases (Lezak, Howieson, & Loring, 2004; Strauss, Sherman, & Spreen, 2006). Age-based normative percentiles representing the sum across the five AVLT learning trials were calculated using Schmidt metanorms (Schmidt, 1996). Only the AVLT learning score was examined in comparison to the PSMT since no delayed PSMT recall measure exists. Performance Validity Tests Patients referred to the Cognitive Screening Clinic were administered the Word Memory Test (WMT), a computerized Performance Validity Tests (PVT) based upon patterns of verbal learning, recognition, and recall (Green, 2005). Most patients (n = 113) referred to the DBS Clinic were administered the Medical Symptom Validity Test (MSVT; Green, 2004), a PVT similar to the WMT but designed for use in medical rather than medico-legal settings (Howe, Anderson, Kaufman, Sachs, & Loring, 2007), although two DBS Clinic patients were administered the WMT. Of the 115 DBS Clinic patients with PVT testing, 84 obtained passing scores on the three primary validity indices of the WMT/MSVT and 13 patients were not administered any PVT. All 93 Cognitive Clinic patients were administered a stand-alone PVT (91 MSVT, 2 WMT); 63 patients had passing scores on all three MSVT or WMT primary validity indices. Results We first examined the effects of age and education upon memory performance. For AVLT learning, there was a statistically significant correlation with both age (r = 0.30, p < .0001) and education (r = 0.37, p < .0001). For PSMT, there was no significant relationship with age (r = 0.12, p = .12), although a significant correlation with education was observed (r = 0.30, p < .0001). The average percentile for AVLT learning was 39.9 (SD = 35.1) and the average PSMT percentile was 35.3 (SD = 28.1). In addition to Pearson correlations between task performance, we also calculated disattenuated coefficients, which adjusts the correlation based upon the attenuating effects of measurement error (Gravetter & Wallnau, 2007; Schmidt & Hunter, 1996) using software developed by Advanced Projects R&D Ltd. (http://www.pbarrett.net/Atten3/Attenuation.html#:1). Across all 221 patients, there was a significant correlation in test performance between measures (r = 0.48, p < .0001; disattenuated convergent validity coefficient = 0.69). When restricting the correlation to only those subjects with passing PVT scores (n = 138), the average AVLT percentile was 53.9 (SD = 33.4) and the average PSMT percentile was 42.0 (SD = 29.6). The correlation between these measures in this subsample was comparable to the full sample (r = 0.47, p < .0001; disattenuated convergent validity coefficient = 0.67). To examine classification agreement, test performance was characterized as either normal (>16th percentile), borderline (5th−16th percentile), or impaired (<5th percentile). As shown in Table 1, overall classification agreement between tests was 136/221 (62%). Ratings differed by one classification level for 61/221 (28%) cases (lower PSMT rating in 37 subjects (17%) and higher PSMT rating in 24 subjects (11%)). Difference of two classification levels was present in 21 (10%) subjects (higher PSMT rating in 14 subjects (6%) and lower PSMT rating in 7 subjects (3%)). Agreement assessed using a weighted Kappa score reflecting order was 0.42, which by convention reflects moderate agreement between measures (Gravetter & Wallnau, 2007). Table 1. Classification agreement between AVLT learning and PSMT from entire sample (n = 221) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 1. Classification agreement between AVLT learning and PSMT from entire sample (n = 221) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 17 (8%) 17 (8%) 7 (3%) 41 (19%) PSMT Borderline 20 (9%) 14 (6%) 20 (9%) 54 (24%) PSMT Normal 14 (6%) 4 (2%) 108 (49%) 126 (57%) Total 51 (23%) 35 (16%) 135 (61%) 221 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Because the sample included general clinical neurology referrals to the Cognitive Screening Clinic, there were concerns regarding task engagement that could potentially bias our findings compared to DBS Clinic patients, who are considered highly motivated in order to qualify for surgery. To ensure that poor task engagement was not contributing to our findings, we performed an additional classification using only those patients with “valid” classification thresholds on each of the three primary WMT or MSVT PVT indices resulting in a sample of 147 subjects. As shown in Table 2, classification agreement was present for 106/147 subjects (71%). Ratings differed by one classification level for 29/147 (20%) cases [higher PSMT rating in six subjects (4%) and lower PSMT rating in 23/147 subjects (16%)]. Rating difference of two classifications was present in 12/147 (8%) subjects [higher PSMT rating in 5/147 (3%) subjects and lower PSMT rating in seven (5%) subjects]. The weighted Kappa score reflecting order was 0.43, considered to indicate moderate agreement. Table 2. Classification agreement between AVLT learning and PSMT from subsample of subjects passing PVT using either the WMT or MSVT (n = 147) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Table 2. Classification agreement between AVLT learning and PSMT from subsample of subjects passing PVT using either the WMT or MSVT (n = 147) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total PSMT Impaired 6 (4%) 10 (7%) 7 (5%) 23 (16%) PSMT Borderline 5 (3%) 8 (5%) 13 (9%) 26 (18%) PSMT Normal 5 (3%) 1 (1%) 92 (63%) 98 (67%) Total 16 (11%) 19 (13%) 112 (76%) 147 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Multiple-Level Likelihood Ratios We calculated multiple level likelihood ratios (LRs) using PSMT as the index test to be evaluated with three performance classification levels related to the reference standard AVLT using software provided by the University of Oxford Centre for Evidence-Based Medicine (http://www.cebm.net/blog/2014/06/09/catmaker-ebm-calculators/). Because LRs require the reference standard to be dichotomized and AVLT scores were initially classified into three categories, we conducted two sets of analyses, one of which grouped borderline AVLT with impaired AVLT learning scores to contrast with normal AVLT performance, and a second approach in which borderline AVLT was combined with normal scores to contrast with impaired AVLT scores. In both cases, high LRs indicate agreement on impairment (test positive) and low LRs indicate agreement on normality (test negative). As seen in Table 3, the LR associated with impaired PSMT predicting borderline/impaired AVLT learning for the entire sample was a modest 2.36 (95% CI = 1.38–4.04). A borderline PSMT modestly predicted the likelihood of a borderline/impaired AVLT (LR = 1.96, 95% CI = 0.85–4.45). A normal PSMT was associated with a decreased likelihood of borderline/impaired AVLT (LR = 0.42, 95% CI = 0.26–0.66). When examining only patients with passing scores on all three primary PVT indices, a similar pattern was present although the borderline LR 95% CI overlapped with the value 1.00 and was not statistically significant (see Table 4). Table 3. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for entire sample AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 3. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for entire sample AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 17 24 2.36 (1.38–4.04) PSMT Borderline 20 34 1.96 (1.24–3.09) PSMT Normal 14 112 0.42 (0.26–0.66) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 4. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for patients passing all primary PVT scores AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Table 4. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to borderline/impaired AVLT Learning vs. normal AVLT scores for patients passing all primary PVT scores AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT Borderline/Impaired AVLT Normal Likelihood ratio (95% CI) PSMT Impaired 6 17 2.89 (1.33–6.26) PSMT Borderline 5 21 1.95 (0.85–4.45) PSMT Normal 5 93 0.44 (0.21–0.92) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test; PVT = Performance Validity Test; WMT = Word Memory Test; MSVT = Medical Symptom Validity Test. Somewhat more useful LRs were obtained when grouping borderline scores with normal AVLT performance, with impaired PSMT predicting impaired AVLT. As seen in Table 5, the LR for impaired PSMT predicting impaired AVLT was 7.62 (95% CI = 3.54–16.42). The likelihood of an impaired AVLT with a borderline PSMT was also increased (LR = 2.67, 95% CI = 1.65–4.32). A decreased likelihood was observed for predicting impaired AVLT after obtaining a normal PSMT score (LR = 0.26, 95% CI = 0.17–0.40). A comparable pattern was observed examining patients with passing PVT scores (see Table 6). Impaired PSMT was associated with an increased likelihood of impaired AVLT performance (LR = 7.31, 95% CI = 3.28–16.33). Borderline PSMT was associated with a smaller but still statistically increased likelihood of obtaining an impaired AVLT score (LR = 3.20, 95% CI = 1.64–6.24). A normal PSMT was associated with a decreased likelihood of obtaining impaired AVLT performance (LR = 0.21, 95% CI = 0.10–0.43). Of note, however, are the relatively large confidence intervals associated with positive predictions. Table 5. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for entire sample AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 5. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for entire sample AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) PSMT Impaired 34 7 7.62 (3.54–16.42) PSMT Borderline 34 20 2.67 (1.65–4.32) PSMT Normal 18 108 0.26 (0.17–0.40) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 6. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for patients passing all primary PVT scores PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 6. Multiple level likelihood ratio for Picture Sequence Memory Test classification related to impaired AVLT Learning Scores for normal vs. normal/borderline AVLT scores for patients passing all primary PVT scores PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) PSMT AVLT Impaired AVLT Normal/Borderline Likelihood ratio (95% CI) Impaired 16 7 7.31 (3.28–16.33) Borderline 13 13 3.20 (1.64–6.24) Normal 6 92 0.21 (0.10–0.43) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Because there were two distinct referral sources for patients (programmatic evaluation vs. clinical evaluations) and because these two groups differed in age in the context of the significant age correlation with AVLT learning score, we performed secondary classification analyses in each group (see Table 7). In the DBS population, impaired PSMT was associated with a non-significant increase in the likelihood of impaired/borderline AVLT (LR = 2.15, 95% CI = 0.77–6.02). However, borderline PSMT increased the likelihood of impaired/borderline AVLT (LR = 2.70, 95% CI = 1.47–4.94) and a normal PSMT was associated with a decreased likelihood of an impaired/borderline AVLT (LR = 0.55, 95% CI = 0.38–0.79). Table 7. Classification agreement between AVLT learning and PSMT by referral source AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Table 7. Classification agreement between AVLT learning and PSMT by referral source AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT Impaired AVLT Borderline AVLT Normal Total DBS  PSMT Impaired 4 (3%) 3 (2%) 6 (5%) 13 (10%)  PSMT Borderline 10 (8%) 9 (7%) 13 (10%) 32 (25%)  PSMT Normal 7 (5%) 12 (9%) 64 (50%) 83 (65%)  Total 21 (16%) 24 (19%) 83 (65%) 128 (100%) Screening Clinic  PSMT Impaired 10 (11%) 1 (1%) 1 (1%) 12 (13%)  PSMT Borderline 11 (12%) 5 (5%) 7 (8%) 23 (25%)  PSMT Normal 9 (10%) 5 (5%) 44 (47%) 58 (62%)  Total 30 (32%) 11 (12%) 52 (56%) 93 (100%) AVLT = Auditory Verbal Learning Test; PSMT = Picture Sequence Memory Test. Impaired PSMT was associated with a non-significant increase in predicting impaired AVLT (LR = 2.26, 95% CI = 0.77–6.67). However, the relationship between borderline PSMT predicting impaired AVLT was significant (LR = 2.32, 95% CI = 1.29–4.15). Obtaining an impaired AVLT was significantly less likely with a normal PSMT score (LR = 0.46, 95% CI = 0.25–0.87). When PSMT performance ratings were used to predict impaired/borderline AVLT, a similar pattern was observed Similar analyses were conducted for Cognitive Screening Clinic patients. When PSMT performance ratings were used to predict impaired/borderline AVLT, a similar pattern was observed. Impaired PSMT was associated with a significant increase in the likelihood of impaired/borderline AVLT (LR = 13.95, 95% CI = 1.88–103.69). Borderline PSMT increased the likelihood of impaired/borderline AVLT (LR = 2.90, 95% CI = 1.32–6.38) and a normal PSMT was associated with a decreased likelihood of an impaired/borderline AVLT (LR = 0.40, 95% CI = 0.26–0.63). The relationship between impaired PSMT to impaired AVLT was significant (LR = 10.50, 95% CI = 2.45–44.97). Borderline PSMT was associated with a non-significant LR of 1.93 (95% CI = 0.96–3.85) and a normal PSMT was associated with significant decrease in the likelihood of obtaining an impaired AVLT score (LR = 0.39, 95% CI = 0.22–0.68). Discussion This report demonstrates that the PSMT and AVLT learning scores may generate different interpretive results. In the entire sample, classification agreement was present in 62% of subjects, partial agreement (i.e., deviating by only a single category) was present in 28%, and full disagreement (i.e., deviating by two classification categories) in 10%. In the subgroup with no suggestion of poor task engagement based upon PVT scores, full classification agreement was present in 72% of the sample, partial agreement present in 20% of the sample, and complete disagreement in 8% of the sample. Across all subjects, the correlation between PSMT percentile performance and AVLT learning percentile performance was statistically significant (r = 0.48; p < .0001), which is broadly lower than that reported in children (Bauer et al., 2013; r = 0.58) and healthy adults (r = 0.65; Dikmen et al., 2014), although as we discuss below, there are reasons other than sample composition that may be contributing to this discrepancy. Despite the high levels of statistical significance for PSMT and AVLT correlations, the effect sizes were modest. These data suggest that for patients without PVT performance indicating poor task engagement, the risk of fully discrepant classification between the two tests is 8%, with risk of misclassification approximately the same for both directions of misclassification. While PVT performance is not a primary focus of this evaluation, it does provide an objective measure to characterize patients whose task engagement was not in doubt. Although many DBS Clinic patients “failed” PVT testing based upon their failure to obtain “passing scores” on all three primary PVT indices, we are reluctant to infer that the patients were not sufficiently motivated to obtain valid neuropsychological test scores. This is a population that is highly motivated to perform well since their movement disorders often decrease quality of life and socialization, and interfere with activities of daily living. As has been previously argued (Loring et al., 2016; Bowden, Shores, & Mathias, 2006), with the exception of traumatic brain injury, PVT false positive error rates have been insufficiently studied in patients with established neurologic disease. When PVT failure occurs in these patients, there are multiple possible reasons for poor performance including the neurologic disease effects. Nevertheless, to eliminate questions regarding possible effects of insufficient task engagement, even if due to neurologic etiologies, we analyzed both the entire dataset as well as those patients for whom there were no concerns regarding task engagement based upon their PVT performance levels. We also note that 13 DBS who were not administered a PVT were excluded from this secondary analysis given the absence of any PVT score for classification. Based upon the multiple-level LR analyses, PSMT scores appear better at predicting impaired (<5th percentile) AVLT scores vs. non-impaired AVLT performance (5th percentile and higher) compared to PSMT predicting non-normal AVLT performance (impaired or borderline). For example, the LR = 7.62 for impaired PSMT predicting impaired AVLT is over twice that of impaired PSMT predicting impaired/borderline scores (LR = 2.36). This difference, however, should be tempered based upon differences in the 95% CIs, since the CI range for impaired PSMT predicting impaired AVLT is much larger than that associated with impaired PSMT predicting impaired or borderline AVLT (12.88 vs. 2.66). These data do not necessarily indicate that PSMT cannot provide clinically relevant information about memory systems, but rather that the clinical classifications based upon available percentile ratings often generate different interpretative results. For example, using different methods of validation in which test sensitivity is established based upon relationships with established lateralized temporal lobe disease in patients with medically refractory epilepsy, story memory recall using the Logical Memory subtest of the Wechsler Memory Scale is not helpful in identifying lateralized memory impairment (Umfleet et al., 2015). However, story memory tests are superior to identifying cognitive side effects associated with various antiepilepsy medications which is thought to be related to attentional and processing speed characteristics of the task (Meador et al., 1995). Even memory tasks with superficial structural similarity such as the Rey AVLT and California Verbal Learning Test have different sensitivities to detecting lateralized temporal lobe impairment with epilepsy (Loring et al., 2008). Validation studies are in their early stages for both pediatric and adult versions of the PSMT. In a pediatric validation study in 208 healthy children ranging between 3 and 15 years of age, the PSMT was found to be significantly correlated with age (r = 0.78), which was of stronger magnitude that the correlation between age and performance on the sentence repetition subtest of the NEPSY-II (r = 0.58; Bauer et al., 2013). Test–retest reliability was good (intraclass correlation coefficient = .76). In a validation study in 268 healthy adults relating the PSMT to 3-trial Toolbox AVLT scores, a strong correlation between tests was reported (r = 0.65; Dikmen et al., 2014). However, both pediatric and adult studies report data using a slightly different PSMT version that includes three rather than two learning tasks, and therefore, generalization to the current two trial version of the PSMT should be made with caution. As observed in both pediatric and adult studies, “three trials were administered to improve test score variability and test–retest reliability.” Thus, decreasing PSMT learning trials by 1/3 likely alters psychometric characteristics compared to those initial validation studies, although we have been unable to find published reports describing the psychometric effects of changing the number of PSMT learning trials. The correlation of 0.48 between AVLT and PSMT measures is lower than findings by both Bauer and colleagues (2013) and Dikmen and colleagues (2014). As noted, these studies employed a 3-trial version of the PSMT, and in the adult validation study, PSMT performance levels were derived from the sample under study and for all subjects, who ranged from 20 to 85 years. Thus, it is likely that the effects of normal aging in both validation studies may have contributed to higher correlations between memory measures due to this shared aging variance. Nevertheless, an initial clinical validation study has demonstrated evidence for the appears to use a 2-trial test version PSMT utility to discriminate across a variety of clinical groups (Carlozzi et al., 2017), PSMT was found to discriminate patients with spinal cord injury (M = 47.2, SD = 9.7) from both patients with mild-severe TBI (M = 42.4, SD = 10.9) or with mild-severe stroke (ischemic or hemorrhagic) but without significant aphasia (M = 42.6, SD = 12.4). TBI and stroke patients did not differ from each other. Despite the modest correlations between measures, the disattenuated convergent validity coefficients of approximately 0.7, which reflects adjustment due to attenuating measurement error, indicates both tests are assessing a similar memory construct. One of the difficulties in test validation is the selection of the gold standard reference. In this case, the AVLT serves as a gold standard, although the AVLT is also associated with non-perfect reliability. For example, when computing the disattenuated coefficients, we used a reliability of 0.7 (Delaney, Prevey, Cramer, & Mattson, 1992), which is slightly lower than the 0.76 reported by Bauer and colleagues (2013). By convention, the gold-standard measure should be more valid than the index test, and this may not necessary be the case with these two measures. Consequently, additional correlational and classification studies of these measures will provide additional detail for the identification of any gold standard memory measure. One limitation of this report is the reference normative databases. When contrasting test performance levels, AVLT sum percentile was generated from metanorms based upon multiple reports rather than a formal normative sample with age being the only demographic correction (Schmidt, 1996). In contrast, formal normative sampling was performed for the PSMT and reported percentile values were adjusted for age, gender, race, and education. Thus, percentile values to characterize performances for both tasks were obtained from different normative databases with differing degrees of demographic correction. Although the 3-trial version of the AVLT is considered a component of the Cognitive Toolbox, presently there are insufficient norms for the shortened version available to guide interpretation. The validation studies that have been done against a 3-trial AVLT have been correlative rather than reflecting clinical classification, and consequently do not need percentile values to establish magnitude of association. In addition, although the development and validation of the NIH Cognitive Toolbox tests were conducted using desktop computer assessment, there are no data comparing desktop versus iPad assessments in terms of psychometric equivalence. In summary, the data from the present report indicate that the clinical inference regarding memory function from the PSMT and Rey AVLT are not strictly equivalent, although the rate of major disagreement of the two test classification levels is relatively small. Ultimately, it will be up to individual clinicians to determine whether only one measure can be used without other measures of memory assessment in the clinical environment in which they practice. Further studies should examine the relationship of both tests to various clinical factors including clinical diagnosis, independent neuropsychological tests, and lesion location and lateralization. Conflict of interest None declared. References Bauer , P. J. , Dikmen , S. S. , Heaton , R. K. , Mungas , D. , Slotkin , J. , & Beaumont , J. L. ( 2013 ). III. NIH Toolbox Cognition Battery (CB): Measuring episodic memory . Monographs of the Society for Research in Child Development , 78 , 34 – 48 . doi:10.1111/mono.12033 . Google Scholar CrossRef Search ADS PubMed Bleck , T. P. , Nowinski , C. J. , Gershon , R. , & Koroshetz , W. J. ( 2013 ). What is the NIH toolbox, and what will it mean to neurology? Neurology , 80 , 874 – 875 . doi:10.1212/WNL.0b013e3182872ea0 . Google Scholar CrossRef Search ADS PubMed Boake , C. ( 2000 ). Édouard Claparède and the Auditory Verbal Learning Test . Journal of Clinical and Experimental Neuropsychology , 22 , 286 – 292 . Google Scholar CrossRef Search ADS PubMed Bowden , S. C. , Shores , E. A. & Mathias , J. L. ( 2006 ). Does effort suppress cognition after traumatic brain injury? A re-examination of the evidence for the Word Memory Test . The Clinical Neuropsychologist , 20 , 858 – 872 . doi:10.1080/13854040500246935 . Google Scholar CrossRef Search ADS PubMed Carlozzi , N. E. , Goodnight , S. , Casaletto , K. B. , Goldsmith , A. , Heaton , R. K. , Wong , A. W. K. , et al. . ( 2017 ). Validation of the NIH Toolbox in individuals with neurologic disorders . Archives of Clinical Neuropsychology , 32 , 555 – 573 . doi:10.1093/arclin/acx020 . Google Scholar CrossRef Search ADS PubMed Casaletto , K. B. , Umlauf , A. , Beaumont , J. , Gershon , R. , Slotkin , J. , Akshoomoff , N. , et al. . ( 2015 ). Demographically corrected normative standards for the English version of the NIH Toolbox Cognition Battery . Journal of the International Neuropsychological Society , 21 , 378 – 391 . doi:10.1017/s1355617715000351 . Google Scholar CrossRef Search ADS PubMed Delaney , R. C. , Prevey , M. L. , Cramer , J. , & Mattson , R. H. ( 1992 ). Test-retest comparability and control subject data for the Rey-Auditory Verbal Learning Test and Rey-Osterrieth/Taylor Complex Figures . Archives of Clinical Neuropsychology , 7 , 523 – 528 . doi:088761779290142A . Google Scholar PubMed Dikmen , S. S. , Bauer , P. J. , Weintraub , S. , Mungas , D. , Slotkin , J. , Beaumont , J. L. , et al. . ( 2014 ). Measuring episodic memory across the lifespan: NIH Toolbox Picture Sequence Memory Test . Journal of the International Neuropsychological Society , 20 , 611 – 619 . doi:10.1017/S1355617714000460 . Google Scholar CrossRef Search ADS PubMed Gravetter , F. J. , & Wallnau , L. B. ( 2007 ). Statistics for the behavioral sciences ( 7th ed. ). Belmont, CA : Thomas Wadsworth . Green , P. ( 2004 ). Green’s Medical Symptom Validity Test (MSVT) for Microsoft Windows: User’s manual . Edmonton, Canada : Green’s Publishing . Green , P. ( 2005 ). Green’s Word Memory Test for Windows User’s Manual-Revised . Edmonton, Alberta : Green’s Publishing . Howe , L. L. , Anderson , A. M. , Kaufman , D. A. , Sachs , B. C. , & Loring , D. W. ( 2007 ). Characterization of the Medical Symptom Validity Test in evaluation of clinically referred memory disorders clinic patients . Archives of Clinical Neuropsychology , 22 , 753 – 761 . Google Scholar CrossRef Search ADS PubMed Lezak , M. D. , Howieson , D. B. , & Loring , D. W. ( 2004 ). Neuropsychological assessment ( 4th ed. ). New York : Oxford University Press . Loring , D. W. , Goldstein , F. C. , Chen , C. , Drane , D. L. , Lah , J. J. , Zhao , L. , et al. . ( 2016 ). False-positive error rates for Reliable Digit Span and Auditory Verbal Learning Test performance validity measures in amnestic Mild Cognitive Impairment and Early Alzheimer Disease . Archives of Clinical Neuropsychology , 31 , 313 – 331 . doi:10.1093/arclin/acw014 . Google Scholar CrossRef Search ADS PubMed Loring , D. W. , & Hermann , B. P. ( 2011 ). Neuropsychology and the Epilepsy Common Data Elements Project. In Helmstaedter C. , Hermann B. , Lassonde M. , Kahane P. , & Arzimanoglou A. (Eds.) , Neuropsychology in the care of people with epilepsy , Vol. 11 , pp. 59 – 65 ). Surrey, UK : John Libbey Eurotext . Loring , D. W. , Strauss , E. , Hermann , B. P. , Barr , W. B. , Perrine , K. , Trenerry , M. R. , et al. . ( 2008 ). Differential neuropsychological test sensitivity to left temporal lobe epilepsy . Journal of the International Neuropsychological Society , 14 , 394 – 400 . Google Scholar CrossRef Search ADS PubMed Meador , K. J. , Loring , D. W. , Moore , E. E. , Thompson , W. O. , Nichols , M. E. , Oberzan , R. E. , et al. . ( 1995 ). Comparative cognitive effects of phenobarbital, phenytoin, and valproate in healthy adults . Neurology , 45 , 1494 – 1499 . Google Scholar CrossRef Search ADS PubMed Rey , A. ( 1964 ). L’examen clinique en psychologie . Paris : Presses Universitaires de France . Schmidt , M. ( 1996 ). Rey Auditory and Verbal Learning Test: A handbook . Los Angeles : Western Psychological Services . Schmidt , F. L. , & Hunter , J. E. ( 1996 ). Measurement error in psychological research: Lessons from 26 research scenarios . Psychological Methods , 1 , 199 – 233 . Google Scholar CrossRef Search ADS Strauss , E. , Sherman , E. M. S. , & Spreen , O. ( 2006 ). A compendium of neuropsychological tests: Administration, norms, and commentary ( 3rd ed. ). New York : Oxford University Press . Umfleet , L. G. , Janecek , J. K. , Quasney , E. , Sabsevitz , D. S. , Ryan , J. J. , Binder , J. R. , et al. . ( 2015 ). Sensitivity and specificity of memory and naming tests for identifying left temporal-lobe epilepsy . Applied Neuropsychology: Adult , 22 , 189 – 196 . doi:10.1080/23279095.2014.895366 . Google Scholar CrossRef Search ADS PubMed Weintraub , S. , Dikmen , S. S. , Heaton , R. K. , Tulsky , D. S. , Zelazo , P. D. , Bauer , P. J. , et al. . ( 2013 ). Cognition assessment using the NIH Toolbox . Neurology , 80 , S54 – S64 . doi:10.1212/WNL.0b013e3182872ded . Google Scholar CrossRef Search ADS PubMed © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Archives of Clinical NeuropsychologyOxford University Press

Published: Mar 28, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off